EP-4738790-A1 - SYSTEMS AND METHODS FOR MANAGING DEVICE FAILOVER AND DATA ROUTING IN NETWORK SYSTEMS
Abstract
The subject technology is directed to a device for managing inrush current in voltage regulation systems. The device includes an input configured to receive an input voltage and an output configured to provide an output voltage. The device includes a first circuit configured to generate a first signal associated with the output voltage. The device further includes a first comparator configured to compare the first signal with a first reference voltage and generate a second signal based on the comparison. The device further includes a switch configured to receive the second signal and adjust a first resistance in a current path between the input and the output based on the second signal. The device implements multilevel inrush current control, allowing for dynamic adjustment of the inrush current at different stages of the power-up phase.
Inventors
- Khaparde, Ajit Kumar
Assignees
- Avago Technologies International Sales Pte. Limited
Dates
- Publication Date
- 20260506
- Application Date
- 20251027
Claims (15)
- A switch apparatus comprising: a first port coupled to a first device; a second port coupled to a second device; a controller coupled to the first port, the controller being configured to assign an active state to the first device and a passive state to the second device; a scheduler coupled to the controller, the scheduler being configured to monitor an operational status of the first device by detecting a first failure associated with the first device; and a routing unit coupled to the controller, the routing unit being configured to determine a first routing path between the first device and the first port for managing data traffic; wherein in response to the scheduler detecting the first failure, the controller is configured to reassign the active state from the first device to the second device and the passive state from the second device to the first device, and the routing unit is configured to determine a second routing path between the second device and the second port.
- The apparatus of claim 1, wherein the routing unit comprises a route table configured to store the first routing path
- The apparatus of claim 2, wherein the routing unit is further configured to update the route table to store the second routing path.
- The apparatus of any one of the claims 1 to 3, wherein the first device comprises a first network interface card (NIC) and the second device comprises a second NIC.
- The apparatus of any one of the claims 1 to 4, wherein the scheduler is configured to monitor the operational status of the first device based on a predefined time interval.
- The apparatus of any one of the claims 1 to 5, wherein the first failure is detected based on a loss of electrical connectivity between the first device and the first port.
- The apparatus of any one of the claims 1 to 5, wherein the first failure is detected based on an error in a configuration space of the first device.
- The apparatus of any one of the claims 1 to 5, wherein the first failure is detected based on a success rate of data transactions between the first device and the first port.
- The apparatus of any one of the claims 1 to 8, further comprising a third port coupled to a third device.
- The apparatus of claim 9, further comprising at least one of the following features: (A) the first device is configured to perform a direct memory access (DMA) transfer to the third device; (B) the third device comprises a graphics processing unit (GPU); (C) the third device comprises a storage device.
- The apparatus of any one of the claims 1 to 10, wherein the first device is coupled to the second device via a peripheral component interconnect express (PCIe) interface.
- The apparatus of any one of the claims 1 to 11, further comprising a fourth port coupled to a host, and the controller being configured to communicate the active state of the first device to the host.
- A method comprising: assigning, by a controller, an active state to a first device coupled to a first port and a passive state to a second device coupled to a second port; monitoring, by a scheduler, an operational status of the first device; determining, by a routing unit, a first routing path between the first device and the first port for managing data traffic; in response to detecting a first failure associated with the first device, reassigning the active state to the second device and the passive state to the first device; and determining, by the routing unit, a second routing path between the second device and the second port for managing data traffic.
- The method of claim 13, wherein the first device comprises a first network interface card (NIC) and the second device comprises a second NIC.
- The method of claim 13 or 14, wherein the first failure is detected based on a loss of electrical connectivity between the first device and the first port.
Description
BACKGROUND OF THE INVENTION In modern computing and networking environments, reliable and efficient communication between devices is important for maintaining system performance and uptime. Many systems involve multiple devices-such as network interface cards (NICs), storage devices, and processing units-that work together to handle high-volume data traffic. These devices may be interconnected through switches, which manage data routing between devices and external systems, including host systems and other endpoints. Some approaches for data transfer between devices rely on direct memory access (DMA), which allows devices to access memory directly without burdening the central processing unit (CPU). This improves overall efficiency by reducing processing overhead and enabling faster data transfers. For instance, peripheral component interconnect express (PCIe) is a standard that supports high-speed communication between devices, such as NICs, processing units, and storage controllers. PCIe enables direct connections between devices via a bus structure, facilitating efficient data flow between multiple endpoints through switches. As systems become more complex, especially with high-performance workloads such as artificial intelligence (AI) and machine learning (ML), the likelihood of device failures increases. These workloads often rely on multiple devices working together in a coordinated manner, and a failure in one device can have cascading effects throughout the system. For example, when a NIC that transfers data to one or more processing units fails, the associated processing units may be left unused, causing a loss of processing power and reducing overall system efficiency. Various approaches for addressing device failure in complex systems have been explored, but they have proven to be insufficient. It is important to recognize the need for new and improved systems and methods. BRIEF DESCRIPTION OF THE DRAWINGS A further understanding of the nature and advantages of particular embodiments may be realized by reference to the remaining portions of the specification and the drawings, in which like reference numerals are used to refer to similar components. In some instances, a sub-label is associated with a reference numeral to denote one of multiple similar components. When reference is made to a reference numeral without specification to an existing sub-label, it is intended to refer to all such multiple similar components. Figure 1 is a block diagram illustrating an architecture of a computing system, in accordance with various embodiments of the subject technology.Figure 2 is a block diagram illustrating an architecture of a computing system, in accordance with various embodiments of the subject technology.Figure 3 is a block diagram illustrating a switch apparatus, in accordance with various embodiments of the subject technology. DETAILED DESCRIPTION OF THE INVENTION The subject technology is directed to a switch apparatus for managing device states and data traffic between multiple devices. In an embodiment, the switch apparatus includes a first port coupled to a first device and a second port coupled to a second device. The apparatus further includes a controller configured to assign an active state to the first device and a passive state to the second device. The apparatus further includes a scheduler configured to monitor the operational status of the first device and detect a failure. Upon detecting the failure, the controller reassigns the active state to the second device and the passive state to the first device, ensuring continuous data traffic flow and reducing downtime through dynamic switching. There are other embodiments as well. One general aspect includes a switch apparatus, which comprises: a first port coupled to a first device; a second port coupled to a second device; a controller coupled to the first port, the controller being configured to assign an active state to the first device and a passive state to the second device; a scheduler coupled to the controller, the scheduler being configured to monitor an operational status of the first device by detecting a first failure associated with the first device; and a routing unit coupled to the controller, the routing unit being configured to determine a first routing path between the first device and the first port for managing data traffic, the routing unit comprising a route table configured to store the first routing path. In response to the scheduler detecting the first failure, the controller is configured to reassign the active state from the first device to the second device and the passive state from the second device to the first device, and the routing unit is configured to determine a second routing path between the second device and the second port and update the route table to store the second routing path. Implementations may include one or more of the following features. The first device comprises a first network inter