Search

US-20260127041-A1 - Unified Device Presentation for Hot-Plug Events in Reconfigurable Data Flow

US20260127041A1US 20260127041 A1US20260127041 A1US 20260127041A1US-20260127041-A1

Abstract

A data processing system includes a pool of reconfigurable data flow resources having plurality of reconfigurable processors interconnected via a bus, controller, and runtime processor. The controller monitors ports of the bus connected to respective reconfigurable processors and generates hot-plug events in response to detecting disconnection or addition of reconfigurable processors. The runtime processor includes a kernel module with device abstraction module that presents all reconfigurable processors as a single virtual device file to user space, maintaining this unified presentation transparent to changes in the pool. Hot-plug events are transmitted as interrupts to a daemon module, which executes initialization of clocks, bus interfaces, and memory resources for added processors. The system supports distributed hot-plug controllers at each bus port communicating with controller service and driver. A resource manager and scheduler dynamically adjust hardware resource availability and configuration file mapping in response to hot-plug events while continuing execution of user applications.

Inventors

  • Anand Misra
  • Conrad Alexander TURLIK
  • Maran Wilson
  • Anand Vayyala
  • Raghu Shenbagam
  • Ranen Chatterjee
  • Pushkar Shridhar Nandkar
  • Shivam Raikundalia

Assignees

  • SambaNova Systems, Inc.

Dates

Publication Date
20260507
Application Date
20251230

Claims (16)

  1. 1 . A data processing system, comprising: a pool of reconfigurable data flow resources having a plurality of reconfigurable processors interconnected via a bus, each reconfigurable processor including arrays of physical configurable units; a controller connected to the bus and configured to: monitor a plurality of ports of the bus, wherein each port of the plurality of ports is connected to a respective reconfigurable processor of the plurality of reconfigurable processors; detect a disconnection of at least one reconfigurable processor from a corresponding port of the plurality of ports; and generate a hot-plug event in response to detecting the disconnection; and a runtime processor connected to the controller and configured to: receive the hot-plug event from the controller; determine that the at least one reconfigurable processor is unallocated; and make the at least one reconfigurable processor unavailable for subsequent allocations while continuing execution of user applications on other reconfigurable processors of the plurality of reconfigurable processors.
  2. 2 . The data processing system of claim 1 , wherein the bus comprises at least one of a Peripheral Component Interconnect Express (PCIe) bus, a Universal Serial Bus (USB), or an Inter-Integrated Circuit (I2C) bus.
  3. 3 . The data processing system of claim 1 , wherein the controller is implemented as a master controller that monitors all ports of the plurality of ports.
  4. 4 . The data processing system of claim 1 , wherein the controller comprises: a controller service and driver; and a plurality of distributed hot-plug controllers, each distributed hot-plug controller associated with a respective port of the plurality of ports, wherein each distributed hot-plug controller is configured to communicate with the controller service and driver to notify the controller service and driver of changes at the respective port.
  5. 5 . The data processing system of claim 1 , wherein the disconnection of the at least one reconfigurable processor is reactive to an error event, the error event comprising at least one of: a single-event upset (SEU) in a configuration memory; a single-event latch-up (SEL); a single-event gate rupture (SEGR); or a single-event burnout (SEB).
  6. 6 . The data processing system of claim 1 , wherein the hot-plug event is transmitted to a kernel module of the runtime processor as an interrupt.
  7. 7 . The data processing system of claim 6 , wherein the kernel module is configured to transmit the hot-plug event as an interrupt to a daemon module in user space.
  8. 8 . A data processing system, comprising: a pool of reconfigurable data flow resources having a plurality of reconfigurable processors, each reconfigurable processor including arrays of physical configurable units; a controller connected to the pool of reconfigurable data flow resources and configured to generate an addition hot-plug event in response to detecting an addition of an other reconfigurable processor to the pool of reconfigurable data flow resources; and a runtime processor connected to the controller and configured to: receive the addition hot-plug event from the controller indicating the addition of the other reconfigurable processor; execute an initialization of clocks, bus interfaces, and memory resources of arrays of physical configurable units in the other reconfigurable processor; and make the other reconfigurable processor available for subsequent allocations of subsequent virtual data flow resources and subsequent executions of subsequent user applications, while a subset of the plurality of reconfigurable processors continues execution of user applications.
  9. 9 . The data processing system of claim 8 , wherein the other reconfigurable processor is at least one of a previously removed reconfigurable processor from the pool of reconfigurable data flow resources or a newly added reconfigurable processor.
  10. 10 . The data processing system of claim 8 , wherein the addition hot-plug event is transmitted to a module in the runtime processor as an interrupt, and the module is configured to respond to the interrupt by transmitting a file descriptor data structure using an input-output control (IOCTL) system call, wherein the file descriptor data structure specifies the initialization of the clocks, the bus interfaces, and the memory resources of the arrays of physical configurable units in the other reconfigurable processor.
  11. 11 . The data processing system of claim 8 , wherein the bus interfaces include at least one of a peripheral component interconnect express (PCIe) channel, a direct memory access (DMA) channel, a double data rate (DDR) channel, an InfiniBand channel, or an Ethernet channel.
  12. 12 . The data processing system of claim 8 , wherein the memory resources include at least one of a main memory, a local secondary storage, or a remote secondary storage.
  13. 13 . The data processing system of claim 8 , wherein the runtime processor comprises: a daemon module including an event manager and a local fabric initializer, wherein the event manager directs the local fabric initializer to initialize the clocks, the bus interfaces, and the memory resources of the arrays of physical configurable units in the other reconfigurable processor.
  14. 14 . A data processing system, comprising: a pool of reconfigurable data flow resources having a plurality of reconfigurable processors interconnected using a peripheral component interconnect express (PCIe) bus, each reconfigurable processor including arrays of physical configurable units; a controller connected to the pool of reconfigurable data flow resources and configured to generate a hot-plug event in response to detecting a removal of a virtual function on an allocated array of physical configurable units in the pool of reconfigurable data flow resources; and a runtime processor connected to the controller and configured to: receive the hot-plug event from the controller indicating the removal of the virtual function; and make the virtual function unavailable for subsequent allocation of subsequent virtual data flow resources and subsequent execution of subsequent user applications, while other allocated arrays of physical configurable units continue execution of user applications, wherein the virtual function is initialized using a single-root input-output virtualization (SR-IOV) interface.
  15. 15 . The data processing system of claim 14 , wherein the hot-plug event is transmitted to a module in the runtime processor as an interrupt, and the module is configured to respond to the interrupt by transmitting a file descriptor data structure using an input-output control (IOCTL) system call, wherein the file descriptor data structure specifies that the virtual function is unavailable for the subsequent allocation of the subsequent virtual data flow resources and the subsequent execution of the subsequent user applications.
  16. 16 . The data processing system of claim 14 , wherein the controller is further configured to generate an additional hot-plug event in response to detecting an addition of a second virtual function on an initialized array of physical configurable units in the pool of reconfigurable data flow resources, and wherein the runtime processor is further configured to: receive the additional hot-plug event from the controller indicating the addition of the second virtual function; and make the second virtual function available for subsequent allocation of subsequent virtual data flow resources and subsequent execution of subsequent user applications.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/083,403, entitled, “Hot-Plug Events In a Pool of Reconfigurable Data Flow Resources” filed on Dec. 16, 2022. RELATED APPLICATIONS AND REFERENCES The following are incorporated by reference for all purposes as if fully set forth herein: Prabhakar et al., “Plasticine: A Reconfigurable Architecture for Parallel Patterns,” ISCA '17, Jun. 24-28, 2017, Toronto, ON, Canada;Koeplinger et al., “Spatial: A Language And Compiler For Application Accelerators,” Proceedings Of The 39th ACM SIGPLAN Conference On Programming Language Design And Embodiment (PLDI), Proceedings of the 43rd International Symposium on Computer Architecture, 2018;U.S. Nonprovisional patent application Ser. No. 16/239,252, now U.S. Pat. No. 10,698,853 B1, filed. Jan. 3, 2019, entitled, “VIRTUALIZATION OF A RECONFIGURABLE DATA PROCESSOR;”U.S. Nonprovisional patent application Ser. No. 16/197,826, now U.S. Pat. No. 10,831,507 B2, filed Nov. 21, 2018, entitled, “CONFIGURATION LOAD OF A RECONFIGURABLE DATA PROCESSOR;”U.S. Nonprovisional patent application Ser. No. 16/198,086, now U.S. Pat. No. 11,188,497 B2, filed Nov. 21, 2018, entitled, “CONFIGURATION UNLOAD OF A RECONFIGURABLE DATA PROCESSOR;”U.S. Nonprovisional patent application Ser. No. 16/260,548, now U.S. Pat. No. 10,768,899 B2, filed Jan. 29, 2019, entitled, “MATRIX NORMAL/TRANSPOSE READ AND A RECONFIGURABLE DATA PROCESSOR INCLUDING SAME;”U.S. Nonprovisional patent application Ser. No. 16/536,192, now U.S. Pat. No. 11,080,227 B2, filed Aug. 8, 2019, entitled, “COMPILER FLOW LOGIC FOR RECONFIGURABLE ARCHITECTURES;”U.S. Nonprovisional patent application Ser. No. 16/407,675, now, U.S. Pat. No. 11,386,038 B2, filed May 9, 2019, entitled, “CONTROL FLOW BARRIER AND RECONFIGURABLE DATA PROCESSOR;”U.S. Nonprovisional patent application Ser. No. 16/504,627, now U.S. Pat. No. 11,055,141 B2, filed Jul. 8, 2019, entitled, “QUIESCE RECONFIGURABLE DATA PROCESSOR;”U.S. Nonprovisional patent application Ser. No. 16/572,516, filed Sep. 16, 2019, entitled, “EFFICIENT EXECUTION OF OPERATION UNIT GRAPHS ON RECONFIGURABLE ARCHITECTURES BASED ON USER SPECIFICATION;”U.S. Nonprovisional patent application Ser. No. 16/744,077, filed Jan. 15, 2020, entitled, “COMPUTATIONALLY EFFICIENT SOFTMAX LOSS GRADIENT BACKPROPAGATION;”U.S. Nonprovisional patent application Ser. No. 16/590,058, now U.S. Pat. No. 11,327,713 B2, filed Oct. 1, 2019, entitled, “COMPUTATION UNITS FOR FUNCTIONS BASED ON LOOKUP TABLES;”U.S. Nonprovisional patent application Ser. No. 16/695,138, now U.S. Pat. No. 11,328,038 B2, filed Nov. 25, 2019, entitled, “COMPUTATION UNITS FOR BATCH NORMALIZATION;”U.S. Nonprovisional patent application Ser. No. 16/688,069 , now U.S. Pat. No. 11,327,717 B2, filed Nov. 19, 2019, entitled, “LOOK-UP TABLE WITH INPUT OFFSETTING;”U.S. Nonprovisional patent application Ser. No. 16/718,094, now U.S. Pat. No. 11,150,872 B2, filed Dec. 17, 2019, entitled, “COMPUTATION UNITS FOR ELEMENT APPROXIMATION;”U.S. Nonprovisional patent application Ser. No. 16/560,057, now U.S. Pat. No. 11,327,923 B2, filed Sep. 4, 2019, entitled, “SIGMOID FUNCTION IN HARDWARE AND A RECONFIGURABLE DATA PROCESSOR INCLUDING SAME;”U.S. Nonprovisional patent application Ser. No. 16/572,527, now U.S. Pat. No. 11,410,027 B2,filed Sep. 16, 2019, entitled, “PERFORMANCE ESTIMATION-BASED RESOURCE ALLOCATION FOR RECONFIGURABLE ARCHITECTURES;”U.S. Nonprovisional patent application Ser. No. 15/930,381, now U.S. Pat. No. 11,250,105 B2, filed May 12, 2020, entitled, “COMPUTATIONALLY EFFICIENT GENERAL MATRIX-MATRIX MULTIPLICATION (GeMM);”U.S. Nonprovisional patent application Ser. No. 16/890,841, filed Jun. 2, 2020, entitled, “ANTI-CONGESTION FLOW CONTROL FOR RECONFIGURABLE PROCESSORS;’ andU.S. Nonprovisional patent application Ser. No. 16/922,975, filed Jul. 7, 2020, entitled, “RUNTIME VIRTUALIZATION OF RECONFIGURABLE DATA FLOW RESOURCES;”U.S. Nonprovisional patent application Ser. No. 17/554,913, filed Dec. 17, 2021, entitled, “HOT-PLUG EVENTS IN A POOL OF RECONFIGURABLE DATA FLOW RESOURCES.” FIELD OF THE TECHNOLOGY DISCLOSED The present technology relates to hot-plug events in a pool of reconfigurable data flow resources, and more particularly to the hot-plug removal of reconfigurable data flow resources from the pool of reconfigurable data flow resources and/or the hot-plug insertion of reconfigurable data flow resources to the pool of reconfigurable data flow resources. Such hot-plug events in the pool of reconfigurable data flow resources is particularly applicable to cloud offering of coarse-grained reconfigurable architectures (CGRAs). BACKGROUND The subject matter discussed in this section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the pr