CN-122001894-A - PCIe-based CPU, NPU and GPU interconnection cluster management method

CN122001894ACN 122001894 ACN122001894 ACN 122001894ACN-122001894-A

Abstract

The invention discloses a CPU, NPU and GPU interconnection cluster management method based on PCIe, and belongs to the technical field of interconnection cluster management. The method comprises the steps of building an interconnection link through a PCIe (peripheral component interconnect express) exchanger, dynamically distributing bandwidth to adapt to task demands by depending on the link, supporting multi-priority data transmission, orderly distributing bandwidth resources according to priority, pre-configuring a multi-element operation mode, distributing physical and virtual computing power resources according to demands, building a device state monitoring mechanism, enabling idle devices to enter a dormant mode, and simultaneously combing all transmission links and selecting an optimal path. The method realizes the fine scheduling of cluster resources, improves the utilization ratio of bandwidth and computing power resources, enhances the adaptive capacity to different tasks, ensures the order and stability of data transmission, reduces the energy consumption of cluster operation, and helps the interconnected clusters to operate efficiently and stably.

Inventors

TANG CHAO

Assignees

四川华鲲振宇智能科技有限责任公司

Dates

Publication Date: 20260508
Application Date: 20260228

Claims (10)

1. The PCIe-based CPU, NPU and GPU interconnection cluster management method is characterized by comprising the following steps of: s1, constructing an interconnection link among a CPU, an NPU and a GPU through a PCIe exchanger, carrying out dynamic allocation of transmission bandwidth by relying on the interconnection link, collecting task demands, and adjusting bandwidth allocation conditions in real time according to the task demands; S2, carrying out data transmission by relying on the bandwidth subjected to bandwidth dynamic allocation adjustment, supporting multi-priority data transmission, allocating bandwidth occupation authorities according to priority orders, wherein the data transmission corresponding to a high-priority task occupies bandwidth preferentially, and the data transmission corresponding to other priority tasks sequentially occupy the residual bandwidth; S3, a plurality of operation modes are pre-configured, corresponding modes are selected according to calculation power requirements of different tasks, physical CPU, virtual CPU, physical NPU, virtual NPU, physical GPU or virtual GPU resources are allocated, and the physical resources and the virtual resources can be matched for use; S4, establishing a CPU, NPU and GPU use state monitoring mechanism, monitoring the use state of each device in real time, enabling the device which is not allocated with resources to enter a sleep mode, simultaneously carding all data transmission links among the devices, analyzing link transmission parameters, determining and adopting an optimal path to transmit data.
2. The method according to claim 1, characterized in that step S1 comprises the sub-steps of: S1.1, deploying PCIe switches to build interconnection links, defining link connection relations, and carding direct and indirect connection links among devices; S1.2, combining the definite link connection relation, collecting the task demands of the current operation, defining the bandwidth demands of each task, and regulating the bandwidth allocation proportion among the devices in real time by the associated link transmission capacity; S1.3, recording the result of bandwidth allocation adjustment proportion, and synchronizing to a data transmission link.
3. The method according to claim 1, characterized in that step S2 comprises the sub-steps of: s2.1, combing all data to be transmitted, defining tasks corresponding to each item of data, setting priority identification for each task, distinguishing high, medium and low priorities, and sequencing the data according to the priority order; s2.2, according to the result after sequencing according to the priority, the bandwidth after bandwidth dynamic allocation adjustment is preferentially allocated to the data transmission corresponding to the task with high priority; s2.3, after the high-priority data transmission occupies the bandwidth, the data transmission corresponding to the middle-priority task and the low-priority task sequentially occupies the residual bandwidth after the transmission of the front-priority task is completed or the transmission of the front-priority task is completed, and all the data transmission is completed.
4. The method according to claim 1, characterized in that step S3 comprises the sub-steps of: S3.1, pre-configuring CPU virtualization, NPU virtualization, GPU virtualization, CPU physical straight-through, NPU straight-through and GPU straight-through modes, and defining parameter configuration standards of each mode; S3.2, acquiring calculation force demand parameters of each task by referring to parameter configuration standards of each mode, comparing the calculation force demand parameters with the parameter configuration standards of each mode, analyzing the adaptation condition, and selecting one or more mode combinations; s3.3, corresponding resources are allocated according to the selected mode or mode combination, the allocation limit and the service life of each resource are defined, and the allocation of the computing power resources is completed.
5. The method according to claim 1, characterized in that step S4 comprises the sub-steps of: S4.1, establishing a device use state monitoring mechanism, setting a monitoring period and indexes, collecting resource occupation data according to the period, and judging whether each device is allocated with task resources or not; S4.2, generating and sending a dormancy control signal to equipment which is not allocated with resources according to a judging result of whether the equipment is allocated with the resources, combing all possible links among the equipment, and recording transmission related parameters of each link; S4.3, analyzing the link transmission parameters based on the recorded link transmission related parameters, screening links meeting preset requirements, determining an optimal path and synchronizing the optimal path to a data transmission link.
6. The method according to claim 1, wherein in step S1, transmission protocol parameters are configured when the interconnection link is established, the protocol parameters are matched with PCIe switch operation parameters, a data frame structure, a data checking algorithm and a transmission rate adaptation rule are configured, the data frame structure adapts to data interaction formats of three parties of the CPU, the NPU and the GPU, the integrity of transmission data is checked by adopting a cyclic redundancy check mode, corresponding rate gears are dynamically matched according to actual transmission capability of each section of interconnection link, and all protocol parameters are synchronized to a bandwidth dynamic allocation link after configuration is completed.
7. The method according to claim 1, wherein in step S1, data before and after bandwidth adjustment is recorded to form a log, a task unique identifier is recorded in the log, the execution time of bandwidth adjustment, the number of the corresponding interconnection link, the bandwidth allocation ratio between devices before adjustment, the bandwidth allocation ratio between devices after adjustment, and the trigger factor of bandwidth adjustment are recorded, the log data is stored in a sorted manner according to time sequence, the storage path is associated with configuration information of the interconnection link, the log data is called when task demand analysis work is performed later, the bandwidth demand difference of different types of tasks in different operation stages is compared, and the adjustment logic of bandwidth dynamic allocation is adjusted.
8. A method according to claim 3, wherein in step S2.1, three factors including task operation dependency relationship, data importance degree and execution urgency degree are referred to when setting the priority mark, the task operation dependency relationship is determined according to the execution sequence hierarchy of the task, the task priority level in the upstream execution link is set higher than the task with dependency downstream, the data importance degree is determined according to the influence range of the data on the overall task completion result, the task priority level with wider data influence range is set higher, the execution urgency degree is determined according to the completion time limit preset by the task, the priority level with shorter task completion time limit is set higher, the priority mark of the task is determined according to the determination result of the three factors, and the priority mark is marked on the data head to be transmitted of the corresponding task.
9. The method according to claim 4, wherein in step S3.1, when the mode parameter standard is configured, the resource calling rule, the resource allocation upper limit and the resource sharing mechanism in each mode are defined, the whole process of creating, scheduling and releasing the resources in the virtualized mode is defined, the binding mode of the physical resources and the corresponding devices in the through mode is defined, the maximum occupation limit of the resources in each running mode is defined according to the total amount of the resources of a single device, the resource sharing mechanism is only set for the virtualized mode, the virtual resources of the same type are allowed to be switched and shared between different tasks as required, and the parameter standards of all the running modes are uniformly stored in the mode configuration library.
10. The method according to claim 5, wherein in step S4.2, the recorded transmission parameters include link transmission distance, current data flow, stability parameters and error rate data, the link transmission distance is calculated by topology of the interconnection link, the current data flow is counted according to the number of data packets transmitted in the link in unit time, the stability parameters are counted according to the number of interruption times of the link in a preset monitoring period and the recovery time after interruption, the error rate data is counted according to the proportion of the number of data packets failed to be checked in the transmission process to the total number of data packets transmitted, all the transmission parameters are classified and recorded according to the serial number of the interconnection link, and the parameter data is updated in real time according to the actual running state of the link.

Description

PCIe-based CPU, NPU and GPU interconnection cluster management method Technical Field The invention relates to the technical field of interconnection cluster management, in particular to a CPU, NPU and GPU interconnection cluster management method based on PCIe. Background With the rapid development of digital economy, the demand for computing power continues to rise in a plurality of fields such as artificial intelligence, big data processing, high-performance computing and the like, and the application of interconnected clusters of the CPU, the NPU and the GPU serving as a core computing unit is becoming wider. PCIe (high speed serial computer expansion bus standard) has become the dominant technology choice for building CPU, NPU and GPU interconnect clusters by virtue of high transmission rate, low latency, good compatibility, etc. At present, interconnection link construction is commonly adopted in the industry to realize multi-equipment cooperative work, a virtualization technology and a physical resource straight-through mode are applied to a cluster, multi-task concurrent operation becomes a normal state, and various tasks show diversified characteristics on requirements on bandwidth and calculation power. Meanwhile, cluster management gradually develops towards the direction of refinement and high efficiency, and technical schemes of links such as link configuration, resource scheduling, data transmission and the like are iterated continuously, so that increasingly complex application scenes are adapted, and the running requirements of different tasks are met. In the current management of interconnected clusters of CPUs, NPUs and GPUs, a series of technical problems still exist, which affect the operation efficiency. Firstly, after the existing interconnection links are built, bandwidth allocation is mostly in a fixed proportion or static adjustment mode, dynamic adaptation cannot be performed according to real-time requirements of tasks, so that link resources are not matched with task requirements, part of task bandwidth is not supplied enough, and part of resources are idle, secondly, an effective priority distinguishing mechanism is lacked in a data transmission process, various task data are unordered to fight bandwidth, key tasks are difficult to obtain priority transmission support, overall task propulsion efficiency is affected, operation mode configuration is single, advantages of physical resources and virtual resources are not fully integrated, differential calculation force requirements of different tasks are difficult to flexibly adapt, calculation force resource utilization rate is low, comprehensive monitoring of equipment use states is lacked, idle equipment cannot enter a low-power consumption mode in time, energy consumption is caused, meanwhile, carding and optimization of the data transmission links are insufficient, an optimal transmission path cannot be accurately screened, data transmission delay is high, stability is poor, and overall operation efficiency and adaptation capability of the interconnection clusters are jointly restricted. Disclosure of Invention The invention aims to overcome the defects of the prior art and provides a CPU, NPU and GPU interconnection cluster management method based on PCIe. The aim of the invention is realized by the following technical scheme: The method for managing the interconnection clusters of the CPU, the NPU and the GPU based on PCIe comprises the following steps: s1, constructing an interconnection link among a CPU, an NPU and a GPU through a PCIe exchanger, carrying out dynamic allocation of transmission bandwidth by relying on the interconnection link, collecting task demands, and adjusting bandwidth allocation conditions in real time according to the task demands; S2, carrying out data transmission by relying on the bandwidth subjected to bandwidth dynamic allocation adjustment, supporting multi-priority data transmission, allocating bandwidth occupation authorities according to priority orders, wherein the data transmission corresponding to a high-priority task occupies bandwidth preferentially, and the data transmission corresponding to other priority tasks sequentially occupy the residual bandwidth; S3, a plurality of operation modes are pre-configured, corresponding modes are selected according to calculation power requirements of different tasks, physical CPU, virtual CPU, physical NPU, virtual NPU, physical GPU or virtual GPU resources are allocated, and the physical resources and the virtual resources can be matched for use; S4, establishing a CPU, NPU and GPU use state monitoring mechanism, monitoring the use state of each device in real time, enabling the device which is not allocated with resources to enter a sleep mode, simultaneously carding all data transmission links among the devices, analyzing link transmission parameters, determining and adopting an optimal path to transmit data. Further, step S1 comprises the sub-s