CN-122027717-A - OISA protocol intelligent memory state feedback optimization method, device and equipment

CN122027717ACN 122027717 ACN122027717 ACN 122027717ACN-122027717-A

Abstract

The invention provides a method, a device and equipment for optimizing state feedback of OISA protocol intelligent memory access, which are used for updating state information corresponding to an original bottleneck position in real time on a return path of intelligent feedback messages by caching the latest intelligent state information on a switching device in real time, ensuring that a source GPU receives the current latest network state data, avoiding the problem of making an error flow scheduling decision based on outdated historical information as far as possible, and remarkably improving the accuracy of flow scheduling. Through the accurate state information updated in real time, the source GPU can timely match with the network actual state adjustment sending strategy, unnecessary speed reduction operation is avoided, link bandwidth resources can be fully utilized, severe requirements of high bandwidth, low time delay such as All-to-All communication and the like under a large model training scene are adapted, and overall transmission efficiency of interconnection among GPU cards is improved.

Inventors

YOU JUNPING
WANG CHAOQUN
YANG YI
LUO BIN

Assignees

格创通信(浙江)有限公司

Dates

Publication Date: 20260512
Application Date: 20260415

Claims (10)

1. A OISA protocol intelligent access state feedback optimization method is characterized in that the method is applied to a switching device interconnected among GPU cards, and the method comprises the following steps: S101, caching the latest intelligent state information of the current position when the switching equipment receives the intelligent message on the detection path; S102, the switching equipment monitors the intelligent feedback message on the detection path in real time, analyzes a perception label in the intelligent feedback message, and judges whether the bottleneck position on the detection path is consistent with the current position according to the content of the perception label in the intelligent feedback message; S103, if the judgment is consistent, updating the latest intelligent state information of the current position into the intelligent feedback message, and forwarding the updated intelligent feedback message to a source end; and S104, if the judgment is inconsistent, not updating, and transmitting the intelligent feedback message to the source end.
2. The method of claim 1, wherein the method for caching the most current intelligent state information of the current location is: The intelligent state information table is used for caching the latest intelligent state information of the current position triggered and collected by the intelligent message, and comprises the following components: A location field for recording the detection path and the intelligent state location related information; the state field is used for recording the latest intelligent state information, and at least comprises a perception type and a state degree value; The method for judging whether the bottleneck position on the detection path is consistent with the current position according to the perception label content in the intelligent feedback message comprises the following steps: and analyzing the bottleneck position from the message header and the perception label of the intelligent feedback message, matching the analyzed bottleneck position with the content of the position field in the intelligent state information table, and judging whether the bottleneck position is consistent with the current position or not based on a matching result.
3. The method of claim 2, wherein the step of determining the position of the substrate comprises, The position field of the intelligent state information table at least comprises a source end device identifier, a source end device port identifier, a destination end device identifier and a destination end device port identifier; When the perception type is 'minimum available bandwidth ratio (LABR'), the perception granularity is the port of the switching equipment; when the perception type is 'maximum single-hop time delay MSHD', the perception granularity is the switching equipment; when the perception type is 'maximum used queue depth MQDU', a virtual channel queue with the granularity of a port is perceived, and the position field further comprises a virtual channel identification.
4. The method of claim 1, wherein the step of determining the position of the substrate comprises, Presetting a corresponding intelligent state feedback threshold value for a perception type on the switching equipment; Before the exchange equipment performs the operation of updating the latest intelligent state information of the current position into the intelligent feedback message, the exchange equipment needs to judge whether the state degree value of the latest intelligent state information of the current position exceeds an intelligent state feedback threshold value; If the current position exceeds the current position, replacing a state degree value in a perception label in the intelligent feedback message, and feeding back the intelligent state information with the latest current position to a source end through the intelligent feedback message; If the message is not exceeded, the intelligent feedback message is intercepted, and the intelligent feedback message is not forwarded to the source terminal.
5. The method of claim 1, wherein the step of determining the position of the substrate comprises, The return path of the intelligent feedback message is controlled by the destination terminal equipment through setting an entropy Entropy field in the intelligent feedback message, the worst value of the return path is not detected, and the intelligent label of the intelligent feedback message is only compared and updated at the bottleneck position detected by the intelligent message.
6. The utility model provides a OISA agreement sense of intelligence visit state feedback optimizing device that exists which characterized in that is applied to the switching equipment of interconnection between the GPU card, said device includes: The state caching module is used for caching the latest intelligent state information of the current position when the intelligent message on the detection path is received; the message monitoring module is used for monitoring the intelligent feedback message on the detection path in real time and analyzing the perception label in the intelligent feedback message; The comparison judging module is used for judging whether the bottleneck position on the detection path is consistent with the current position according to the content of the perception label in the intelligent feedback message; The message updating module is used for updating the latest intelligent state information of the current position into the intelligent feedback message when the bottleneck position is consistent with the current position; And the message processing module is used for forwarding the updated intelligent feedback message to the source end or executing normal forwarding operation on the intelligent feedback message which does not need to be updated.
7. The apparatus of claim 6, wherein the device comprises a plurality of sensors, The state caching module is further configured to maintain a smart state information table, where the smart state information table is configured to cache smart state information that is triggered and collected by a smart message and is latest in a current position, and the smart state information table includes: A location field for recording the detection path and the intelligent state location related information; the state field is used for recording the latest intelligent state information, and at least comprises a perception type and a state degree value; The comparison judging module analyzes the bottleneck position from the message head and the perception label of the intelligent feedback message, matches the analyzed bottleneck position with the content of the position field in the intelligent state information table, and judges whether the bottleneck position is consistent with the current position or not based on the matching result.
8. The apparatus of claim 7, wherein the device comprises a plurality of sensors, The position field of the intelligent state information table at least comprises a source end device identifier, a source end device port identifier, a destination end device identifier and a destination end device port identifier; When the perception type is 'minimum available bandwidth ratio (LABR'), the perception granularity is the port of the switching equipment; when the perception type is 'maximum single-hop time delay MSHD', the perception granularity is the switching equipment; when the perception type is 'maximum used queue depth MQDU', a virtual channel queue with the granularity of a port is perceived, and the position field further comprises a virtual channel identification.
9. The apparatus of claim 6, wherein the apparatus further comprises: The system comprises a threshold maintenance module, a threshold control module, a control module and a control module, wherein the threshold maintenance module is used for maintaining intelligent state feedback thresholds consistent with GPU in the networking on the switching equipment; The message updating module is also used for judging whether the state degree value of the intelligent state information of the current position is more than the intelligent state feedback threshold value, if so, replacing the state degree value in the perception label in the intelligent feedback message, and if not, not executing the replacing operation; And the message processing module is used for forwarding the updated intelligent feedback message to the source terminal when the latest state degree value of the current position exceeds the intelligent state feedback threshold value, and intercepting the intelligent feedback message and not forwarding the intelligent feedback message to the source terminal when the current state degree value of the current position does not exceed the intelligent state feedback threshold value.
10. A switching device, characterized in that the device comprises a software/hardware processing unit inside the switching chip running chip to perform the method according to any of claims 1 to 5.

Description

OISA protocol intelligent memory state feedback optimization method, device and equipment Technical Field The invention relates to the technical field of high-performance computing cluster GPU interconnection communication, in particular to an intelligent memory state feedback optimization method based on OISA protocols, a corresponding realization device and switching equipment applying the method. Background As the scale and complexity of AI model applications continue to rise, a vast computing cluster consisting of numerous GPUs becomes the mainstream industry. OISA (Omni-directional INTELLIGENT SENSING Express Architecture) is specially designed for large-scale AI intelligent computing clusters, especially GPU interconnection scenes in super nodes, and can provide a high-bandwidth and low-delay inter-card communication solution. The OISA protocol systematically strips the complex mechanism which is designed for universality in the traditional network protocol stack but is redundant under an AI cluster deterministic flow model, realizes the light weight of the protocol stack, reduces the overhead of a message header and improves the effective bandwidth, the OISA protocol adds a deep optimization function for AI load on a simplified protocol base, and the core of the OISA protocol comprises advanced characteristics such as intelligent in-transit sensing, transaction layer message (Transaction LAYER PACKET, TLP) reconstruction and the like, so that the system efficiency and the operability of the AI computing cluster are improved. In the OISA protocol system, the intelligent access mode is a core mechanism for realizing the cooperative flow optimization of the end network. The intelligent access mode is characterized in that a sensing label (SENSING TAG) is carried in the data message, the device port can judge the transmission mode by identifying the protocol type field, the intelligent sensing function of the following path is started, and the corresponding port device can update the key state information of the device in real time into the sensing label of the message in a comparison and replacement mode according to the sensing label when the message is received or sent out by the port. Key fields such as hop count, device port number, perception type, state degree value and the like are defined in the perception label, detection of multiple network states such as minimum available bandwidth ratio, maximum single hop delay, maximum used queue depth and the like can be supported, and the principle of 'worst state reservation' is followed, and the content of the perception label in the message can be updated only when the local state is worse, so that performance bottleneck information on an end-to-end transmission path is recorded. In order to realize complete state sensing closed loop, OISA protocol also defines active detection mechanism and sensing information feedback mechanism of source end. The source active detection mechanism is an interconnection state information collection mechanism initiated by the source GPU and based on in-band perception. The sensing information feedback mechanism standardizes the triggering rule of feedback and the message generation standard, and the protocol sets a feedback inhibition mechanism based on a threshold value, and the destination GPU can generate and send a feedback message only when the sensed state degree value reaches or is inferior to a preset threshold value. However, in the scale-up networking scenario of the AI intelligent computation cluster, the port queue state, the link bandwidth occupation condition, the forwarding delay and other network states of the exchange chip can change rapidly and in real time along with the concurrent transmission of mass data of multiple GPU nodes, the existing intelligent access memory mechanism adopts an end-to-end closed loop process, a path 'worst state' is detected from a bottleneck position, the path 'worst state' is sent to a destination GPU through an intelligent data message, the path 'worst state' is fed back to a source GPU through an intelligent feedback message from the destination GPU, the network state on a detection path may have changed greatly at the moment when the source GPU receives the intelligent feedback message, and if the source GPU adjusts a flow path or adjusts a sending rate according to the network state carried by the intelligent feedback message, an unmatched adjustment strategy is adopted, thereby influencing the efficiency of the intelligent message mechanism and reducing the processing efficiency of the whole network. Disclosure of Invention The invention aims to overcome the defects of lagging state feedback information and inaccurate source end flow scheduling decision in the conventional OISA protocol intelligent memory access mechanism, and provides a OISA protocol intelligent memory access state feedback optimization method, device and equipment, by realizing the latest state cac