Search

CN-122027590-A - Network controller, electronic equipment and message management method

CN122027590ACN 122027590 ACN122027590 ACN 122027590ACN-122027590-A

Abstract

The embodiment of the invention discloses a network controller, electronic equipment and a message management method, which relate to the technical field of communication and can effectively save cache resources and improve the cache utilization rate. The network controller comprises a shared buffer memory, a buffer memory management component and a buffer memory management component, wherein the shared buffer memory is used for being coupled with at least two sending queues and is configured to buffer the sent messages of the at least two sending queues, and the buffer memory management component is coupled with the shared buffer memory and is configured to manage the sent messages in the shared buffer memory according to index information of each sent message, wherein the index information comprises a queue identifier of the sending queue corresponding to the sent message, a message number of the sent message and a storage position of the sent message in the shared buffer memory.

Inventors

  • XIE NING

Assignees

  • 海光信息技术(成都)有限公司

Dates

Publication Date
20260512
Application Date
20260112

Claims (20)

  1. 1. A network controller, comprising: A shared buffer, configured to be coupled to at least two transmit queues, and configured to buffer transmitted messages of the at least two transmit queues; and the buffer memory management component is coupled with the shared buffer memory and is configured to manage the sent messages in the shared buffer memory according to index information of each sent message, wherein the index information comprises a queue identifier of the sending queue corresponding to the sent message, a message number of the sent message and a storage position of the sent message in the shared buffer memory.
  2. 2. The network controller according to claim 1, wherein the buffer management unit is specifically configured to perform, according to the index information and a feedback message of a receiver of the sent message, the operations of clearing the corresponding sent message from the shared buffer according to the feedback message, or searching the sent message indicated by the feedback message from the shared buffer so as to resend the message.
  3. 3. The network controller of claim 2, wherein the shared cache comprises a plurality of cache blocks; the cache management section includes: A cache state marking module configured to mark whether each cache block is occupied; a first indexing module configured to: the feedback information of the receiving side of the sent message is obtained, the feedback information comprises a first feedback information, the first feedback information carries a first queue identifier and a first message number, wherein the first queue identifier is a queue identifier of a sending queue corresponding to the successfully received sent message, and the first message number is a message number of the successfully received sent message; Searching a first target cache block in the shared cache according to the first feedback message and the index information, wherein the first target cache block comprises a cache block in which a queue identifier indicated by the index information is identical to the first queue identifier and a message number indicated by the index information is before the first message number or is equal to each sent message of the first message number; And setting the state corresponding to the first target cache block in the cache state marking module as an unoccupied state.
  4. 4. The network controller of claim 3 wherein the queue identification in the index information and the message number in the index information are stored in a first array, each array element of the first array corresponds to each cache block of the shared cache one by one, and wherein the queue identification and the message number of the sent message stored in each cache block are used as one array element of the first array.
  5. 5. The network controller of claim 2, wherein the shared cache comprises a plurality of cache blocks; the cache management section includes: A cache state marking module configured to mark whether each cache block is occupied; A second indexing module configured to: The feedback information of the receiving side of the sent message is obtained, the feedback information comprises a second feedback information, the second feedback information carries a second queue identifier and a second message number, wherein the second queue identifier is a queue identifier of a sending queue corresponding to the sent message with the receiving error, and the second message number is a message number of the sent message with the receiving error; Searching a second target cache block in the shared cache according to the second feedback message and the index information to resend the sent message stored in the second target storage module, wherein the second target cache block is the cache block in which a queue identifier indicated by the index information is the same as the second queue identifier, and a message number indicated by the index information is the same as the second message number of the sent message.
  6. 6. The network controller of claim 2, wherein the buffer management unit is further configured to determine that the transmitted message waits for a timeout if the feedback message of the receiver of the transmitted message is not received within a preset time period after the transmitted message is transmitted, and to search the transmitted message from the shared buffer according to index information of the transmitted message waiting for the timeout so as to retransmit the message.
  7. 7. The network controller of claim 5 or 6, wherein each of the transmitted messages occupies N cache blocks, the index message is stored in a second array, and each array element of the second array includes a queue identifier of a transmission queue corresponding to the transmitted message, a message number of the transmitted message, and an address of each cache block sequentially occupied by the transmitted message in the shared cache according to a storage order, so as to resend the transmitted message according to the storage order, wherein N is a positive integer.
  8. 8. The network controller according to claim 5 or 6, wherein each of the transmitted messages occupies N cache blocks, the index message is stored in a third array and a fourth array, wherein the number of array elements of the third array is smaller than the number of cache blocks included in the shared cache, the number of array elements of the fourth array is equal to the number of cache blocks included in the shared cache, and each array element of the fourth array corresponds to each cache block of the shared cache one by one; the third array is configured to store a queue identifier of a transmission queue corresponding to each transmitted message, a message number of the transmitted message, and a head pointer of the transmitted message, where the head pointer is configured to indicate an address of a first cache block occupied by the transmitted message; The fourth array is configured to store addresses of other cache blocks occupied by each of the sent packets except the first cache block, so that the cache management unit performs the following operations: According to the corresponding relation between each array element in the fourth array and each cache block of the shared cache, searching the array element corresponding to the first cache block in each array element of the fourth array to obtain a first element, wherein the first element comprises the address of the next cache block occupied by the sent message, the next cache block is used as a new first cache block, searching the array element corresponding to the first cache block in each array element of the fourth array is continued, and iterating circularly until the address of the next cache block in the searched first element is a preset ending mark, so as to resend the sent message according to the searched sequence of each cache block.
  9. 9. The network controller according to any one of claim 1 to 4, wherein, The buffer management part is further configured to store a transmitted message to be stored into the shared buffer and record the index information of the stored transmitted message in response to receiving a new message storage request from any one of the transmission queues.
  10. 10. The network controller of claim 9, wherein the network controller, The shared cache includes a plurality of cache blocks; the cache management unit includes: A cache state marking module configured to mark whether each cache block is occupied; And the storage control module is configured to respond to receiving a new message storage request from any one of the sending queues, store the sent message to be stored into at least one unoccupied cache block in the shared cache, and mark the state of the at least one cache block in the cache state marking module as occupied.
  11. 11. The network controller of claim 10, wherein the network controller, The storage control module includes: the available cache statistics sub-module is configured to traverse the occupied state of each cache block in the cache state marking module, acquire the address of the cache block with the unoccupied state, and acquire the available cache address; And the storing control sub-module is configured to respond to a new message storing request from any sending queue, store the sent message to be stored into at least one cache block of the shared cache according to the available cache address counted by the available cache counting sub-module, and mark the state of the at least one cache block in the cache state marking module as occupied.
  12. 12. The network controller of claim 11, wherein the network controller, The available cache statistics sub-module comprises at least two statistics units, wherein each statistics unit is configured to traverse the occupied state of a cache block in a preset range in the cache state marking module, the preset ranges corresponding to the statistics units are different from each other, the union of the preset ranges corresponding to the statistics units comprises all the cache blocks marked in the cache state marking module, and the statistics units are traversed in parallel.
  13. 13. The network controller of claim 11, wherein the storage control module further comprises: An available buffer address storage sub-module configured to store each of the available buffer addresses counted by the available buffer counting sub-module based on a first-in first-out buffer area, wherein the first-in first-out buffer area is used for writing a plurality of available buffer addresses in one clock cycle and/or reading out 0 or 1 available buffer addresses in one clock cycle; The storing control submodule is configured to respond to receiving a new message storing request from any sending queue, read out one available cache address from the available cache address storing submodule every clock period, store the sent message to be stored into the cache block corresponding to the available cache address, and mark the state of the cache block in the cache state marking module as occupied until the sent message to be stored is stored into the shared cache.
  14. 14. The network controller of claim 1, wherein the sent message comprises a remote direct memory access protocol based sent message.
  15. 15. The network controller of claim 1, wherein the shared cache is of a size less than K BW And RTT, wherein K is the number of queues of the at least two sending queues, BW is the bandwidth of a physical link between the network controller and a receiver of the sent message, and RTT is the signal round trip delay between the network controller and the receiver.
  16. 16. An electronic device, characterized in that it comprises a network controller according to any of the preceding claims 1 to 15.
  17. 17. A method for managing messages, comprising: caching the transmitted messages of at least two transmission queues through a shared cache; And managing the sent messages in the shared cache according to index information of each sent message, wherein the index information comprises a queue identifier of the sending queue corresponding to the sent message, a message number of the sent message and a storage position of the sent message in the shared cache.
  18. 18. The method according to claim 17, wherein said managing said sent messages in said shared buffer according to index information of each of said sent messages comprises: according to the index information and the feedback information of the receiving side of the sent message, executing the following operations: The corresponding sent message is cleared from the shared cache according to the feedback message, Or alternatively And searching the sent message indicated by the feedback message from the shared buffer memory so as to resend the message.
  19. 19. The message management method according to claim 18, wherein the shared cache comprises a plurality of cache blocks; The step of clearing the corresponding sent message from the shared buffer according to the feedback message includes: the feedback information of the receiving side of the sent message is obtained, the feedback information comprises a first feedback information, the first feedback information carries a first queue identifier and a first message number, wherein the first queue identifier is a queue identifier of a sending queue corresponding to the successfully received sent message, and the first message number is a message number of the successfully received sent message; Searching a first target cache block in the shared cache according to the first feedback message and the index information, wherein the first target cache block comprises a cache block in which a queue identifier indicated by the index information is identical to the first queue identifier and a message number indicated by the index information is before the first message number or is equal to each sent message of the first message number; And setting the state corresponding to the first target cache block as an unoccupied state.
  20. 20. The method according to claim 19, wherein the queue identifier in the index information and the message number in the index information are stored in a first array, each array element of the first array corresponds to each cache block of the shared cache one by one, and wherein the queue identifier and the message number of the transmitted message stored in each cache block are used as one array element of the first array.

Description

Network controller, electronic equipment and message management method Technical Field The present invention relates to the field of chip data transmission technologies, and in particular, to a network controller, an electronic device, and a message management method. Background The training process of artificial intelligence large models typically includes multiple parallel computing modes, such as data parallelism, pipeline parallelism, and tensor parallelism, often requiring frequent collective communication operations between multiple coprocessors (e.g., GPUs). In view of this, RDMA (Remote Direct Memory Access ) technology has evolved. RDMA is derived from DMA, allowing user programs to bypass the operating system kernel, directly interact with the network card for network communication, thereby providing high bandwidth and low latency. In an RDMA communication network, multiple Queue combinations (QP) may be provided in each RDMA device, each including a transmit Queue and a receive Queue. After RDMA establishes a connection, each send queue (or receive queue) binds with a unique one of the remote nodes 'receive queues (or send queues) to form a QP connection, which may enable end-to-end lossless transmission through the transport layer's message sequence number and the message acknowledgement and retransmission mechanism. In order to realize reliable delivery of the message, a retransmission Buffer (retransmission Buffer) may be created at the transmitting end, for temporarily storing the message that has been sent but not yet acknowledged by the receiving end, so as to enable fast retransmission of the message after receiving the reception error message returned by the receiving end or when the retransmission timer is overtime. In the related art, in the case where multiple QP connections communicate over the same physical link, one BW needs to be created at the sender for each QP connectionRTT-sized Retry Buffer to cope with a traffic model in which only one QP connection is activated for a certain period of time, where BW represents the bandwidth of the physical link between the sender and the receiver, and RTT represents the end-to-end signal round trip delay. The total size of the Retry Buffer in the whole system is equal to KBWRTT, where K is the number of QP connections. However, although there are multiple QP connections logically, the physical link between the sending end and the receiving end can only carry the bandwidth of BW, so the total Retry Buffer size of the sending end is far greater than the bandwidth that can be carried by the physical link, and a large amount of buffering is often in an idle state, resulting in a large resource waste. Disclosure of Invention In view of the above, the embodiments of the present invention provide a network controller, an electronic device, and a message management method, which can effectively save cache resources and improve cache utilization. In a first aspect, an embodiment of the present invention provides a network controller, including: A shared buffer, configured to be coupled to at least two transmit queues, and configured to buffer transmitted messages of the at least two transmit queues; and the buffer memory management component is coupled with the shared buffer memory and is configured to manage the sent messages in the shared buffer memory according to index information of each sent message, wherein the index information comprises a queue identifier of the sending queue corresponding to the sent message, a message number of the sent message and a storage position of the sent message in the shared buffer memory. In one embodiment, the buffer management unit is specifically configured to perform, according to the index information and a feedback message of a receiver of the sent message, the operations of clearing the corresponding sent message from the shared buffer according to the feedback message, or searching the sent message indicated by the feedback message from the shared buffer so as to resend the message. In one embodiment, the shared cache includes a plurality of cache blocks; the cache management section includes: A cache state marking module configured to mark whether each cache block is occupied; a first indexing module configured to: the feedback information of the receiving side of the sent message is obtained, the feedback information comprises a first feedback information, the first feedback information carries a first queue identifier and a first message number, wherein the first queue identifier is a queue identifier of a sending queue corresponding to the successfully received sent message, and the first message number is a message number of the successfully received sent message; Searching a first target cache block in the shared cache according to the first feedback message and the index information, wherein the first target cache block comprises a cache block in which a queue identifier indicated by the index info