Search

CN-121979813-A - Data transmission method, data receiving method and data transmission system

CN121979813ACN 121979813 ACN121979813 ACN 121979813ACN-121979813-A

Abstract

The present disclosure provides a data transmission method, a data reception method, a data transmission system, an electronic device, a readable storage medium and a computer program product, and relates to the technical field of artificial intelligence such as a processor, an aggregate communication operation, a large model, and the like. According to the RDMA communication control plane unloading method and device, the RDMA communication control plane in the data sending process and the RDMA communication control plane in the data receiving process is unloaded to the transmission proxy unit, so that the data transmission delay in the data sending and data receiving process is effectively reduced.

Inventors

  • LI SHUKUN
  • ZHOU ZHONGLIANG
  • WANG PEILONG
  • SHAN QIANG
  • Miao mao

Assignees

  • 北京百度网讯科技有限公司

Dates

Publication Date
20260505
Application Date
20251219

Claims (20)

  1. 1. A data transmission method includes, in a data transmission system, After receiving a data transmission task, splitting the data transmission task to obtain at least one data transmission subtask; Aiming at a current data transmission subtask, acquiring data to be transmitted corresponding to the current data transmission subtask; Transmitting the data to be transmitted to a transmission data buffer in response to determining that the transmission data buffer meets a data receiving condition according to a head pointer of the transmission data buffer; and in response to determining that the data to be transmitted is transmitted, updating a tail pointer of the transmission data buffer according to the size of the data to be transmitted, so as to be used for polling by a transmission agent unit.
  2. 2. The method of claim 1, further comprising, And in response to determining that the head pointer of the transmission data cache is updated, executing the next data transmission subtask, and repeating the next data transmission subtask until the execution of all the data transmission subtasks is completed.
  3. 3. The method of claim 1, wherein the sending the data to be sent to the send data cache comprises: dividing the data to be transmitted into a plurality of data blocks with the same size; and transmitting the multiple data blocks obtained by segmentation to the transmission data cache.
  4. 4. The method of claim 1, wherein the determining that the transmit data buffer satisfies a data reception condition based on a head pointer of the transmit data buffer comprises: And determining that the sending data buffer meets a data receiving condition in response to the required storage space determined according to the head pointer of the sending data buffer and the size of the data to be sent not exceeding the residual storage space of the sending data buffer.
  5. 5. A data transmission method is applied to a transmission proxy unit, and comprises the following steps, After receiving a transmission proxy task corresponding to the data transmission task, polling a tail pointer of the transmission data buffer; Responding to the tail pointer of the sending data buffer to determine that the sending data buffer comprises data to be sent, and acquiring the data to be sent from the sending data buffer; and transmitting the data to be transmitted, and in response to determining that the data to be transmitted is transmitted, updating the head pointer of the transmission data cache according to the size of the data to be transmitted.
  6. 6. The method of claim 5, wherein the obtaining the data to be transmitted from the transmit data cache comprises: Copying the data in the transmission data cache in a direct memory access mode to acquire the data to be transmitted.
  7. 7. The method of claim 5, wherein the transmitting the data to be transmitted comprises: Converting the data to be sent based on the memory semantics into a data packet to be sent based on remote direct memory access semantics; And sending the data packet to be sent.
  8. 8. A data receiving method includes, in a first embodiment, After receiving a data receiving task, splitting the data receiving task to obtain at least one data receiving subtask; and in response to determining that the tail pointer of the received data cache is updated, acquiring data to be received from the received data cache.
  9. 9. The method of claim 8, further comprising, After the data to be received is processed according to the data receiving subtask corresponding to the data to be received, the head pointer of the received data cache is updated according to the size of the data to be received, so as to be used for polling by a transmission agent unit.
  10. 10. A data receiving method is applied to a transmission proxy unit, and comprises the following steps, After receiving a transmission agent subtask corresponding to a data receiving task, acquiring at least one data to be received, and polling a head pointer of a received data cache; In response to determining that the received data buffer meets a data receiving condition according to the head pointer of the received data buffer, sending current data to be received to the received data buffer; And in response to determining that the current data to be received is sent, updating the tail pointer of the received data cache according to the size of the current data to be received.
  11. 11. The method of claim 10, further comprising, And transmitting the next data to be received in response to the fact that the head pointer of the received data cache is updated, and repeating until the transmission of all the data to be received is completed.
  12. 12. The method of claim 10, wherein the determining that the received data buffer satisfies a data reception condition from a head pointer of the received data buffer comprises: And determining that the received data buffer meets a data receiving condition in response to the required storage space determined according to the head pointer of the received data buffer and the size of the data to be received not exceeding the residual storage space of the received data buffer.
  13. 13. The method of claim 10, wherein the acquiring at least one data to be received comprises: receiving at least one data packet to be received based on remote direct memory access semantics; and respectively converting the at least one data packet to be received into data to be received based on memory semantics, and acquiring the at least one data to be received.
  14. 14. The method of claim 10, wherein the sending the current data to be received to the received data cache comprises: And copying the current data to be received to the received data cache in a direct memory access mode.
  15. 15. A data transmission system comprises a task allocation unit, a transmission agent unit and a plurality of calculation units positioned in the same communication group; the task allocation unit is used for determining a data transmission task and a transmission agent task corresponding to the communication operation after receiving the communication operation; transmitting the data transmission task to a target computing unit of the plurality of computing units, and transmitting the transmission agent task to the transmission agent unit; the target computing unit comprises a transmission data buffer and a reception data buffer, and is used for executing the data transmission method according to any one of the claims 1-4 and the data reception method according to any one of the claims 8-9; the transfer agent unit comprises a hardware agent thread, a copy engine and an RDMA transfer engine for performing the data transmission method according to any of the preceding claims 5-7 and the data reception method according to any of the preceding claims 10-14.
  16. 16. The data transmission system according to claim 15, wherein the transmission agent unit, when executing the data transmission method according to any one of the above claims 5 to 7, is configured to poll the target computing unit for a tail pointer of a transmission data buffer; The copying engine is used for copying the data to be transmitted in the transmission data cache in a DMA mode; the RDMA transmission engine is used for sending the copied data to be sent.
  17. 17. The data transfer system of claim 15, wherein the transfer agent unit, when performing the data reception method of any of the above claims 10-14, is configured to obtain at least one data to be received by the RDMA transfer engine; The hardware agent thread is used for polling a head pointer of a received data cache of the target computing unit; the copying engine is used for copying the current data to be received to the received data cache in a DMA mode.
  18. 18. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-14.
  19. 19. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-14.
  20. 20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-14.

Description

Data transmission method, data receiving method and data transmission system Technical Field The present disclosure relates to the field of communications technologies, and in particular, to the field of artificial intelligence technologies for processors, collective communications operations, large models, and the like. A data transmission method, a data reception method, a data transmission system, an electronic device, a readable storage medium, and a computer program product are provided. Background Along with the continuous expansion of the large model scale, a single-card video memory can not accommodate a complete model and an intermediate state thereof, and multi-machine multi-card collaboration becomes a mainstream form of reasoning and training. However, multi-machine collaboration brings new communication performance bottlenecks, especially in reasoning business, communication delay directly affects overall response speed and service experience. The data transmission scheme for communication between the computing units commonly adopted in the prior art has the problems that the CPU participates in the scheme to reduce the transmission delay of data, occupy precious computing resources in the computing units and the like. Disclosure of Invention According to a first aspect of the present disclosure, a data transmission method is provided, which includes splitting a data transmission task after receiving the data transmission task to obtain at least one data transmission sub-task, acquiring data to be transmitted corresponding to the current data transmission sub-task for the current data transmission sub-task, transmitting the data to be transmitted to a transmission data buffer in response to determining that the transmission data buffer meets a data receiving condition according to a head pointer of the transmission data buffer, and updating a tail pointer of the transmission data buffer according to a size of the data to be transmitted in response to determining that the data to be transmitted is completely transmitted, so as to be used for polling by a transmission proxy unit. According to a second aspect of the present disclosure, there is provided a data transmission method applied to a transmission proxy unit, including polling a tail pointer of a transmission data buffer after receiving a transmission proxy task corresponding to a data transmission task, in response to determining that data to be transmitted is included in the transmission data buffer according to the tail pointer of the transmission data buffer, acquiring the data to be transmitted from the transmission data buffer, transmitting the data to be transmitted, and in response to determining that the data to be transmitted is completed to transmit, updating a head pointer of the transmission data buffer according to a size of the data to be transmitted. According to a third aspect of the present disclosure, there is provided a data receiving method, including splitting a data receiving task after receiving the data receiving task to obtain at least one data receiving sub-task, and acquiring data to be received from a received data buffer in response to determining that a tail pointer of the received data buffer is updated According to a fourth aspect of the present disclosure, there is provided a data transmitting method applied to a transmission proxy unit, including acquiring at least one data to be received after receiving a transmission proxy sub-task corresponding to a data receiving task, and polling a head pointer of a received data buffer, transmitting current data to be received to the received data buffer in response to determining that the received data buffer satisfies a data receiving condition according to the head pointer of the received data buffer, and updating a tail pointer of the received data buffer in response to determining that the current data to be received is completed to transmit according to a size of the current data to be received According to a fifth aspect of the present disclosure, there is provided a data transmission system, including a task allocation unit, a transmission proxy unit, and a plurality of computing units located in the same communication group, where the task allocation unit is configured to determine, after receiving a communication operation, a data transmission task and a transmission proxy task corresponding to the communication operation, send the data transmission task to a target computing unit of the plurality of computing units, and send the transmission proxy task to the transmission proxy unit, the target computing unit includes a send data cache and a receive data cache, and is configured to perform the above data sending method and the above data receiving method, and the transmission proxy unit includes a hardware proxy thread, a copy engine, and an RDMA transmission engine, and is configured to perform the above data sending method applied to the transmission pr