CN-121979653-A - Communication and data processing methods, apparatus, devices, media and products

CN121979653ACN 121979653 ACN121979653 ACN 121979653ACN-121979653-A

Abstract

The disclosure provides a communication and data processing method, a device, equipment, a medium and a product, relates to the technical field of artificial intelligence, and particularly relates to the technical fields of cloud computing, large models, computing power and the like. The communication method is applied to a CPU and comprises the steps of receiving a target expert index and data sent by a current GPU, wherein the target expert index is determined by the current GPU based on a token sequence, the data comprise hidden state data of all tokens in the token sequence, the data are stored in a memory of the CPU, a communication calculation process is executed based on the target expert index to obtain communication metadata, and the hidden state data of the cross-node tokens stored in the memory are sent to the cross-node GPU based on the communication metadata. The communication related processing can be unloaded from the GPU to the CPU, so that the resource utilization rate of each hardware is balanced, and the overall throughput is improved.

Inventors

SHAN QIANG
WANG WENBO

Assignees

北京百度网讯科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251209

Claims (13)

1. A communication method applied to a CPU, the method comprising: Receiving a target expert index and data sent by a current GPU, wherein the target expert index is determined by the current GPU based on a token sequence, and the data comprises hidden state data of all tokens in the token sequence; storing the data into a memory of the CPU; the communication metadata comprises target storage information and cross-node GPU information, wherein the target storage information is storage information of hidden state data of a cross-node token in the memory, and the target expert of the cross-node token comprises a cross-node expert which is deployed on the cross-node GPU; And based on the communication metadata, sending the hidden state data of the cross-node token stored in the memory to the cross-node GPU.
2. The method of claim 1, wherein the sending the hidden state data of the cross-node token stored in the memory to the cross-node GPU based on the communication metadata comprises: Creating a task unit, wherein the task unit comprises the target storage information and the information of the cross-node GPU; And responding to the task unit meeting a preset condition, sending a trigger instruction to a network card, so that the network card obtains the task unit based on the trigger instruction, obtains hidden state data of the cross-node token from the memory according to the task unit, and sends the hidden state data to the cross-node GPU.
3. The method of claim 2, wherein, The target expert index and the data are sent to the CPU by the current GPU through a first PCIe bus; the trigger instruction is sent to the network card by the CPU through a second PCIe bus; The hidden state data of the cross-node token is obtained from the memory of the CPU by the network card through the second PCIe bus.
4. The method of claim 2, wherein, The network card is an RNIC; the hidden state data of the cross-node token is sent to the cross-node GPU by the RNIC based on RDMA communication.
5. The method according to any one of claims 1 to 4, wherein, The CPU comprises a plurality of thread pools; the performing a communication calculation process based on the target expert index includes: dividing the target expert index into a plurality of groups; and (3) adopting each thread pool to respectively index each group of target experts and executing the communication calculation process in parallel.
6. A data processing method applied to a current GPU, the method comprising: Receiving a token sequence, wherein the token sequence comprises hidden state data of a plurality of tokens; Determining a target expert index corresponding to the token sequence; The target expert index and the data are sent to a CPU, so that the CPU executes a communication calculation process based on the target expert index, and hidden state data of the cross-node token stored in a memory are sent to a cross-node GPU according to communication metadata; The data comprises hidden state data of all the token in the token sequence; the target expert of the cross-node token includes a cross-node expert deployed on the cross-node GPU.
7. The method of claim 6, further comprising: Determining a local token in the token sequence, wherein a target expert corresponding to the local token comprises a local expert, and the local expert is deployed on the current GPU; and carrying out expert calculation on the local token by adopting the local expert.
8. The method of claim 6, further comprising: Determining a node token in the token sequence, wherein a target expert corresponding to the node token comprises a node expert, and the node expert is deployed on a node GPU; And sending the hidden state data of the node token to the node GPU through an intra-node communication bus, so that the node GPU adopts the node expert to carry out expert calculation on the node token.
9. A communication device for use in a CPU, the device comprising: The system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a target expert index and data sent by a current GPU, the target expert index is determined by the current GPU based on a token sequence, and the data comprises hidden state data of all tokens in the token sequence; The storage module is used for storing the data into the memory of the CPU; The communication metadata comprises target storage information and cross-node GPU information, wherein the target storage information is storage information of hidden state data of a cross-node token in the memory, and the target expert of the cross-node token comprises a cross-node expert which is deployed on the cross-node GPU; and the communication module is used for sending the hidden state data of the cross-node token stored in the memory to the cross-node GPU according to the communication metadata.
10. A data processing apparatus for use with a current GPU, the apparatus comprising: the receiving module is used for receiving a token sequence, wherein the token sequence comprises hidden state data of a plurality of tokens; the determining module is used for determining a target expert index corresponding to the token sequence; the sending module is used for sending the target expert index and the data to a CPU (Central processing Unit) so that the CPU executes a communication calculation process based on the target expert index and sends hidden state data of the cross-node token stored in the memory to the cross-node GPU according to communication metadata; The data comprises hidden state data of all the token in the token sequence; the target expert of the cross-node token includes a cross-node expert deployed on the cross-node GPU.
11. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.
13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-8.

Description

Communication and data processing methods, apparatus, devices, media and products Technical Field The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of cloud computing, large models, computing power and the like, and particularly relates to a communication and data processing method, device, equipment, medium and product. Background To improve the performance of large language models (Large Language Model, LLM), a hybrid expert (Mixture of Experts, moE) network may be introduced in LLM. Disclosure of Invention The present disclosure provides a communication and data processing method, apparatus, device, medium and article. According to one aspect of the disclosure, a communication method is provided and applied to a CPU, the method comprises the steps of receiving a target expert index and data sent by a current GPU, wherein the target expert index is determined by the current GPU based on a token sequence, the data comprise hidden state data of all tokens in the token sequence, the data are stored in a memory of the CPU, a communication calculation process is performed based on the target expert index to obtain communication metadata, the communication metadata comprise target storage information and information of a cross-node GPU, the target storage information is storage information of hidden state data of the cross-node tokens in the memory, the target expert of the cross-node tokens comprises a cross-node expert, the cross-node expert is deployed on the cross-node GPU, and the hidden state data of the cross-node tokens stored in the memory are sent to the cross-node GPU based on the communication metadata. According to another aspect of the disclosure, a data processing method is provided and applied to a current GPU, the method comprises the steps of receiving a token sequence, determining a target expert index corresponding to the token sequence, sending the target expert index and data to a CPU, enabling the CPU to execute a communication calculation process based on the target expert index, and sending hidden state data of a cross-node token stored in a memory to a cross-node GPU according to communication metadata, wherein the data comprises hidden state data of all the tokens in the token sequence, the target expert of the cross-node token comprises a cross-node expert, and the cross-node expert is deployed on the cross-node GPU. According to another aspect of the disclosure, a communication device is provided, and the communication device is applied to a CPU, and comprises a receiving module, a storage module and a calculation module, wherein the receiving module is used for receiving a target expert index and data sent by a current GPU, the target expert index is determined by the current GPU based on a token sequence, the data comprises hidden state data of all tokens in the token sequence, the storage module is used for storing the data into a memory of the CPU, the calculation module is used for executing a communication calculation process according to the target expert index to obtain communication metadata, the communication metadata comprises target storage information and information of a cross-node GPU, the target storage information is storage information of hidden state data of the cross-node token in the memory, the target expert of the cross-node token comprises a cross-node expert, the cross-node expert is deployed on the cross-node GPU, and the communication module is used for sending the hidden state data of the cross-node token stored in the memory to the cross-node GPU according to the communication metadata. According to another aspect of the disclosure, a data processing device is provided, and the device is applied to a current GPU, and comprises a receiving module, a determining module and a sending module, wherein the receiving module is used for receiving a token sequence, the token sequence comprises hidden state data of a plurality of tokens, the determining module is used for determining a target expert index corresponding to the token sequence, the sending module is used for sending the target expert index and data to a CPU, so that the CPU can execute a communication calculation process based on the target expert index and send hidden state data of a cross-node token stored in a memory to a cross-node GPU according to communication metadata, the data comprises hidden state data of all the tokens in the token sequence, and the target expert of the cross-node token comprises a cross-node expert which is deployed on the cross-node GPU. According to another aspect of the present disclosure there is provided an electronic device comprising at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects. According to