Search

CN-122019456-A - Communication method, communication apparatus, electronic device, computer-readable storage medium, and computer program product

CN122019456ACN 122019456 ACN122019456 ACN 122019456ACN-122019456-A

Abstract

The application provides a communication method, a device, electronic equipment, a computer readable storage medium and a computer program product, wherein the method comprises the steps of distributing data to be processed to a plurality of target computing cards, each target computing card comprises a plurality of computing core grains and at least one communication core grain, each computing core grain is distributed with one data slice in the data to be processed, determining the target core grain corresponding to the target computing card from at least one communication core grain for each target computing card, constructing a first communication network based on the computing core grains and the target core grains, controlling each computing core grain to carry out protocol processing on each data slice through the first communication network to obtain a first processing result of the target computing card, constructing a second communication network based on each target core grain, controlling each target core grain to carry out global protocol processing on the first processing result of each target computing card through the second communication network to obtain a second processing result. By the method and the device, the collection communication efficiency between the computing cards can be improved.

Inventors

  • LIU XIANBIN

Assignees

  • 上海东方算芯科技有限公司

Dates

Publication Date
20260512
Application Date
20260415

Claims (12)

  1. 1. A method of communication, the method comprising: in response to receiving data to be processed, distributing the data to be processed to a plurality of target computing cards, wherein each target computing card comprises a plurality of computing core grains and at least one communication core grain, each computing core grain is distributed with one data slice in the data to be processed, and the communication core grains have an inter-card communication function; for each target computing card, determining a target core particle corresponding to the target computing card from the at least one communication core particle, and constructing a first communication network based on each computing core particle and the target core particle; Each computing core particle is controlled to carry out protocol processing on each data fragment through the first communication network, a first processing result of the target computing card is obtained, and the first processing result is sent to the target core particle corresponding to the target computing card through the first communication network; constructing a second communication network based on each target core particle, controlling each target core particle to pass through the second communication network, and performing global protocol processing on the first processing result of each target computing card to obtain a second processing result; And for each target core particle, controlling the target core particle to broadcast the second processing result to each calculation core particle in a target calculation card corresponding to the target core particle through the first communication network.
  2. 2. The method of claim 1, wherein the determining, from the at least one communication core, a target core corresponding to the target computing card comprises: aiming at each communication core particle, acquiring communication performance indexes between the communication core particle and communication core particles in other target computing cards; Determining performance evaluation data of the communication core grains based on the communication performance indexes, and sequencing the communication core grains according to the sequence from good to bad of the performance evaluation data to obtain a communication core grain sequence; And determining the first K communication core grains in the communication core grain sequence as target core grains corresponding to the target computing card, wherein K is a positive integer.
  3. 3. The method of claim 1, wherein said constructing a first communication network based on each of said computational core and said target core comprises: Acquiring a physical interconnection structure between each calculated core particle and the target core particle; determining a link performance index between each of the calculated core grains and the target core grain based on the physical interconnect structure; Determining a target logical topology for connecting each of the compute core kernels and the target core kernels based on the link performance indicators; The first communication network is generated based on the target logical topology.
  4. 4. The method according to claim 1, wherein the data fragments comprise a plurality of data blocks, and the controlling each computing core particle to perform a protocol processing on each data fragment through the first communication network to obtain a first processing result of the target computing card includes: Determining, for each of the computational cores, a target computational core adjacent to the computational core based on a topology of the first communication network; Determining a target data block to be processed by the computing core particle based on a plurality of data blocks corresponding to the computing core particle and a plurality of data blocks corresponding to the target computing core particle, and controlling the computing core particle to execute aggregation operation on the target data block to obtain a first aggregation result of the computing core particle; And combining a plurality of first polymerization results to obtain the first processing result.
  5. 5. The method according to claim 2, wherein the sending, via the first communication network, the first processing result to the target core corresponding to the target computing card includes: dividing the first processing result into K sub-processing results; And sending each sub-processing result to each target core particle, wherein the target core particles are in one-to-one correspondence with the sub-processing results.
  6. 6. The method of claim 1, wherein said constructing a second communication network based on each of said target kernels comprises: Acquiring inter-card link bandwidths among the target core grains; Determining a global interconnection topology for connecting each of the target cores based on the inter-card link bandwidth; the second communication network is generated based on the global interconnection topology.
  7. 7. The method of claim 1, wherein controlling each of the target kernels to perform global protocol processing on the first processing result of each of the target computing cards through the second communication network to obtain a second processing result comprises: Determining a data interaction path between each of the target kernels based on a global interconnection topology used to generate the second communication network; and controlling each target core particle to execute global aggregation operation on the first processing result of each target computing card based on the data interaction path to obtain the second processing result.
  8. 8. The method according to any one of claims 1 to 7, further comprising: Acquiring performance data of each target core particle in the second communication network; under the condition that the performance data meets the preset abnormal condition, re-determining new target core grains from the target computing card where the target core grains are located; Reconstructing the first communication network and the second communication network based on the new target core particle.
  9. 9. A communication device, the device comprising: The data distribution module is used for responding to the received data to be processed and distributing the data to be processed to a plurality of target computing cards, each target computing card comprises a plurality of computing core grains and at least one communication core grain, each computing core grain is distributed with one data slice in the data to be processed, and the communication core grain has an inter-card communication function; The core particle determining module is used for determining a target core particle corresponding to the target computing card from the at least one communication core particle aiming at each target computing card, and constructing a first communication network based on each computing core particle and the target core particle; the first processing module is used for controlling each computing core particle to carry out reduction processing on each data fragment through the first communication network to obtain a first processing result of the target computing card, and sending the first processing result to the target core particle corresponding to the target computing card through the first communication network; the second processing module is used for constructing a second communication network based on each target core particle, controlling each target core particle to pass through the second communication network, and performing global protocol processing on the first processing result of each target computing card to obtain a second processing result; and the data broadcasting module is used for controlling the target core particles to pass through the first communication network aiming at each target core particle and broadcasting the second processing result to each calculation core particle in the target calculation card corresponding to the target core particle.
  10. 10. An electronic device, the electronic device comprising: A memory for storing computer executable instructions or computer programs; A processor for implementing the communication method of any of claims 1 to 8 when executing computer executable instructions or computer programs stored in the memory.
  11. 11. A computer-readable storage medium storing computer-executable instructions or a computer program, which when executed by a processor implements the communication method according to any one of claims 1 to 8.
  12. 12. A computer program product comprising computer-executable instructions or a computer program, which, when executed by a processor, implements the communication method according to any of claims 1 to 8.

Description

Communication method, communication apparatus, electronic device, computer-readable storage medium, and computer program product Technical Field The present application relates to the field of semiconductor technology, and in particular, to a communication method, apparatus, electronic device, computer readable storage medium, and computer program product. Background With the dramatic expansion of artificial intelligence models and high performance computing applications, the demands on graphics processor (Graphics Processing Unit, GPU) computing power continue to increase. In order to solve the above-mentioned problems, a Multi-Chip Module (MCM) technology is adopted, and a plurality of smaller-scale computing chips are integrated into a computing card through advanced packaging technologies such as a silicon interposer, and this architecture brings new challenges for collective communication (such as global protocol) in large-scale distributed training. In the related art, the inter-card link between each computing card is utilized for collective communication, so that the utilization rate of the high-speed intra-card link is insufficient, thereby resulting in the inefficiency of communication between the computing cards. Disclosure of Invention Embodiments of the present application provide a communication method, apparatus, electronic device, computer-readable storage medium, and computer program product, capable of improving aggregate communication efficiency between computing cards. The technical scheme of the embodiment of the application is realized as follows: the embodiment of the application provides a communication method, which comprises the following steps: in response to receiving data to be processed, distributing the data to be processed to a plurality of target computing cards, wherein each target computing card comprises a plurality of computing core grains and at least one communication core grain, each computing core grain is distributed with one data slice in the data to be processed, and the communication core grains have an inter-card communication function; for each target computing card, determining a target core particle corresponding to the target computing card from the at least one communication core particle, and constructing a first communication network based on each computing core particle and the target core particle; Each computing core particle is controlled to carry out protocol processing on each data fragment through the first communication network, a first processing result of the target computing card is obtained, and the first processing result is sent to the target core particle corresponding to the target computing card through the first communication network; constructing a second communication network based on each target core particle, controlling each target core particle to pass through the second communication network, and performing global protocol processing on the first processing result of each target computing card to obtain a second processing result; And for each target core particle, controlling the target core particle to broadcast the second processing result to each calculation core particle in a target calculation card corresponding to the target core particle through the first communication network. An embodiment of the present application provides a communication apparatus including: The data distribution module is used for responding to the received data to be processed and distributing the data to be processed to a plurality of target computing cards, each target computing card comprises a plurality of computing core grains and at least one communication core grain, each computing core grain is distributed with one data slice in the data to be processed, and the communication core grain has an inter-card communication function; The core particle determining module is used for determining a target core particle corresponding to the target computing card from the at least one communication core particle aiming at each target computing card, and constructing a first communication network based on each computing core particle and the target core particle; the first processing module is used for controlling each computing core particle to carry out reduction processing on each data fragment through the first communication network to obtain a first processing result of the target computing card, and sending the first processing result to the target core particle corresponding to the target computing card through the first communication network; the second processing module is used for constructing a second communication network based on each target core particle, controlling each target core particle to pass through the second communication network, and performing global protocol processing on the first processing result of each target computing card to obtain a second processing result; and the data broadcasting module is used for controlling the