CN-122027393-A - Collective communication method, apparatus, storage medium, and computer program product
Abstract
The embodiment of the application provides a method, equipment, a storage medium and a computer program product for collective communication. For the case that the computing system comprises N computing nodes, the computing nodes comprise M computing units, the logic topology of the computing system is constructed, the logic topology comprises a first ring topology among N computing units belonging to different computing nodes and a second ring topology among M computing units belonging to the same computing node, two different transmission paths can be constructed based on the two different ring topologies, one of the two different transmission paths uses the connection of the first ring topology preferentially for data transmission, and the other uses the connection of the second ring topology preferentially for data transmission. The number of data channels adopting two different transmission paths is controlled, and data transmission is carried out on the data channels by adopting corresponding transmission paths, so that bandwidth resources inside the computing nodes and bandwidth resources among the computing nodes are fully utilized, and the bandwidth resource utilization is maximized.
Inventors
- LIU YAOZHONG
- ZHANG JUN
- DONG JIANBO
Assignees
- 云智能资产控股(新加坡)私人股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20241111
Claims (15)
- 1. A method of collective communication, characterized in that it is applied to a computing system, the computing system includes N computing nodes, the computing nodes include M computing units, M, N are all positive integers, the method includes: determining a logical topology established over a physical topology of the computing system, the logical topology comprising a first ring topology between N computing units belonging to different computing nodes and a second ring topology between M computing units belonging to the same computing node; According to bandwidth resources among computing nodes and bandwidth resources inside the computing nodes, K1 data channels suitable for a first transmission path and K2 data channels suitable for a second transmission path are created on the logic topology, wherein K1 and K2 are positive integers, the first transmission path is a path which preferentially uses connection in a second ring topology to perform data transmission among M x N computing units, and the second transmission path is a path which preferentially uses connection in the first ring topology to perform data transmission among M x N computing units; And adopting the first transmission path to perform data transmission among the M x N computing units in the K1 data channels, and adopting the second transmission path to perform data transmission among the M x N computing units in the K2 data channels.
- 2. The method of claim 1, wherein determining a logical topology established over a physical topology of the computing system comprises: Performing array arrangement on N x M computing units, wherein the array arrangement comprises N rows and M columns, the N rows correspond to N computing nodes, and the M columns correspond to M computing units in the same computing node; based on the physical topology of the computing system, connecting the M computing units in each row in sequence from head to tail to obtain N second ring topologies; based on the physical topology of the computing system, connecting N computing units in each column in sequence from head to tail to obtain M first ring topologies; wherein the N second ring topologies and the M first ring topologies form the logical topology.
- 3. The method of claim 1, wherein creating K1 data lanes for the first transmission path and K2 data lanes for the second transmission path on the logical topology based on bandwidth resources between the compute nodes and bandwidth resources internal to the compute nodes comprises: creating C data channels on the logic topology, wherein C is a positive integer; The method comprises the steps of determining the number K1 of data channels suitable for a first transmission path and the number K2 of data channels suitable for a second data transmission path by taking inter-node bandwidth and intra-node bandwidth consumed by C data channels as targets and respectively adapting to bandwidth resources between computing nodes and bandwidth resources inside the computing nodes, wherein K1+K2=C; the C data channels are divided into K1 data channels adapted to the first transmission path and K2 data channels adapted to the second transmission path.
- 4. A method according to claim 3, characterized in that the determining of the number K1 of data channels for the first transmission path and the number K2 of data channels for the second transmission path with the inter-node bandwidth and intra-node bandwidth consumed by the C data channels respectively being targeted for the adaptation of the bandwidth resources between the computing nodes and the bandwidth resources inside the computing nodes, comprises: Taking the number K1 of the data channels as a quantity to be calculated, and calculating the inter-node bandwidth and the intra-node bandwidth consumed by the C data channels according to the first connection number, the second connection number, the third connection number and the fourth connection number; The first connection number and the second connection number refer to a connection number belonging to a first ring topology and a connection number belonging to a second ring topology in the first transmission path; the third connection number and the fourth connection number refer to a connection number belonging to a first ring topology and a connection number belonging to a second ring topology in the second transmission path; taking the minimum absolute value of the difference between the first bandwidth ratio and the second bandwidth ratio as an optimization target, solving the data channel number K1, and taking the C-K1 as the data channel number K2; The first bandwidth ratio is the ratio between the inter-node bandwidth consumed by the C data channels and the intra-node bandwidth, and the second bandwidth ratio is the ratio between the bandwidth resources between the computing nodes and the bandwidth resources inside the computing nodes.
- 5. A method according to claim 3, wherein dividing the C data lanes into K1 data lanes for the first transmission path and K2 data lanes for the second transmission path comprises: Randomly selecting K1 data channels from the C data channels as data channels suitable for the first transmission path, and taking the rest K2 data channels as data channels suitable for the second transmission path; Or alternatively According to the size of the identifier of the C data channels, selecting the K1 data channels with the minimum or maximum identifier as the data channels suitable for the first transmission path, and taking the remaining K2 data channels as the data channels suitable for the second transmission path.
- 6. The method as recited in claim 1, further comprising: acquiring predefined first path description information and second path description information; and determining the first transmission path and the second transmission path according to the logic topology, the first path description information and the second path description information.
- 7. The method according to any one of claims 1-6, wherein in case of an array arrangement of N x M computing units, the first transmission path comprises other connections than the head-to-tail connection in the first second ring topology and other connections than the head-to-tail connection in the respective first ring topology; The head-to-tail connection refers to connection between a first computing unit and a tail computing unit in a first second ring topology or each first ring topology.
- 8. The method according to any one of claims 1-6, wherein in case of an array arrangement of N x M computing units, the second transmission path comprises other connections than the head-to-tail connection in the first ring topology and other connections than the head-to-tail connection in the respective second ring topologies; The head-to-tail connection refers to connection between a first computing unit and a tail computing unit in a first ring topology or each second ring topology.
- 9. The method of claim 7, wherein using the first transmission path in the K1 data channels for data transmission between M x N computing units comprises: for any one of the K1 data channels, control data is transmitted from a first computing unit in the first second ring topology to a subsequent computing unit in the first second ring topology to a tail computing unit in the first second ring topology, and And controlling the data to be transmitted from a first computing unit in each first ring topology to a subsequent computing unit in each first ring topology to a tail computing unit in each first ring topology.
- 10. The method of claim 8, wherein using the second transmission path for data transmission between M x N computing units in the K2 data channels comprises: For any one of the K2 data channels, control data is transmitted from a first computing unit in the first ring topology to a subsequent computing unit in the first ring topology to a tail computing unit in the first ring topology, and And controlling the data to be transmitted from a first computing unit in each second ring topology to a subsequent computing unit in each second ring topology to a tail computing unit in each second ring topology.
- 11. The method of claim 10, wherein at least some of the N computing nodes further comprise a CPU, wherein the aggregate communication method is performed by the CPU on any of the computing nodes including the CPU, or wherein the aggregate communication method is performed by a device other than the N computing nodes.
- 12. An electronic device comprising a memory and a processor, the memory having stored therein a computer program, the processor being coupled to the memory for executing the computer program for implementing the steps of the method of any of claims 1-11.
- 13. The electronic device of claim 12, wherein the electronic device is implemented as one computing node in the computing system, the electronic device further comprising M computing units, M being a positive integer.
- 14. A computer readable storage medium storing a computer program, which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1-11.
- 15. A computer program product comprising computer programs/instructions which, when executed by a processor, cause the processor to carry out the steps of the method of any of claims 1-11.
Description
Collective communication method, apparatus, storage medium, and computer program product Technical Field The present application relates to the field of communications technologies, and in particular, to a collective communications method, apparatus, storage medium, and computer program product. Background With the increasing size of large language models (large language model, LLM), training large language models requires more and more training data and computing resources. These training data and computing resources are often distributed across multiple computing nodes for distributed training. To improve the efficiency of distributed training, aggregate communication, which is a communication mode for efficient data exchange among multiple computing nodes in a distributed computing environment, is employed. For example, parameter synchronization, gradient aggregation, etc. in the distributed model training may all use a collective communication manner. At present, ring topology is generally adopted in collective communication, namely, a plurality of computing nodes form a ring communication path connected end to end through a network card, data is transmitted among the computing nodes and among a plurality of computing units inside the computing nodes along the ring communication path, and data synchronization is rapidly completed. However, the overall communication performance of the collective communication scheme is greatly affected by the bandwidth resources between nodes and the bandwidth resources in the nodes, and whatever bandwidth resources become bottlenecks can lead to the reduction of the overall communication performance and the waste of the bandwidth resources. Disclosure of Invention Aspects of the present application provide a collective communication method, apparatus, storage medium, and computer program product for improving overall communication performance and reducing waste of bandwidth resources. The embodiment of the application provides a set communication method which is applied to a computing system, wherein the computing system comprises N computing nodes, each computing node comprises M computing units, M, N is a positive integer, the method comprises the steps of determining a logic topology established on the physical topology of the computing system, the logic topology comprises a first ring topology among N computing units belonging to different computing nodes and a second ring topology among M computing units belonging to the same computing node, creating K1 data channels suitable for a first transmission path and K2 data channels suitable for a second transmission path on the logic topology according to bandwidth resources among the computing nodes and bandwidth resources inside the computing nodes, wherein the K1 data channels and the K2 data channels are positive integers, the first transmission path is a path which is used for preferentially using connection in the second ring topology to conduct data transmission among the M computing units, the second transmission path is a path which is used for preferentially using connection in the first ring topology to conduct data transmission among the M computing units, the K1 data channels is used for adopting the first transmission path to conduct data transmission among the M computing units, and the K2 data channels is used for preferentially using connection in the second ring topology to conduct data transmission among the M computing units. The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor is coupled with the memory and is used for executing the computer program to realize the steps in the collective communication method. The embodiments of the present application also provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement steps in a collective communication method. Embodiments of the present application also provide a computer program product comprising computer programs/instructions which, when executed by a processor, cause the processor to implement steps in an aggregate communication method. In the embodiment of the application, for the case that the computing system comprises N computing nodes, the computing nodes comprise M computing units, the logic topology of the computing system is constructed, the logic topology comprises a first ring topology among the N computing units belonging to different computing nodes and a second ring topology among the M computing units belonging to the same computing node, two different transmission paths can be constructed based on two different ring topologies, one of the two different transmission paths preferentially uses the connection of the first ring topology for data transmission, and the other preferentially uses the connection of the second ring topology for