EP-4738117-A1 - COMMUNICATION METHOD AND RELATED DEVICE

EP4738117A1EP 4738117 A1EP4738117 A1EP 4738117A1EP-4738117-A1

Abstract

This application provides a communication method and a related device, applied to the communication field. The method includes: obtaining a first sequence number of a first network interface card in a server and a second sequence number of the first network interface card on a communication plane; computing a phase deflection value of the communication plane based on the first sequence number; computing a first communication partner sequence number based on the second sequence number, where the first communication partner sequence number is a sequence number, on the communication plane, of a communication object of the first network interface card in a K th step, and the first communication partner sequence number corresponds to a second network interface card; combining the first communication partner sequence number with the phase deflection value, to obtain a second communication partner sequence number, where the second communication partner sequence number is a sequence number of a third network interface card on the communication plane; and replacing the second network interface card with the third network interface card as the communication object of the first network interface card in the K th step. In this application, a process of constructing phase deflection of communication traffic is introduced, so that a phase difference is generated between inter-node traffic on different communication planes, thereby balancing traffic on an entire network, and improving communication performance.

Inventors

CAI, Xin
HUANG, ZHIYUAN

Assignees

Huawei Technologies Co., Ltd.

Dates

Publication Date: 20260506
Application Date: 20240313

Claims (17)

A communication method, comprising: obtaining a first sequence number, wherein the first sequence number is a sequence number of a first network interface card in a target server, the target server comprises N network interface cards, each of the N network interface cards corresponds to a communication plane, the first network interface card corresponds to a target communication plane, the target communication plane comprises M network interface cards, the M network interface cards belong to different servers, sequence numbers of the M network interface cards in respective servers are the same as the first sequence number, the M network interface cards communicate with each other based on a target communication algorithm, and the target communication algorithm comprises a plurality of steps; obtaining a second sequence number, wherein the second sequence number is a sequence number of the first network interface card on the target communication plane; computing a phase deflection value of the target communication plane based on the first sequence number, a quantity of network interface cards comprised in the target server, and a quantity of network interface cards comprised on the target communication plane; computing a first communication partner sequence number based on the second sequence number, wherein the first communication partner sequence number is a sequence number, on the target communication plane, of a communication object of the first network interface card in a K th step, and the first communication partner sequence number corresponds to a second network interface card; combining the first communication partner sequence number with the phase deflection value, to obtain a second communication partner sequence number, wherein the second communication partner sequence number is a sequence number of a third network interface card on the target communication plane; and replacing the second network interface card with the third network interface card as the communication object of the first network interface card in the K th step.
The method according to claim 1, wherein the target communication algorithm is a halving-doubling algorithm, and the plurality of steps are log 2 M steps.
The method according to claim 2, wherein before the obtaining a first sequence number, the method further comprises: performing collective communication on the N network interface cards by using a reduce-scatter operator; and after the replacing the second network interface card with the third network interface card as the communication object of the first network interface card in the K th step, the method further comprises: performing collective communication on the N network interface cards by using an all-gather operator.
The method according to claim 2 or 3, wherein the computing a first communication partner sequence number based on the second sequence number comprises: performing computation according to the following formula: First communication partner sequence number = Second sequence number ^ (1 << ((K + First sequence number) % log 2 M)), wherein ^ represents a bitwise exclusive OR operation, << represents a left shift operation, and % represents a modulo operation.
The method according to claim 1, wherein the target communication algorithm is a pairwise algorithm, and the plurality of steps are M-1 steps.
The method according to claim 5, wherein before the obtaining a first sequence number, the method further comprises: performing collective communication on the N network interface cards by using an all-to-all operator.
The method according to claim 5 or 6, wherein the first communication partner sequence number comprises a third communication partner sequence number and a fourth communication partner sequence number, and the computing a first communication partner sequence number based on the second sequence number comprises: computing the third communication partner sequence number and the fourth communication partner sequence number based on the second sequence number, wherein the third communication partner sequence number and the fourth communication partner sequence number are computed according to the following formulas: Third communication partner sequence number = (Second sequence number + M-K) % M, and Fourth communication partner sequence number = (Second sequence number + K) % M, wherein % represents a modulo operation.
A communication device, comprising: an obtaining unit, configured to obtain a first sequence number, wherein the first sequence number is a sequence number of a first network interface card in a target server, the target server comprises N network interface cards, each of the N network interface cards corresponds to a communication plane, the first network interface card corresponds to a target communication plane, the target communication plane comprises M network interface cards, the M network interface cards belong to different servers, sequence numbers of the M network interface cards in respective servers are the same as the first sequence number, the M network interface cards communicate with each other based on a target communication algorithm, and the target communication algorithm comprises a plurality of steps, wherein the obtaining unit is further configured to obtain a second sequence number, wherein the second sequence number is a sequence number of the first network interface card on the target communication plane; a computation unit, configured to compute a phase deflection value of the target communication plane based on the first sequence number, a quantity of network interface cards comprised in the target server, and a quantity of network interface cards comprised on the target communication plane, wherein the computation unit is further configured to compute a first communication partner sequence number based on the second sequence number, wherein the first communication partner sequence number is a sequence number, on the target communication plane, of a communication object of the first network interface card in a K th step, and the first communication partner sequence number corresponds to a second network interface card; a combination unit, configured to combine the first communication partner sequence number with the phase deflection value, to obtain a second communication partner sequence number, wherein the second communication partner sequence number is a sequence number of a third network interface card on the target communication plane; and a replacement unit, configured to replace the second network interface card with the third network interface card as the communication object of the first network interface card in the K th step.
The communication device according to claim 8, wherein the target communication algorithm is a halving-doubling algorithm, and the plurality of steps are log 2 M steps.
The communication device according to claim 9, wherein the communication device further comprises: a communication unit, configured to perform collective communication on the N network interface cards by using a reduce-scatter operator, wherein the communication unit is further configured to: perform collective communication on the N network interface cards by using an all-gather operator.
The communication device according to claim 9 or 10, wherein the computation unit is specifically configured to: compute the first communication partner sequence number according to the following formula: First communication partner sequence number = Second sequence number ^ (1 << ((K + First sequence number) % log 2 M)), wherein ^ represents a bitwise exclusive OR operation, << represents a left shift operation, and % represents a modulo operation.
The communication device according to claim 8, wherein the target communication algorithm is a pairwise algorithm, and the plurality of steps are M-1 steps.
The communication device according to claim 12, wherein the communication unit is further configured to: perform collective communication on the N network interface cards by using an all-to-all operator.
The communication device according to claim 12 or 13, wherein the first communication partner sequence number comprises a third communication partner sequence number and a fourth communication partner sequence number, and the computation unit is specifically configured to: compute the third communication partner sequence number and the fourth communication partner sequence number based on the second sequence number, wherein the third communication partner sequence number and the fourth communication partner sequence number are computed according to the following formulas: Third communication partner sequence number = (Second sequence number + M-K) % M, and Fourth communication partner sequence number = (Second sequence number + K) % M, wherein % represents a modulo operation.
A communication device, comprising a processor and a memory, wherein the memory is configured to store instructions; and the processor is configured to execute the instructions stored in the memory, to implement the method according to any one of claims 1 to 7.
A computer-readable storage medium, storing a computer program, wherein when the computer program is executed by one or more processors, the method according to any one of claims 1 to 7 is implemented.
A computer program product comprising instructions, wherein when the computer program product is run on a computer, the computer is enabled to perform the method according to any one of claims 1 to 7.

Description

This application claims priority to Chinese Patent Application No. CN202310957997.9, filed with the China National Intellectual Property Administration on July 31, 2023 and entitled "COMMUNICATION METHOD AND RELATED DEVICE", which is incorporated herein by reference in its entirety. TECHNICAL FIELD This application relates to the field of communication technologies, and in particular, to a communication method and a related device. BACKGROUND In an AI scenario, a computing volume is increasing, and a requirement on a computing cluster networking scale is also increasing. Currently, a fat-tree (fat-tree) topology is the mainstream computing networking form in the market. However, when the fat-tree is applied to an ultra-large-scale networking solution, a quantity of fat-tree layers increases. In other words, a large quantity of switches for communication at a core layer need to be added, resulting in high networking costs. In an ultra-large-scale networking scenario, mesh (Mesh) networking has far better scalability and lower networking costs than the fat-tree. Therefore, the mesh networking is an effective alternative. The mesh networking is a type of networking with a mesh-like networking topology, including full mesh (Full Mesh) networking and dragonfly (Dragonfly) networking. In actual application, it is found that in a case of parallel communication of a plurality of network interface cards, symmetric collective communication exhibits longer communication duration in the mesh networking than in the fat-tree topology. For example, it is found, through actual tests, that compared with a fat-tree topology with 8 servers and 64 network interface cards (each server has 8 interface cards), in dragonfly networking of a same scale, communication duration of an all-reduce (all-reduce) operator in the symmetric collective communication increases by more than 50% and communication duration of an all-to-all (All-to-all) operator in the symmetric collective communication increases by more than 100%. It can be learned that, in a mesh networking scenario, performance of the symmetric collective communication deteriorates severely, and an innovative solution for improvement is urgently needed. SUMMARY This application provides a communication method and a related device. This can reduce a probability of inter-node link congestion and improve communication performance. According to a first aspect of this application, a communication method is provided, and may be applied to a communication device. The method includes: obtaining a first sequence number, where the first sequence number is a sequence number of a first network interface card in a target server, the target server includes N network interface cards, each of the N network interface cards corresponds to a communication plane, the first network interface card corresponds to a target communication plane, the target communication plane includes M network interface cards, the M network interface cards belong to different servers, sequence numbers of the M network interface cards in respective servers are the same as the first sequence number, the M network interface cards communicate with each other based on a target communication algorithm, and the target communication algorithm includes a plurality of steps; obtaining a second sequence number, where the second sequence number is a sequence number of the first network interface card on the target communication plane; computing a phase deflection value of the target communication plane based on the first sequence number, a quantity of network interface cards included in the target server, and a quantity of network interface cards included on the target communication plane; computing a first communication partner sequence number based on the second sequence number, where the first communication partner sequence number is a sequence number, on the target communication plane, of a communication object of the first network interface card in a Kth step, and the first communication partner sequence number corresponds to a second network interface card; combining the first communication partner sequence number with the phase deflection value, to obtain a second communication partner sequence number, where the second communication partner sequence number is a sequence number of a third network interface card on the target communication plane; and replacing the second network interface card with the third network interface card as the communication object of the first network interface card in the Kth step. The N network interface cards of the target server each correspond to one communication plane. Therefore, the quantity of network interface cards included in the target server is a quantity of communication planes. A plurality of communication planes may perform parallel communication. The first network interface card corresponds to the target communication plane. The sequence numbers, in respective servers, of the netwo