CN-121996394-A - Data channel updating method and related equipment
Abstract
The embodiment of the application discloses a data channel updating method and related equipment, belongs to the technical field of computers, and is used for avoiding the situation of frequent chain breakage and chain establishment caused by cluster scale expansion in a DC communication mode, solving the problem of time delay increase caused by the increase of the number of chain establishment and chain breakage due to the increase of the computing cluster scale, limiting the expansion of the computing cluster scale and further improving the connection expandability of reliable connection transmission in RDMA. The method comprises the steps of obtaining a connection state of Dynamic Connection Initiator (DCI) corresponding to a process, wherein the connection state comprises a chain building state, the DCI and a receiving end are in the chain building state after chain building, the process sends data to the receiving end corresponding to the DCI through the DCI in the chain building state, and the DCI corresponding to the process is updated according to the connection state of the DCI corresponding to the process.
Inventors
- SUN XIN
- PENG BIYU
- FANG TUO
Assignees
- 成都华为技术有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20241031
Claims (12)
- 1. A method for updating a data channel, the method comprising: acquiring a connection state of Dynamic Connection Initiator (DCI) corresponding to a process, wherein the connection state comprises a link establishment state, the DCI and a receiving end are in the link establishment state after establishing a link, and the process sends data to the receiving end corresponding to the DCI through the DCI in the link establishment state; and updating the DCI corresponding to the process according to the connection state of the DCI corresponding to the process.
- 2. The method of claim 1, wherein the updating the DCI corresponding to the process according to the connection state of the DCI corresponding to the process comprises: Acquiring a first quantity, wherein the first quantity is the quantity of DCIs in the link establishment state in the DCIs corresponding to the process; if the first number exceeds a first threshold, creating a first DCI; the connection state of the first DCI is a broken link state, and the broken link state indicates that the DCI has no receiving end connected correspondingly.
- 3. The method according to claim 2, wherein the method further comprises: acquiring a second number, wherein the second number is the number of DCIs corresponding to each process in a node where the process is located; If the second number exceeds an upper threshold, acquiring a first time delay of the process for transmitting data, wherein the first time delay is the time delay of the process for transmitting data before the first DCI is created; Acquiring a second time delay of the process sending data, wherein the second time delay is the time delay of the process sending data after the first DCI is created; and if the second time delay is larger than the first time delay, deleting the first DCI.
- 4. The method according to any one of claims 1-3, wherein the updating the DCI corresponding to the process according to the connection state of the DCI corresponding to the process includes: Acquiring a third quantity, wherein the third quantity is the quantity of the DCI in a broken link state in the DCI corresponding to the process, and the broken link state represents a receiving end of the DCI without corresponding connection; and deleting the second DCI if the third quantity exceeds a second threshold value, wherein the connection state of the second DCI is the broken link state.
- 5. The method according to any one of claims 1-4, wherein the DCI corresponding to the procedure includes a target DCI, the target DCI and a target receiving end are in the link establishment state, and after data is transmitted, a link is not broken.
- 6. The method according to any one of claims 1-5, further comprising: Acquiring a first transmission task corresponding to the process; Establishing a chain through the updated DCI and a receiving end corresponding to the process, so that the updated DCI is in the chain establishment state; transmitting data to a receiving end corresponding to the process through the updated DCI; and receiving a response packet returned by the receiving end corresponding to the process.
- 7. The method of claim 6, wherein the method further comprises: And disconnecting the updated DCI from the receiving end corresponding to the process, so that the DCI is in the broken link state.
- 8. The method according to any one of claims 1-7, wherein before the obtaining the connection state of the dynamic connection initiator DCI corresponding to the procedure, the method further comprises: And creating the DCI corresponding to the process according to the operation history record corresponding to the process.
- 9. A data channel updating apparatus, the apparatus comprising: The system comprises an acquisition module, a receiving module and a processing module, wherein the acquisition module is used for acquiring a connection state of Dynamic Connection Initiator (DCI) corresponding to a process, the connection state comprises a chain building state, the DCI and a receiving end are in the chain building state after chain building, and the process sends data to the receiving end corresponding to the DCI through the DCI in the chain building state; And a processing module, configured to update the DCI corresponding to the process according to the connection state of the DCI corresponding to the process.
- 10. A computing node comprising a processor and a memory for storing the processor-executable instructions, the processor configured to execute the instructions to cause the server to perform the data channel update method of any of claims 1-8.
- 11. A computer program product comprising instructions which, when executed by a server, cause the server to perform the data channel updating method of any of claims 1-8.
- 12. A computer readable storage medium, characterized in that the computer readable storage medium comprises computer program instructions which, when executed by a server, perform the data channel updating method according to any of claims 1-8.
Description
Data channel updating method and related equipment Technical Field The embodiment of the application relates to the technical field of computers, in particular to a data channel updating method and related equipment. Background Remote direct memory access (remote direct memory access, RDMA) is one of the techniques widely used in the computing field at present. RDMA technology allows computers to directly access remote computer memory without the need for an operating system or central processing unit (center processing unit, CPU) involved with the remote computer, featuring high bandwidth, low latency, etc. RDMA supports a variety of communication service protocols including reliable connection transfer, unreliable datagrams, and the like. Wherein the reliable connection transfer (Reliable Connection, RC) protocol is a Queue Pair (QP) based communication model. In a computing cluster, each QP in a process establishes a connection with another remote process QP. In this case, relevant information for QP would be recorded in the queue pair context (Queue Pair Context, QPC). RC is widely used because of its advantages such as supporting RDMA multi-communication semantics and reliable transmission. However, due to the limited cache capacity on the net card, as the size of the computing clusters increases, a large number of QPC cache misses (ache miss) may be caused, resulting in a need to obtain the corresponding QPC from the host memory, which seriously affects the communication performance. Reliable connection transfer in RDMA faces serious connection scalability issues. Disclosure of Invention The embodiment of the application provides a data channel updating method and related equipment, which can improve the connection expandability of reliable connection transmission in RDMA. In a first aspect, a data channel updating method is provided, the method includes obtaining a connection state of a dynamic connection initiator DCI corresponding to a process, wherein the connection state includes a link establishment state, the DCI and a receiving end are in the link establishment state after link establishment, the process sends data to the receiving end corresponding to the DCI through the DCI in the link establishment state, and the DCI corresponding to the process is updated according to the connection state of the DCI corresponding to the process. From the above, according to the connection state of the process corresponding to the DCI, the update process corresponds to the DCI, and the dynamic adjustment process corresponds to the number of the DCI, so that the situation that the link is frequently broken due to the expansion of the computing cluster size in the DC communication mode can be avoided, and the delay caused by the link breaking and the link building can be effectively reduced. The problem that the increase of the scale of the computing cluster is limited due to the increase of the time delay caused by the increase of the number of times of chain establishment and chain breakage is solved, and the connection expandability of the reliable connection transmission mode of RDMA is improved. In one possible implementation manner, a first number is obtained, wherein the first number is the number of DCIs in a link establishment state in DCIs corresponding to a process, if the first number exceeds a first threshold value, a first DCI is created, the connection state of the first DCI is a link breaking state, and the link breaking state indicates that the DCI has no corresponding connected receiving end. As can be seen from the above, when the number of DCIs in the link establishment state in the DCI corresponding to the process exceeds the second threshold, the process continues to establish connection with different receiving ends and send data, and the DCI in the link establishment state needs to be broken, so that the DCI in the link establishment state can continue to establish connection with different receiving ends and send data. At this time, the first DCI is created, so that the number of times of DCI chain breakage and chain establishment corresponding to the process can be effectively reduced, thereby reducing the time delay. In one possible implementation manner, the second number is obtained, the second number is the number of DCIs corresponding to each process in a computing node where the process is located, if the second number exceeds an upper threshold, the first delay of the process sending data is obtained, the first delay is the delay of the process sending data before the first DCI is created, the second delay of the process sending data is obtained, the second delay is the delay of the process sending data after the first DCI is created, and if the second delay is greater than the first delay, the first DCI is deleted. From the above, the first DCI is created in the process operation, so that the number of DCIs corresponding to the process is increased, the number of broken links