CN-122027617-A - Large model file distribution acceleration method and system based on stateless peer-to-peer network

CN122027617ACN 122027617 ACN122027617 ACN 122027617ACN-122027617-A

Abstract

The invention belongs to the technical field of machine learning/large language models, and particularly relates to a large model file distribution acceleration method and system based on a stateless peer-to-peer network. The method comprises the steps of establishing a consistent hash ring as a route through decentralizing a protocol synchronous state, triggering a collaborative cache preheating mode when a request is judged to enter a cluster for the first time, organizing the cluster by a first node to carry out distributed caching and downloading of files, triggering a parallel downloading acceleration mode when the request is judged not to be the first time, and directly enabling a subsequent node to obtain data from the cluster cache node in parallel. The method aims to solve the problem that the conventional large model distribution method cannot realize the control of the extremely distributing performance and the extremely bandwidth cost on the premise of ensuring the protocol compatibility and the automatic operation and maintenance. The invention realizes the extremely acceleration of increasing the downloading speed by several times to ten times through parallel transmission, reduces the flow consumption by more than 90 percent through one-time source returning and cluster sharing, and realizes the distribution of large-model high-efficiency and low-cost files.

Inventors

LI YAZHOU
CAI CHENGHANG
LIU JUNCHENG

Assignees

北京硅基流动科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260121

Claims (10)

1. A stateless peer-to-peer network based large model file distribution acceleration method, applied to a distributed cluster comprising a plurality of service nodes, the method comprising: Each service node synchronizes cluster member states through a decentralization communication protocol, and independently builds a consistency hash ring locally based on the current cluster member states to establish a dynamic route mapping relation, wherein the consistency hash ring maps physical nodes participating in data service to an annular space; For the situation that the service node receives a request of a target model file of a client for entering a cluster state for the first time, the service node logically divides the target large model file into a plurality of data segments according to the route mapping relation and stores the data segments in a distributed manner on a plurality of corresponding service nodes in the cluster, so that all the data segments of the target large model file form a complete distributed cache copy in the cluster; And for the situation that the service node receives the request of the target model file of the client to enter the cluster state for the non-first time, the service node analyzes the downloading request to determine the corresponding data segment, and routes the request of each data segment to the corresponding service node of the cached data segment corresponding to the corresponding data segment based on the routing mapping relation, and data is obtained in parallel through the intra-cluster network to finish downloading.
2. The method of claim 1, wherein the step of logically dividing the target large model file into a plurality of data segments and distributively routing download requests for each data segment to a corresponding service node in the cluster comprises: The method comprises the steps of receiving a download request initiated by a first service node, analyzing the download request initiated by the first service node, extracting file identification and byte range information, mapping the byte range information into corresponding data segment identifiers based on preset data segment size parameters, combining the file identification with the data segment identifiers to generate data segment routing keys uniquely corresponding to data segments, wherein the data segment identifiers comprise data segment numbers which are obtained based on the ratio of byte offset to data segment size; Querying a locally maintained consistency hash ring based on the generated routing key of each data segment, and determining a target service node responsible for caching each data segment, wherein the consistency hash ring is dynamically constructed based on the current cluster member state; And for the data segment routed to the peer node, the current node forwards the downloading request to the corresponding peer node, and the peer node downloads the corresponding data segment from the upstream source and caches the data segment to the local.
3. The method of claim 2, wherein the step of querying the consistent hash ring based on the data segment routing key comprises: Calculating a hash value of the data segment routing key; on the consistent hash ring, starting from a position corresponding to the hash value, searching a first virtual node in a clockwise direction; And determining the physical service node associated with the first virtual node as a target service node responsible for caching the data segment.
4. The method according to claim 1, wherein the step of routing the request of each data segment to the corresponding service node of the cached data segment corresponding to the request based on the routing mapping relationship, and obtaining the data in parallel through the intra-cluster network, and completing the accelerated downloading includes: generating a routing key of the corresponding data segment based on the data segment; Inputting the routing key into a consistency hash ring maintained locally for inquiry, and determining a target service node according to an inquiry result; If the target service node is self, checking a local cache, and directly reading if hit; if the target service node is a peer node, the request is forwarded to the peer node, after the peer node receives the request, the local cache is checked, the target service node is directly read if hit, and the target service node is obtained from an upstream source if miss.
5. The method of claim 2, wherein the step of determining the position of the substrate comprises, The method further comprises the steps of analyzing the request and judging whether an agent source identifier exists in the head of the request when the service node receives the HTTP range request of the client side for the target model file, judging that the request for the target model file enters the cluster state for the first time if the agent source identifier does not exist, and judging that the request for the target model file does not enter the cluster state for the first time if the agent source identifier does not exist; The step of forwarding the request to the peer node by the current node comprises the steps of setting a proxy source identifier to a request header of the current node before forwarding, terminating a forwarding flow to process the request locally by the peer node if the proxy source identifier is detected to be contained in the request header after the peer node receives the request, and The method comprises the steps that a current node forwards a request to a peer node, and further comprises the step that before forwarding, a buffer control identifier is set in a response header of the current node, and when the peer node returns request data to a current service node, the current service node detects that the response header contains the buffer control identifier, and then received data is forbidden to be buffered locally.
6. The method as recited in claim 2, further comprising: Triggering automatic degradation if a communication fault is detected before response data starts to be returned to a client in the process of forwarding the request to the peer node, wherein the communication fault at least comprises one of network connection rejection, transport layer security protocol certificate verification failure, domain name resolution error and connection timeout; The automatic degradation step comprises the steps of stopping the forwarding process to the peer node, removing the proxy source identification in the request header, modifying the target address of the request into an upstream source server address, directly reinitiating the request to the upstream source server by the current node, and returning the acquired data to the client.
7. The method of claim 1, wherein the step of each service node synchronizing cluster member states via a decentralized communication protocol comprises: Each service node periodically sends heartbeat messages to other nodes selected randomly through a Gossip protocol, wherein the heartbeat messages carry the survival state of the sending node and known part of cluster member information; And the receiving node merges the received member information, updates the cluster member view maintained locally, and realizes final consistency in a propagation mode.
8. The method of claim 7, wherein the step of each service node synchronizing cluster member states via a decentralized communication protocol further comprises: when the service node fails to receive the heartbeat information of the peer node within the preset time, marking the heartbeat information as suspected failure and transmitting the suspected failure in the cluster through a Gossip protocol; After the cluster achieves node failure consensus through multiple rounds of information exchange and indirect detection, each surviving service node removes the failed node from the cluster member view and triggers the reconstruction of the consistent hash ring.
9. The method of claim 1, wherein each service node synchronizes cluster member states through a decentralised communication protocol and is divided into seed nodes and regular nodes in the step of independently constructing a consistent hash ring locally based on current cluster member states; The seed node is configured not to participate in data slicing, and the metadata of the seed node is provided with a fake participating slicing mark for providing a stable cluster guiding entrance and participating in management and propagation of cluster member states; the conventional node is configured to participate in data slicing, and the metadata of the conventional node is provided with a participation slicing mark which is true and is used for participating in management of cluster member states and carrying data caching and request processing as a physical node of data service; wherein, when the service node constructs the consistent hash ring in a local stateless manner, only the regular node with the participation fragment identification as true is selected to be mapped on the annular space based on the cluster member state.
10. A large model file distribution acceleration system comprising a plurality of nodes, the system being configured to perform the stateless peer-to-peer network based large model file distribution acceleration method of any one of claims 1 to 9.

Description

Large model file distribution acceleration method and system based on stateless peer-to-peer network Technical Field The invention belongs to the technical field of machine learning/large language models, and particularly relates to a large model file distribution acceleration method and system based on a stateless peer-to-peer network. Background Along with the rapid development of artificial intelligence technology, the parameters of large-scale pre-training models such as large language models and the like are often up to hundreds of gigabytes or even hundreds of gigabytes, and the model files are huge in volume. In the scenes of clustered training, batch reasoning or service elastic deployment, how to distribute the Pang Ran large model files to hundreds of thousands of computing nodes efficiently, reliably and at low cost has become a key bottleneck for restricting the efficiency of the AI infrastructure. Currently, existing large model delivery methods can be broadly divided into two categories, including a centralized Content Delivery Network (CDN), and classical peer-to-peer (P2P) file sharing technology. In particular, although a centralized Content Delivery Network (CDN) and its caching scheme are popular, the hierarchical edge caching architecture can generate a "back source storm" problem due to cache misses when it is faced to large-scale concurrent downloading of the same huge file. For example, when hundreds of training nodes simultaneously request a 130GB large model, if an edge cache misses, the source station will instantaneously bear an egress traffic pressure of up to 13TB, which is very likely to cause bandwidth bottlenecks, network congestion, and huge cost overhead. The classical peer-to-peer (P2P) file sharing technology, such as BitTorrent and its derivative IPFS, has the advantage of decentralization, but the design is not generated for dynamic HTTP service in modern cloud native environment, relies on complex stateful maintenance (such as Tracker server, DHT, torrent metadata file), in the elastic cluster where nodes dynamically join and leave frequently, the state synchronization overhead is huge, the consistency and instantaneity of data positioning are difficult to ensure, and the mode of "pre-splitting files and re-sharing" is not enough to meet the standard HTTP RANGE request protocol of acquiring file fragments on demand commonly adopted by AI tool chains (such as transformers library of Hugging Face, dataLoader of PyTorch), so that the forced adaptation needs to add heavy intermediate layers, and the transparency and convenience of use are damaged. In addition, the traditional centralized agent or the simple P2P scheme has the defects in the aspect of automatic operation and maintenance, relies on a node list which is statically configured or manually maintained, has to be manually intervened and restarted for service when the cluster needs elastic capacity expansion, and also has the defects of a rapid and automatic fault detection and rejection mechanism when the node fails, so that the service availability is reduced, and the operation and maintenance complexity and cost are greatly increased. In view of this, the present invention has been made. Disclosure of Invention The invention aims to solve the problem that the conventional large model distribution method can not realize the control of the extremely distribution performance and the extremely bandwidth cost on the premise of ensuring the protocol compatibility and the automatic operation and maintenance. In order to achieve the above object, the present invention provides a method for accelerating distribution of large model files based on stateless peer-to-peer network, applied to a distributed cluster including a plurality of service nodes, the method comprising: Each service node synchronizes cluster member states through a decentralization communication protocol, and independently builds a consistency hash ring locally based on the current cluster member states to establish a dynamic route mapping relation, wherein the consistency hash ring maps physical nodes participating in data service to an annular space; For the situation that the service node receives a request of a target model file of a client for entering a cluster state for the first time, the service node logically divides the target large model file into a plurality of data segments according to the route mapping relation and stores the data segments in a distributed manner on a plurality of corresponding service nodes in the cluster, so that all the data segments of the target large model file form a complete distributed cache copy in the cluster; And for the situation that the service node receives the request of the target model file of the client to enter the cluster state for the non-first time, the service node analyzes the downloading request to determine the corresponding data segment, and routes the request of each data segment to the corr