CN-120785779-B - Model data processing method and device, storage medium and electronic equipment
Abstract
The application discloses a method and a device for processing model data, a storage medium and electronic equipment. The method comprises the steps of responding to a model training request sent by a server, training based on local training data and initial model parameters to obtain local model parameters and link state information, sending the link state information to the server, receiving a maximum communication loop determined by the server according to the link state information, wherein the maximum communication loop represents a communication path with the maximum value of M, the number of times of passing through each client in M clients meets the preset number of times condition, M is a positive integer, sequentially obtaining the local model parameters corresponding to other clients except the local model parameters according to the maximum communication loop, and aggregating to generate global model parameters. The method and the system solve the technical problem that the efficiency of model training is difficult to guarantee due to unstable communication between the client and the server.
Inventors
- FAN XIAOKUN
- LI SHUCHEN
- LIU YAN
- SUN JUNSHUAI
- HE QI
- GENG YUNXIN
Assignees
- 中国星网网络创新研究院有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20250903
Claims (10)
- 1. A method of processing model data, comprising: Training based on local training data and initial model parameters in response to a model training request sent by a server to obtain local model parameters and link state information, and sending the link state information to the server, wherein the model training request comprises the initial model parameters; receiving a maximum communication loop determined by the server according to the link state information, wherein the maximum communication loop represents a communication path with the maximum value of M, and the number of times of passing through each client in M clients meets a preset number of times condition; The global model parameters are generated by sequentially acquiring the local model parameters corresponding to other clients except the local model parameters according to the maximum communication loop and aggregating the local model parameters, wherein the global model parameters are generated by dividing the local model parameters into M blocks based on the number of the clients in the maximum communication loop, sequentially carrying out distributed accumulation and synchronization on the M blocks according to the maximum communication loop, and generating the global model parameters by sequentially summing the received blocks sent by the previous client and the self blocks based on the maximum communication loop, sending the updated blocks to the next client, repeatedly executing M-1 times to obtain intermediate model parameters, and repeatedly executing M-1 times by using the received intermediate model parameters sent by the previous client to replace the intermediate model parameters stored by the local model parameters based on the maximum communication loop.
- 2. The method according to claim 1, wherein the method further comprises: Determining a model metric parameter threshold based on the model training request; Determining current model metric parameters after training based on the local training data and the initial model parameters; determining whether to end training based on a comparison of the model metric parameter threshold and the current model metric parameter; And sending the global model parameters to the server under the condition that the training is required to be ended.
- 3. A method of processing model data, comprising: Sending a model training request to a client, wherein the model training request comprises initial model parameters; Receiving link state information sent by the client, wherein the link state information represents information obtained by training the client based on local training data and the initial model parameters; Determining a maximum communication loop based on the link state information, wherein the maximum communication loop represents a communication path with the maximum value of M, wherein the number of times of passing through each client in M clients meets a preset number of times condition, and M is a positive integer; receiving global model parameters uploaded by the client, wherein the global model parameters represent model parameters generated by the client by sequentially acquiring local model parameters corresponding to other clients except the client according to the maximum communication loop and aggregating the local model parameters; The method comprises the steps of constructing a network topology diagram based on a predetermined client set, determining connection states among clients in the network topology diagram based on the link state information, determining the maximum communication loop according to the network topology diagram and the connection states, wherein the method comprises the steps of randomly selecting clients from the network topology diagram as initial nodes and constructing a path record structure, recursively exploring each neighbor node which is not visited based on the link state information from the initial nodes, wherein each explored neighbor node is regarded as a current node, determining whether the current node meets a termination condition by marking the current node as an accessed state and adding the current node to the tail end of the path record structure, re-exploring nodes marked as the non-accessed state in adjacent nodes of the current node as the current node until traversing is completed when the current node does not meet the termination condition, and generating the maximum communication loop when the current node meets the termination condition.
- 4. A method according to claim 3, wherein said determining whether said current node satisfies a termination condition by marking said current node as an accessed state and joining an end of said path record structure comprises: Generating an effective loop under the condition that the total number of nodes in the path record structure is equal to the total number of nodes of the network topological graph and the head and tail nodes are overlapped; Determining that the termination condition is satisfied in the case that the length of the effective loop exceeds a historical maximum value, and updating the effective loop to the maximum communication loop; wherein when the neighboring nodes of the current node are both marked as either the accessed state or the valid loop is not found, the current node is removed from the end of the path record structure and the access state of the current node is reset.
- 5. A method according to claim 3, wherein said determining whether said current node satisfies a termination condition by marking said current node as an accessed state and joining an end of said path record structure comprises: Generating an effective loop under the condition that the total number of nodes in the path record structure is equal to the total number of nodes of the network topological graph and the head and tail nodes are overlapped; determining the sum of weights of all sides in the effective loop when the length of the effective loop exceeds a historical maximum value, wherein the weights of all sides are used for indicating the communication rate of all sides; Determining that the termination condition is met under the condition that the sum of the weights of all sides in the effective loop exceeds a historical maximum value, and updating the effective loop into the maximum communication loop; wherein when the neighboring nodes of the current node are both marked as either the accessed state or the valid loop is not found, the current node is removed from the end of the path record structure and the access state of the current node is reset.
- 6. A processing apparatus for model data, comprising: The training module is used for responding to a model training request sent by a server, training based on local training data and initial model parameters, obtaining local model parameters and link state information, and sending the link state information to the server, wherein the model training request comprises the initial model parameters; the first receiving module is used for receiving a maximum communication loop determined by the server according to the link state information, wherein the maximum communication loop represents a communication path with the maximum value of M, and the number of times of passing through each client in M clients meets the preset number of times condition; The generation module is used for sequentially acquiring the local model parameters corresponding to other clients except the maximum communication loop and carrying out aggregation to generate global model parameters, and comprises dividing the local model parameters into M blocks based on the number of the clients in the maximum communication loop, sequentially carrying out distributed accumulation and synchronization on the M blocks according to the maximum communication loop to generate the global model parameters, wherein the generation module comprises the steps of sequentially summing the received blocks sent by the former client and the blocks of the global model parameters based on the maximum communication loop, sending the updated blocks to the latter client, repeatedly executing M-1 times to obtain intermediate model parameters, and repeatedly executing M-1 times to generate the global model parameters based on the fact that the maximum communication loop uses the received intermediate model parameters sent by the former client to replace the stored intermediate model parameters of the local model parameters.
- 7. A processing apparatus for model data, comprising: The system comprises a sending module, a receiving module and a receiving module, wherein the sending module is used for sending a model training request to a client, and the model training request comprises initial model parameters; The second receiving module is used for receiving link state information sent by the client, wherein the link state information represents information obtained by training the client based on local training data and the initial model parameters; The determining module is used for determining a maximum communication loop based on the link state information, wherein the maximum communication loop represents a communication path with the maximum value of M, and the number of times of passing through each client in M clients meets the preset number of times condition; The third receiving module is used for receiving the global model parameters uploaded by the client, wherein the global model parameters represent model parameters generated by the client by sequentially acquiring local model parameters corresponding to other clients except the client according to the maximum communication loop; The determining module is further used for constructing a network topology graph based on a predetermined client set, determining connection states among clients in the network topology graph based on the link state information, determining the maximum communication loop according to the network topology graph and the connection states, wherein the determining module comprises the steps of randomly selecting clients from the network topology graph as initial nodes and constructing a path record structure, recursively exploring each non-visited neighbor node based on the link state information from the initial nodes, enabling each explored neighbor node to be regarded as a current node, determining whether the current node meets a termination condition by marking the current node as a visited state and adding the current node to the tail end of the path record structure, re-exploring nodes marked as non-visited states in adjacent nodes of the current node as the current node until traversing is completed when the current node does not meet the termination condition, and generating the maximum communication loop when the current node meets the termination condition.
- 8. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program, wherein the computer program, when executed by a processor, implements the steps of the method of any of claims 1 to 2 or the steps of the method of any of claims 3 to 5.
- 9. A computer program product comprising computer program/instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 2 or the steps of the method of any one of claims 3 to 5.
- 10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the method of any one of claims 1 to 2 or the steps of the method of any one of claims 3 to 5.
Description
Model data processing method and device, storage medium and electronic equipment Technical Field The present application relates to the field of computers, and in particular, to a method and apparatus for processing model data, a storage medium, and an electronic device. Background The satellite internet, which is a core component of the global communication infrastructure, has a low-orbit satellite constellation that accumulates a large amount of user location, communication traffic, and satellite state data through multi-star networking. The data has important value for promoting intelligent transformation of the satellite Internet, but intelligent processing is realized on the premise of ensuring the privacy and safety of the data. Federal learning techniques were introduced in this field to allow multiple satellite clients to cooperatively train artificial intelligence models without sharing raw data. The related technology generally adopts a centralized federal learning framework that after satellites are used as clients and a local data training model is utilized, local model parameters are uploaded to a ground center server for aggregation, and a global model is generated and then issued to each satellite. The architecture has the remarkable defects that firstly, a large number of model parameters are transmitted to a single central node in a concentrated manner in a short time, network congestion is easy to cause and exceeds the limit of satellite-ground link bandwidth, secondly, the architecture is limited by a short communication window of satellites and ground stations, a large amount of time is spent on waiting for all satellites to finish model uploading, and finally, a central server has single-point fault risk and the whole training is interrupted once the central server fails. Although some of the improved schemes attempt to optimize efficiency through hierarchical aggregation or asynchronous mechanisms, the intermittent interruption problem of the inter-satellite link caused by high-speed motion of satellites still cannot be solved, so that model interaction and aggregation process are unreliable. Therefore, the low-orbit satellite constellation federal learning has the technical problem that the training efficiency of model parameters is difficult to guarantee due to dynamic instability of inter-satellite links. In view of the above problems, no effective solution has been proposed at present. Disclosure of Invention The embodiment of the application provides a method and a device for processing model data, a storage medium and electronic equipment, which are used for at least solving the technical problem that the efficiency of model training is difficult to guarantee due to unstable communication between a client and a server. According to one aspect of the embodiment of the application, a processing method of model data is provided, which comprises the steps of responding to a model training request sent by a server, training based on local training data and initial model parameters to obtain local model parameters and link state information, sending the link state information to the server, wherein the model training request comprises the initial model parameters, receiving a maximum communication loop determined by the server according to the link state information, wherein the maximum communication loop represents a communication path with the maximum value of M and the number of times of passing through each of M clients meets a preset number of times condition, acquiring the local model parameters corresponding to other clients except the local communication loop according to the maximum communication loop in sequence, and aggregating the local model parameters according to the maximum communication loop to generate global model parameters. According to another aspect of the embodiment of the application, a processing method of model data is provided, which comprises the steps of sending a model training request to a client, receiving link state information sent by the client, wherein the link state information represents information obtained by training the client based on the local training data and the initial model parameters, determining a maximum communication loop based on the link state information, wherein the maximum communication loop represents a communication path with the maximum value of M and the number of times of passing through each client in M clients meets a preset number of times condition, M is a positive integer, receiving global model parameters uploaded by the client, and the global model parameters represent model parameters generated by the client by sequentially acquiring local model parameters corresponding to other clients except the client according to the maximum communication loop. According to still another aspect of the embodiment of the application, a processing device of model data is provided, which comprises a training module, a first receivi