CN-115907041-B - Model training method and device

CN115907041BCN 115907041 BCN115907041 BCN 115907041BCN-115907041-B

Abstract

The application discloses a model training method which is applied to the field of federal learning and comprises the steps of obtaining a plurality of first gradients and a plurality of second gradients, wherein the first gradients are gradients corresponding to a plurality of first parameters in a target model, the second gradients are gradients corresponding to a plurality of second parameters in the target model, the first parameters are updated in the last iteration of federal learning, the second parameters are not updated in the last iteration of federal learning, partial gradients are selected from the first gradients and the second gradients, the partial gradients are used for updating the target model in the current iteration of federal learning, and the updated information of the target model is transmitted to a plurality of first devices, wherein the first devices belong to a plurality of terminals. In the application, the server only selects and updates the numerical value of part of parameters in the target model each time, so that the gradient transmission quantity of the server to the terminal can be effectively reduced.

Inventors

ZHANG YU
LU JIAXUN
SHAO YUNFENG
XU HONG

Assignees

华为技术有限公司

Dates

Publication Date: 20260508
Application Date: 20221102

Claims (20)

1. A model training method applied to a server, the server in communication with a plurality of terminals, the method comprising: Acquiring a plurality of first gradients and a plurality of second gradients, wherein the plurality of first gradients are gradients corresponding to a plurality of first parameters in a target model, the plurality of second gradients are gradients corresponding to a plurality of second parameters in the target model, the plurality of first parameters are updated in a previous iteration of federal learning, and the plurality of second parameters are not updated in the previous iteration of federal learning; selecting a partial gradient from the plurality of first gradients and the plurality of second gradients, the partial gradient for updating the target model in a current round of iteration of federal learning; And transmitting the updated information of the target model to a plurality of first devices, wherein the plurality of first devices belong to the plurality of terminals.
2. The method of claim 1, wherein the partial gradient is a largest of the plurality of first gradients and the plurality of second gradients.
3. The method according to claim 1 or 2, wherein the plurality of first gradients are obtained by aggregating a plurality of third gradients sent from a plurality of second devices in a current iteration round, the plurality of second gradients are obtained according to gradients corresponding to the plurality of second parameters determined in the previous iteration round, and a plurality of fourth gradients sent from the plurality of second devices in the current iteration round, and the plurality of second devices belong to the plurality of terminals.
4. The method according to claim 1 or 2, wherein the information of the target model comprises an updated parameter update amount of the target model relative to a first model, the first model being a model obtained by updating the target model in an iteration round before the current iteration round; before the acquiring the plurality of first gradients and the plurality of second gradients, the method further comprises: and broadcasting the updated parameter values of the first model to the plurality of terminals.
5. The method of claim 1 or 2, wherein the information of the target model includes an updated amount of parameter update of the target model relative to a second model, the second model being an initial model of the target model.
6. The method according to claim 1 or 2, wherein the information of the target model comprises a parameter update amount of the updated target model relative to a third model, wherein the plurality of first devices comprise first target devices, wherein the third model is a model obtained by updating the target model by the first target devices in an iteration round before the current iteration round; the transferring the updated information of the target model to a plurality of first devices includes: And transmitting the updated parameter updating quantity of the target model relative to a third model to the first target device.
7. The method of claim 6, wherein the iteration round preceding the current round of iterations is specifically an iteration round in which the first target device has updated the target model last before the current round of iterations.
8. A system comprising a server and a plurality of terminals, the server in communication with the plurality of terminals, wherein, The server is used for acquiring a plurality of first gradients and a plurality of second gradients, wherein the plurality of first gradients are gradients corresponding to a plurality of first parameters in a target model, the plurality of second gradients are gradients corresponding to a plurality of second parameters in the target model, the plurality of first parameters are updated in a last iteration of federal learning, and the plurality of second parameters are not updated in the last iteration of federal learning; selecting a partial gradient from the plurality of first gradients and the plurality of second gradients, the partial gradient for updating the target model in a current round of iteration of federal learning; And transmitting the updated information of the target model to a plurality of first devices, wherein the plurality of first devices belong to the plurality of terminals.
9. The system of claim 8, wherein the partial gradient is a largest of the plurality of first gradients and the plurality of second gradients.
10. The system of claim 8 or 9, wherein a plurality of second devices in the plurality of terminals are configured to send a plurality of third gradients and a plurality of fourth gradients to the server, wherein the plurality of third gradients are gradients corresponding to a plurality of first parameters in the target model, wherein the plurality of fourth gradients are gradients corresponding to a plurality of second parameters in the target model, wherein the plurality of first parameters are updated in a last iteration of federal learning, wherein the plurality of second parameters are not updated in a last iteration of federal learning; the server is specifically configured to aggregate the plurality of third gradients to obtain a plurality of first gradients; And polymerizing the fourth gradients and fusing the gradients corresponding to the second parameters determined in the previous iteration round to obtain a plurality of second gradients.
11. The system of claim 10, wherein the plurality of second devices in the plurality of terminals are specifically configured to determine a plurality of gradients corresponding to the target model in a previous iteration of federal learning, and wherein the plurality of third gradients and the plurality of fourth gradients are randomly selected from the plurality of gradients.
12. The system according to claim 11, wherein a plurality of second devices in the plurality of terminals are specifically configured to perform lossless compression or linear unbiased compression on information indicating the plurality of third gradients and the plurality of fourth gradients, and send a compression result to the server.
13. The system according to claim 8 or 9, wherein the information of the target model comprises an updated parameter update amount of the target model relative to a first model, the first model being a model obtained by updating the target model in an iteration round preceding the current iteration round; the server is further configured to broadcast parameter values of the updated first model to the plurality of terminals before the acquiring the plurality of first gradients and the plurality of second gradients.
14. The system of claim 8 or 9, wherein the information of the target model includes an updated amount of parameter updates of the target model relative to a second model, the second model being an initial model of the target model.
15. The system according to claim 8 or 9, wherein the information of the target model comprises an updated parameter update amount of the target model relative to a third model, wherein the plurality of first devices comprises first target devices, wherein the third model is a model obtained by updating the target model by the first target devices in an iteration round before the current iteration round; the server is specifically configured to transfer the updated parameter update amount of the target model relative to a third model to the first target device.
16. The system of claim 15, wherein the iteration round preceding the current round iteration is specifically an iteration round in which the first target device has updated the target model last before the current round iteration.
17. A model training apparatus for use with a server in communication with a plurality of terminals, the apparatus comprising: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plurality of first gradients and a plurality of second gradients, the plurality of first gradients are gradients corresponding to a plurality of first parameters in a target model, the plurality of second gradients are gradients corresponding to a plurality of second parameters in the target model, the plurality of first parameters are updated in a previous iteration of federal learning, and the plurality of second parameters are not updated in the previous iteration of federal learning; a gradient selection module for selecting a partial gradient from the plurality of first gradients and the plurality of second gradients, the partial gradient being used to update the target model in a current round of iteration of federal learning; And the sending module is used for transmitting the updated information of the target model to a plurality of first devices, wherein the plurality of first devices belong to the plurality of terminals.
18. The apparatus of claim 17, wherein the partial gradient is a largest of the plurality of first gradients and the plurality of second gradients.
19. The apparatus according to claim 17 or 18, wherein the plurality of first gradients are obtained by aggregating a plurality of third gradients sent from a plurality of second devices in a current iteration round, the plurality of second gradients are obtained according to gradients corresponding to the plurality of second parameters determined in the previous iteration round, and a plurality of fourth gradients sent from the plurality of second devices in the current iteration round, the plurality of second devices belonging to the plurality of terminals.
20. The apparatus according to claim 17 or 18, wherein the information of the target model includes an updated parameter update amount of the target model relative to a first model, the first model being a model obtained by updating the target model in an iteration round preceding the current iteration round; the sending module is further configured to broadcast the updated parameter values of the first model to the plurality of terminals before the acquiring the plurality of first gradients and the plurality of second gradients.

Description

Model training method and device Technical Field The application relates to the field of artificial intelligence, in particular to a model training method and device. Background Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. Federal learning (FEDERATED LEARNING) systems train machine learning models based on data generated from a large number of users interacting with their devices (e.g., smartphones, etc.), without the need to take the data from the devices. For example, a subset of the online devices is selected per cycle, and the current version of the machine learning model is sent to those selected devices. Each of those selected devices is tasked with calculating an update of the model using their own locally generated and locally stored data. The model updates are then sent back to the server, averaged, and applied to the server's model to generate a new version of the model for the next iteration of the user (e.g., the next subset of devices). The federal learning comprises two steps of model issuing and model uploading, wherein a central node issues a model to terminal equipment through a network, each terminal equipment calculates the gradient of the model by using local data, each distributed node encrypts the gradient and uploads the encrypted gradient to the central node, and the central node gathers the gradient of each terminal distributed node and updates the parameters of the central node model by adopting a parameter averaging algorithm. At the beginning of training, the server sends an initial model to each end side. Then, after the different end sides utilize the local data to carry out a plurality of iterations on the model, the model change amount (namely, the gradient corresponding to the parameter) is fed back to the server. The server performs weighted average on the fed back gradient, updates the initial model by using the obtained average gradient, and sends the updated model to each end user, and restarts the next iteration. The problem of the existing federal training framework is that when the data of users are not independently and simultaneously distributed, the server cannot obtain an effective model gradient updating direction due to the large gradient direction difference after iteration of each user node, so that the server model is slow to converge, a large amount of gradients are required to be transferred back and forth between the end-side user and the server, and a large amount of traffic is consumed. In the current network environment, the overall bandwidth of the network increases at a rate far less than the size of the neural network model. Therefore, how to effectively reduce the traffic overhead is a problem to be solved in federal learning. Disclosure of Invention The application provides a model training method which can effectively reduce the gradient transmission quantity from a server to a terminal. In a first aspect, the application provides a model training method, which is applied to a server and is communicated with a plurality of terminals, wherein the method comprises the steps of acquiring a plurality of first gradients and a plurality of second gradients, wherein the plurality of first gradients are gradients corresponding to a plurality of first parameters in a target model, the plurality of second gradients are gradients corresponding to a plurality of second parameters in the target model, the plurality of first parameters are updated in the last iteration of federal learning, the plurality of second parameters are not updated in the last iteration of federal learning, partial gradients are selected from the plurality of first gradients and the plurality of second gradients, the partial gradients are used for updating the target model in the current iteration of federal learning, and information of the updated target model is transmitted to a plurality of first devices, wherein the plurality of first devices belong to the plurality of terminals. Because the target model only selects and updates the numerical value of part of parameters at a time, the gradient transmission quantity from the server to the terminal can be effectively reduced. In one possible implementation, the partial gradient is a largest of the plurality of first gradients and the pl