CN-115841134-B - Neural network model optimization method and related equipment

CN115841134BCN 115841134 BCN115841134 BCN 115841134BCN-115841134-B

Abstract

The method obtains a second neural network model by processing the first neural network model, the second neural network model comprising an optimized attention layer and at least two preceding network layers. The input of the optimization inquiry Query feature transformation module is obtained according to the output characteristics of at least one previous network layer of the optimization attention layer, the input of the optimization Key feature transformation module is obtained according to the output characteristics of at least one previous network layer of the optimization attention layer, the input of the optimization Value feature transformation module is obtained according to the output characteristics of at least one previous network layer of the optimization attention layer, and the input of at least one feature transformation module of the optimization inquiry Query feature transformation module, the optimization Key feature transformation module and the optimization Value feature transformation module is obtained according to the output characteristics of at least one non-adjacent previous network layer of the optimization attention layer. The method enhances the expressive power of the second neural network model.

Inventors

SUN YUNXIAO
ZHOU YUCONG
ZHONG ZHAO

Assignees

华为技术有限公司

Dates

Publication Date: 20260512
Application Date: 20210918

Claims (18)

1. The neural network model optimization method is characterized by comprising the following steps of: Optimizing the first neural network model to obtain a second neural network model, wherein the task of the first neural network model comprises at least one of target detection, target classification, image classification, machine translation, voice recognition or text recognition, The second neural network model comprises an optimized attention layer and at least two prior network layers positioned in front of the optimized attention layer, the at least two prior network layers are connected in series, the optimized attention layer comprises an optimized challenge Query feature transformation module, an optimized Key feature transformation module and an optimized Value feature transformation module, The input of the optimizing inquiry Query feature transformation module is obtained according to the output features of at least one previous network layer of the optimizing attention layer; The input of the optimization Key Key characteristic transformation module is obtained according to the output characteristics of at least one previous network layer of the optimization attention layer; The input of the optimized Value characteristic transformation module is obtained according to the output characteristics of at least one previous network layer of the optimized attention layer; The input of at least one feature transformation module of the optimization inquiry Query feature transformation module, the optimization Key Key feature transformation module and the optimization Value feature transformation module is obtained according to the output features of at least one non-adjacent preceding network layer of the optimization attention layer.
2. The method according to claim 1, wherein the first neural network model comprises an attention layer and at least two preceding network layers positioned before the attention layer, which are connected in series, and wherein the optimizing the first neural network model results in a second neural network model, comprising the steps of: Determining a search space of the first neural network model, wherein elements in the search space comprise a previous network layer which can be connected with a first inquiry Query feature transformation module, a first Key feature transformation module and a first Value feature transformation module in the attention layer; Determining the optimized attention layer by using a search algorithm based on the search space, wherein the search algorithm is used for determining a first previous network layer connected with the optimization inquiry Query feature transformation module, a second previous network layer connected with the optimization Key feature transformation module and a third previous network layer connected with the optimization Value feature transformation module according to search conditions, and at least one of the first previous network layer, the second previous network layer and the third previous network layer is a non-adjacent previous network layer of the optimized attention layer.
3. The method of claim 2, wherein the search algorithm comprises any one of an evolutionary algorithm, a reinforcement learning algorithm, a network structure search algorithm.
4. The method of claim 2 or 3, wherein the elements in the search space further comprise at least one of an optional activation function for the first neural network model, an optional normalization operation for the first neural network model, an operational type of the optional feature map for the first neural network model, an optional number of parallel branches for the first neural network model, an optional number of modules in the search unit, and an optional connection between previous network layers other than the attention layer.
5. The method of claim 1, wherein the input of the target feature transformation module is an input feature obtained by weighted summation of output features of at least two previous network layers of the optimized attention layer and weights of the previous network layers, and wherein the target feature transformation module is any one of the optimized challenge feature transformation module, the optimized Key feature transformation module, and the optimized Value feature transformation module.
6. The method of claim 5, wherein the second neural network model further comprises a first fusion module, a second fusion module and a third fusion module, wherein the output ends of all the previous network layers of the optimized attention layer are connected with the input ends of the first fusion module, the output ends of the first fusion module are connected with the input ends of the optimized challenge Query feature transformation module, the first fusion module is used for obtaining a first input feature of the optimized challenge Query feature transformation module according to weighted summation processing of the previous network layers connected with the first fusion module, and the parameters of the first fusion module comprise a first weight corresponding to the previous network layers connected with the first fusion module; The second fusion module is used for carrying out weighted summation treatment according to the previous network layer connected with the second fusion module to obtain a second input characteristic of the optimized Key Key characteristic conversion module, and the parameters of the second fusion module comprise a second weight corresponding to the previous network layer connected with the second fusion module; The output ends of all the previous network layers of the optimized attention layer are connected with the input end of the third fusion module, the output end of the third fusion module is connected with the input end of the optimized Value characteristic transformation module, the third fusion module is used for obtaining the third input characteristic of the optimized Value characteristic transformation module according to weighted summation processing of the previous network layers connected with the third fusion module, and the parameters of the second fusion module comprise third weights corresponding to the previous network layers connected with the third fusion module.
7. The method of claim 6, wherein any of the first fusion module, the second fusion module, and the third fusion module comprises any of a static weighting module, a multi-layer perceptron module, and an attention module.
8. A model optimizing apparatus, characterized by comprising: The processing module is used for carrying out optimization processing on the first neural network model to obtain a second neural network model, wherein the task of the first neural network model comprises at least one of target detection, target classification, image classification, machine translation, voice recognition or text recognition, The second neural network model comprises an optimized attention layer and at least two previous network layers positioned in front of the optimized attention layer, the at least two previous network layers are connected in series, and the optimized attention layer comprises an optimized challenge Query feature transformation module, an optimized Key Key feature transformation module and an optimized Value feature transformation module; the input of the optimizing inquiry Query feature transformation module is obtained according to the output features of at least one previous network layer of the optimizing attention layer; The input of the optimization Key Key characteristic transformation module is obtained according to the output characteristics of at least one previous network layer of the optimization attention layer; The input of the optimized Value characteristic transformation module is obtained according to the output characteristics of at least one previous network layer of the optimized attention layer; The input of at least one feature transformation module of the optimization inquiry Query feature transformation module, the optimization Key Key feature transformation module and the optimization Value feature transformation module is obtained according to the output features of at least one non-adjacent preceding network layer of the optimization attention layer.
9. The apparatus of claim 8, wherein the first neural network model comprises an attention layer and at least two prior network layers positioned before the attention layer, the at least two prior network layers being connected in series, wherein the processing module is configured to optimize the first neural network model to obtain a second neural network model, and wherein the processing module is configured to: Determining a search space of the first neural network model, wherein elements in the search space comprise a previous network layer which can be connected with a first inquiry Query feature transformation module, a first Key feature transformation module and a first Value feature transformation module in the attention layer; Determining the optimized attention layer by using a search algorithm based on the search space, wherein the search algorithm is used for determining a first previous network layer connected with the optimization inquiry Query feature transformation module, a second previous network layer connected with the optimization Key feature transformation module and a third previous network layer connected with the optimization Value feature transformation module according to search conditions, and at least one of the first previous network layer, the second previous network layer and the third previous network layer is a non-adjacent previous network layer of the optimized attention layer.
10. The apparatus of claim 9, wherein the search algorithm comprises any one of an evolutionary algorithm, a reinforcement learning algorithm, a network structure search algorithm.
11. The apparatus of claim 9 or 10, wherein the elements in the search space further comprise at least one of an activation function selectable by the first neural network model, a normalization operation selectable by the first neural network model, an operation type of a feature map selectable by the first neural network model, a number of parallel branches selectable by the first neural network model, a number of modules in a selectable search unit, and a connection selectable between previous network layers other than the attention layer.
12. The apparatus of claim 8, wherein the input of the target feature transformation module is an input feature obtained by weighted summation of output features of at least two previous network layers of the optimized attention layer and weights of the previous network layers, and wherein the target feature transformation module is any one of the optimized challenge feature transformation module, the optimized Key feature transformation module, and the optimized Value feature transformation module.
13. The apparatus of claim 12, wherein the second neural network model further comprises a first fusion module, a second fusion module and a third fusion module, wherein the output ends of all the previous network layers of the optimized attention layer are connected with the input ends of the first fusion module, the output ends of the first fusion module are connected with the input ends of the optimized challenge Query feature transformation module, the first fusion module is used for obtaining a first input feature of the optimized challenge Query feature transformation module according to weighted summation processing of the previous network layers connected with the first fusion module, and the parameters of the first fusion module comprise a first weight corresponding to the previous network layers connected with the first fusion module; The second fusion module is used for carrying out weighted summation treatment according to the previous network layer connected with the second fusion module to obtain a second input characteristic of the optimized Key Key characteristic conversion module, and the parameters of the second fusion module comprise a second weight corresponding to the previous network layer connected with the second fusion module; The output ends of all the previous network layers of the optimized attention layer are connected with the input end of the third fusion module, the output end of the third fusion module is connected with the input end of the optimized Value characteristic transformation module, the third fusion module is used for obtaining the third input characteristic of the optimized Value characteristic transformation module according to weighted summation processing of the previous network layers connected with the third fusion module, and the parameters of the second fusion module comprise third weights corresponding to the previous network layers connected with the third fusion module.
14. The apparatus of claim 13, wherein any of the first fusion module, the second fusion module, and the third fusion module comprises any of a static weighting module, a multi-layer perceptron module, and an attention module.
15. A model optimization device comprising a processor and a memory, wherein the processor is coupled to the memory, wherein the memory is configured to store program code, and wherein the processor is configured to invoke the program code to perform the neural network model optimization method of any of claims 1-7.
16. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program that is executed by a processor to implement the neural network model optimization method according to any one of claims 1 to 7.
17. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the neural network model optimization method of any one of claims 1 to 7.
18. A terminal device on which a second neural network model according to any one of claims 1 to 7 is run.

Description

Neural network model optimization method and related equipment Technical Field The embodiment of the invention relates to the field of artificial intelligence, in particular to a neural network model optimization method and related equipment. Background The neural network model can complete tasks such as target detection, target classification, machine translation, voice recognition and the like, and is therefore widely used in various fields such as security, transportation, industrial production and the like. The transform network model is a deep neural network model which does not contain a convolution layer and is completely composed of a Self-Attention (Self-Attention) layer, an encoder Attention (Encoder-Attention) layer and a Feed-Forward (Feed-Forward) layer. Thanks to the ability of Self-Attention operation to extract features on the global receptive field, the Transformer network model has wide application in the aspects of computer vision, natural language processing and the like. The network structure of the standard transducer is shown in fig. 1a, and fig. 1a and 1b are schematic structural diagrams of a standard transducer network model, wherein the standard transducer network model is composed of 6 encoders (Encoder) and 6 decoders (decoders). The feed-forward layer (F in FIG. 1 a) consists of fully connected operations responsible for extracting the correlation of data between channel dimensions, while the Self-Attention layer (S in FIG. 1 a) consists mainly of Self-Attention operations extracting features of another dimension (patch or token) of data. And the encoder Attention layer performs Attention operation on the output characteristics of the encoder and the intermediate characteristics of the decoder. Further, the inputs of all the Query (Query) feature conversion modules of the Self-attribute layer, the Key (Key) feature conversion module and the Value (Value) feature conversion module in the conversion former network model are identical, namely the inputs of all the Query feature conversion modules of the Self-attribute layer, the Key feature conversion module and the Value feature conversion module are the outputs of the previous layer. Referring to FIG. 1b, taking the sixth layer of the model as an example of the self-attention layer, the input of the challenge feature conversion module (Q in FIG. 1 b), the key feature conversion module (K in FIG. 1 b) and the value feature conversion module (V in FIG. 1 b) are both output features of the fifth layer of the model. Therefore, the overall form of the transducer network is a straight cylinder, and the input connection mode of the transducer network is fixed due to the network design, so that the transducer network cannot be guaranteed to have good performance in all tasks. Disclosure of Invention The application provides a neural network model optimization method and related equipment, which can realize the optimization processing of a neural network model, improve the expression capacity of the model and improve the performance of the model. In a first aspect, a neural network model optimization method is provided, which includes the steps of performing optimization processing on a first neural network model to obtain a second neural network model; wherein the second neural network model comprises an optimized attention layer and at least two preceding network layers located before the optimized attention layer, the at least two preceding network layers being connected in series; the optimization attention layer comprises an optimization inquiry Query feature transformation module, an optimization Key Key feature transformation module and an optimization Value feature transformation module, wherein the input of the optimization inquiry Query feature transformation module is obtained according to the output features of at least one previous network layer of the optimization attention layer; the input of the optimization Key Key feature transformation module is obtained according to the output characteristics of at least one previous network layer of the optimization attention layer, the input of the optimization Value feature transformation module is obtained according to the output characteristics of at least one previous network layer of the optimization attention layer, and the input of at least one feature transformation module in the optimization inquiry Query feature transformation module, the optimization Key Key feature transformation module and the optimization Value feature transformation module is obtained according to the output characteristics of at least one non-adjacent previous network layer of the optimization attention layer. According to the optimization method in the embodiment of the invention, the first neural network model is subjected to optimization processing to obtain the second neural network model, wherein the second neural network model comprises an optimized attention layer and at least two previous net