CN-115965068-B - Neural network architecture optimization method and device, computer equipment and storage medium

CN115965068BCN 115965068 BCN115965068 BCN 115965068BCN-115965068-B

Abstract

The embodiment of the application discloses a neural network architecture optimization method, a neural network architecture optimization device, computer equipment and a storage medium, and belongs to the field of machine learning. The method comprises the steps of determining a sub-network, wherein the sub-network is a micro-network in a neural network architecture, the sub-network is composed of at least two nodes and edges connected with the nodes, the edges connected with the nodes are used for representing basic operation in the neural network, determining network loss of the sub-network based on a first training data set, optimizing network parameters to minimize the network loss, wherein the network parameters comprise architecture parameters and model parameters, the architecture parameters are used for representing weights of the basic operation, the model parameters are used for representing operation modes of the basic operation, determining target network parameters from the optimized network parameters based on the change trend and/or the optimization result of the network parameters in the optimization process, and generating the optimized sub-network based on the target network parameters. The scheme of the embodiment of the application can avoid the extreme selection of the network parameters in the process of optimizing the network parameters.

Inventors

ZHAO JUANPING

Assignees

OPPO广东移动通信有限公司

Dates

Publication Date: 20260505
Application Date: 20211012

Claims (9)

1. A method for optimizing a neural network architecture, the method comprising: determining a sub-network, wherein the sub-network is a micro-network in a neural network architecture, the sub-network is composed of at least two nodes and edges of connection nodes, the edges of the connection nodes are used for representing basic operations in the neural network, and the sub-network is suitable for image classification; Determining a network parameter range based on a network parameter and a range size, wherein the network parameter range is a parameter range centering on the network parameter; determining candidate network parameters in the network parameter range based on a parameter selection step length, wherein the parameter selection step length is adjusted along with network loss in the optimization process of the neural network architecture; inputting training data in a first training data set into the sub-network adopting the candidate network parameters for the candidate network parameters in the network parameter range to obtain network output of the sub-network, wherein the first training data set is an image data set, and a true value corresponding to the training data in the first training data set is an image classification label; determining a candidate network loss corresponding to the candidate network parameter based on the network output; determining the sum of the losses of the candidate network losses corresponding to the candidate network parameters as the network losses of the sub-network in the network parameter range; Minimizing the network loss by optimizing network parameters including architecture parameters for characterizing weights of the basic operations and model parameters for characterizing operational modes of the basic operations; Determining the architecture parameters which are in an ascending trend in the optimization process, and determining the architecture parameters which are in the ascending trend as the target architecture parameters, wherein the change trend in the optimization process is the change trend of the architecture parameters along with the optimization time; Determining the model parameters corresponding to the target architecture parameters as target model parameters in the optimized model parameters; And generating the optimized sub-network based on target network parameters, wherein the target network parameters comprise the target architecture parameters and the target model parameters.
2. The method of claim 1, wherein said minimizing said network loss by optimizing network parameters comprises: and optimizing the network parameters through a first-order optimization algorithm to minimize the network loss, and obtaining the optimized network parameters.
3. The method of claim 1, wherein after said minimizing said network loss by optimizing said network parameters, said method further comprises: Constructing a target network model based on the optimized sub-network, wherein the target network model is obtained by stacking the optimized sub-networks; And carrying out model training on the target network model based on the first training data set to obtain the trained target network model.
4. A method according to claim 3, wherein the target network model is constructed based on optimized up-sampling and down-sampling sub-networks using a U-shaped neural network architecture.
5. The method of claim 4, wherein the model training the target network model based on the first training data set, after obtaining the trained target network model, further comprises: And performing performance verification on the trained target network model based on a second training data set to obtain a performance verification result, wherein the second training data set is different from the first training data set, and the performance verification result comprises an accuracy verification result and a generalization verification result.
6. An apparatus for optimizing a neural network architecture, the apparatus comprising: a first construction module, configured to determine a sub-network, where the sub-network is a micro-network in a neural network architecture, the sub-network is composed of at least two nodes and edges connecting the nodes, the edges connecting the nodes are used to characterize basic operations in the neural network, and the sub-network is applicable to image classification; a first determining module, configured to determine a network parameter range based on a network parameter and a range size, where the network parameter range is a parameter range centered on the network parameter; determining candidate network parameters in the network parameter range based on a parameter selection step length, wherein the parameter selection step length is adjusted along with network loss in the optimization process of the neural network architecture; inputting training data in a first training data set into the sub-network adopting the candidate network parameters for the candidate network parameters in the network parameter range to obtain network output of the sub-network, wherein the first training data set is an image data set, and a true value corresponding to the training data in the first training data set is an image classification label; determining a candidate network loss corresponding to the candidate network parameter based on the network output; determining the sum of the losses of the candidate network losses corresponding to the candidate network parameters as the network losses of the sub-network in the network parameter range; An optimization module for minimizing the network loss by optimizing network parameters including architecture parameters for characterizing weights of the basic operations and model parameters for characterizing operational modes of the basic operations; The second determining module is used for determining the first k architecture parameters as target architecture parameters based on the descending order of the optimized architecture parameters, wherein k is an integer greater than 1; and determining an architecture parameter which is in an ascending trend in the optimization process, and determining the architecture parameter which is in the ascending trend as the target architecture parameter, wherein the change trend in the optimization process is the change trend of the architecture parameter along with the optimization time; Determining the model parameters corresponding to the target architecture parameters as target model parameters in the optimized model parameters; and the generation module is used for generating the optimized sub-network based on target network parameters, wherein the target network parameters comprise the target architecture parameters and the target model parameters.
7. A computer device comprising a processor and a memory having stored therein at least one instruction, at least one program, code set, or instruction set that is loaded and executed by the processor to implement the method of optimizing a neural network architecture of any of claims 1 to 5.
8. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by a processor to implement the method of optimizing a neural network architecture of any one of claims 1 to 5.
9. A computer program product, characterized in that it comprises computer instructions to be executed by a processor to implement a method for optimizing a neural network architecture according to any of claims 1 to 5.

Description

Neural network architecture optimization method and device, computer equipment and storage medium Technical Field The embodiment of the application relates to the field of machine learning, in particular to a neural network architecture optimization method, a neural network architecture optimization device, computer equipment and a storage medium. Background Neural structure search (Neural Architecture Search, NAS) is a technique for automatically designing neural networks, and can automatically design a high-performance network structure according to a sample set through an algorithm. The micro neural network search (Differentiable Architecture Search, DARTS) is used as one of the NAS methods, improves the optimization efficiency of the neural network architecture by constructing a search space through continuous relaxation, and is widely applied to the field of machine learning. In the related art, the DARTS constructs a neural network model by searching one sub-network and then connecting a plurality of sub-networks. In particular, a search space is built, consisting of nodes and directed edges connecting the nodes, the search space being relaxed by a mix of candidate operations that may exist on each directed edge. And (3) jointly optimizing the architecture parameters and the model parameters by a double-layer optimization method, and selecting the operation corresponding to the maximum architecture parameters to generate a final sub-network. The related technical scheme has the problems that after the architecture parameters and the model parameters are jointly optimized through a double-layer optimization method, the operation corresponding to the maximum architecture parameters is selected, other operations are abandoned, the extreme selection of network parameters is easily caused, and the accuracy and the generalization performance of the neural network model are affected. Disclosure of Invention The embodiment of the application provides a neural network architecture optimization method, a neural network architecture optimization device, computer equipment and a storage medium, which can avoid extreme network parameter selection in the optimization process of the neural network architecture and improve the accuracy and generalization performance of a neural network model. The technical scheme is as follows: In one aspect, an embodiment of the present application provides a method for optimizing a neural network architecture, where the method includes: Determining a sub-network, wherein the sub-network is a micro-network in a neural network architecture, and is composed of at least two nodes and edges of connection nodes, and the edges of the connection nodes are used for representing basic operations in the neural network; determining a network loss of the sub-network based on the first training data set; Minimizing the network loss by optimizing network parameters including architecture parameters for characterizing weights of the basic operations and model parameters for characterizing operational modes of the basic operations; Determining a target network parameter from the optimized network parameters based on the change trend of the network parameters in the optimization process and/or the optimization result; And generating the optimized sub-network based on the target network parameters. In another aspect, an embodiment of the present application provides an optimizing apparatus for a neural network architecture, where the apparatus includes: a first construction module, configured to determine a sub-network, where the sub-network is a micro-network in a neural network architecture, and the sub-network is composed of at least two nodes and edges connecting the nodes, where the edges connecting the nodes are used to characterize basic operations in the neural network; A first determining module configured to determine a network loss of the sub-network based on a first training data set; An optimization module for minimizing the network loss by optimizing network parameters including architecture parameters for characterizing weights of the basic operations and model parameters for characterizing operational modes of the basic operations; the second determining module is used for determining a target network parameter from the optimized network parameters based on the change trend of the network parameters in the optimizing process and/or the optimizing result; And the generation module is used for generating the optimized sub-network based on the target network parameters. In another aspect, embodiments of the present application provide a computer device, where the computer device includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and where the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement a method for opt