CN-116468073-B - Differential neural network architecture searching method based on zero-order approximation

CN116468073BCN 116468073 BCN116468073 BCN 116468073BCN-116468073-B

Abstract

The invention provides a differential neural network architecture searching method based on zero-order approximation, which is characterized by comprising the following steps of S1, constructing an initial neural network comprising nerve cells and a downsampling block, S2-S6, constructing a structural parameter matrix, an oscillation structural parameter matrix, a model parameter matrix and an oscillation model parameter matrix, S7, updating the model parameter matrix and the oscillation model parameter matrix, S8, inputting a verification data set into the initial neural network, obtaining an updated structural parameter matrix through a second loss function and a gradient descent method, S9, judging whether iteration is completed or not, and S10, selecting a neural network operation corresponding to the maximum value in structural parameters corresponding to each connecting edge in the updated structural parameter matrix as the neural network operation of the connecting edge, so as to obtain a complex neural network. In a word, the method can improve the efficiency of searching the complex neural network and the accuracy of the complex neural network.

Inventors

XIE LUNCHEN
HUANG KAIYU
LIANG XIN
SHI QINGJIANG

Assignees

同济大学

Dates

Publication Date: 20260505
Application Date: 20230426

Claims (7)

1. A differential neural network architecture searching method based on zero order approximation for training data set, verifying data set and The corresponding complex neural network is obtained by the operation of the neural network, so that the problem of image recognition is solved, a plurality of pieces of picture data are randomly selected from a data set to be used as the training data set, randomly selecting the same number of picture data from the rest pictures in the data set as the verification data set, wherein the method comprises the following steps: Step S1, constructing includes Individual nerve cells An initial neural network of downsampled blocks, the neural cells comprising A plurality of connecting edges; Step S2, from the said Selection in individual neural network operations The neural network operations are used as a neural network operation set ; Step S3, the nerve cells are processed The connecting edges respectively correspond to the neural network operation sets A kind of electronic device The nerve operation is performed to obtain the corresponding According to the structural parameters Each structural parameter, construct structural parameter matrix Setting iteration rounds 1 Is shown in the specification; step S4, according to the structural parameter matrix Setting a minimum value Random unit vector ; Step S5, according to the structural parameter matrix Said minimum value And the random unit vector Obtaining an oscillation structure parameter matrix ; Step S6, setting all weight parameter matrixes of the initial neural network as model parameter matrixes According to the model parameter matrix Setting a vibration model parameter matrix ; Step S7, according to the training data set and the structural parameter matrix And the oscillation structure parameter matrix Obtaining an updated model parameter matrix through a first loss function and a gradient descent method Updating the vibration model parameter matrix ; Step S8, inputting the verification data set into the initial neural network, and obtaining an updated structural parameter matrix through a second loss function and the gradient descent method ; Step S9, the iteration turns Adding 1, and judging the iteration round Whether or not it is less than the maximum iteration round If yes, the updated structural parameter matrix is updated As the structural parameter matrix Step S4 is entered, if not, step S10 is entered; Step S10, respectively selecting the updated structural parameter matrix Each of the connecting sides corresponds to The neural network operation corresponding to the maximum value in the structural parameters is used as the neural network operation of the connecting side, so as to obtain the complex neural network, In the step S4, the minimum value , The random unit vector The expression of (2) is as follows: , In the middle of Is provided with Each element and each element individually follows a normally distributed random vector, In the step S5, the oscillation structure parameter matrix The expression of (2) is as follows: , In the middle of For the random unit vector By and with the structural parameter matrix And aligning the matrix obtained by the dimension.
2. The method for searching the differentiable neural network architecture based on the zero order approximation according to claim 1, wherein the method comprises the following steps: Wherein the neural network operation is an operation of arbitrarily maintaining a data dimension.
3. The method for searching the differentiable neural network architecture based on the zero order approximation according to claim 2, wherein the method comprises the following steps: Wherein the neural network operation comprises a zero operation, a jump connection operation, Convolution operation, Convolution operation The operation of the average pooling is carried out, The zero operation represents multiplying data by 0.
4. The method for searching the differentiable neural network architecture based on the zero order approximation according to claim 1, wherein the method comprises the following steps: wherein the first and second loss functions are both cross entropy functions.
5. The method for searching the differentiable neural network architecture based on the zero order approximation according to claim 1, wherein the method comprises the following steps: Wherein, the step S7 comprises the following substeps: step S7-1, setting iteration rounds ; Step S7-2, inputting the training data set into the initial neural network, and combining the structural parameter matrix And the oscillation structure parameter matrix And according to the first loss function and the learning rate Updating the model parameter matrix And the oscillation model parameter matrix ; Step S7-3, the iteration round Adding 1, and judging the iteration round Whether or not it is less than the maximum iteration round If yes, entering the step S7-2, otherwise, iterating the round For maximum iteration round When the model parameter matrix is The updating result is the updated model parameter matrix The oscillation model parameter matrix The updated result is the updated concussion model parameter matrix 。
6. The method for searching for a differentiable neural network architecture based on zero order approximation of claim 5, wherein: Wherein in the step S7-2, the first loss function is a loss function And loss function The loss function Is based on the training data set for the model parameter matrix And the structural parameter matrix The loss function The values of (1) are the vibration model parameter matrix according to the training data set And the oscillation structure parameter matrix Is used for calculating the result of the calculation, According to the learning rate And the loss function For the model parameter matrix Gradient of (2) Calculating to obtain the updated model parameter matrix by the gradient descent method As iterative rounds The model parameter matrix when , According to the learning rate And the loss function For the oscillation model parameter matrix Gradient of (2) Calculating to obtain the updated oscillation model parameter matrix through the gradient descent method As the iteration round The vibration model parameter matrix 。
7. The method for searching the differentiable neural network architecture based on the zero order approximation according to claim 1, wherein the method comprises the following steps: Wherein in the step S8, the second loss function is a loss function The loss function Is based on the verification data set for the model parameter matrix And the structural parameter matrix Is used for calculating the result of the calculation, The structural parameter matrix Is of the approximate gradient of (a) The calculation formula is as follows: , , In the middle of Is that For the model parameter matrix Is used for the gradient of (a), Representing the update of the model parameter matrix in the gradient calculation Viewed as being in matrix with the structural parameters In relation to the use of a liquid crystal display device, Is that For the structural parameter matrix Is used for the gradient of (a), Representing the update of the model parameter matrix in the gradient calculation Regarding the constant value and the structural parameter matrix Irrelevant, superscript For the transpose operation, According to the set learning rate And the structural parameter matrix Is of the approximate gradient of (a) Calculating to obtain the updated structural parameter matrix by the gradient descent method 。

Description

Differential neural network architecture searching method based on zero-order approximation Technical Field The invention belongs to the technical field of artificial neural networks, and particularly relates to a differential neural network architecture searching method based on zero-order approximation. Background The deep learning is used as an end-to-end artificial intelligence technology, and the characteristic extraction process can be completely completed by the neural network model on the premise of not using characteristic engineering and expert experience. At present, the method in the field has outstanding results in tasks such as images, voices and the like, but designing a neural network model with high efficiency and high accuracy requires a great deal of expertise and a manual parameter adjusting process, which greatly prevents the application of the deep learning technology in a great deal of practical problems. To address this problem, the art of artificial intelligence began to consider using a neural network architecture search (NeuralArchitecture Search, NAS) approach. DARTS algorithm is a representative microstructured search strategy in NAS that uses gradient descent to optimize performance by giving continuous weights to candidate operations, weighted mixing at search time, thus making search space continuous, creating a micro-bilayer optimizable problem. After the optimization is finished, the DARTS selects the operation with the maximum weight from the mixed operation, so that a high-performance neural network framework with a complex topological structure is determined in a rich search space. However, the existing DARTS automatic architecture search algorithm is not accurate enough in solving the optimization problem, requires large time cost, and still has the defects of low efficiency and low accuracy. Disclosure of Invention The present invention has been made to solve the above-mentioned problems, and an object of the present invention is to provide a differentiable neural network architecture search method based on zero-order approximation. The invention provides a differential neural network architecture searching method based on zero order approximation, which is used for obtaining a corresponding complex neural network through training a data set, verifying the data set and a neural network operations, and has the characteristics that the method comprises the following steps of S1, constructing an initial neural network comprising c neural cells and d sampling blocks, wherein the neural cells comprise k connecting edges, S2, selecting b neural network operations from the a neural network operations as a neural network operation setStep S3, respectively corresponding the k connecting edges of the nerve cells to the nerve network operation setsObtaining corresponding k.b structural parameters, constructing a structural parameter matrix alpha according to the k.b structural parameters, setting iteration round p as 1, setting a minimum value mu and a random unit vector u according to the structural parameter matrix alpha, and obtaining an oscillation structural parameter matrix according to the structural parameter matrix alpha, the minimum value mu and the random unit vector u, wherein the iteration round p is set to be 1S6, setting all weight parameter matrixes of the initial neural network as a model parameter matrix w, and setting an oscillation model parameter matrix according to the model parameter matrix wStep S7, according to the training data set, the structural parameter matrix alpha and the oscillation structural parameterObtaining an updated model parameter matrix w' and an updated concussion model parameter matrix by a first loss function and gradient descent methodStep S8, inputting the verification data set into an initial neural network, obtaining an updated structural parameter matrix alpha ' through a second loss function and a gradient descent method, step S9, adding 1 to the iteration round p, judging whether the iteration round p is smaller than the maximum iteration round q, if so, taking the updated structural parameter matrix alpha ' as the structural parameter matrix alpha, entering step S4, if not, entering step S10, and respectively selecting the neural network operation corresponding to the maximum value of b structural parameters corresponding to each connecting edge in the updated structural parameter matrix alpha ' as the neural network operation of the connecting edge, thereby obtaining the complex neural network. The differential neural network architecture searching method based on the zero-order approximation can be further characterized in that the neural network operation is an operation for arbitrarily maintaining the data dimension. In the differentiable neural network architecture searching method based on the zero order approximation, provided by the invention, the method can be further characterized in that the neural network operation