CN-112561039-B - Improved evolutionary neural network architecture searching method based on super network
Abstract
The invention relates to an improved evolutionary neural network architecture searching method based on a super network. The method comprises the following steps of S1, packaging five calculation modules by taking an input layer as a first layer, S2, binarizing the connection of the calculation nodes in the neural network, S3, learning the structural weight for each calculation node, and S4, and constructing a parent population P by adopting a binary tournament selection method. And S5, forming a offspring population Q. And S6, performing mutation operation on individuals in the child population Q. And S8, merging the parent population P and the child population Q into a population R, selecting a plurality of individuals as the original population of the next generation by adopting an environment selection method, and feeding back to the step S4 until reaching a preset maximum evolution algebra. After the evolution is finished, the individual with the highest fitness value is used as the optimal neural network architecture to be output.
Inventors
- JIN YAOCHU
- SHEN XIUPING
Assignees
- 上海悠络客电子科技股份有限公司
- 上海悠络客电子科技股份有限公司
Dates
- Publication Date
- 20260421
- Application Date
- 20201226
- Priority Date
- 20201226
Claims (9)
- 1. An improved method for searching an evolutionary neural network architecture based on a super network is characterized by comprising the following steps: step S1, an input layer is used as a first layer, five calculation modules are packaged, M calculation nodes are packaged in each module, and finally a full-connection layer is used as an output layer of a neural network, wherein M is a natural number greater than 1; s2, coding a neural network structure in a hybrid coding mode, and binarizing the connection of calculation nodes in the neural network, randomly generating N chromosomes to construct an original population, wherein the number of the calculation nodes in any chromosome is smaller than the total number of the calculation nodes of a preset chromosome, and N is a natural number larger than 1; Step S3, uniformly sampling individuals in the population, training based on training data, generating a structural weight for each calculation node, and carrying out fitness evaluation on the individuals by adopting the classification precision of the verification set as a fitness function; In step S3, for the individuals in the population, based on training data, a structural weight is generated for each computing node, and fitness evaluation is performed on the individuals by using classification precision of the verification set as a fitness function, including: step S31, a preset training data set is divided into B batches according to a given batch processing data size, wherein B is a natural number larger than N, each batch is randomly selected from a father population P, and the individual is decoded into a corresponding neural network to train until the maximum training batch B is reached; step S32, evaluating the fitness value fitness of each individual in the parent population, and adopting the classification accuracy of the pictures in the verification set as a fitness function to evaluate the fitness, wherein the expression is as follows: Wherein G is the correct number of pictures identified by the model, and H is the total number of pictures in the verification set; S4, constructing a parent population P by adopting a binary tournament selection method; Step S5, based on a given crossing rate pc, carrying out pairwise crossing on chromosome individuals in the parent population by adopting a mixed crossing method to obtain a plurality of new chromosomes to form a offspring population Q; Step S6, based on a given mutation rate pm, performing mutation operation on individuals in the sub-generation population Q by adopting a mixed mutation method; s7, decoding each individual in the child population Q into a corresponding neural network, obtaining a structural weight by inheriting or randomly initializing, and evaluating the fitness of the individual by adopting the classification precision of the verification set as a fitness function; And S8, merging the parent population P and the child population Q into a population R, selecting a plurality of individuals as an original population of the next generation by adopting an environment selection method, feeding back to the step S4 until reaching a preset maximum evolution algebra, and outputting the individuals with the highest fitness value as an optimal neural network architecture after the evolution is finished.
- 2. The improved method for searching for a super network-based evolutionary neural network architecture of claim 1, wherein said input layer is comprised of a convolutional layer, a ReLU activation function, and a batch normalization layer encapsulation in that order.
- 3. The improved super-network-based evolutionary neural network architecture searching method according to claim 1, wherein in the step S1, the computing nodes are computing units in the neural network and randomly selected from an operation searching space θ, step sizes of all computing nodes in the first computing module, the third computing module and the fifth computing module are 1, and step sizes of all computing nodes in the second computing module and the fourth computing module are 2.
- 4. The improved method for searching the evolutionary neural network architecture based on the super network as claimed in claim 1, wherein in the step S2, the hybrid coding mode is a coding mode combining an integer and a binary number, the integer coding is used for describing the type of the computing nodes in the neural network architecture and the connection relation between the nodes, the binary number is used for binarizing the connection relation between the computing nodes in the neural network architecture, and describing whether the connection between the two computing nodes is activated or not, specifically: Step S21, a computing node encodes a five-tuple Wherein, the I1, I2 represent indexes of the computing units connected with the computing node I, namely the computing node I and the computing nodes I1, I2 are connected with each other; The method comprises the steps of I1 and I2 are a group of integers, J1 and J2 are a group of binary numbers to represent four states of a connection mode of a computing node I and the computing nodes I1 and I2, specifically, J1=0 and J2=0 to represent that the computing node I and the computing nodes I1 and I2 are in an activated state, and at the moment, after the feature graphs of the outputs of the computing nodes I1 and I2 are fused, the feature graphs are used as the input of the computing node I, and the output delta of the computing node I is: J1 =0, j2=1, indicating that the connection between the computing node I and the computing node I1 is activated and the connection between the computing node I and the computing node I2 is closed, then the output δ of the computing node I is: J1 =1, j2=0, indicating that the connection of compute node I and compute node I1 is closed, and the connection of compute node I and compute node I2 Then the output delta of the computing node i is: j1 =1, j2=1, meaning that the connections of the computing node I and the computing nodes I1, I2 are both in the closed state, i.e. the current computing node I is masked, then at this time, the feature graphs of the outputs of the computing nodes I1, I2 are fused and do not pass through the computing unit The processing is directly used as the output value delta of the computing node i: δ=I1 (xc)+I2 (xd) wherein xc, xd are the inputs of the computing nodes I1, I2, respectively, I1 (xc), I2 (xd) are the outputs of the computing nodes I1, I2, respectively Output feature maps I1 (xc), I2 (xd) representing computing nodes I1, I2 are fused, and are used as input of computing node I by computing unit After processing, as an output of the compute node i; step S22, the computing module comprises M computing nodes, and the coding structure of one computing module is as follows: in step S23, the chromosome is a neural network architecture, each neural network architecture comprises five computing modules, and at this time, the coding structure of one neural network architecture is:
- 5. An improved method for searching for a super network-based evolutionary neural network architecture as claimed in claim 1, wherein in step S4, for said binary tournament selection method, the steps are as follows: Step S41, selecting two individuals from the original population at random, reserving the individuals with higher fitness value to the parent population P according to the fitness value, and placing the individuals with lower fitness value back to the original population; step S42, repeating step S41 until the number of individuals included in the parent population P reaches a preset number K of individuals, wherein K is a natural number greater than 1.
- 6. The improved super-network-based evolutionary neural network architecture searching method according to claim 1, wherein in step S5, based on a given crossing rate pc, chromosome individuals in a parent population P are crossed two by using a hybrid crossing method to obtain a plurality of chromosome individuals, and the specific steps are as follows: step S51, splitting the integer part and the binary part of each chromosome into the integer chromosome part and the binary chromosome part; step S52, randomly generating a random number r in the interval [0,1], randomly selecting two individuals P1 and P2 from the parent population P, and determining whether the two individuals P1 and P2 execute the crossing operation by using the random number r; Step S53, if r is less than or equal to pm, aligning the left sides of the integer chromosome parts of the two chromosomes to perform single-point crossover, namely randomly setting a crossover point in the two integer chromosomes, exchanging genes at the crossover point, wherein the crossover points of the two integer chromosomes are at the same position; aligning the left sides of the binary chromosome parts of the two chromosomes for multi-point crossing, namely randomly selecting a plurality of crossing points in the two binary chromosomes, exchanging genes at the crossing points, wherein the crossing points of the two binary numbers are at the same position; Step S54, if r > pm, the two individuals p1 and p2 selected in step S52 are stored in the offspring population Q.
- 7. The improved method for searching the evolutionary neural network architecture based on the super network as claimed in claim 1, wherein in the step S6, the variation operation is performed on the individuals in the sub-generation population Q by using the mixed variation method based on the given variation rate pm, and the specific steps are as follows: step S61, splitting the integer part and the binary part of each chromosome into the integer chromosome part and the binary chromosome part; Step S62, randomly generating a random number t corresponding to any chromosome in the interval [0,1] for any gene locus in any chromosome, and determining whether to execute mutation operation on the gene locus of the chromosome by using the random number; step S63, if t is less than or equal to pm, performing polynomial variation operation on the integer chromosome part of the chromosome; Wherein ai represents a gene at the ith gene position in the chromosome, a' i represents a new gene generated based on the gene ai, u is a random number generated in interval [0,1 ]; , the upper and lower bounds of the gene ai variation are represented, respectively; in step S64, if t > pm, a reverse mutation operation is performed on the binary chromosome portion of the chromosome, that is, a plurality of mutation points are randomly selected in the chromosome, mutation is performed on the gene position corresponding to each mutation point, the gene mutation of the gene position 0 is 1, and the gene mutation of the gene position 1 is 0.
- 8. The improved super-network-based evolutionary neural network architecture searching method according to claim 1, wherein in step S7, each individual in the child population obtains a structural weight by inheritance or random initialization, specifically, for any one of the chromosome individuals in the child population Q, if the result is obtained by the hybrid crossover method in step S5, the weight is inherited from the corresponding calculation node in the parent chromosome individual, and if the result is obtained by the hybrid mutation method in step S6, the weight of the calculation node is generated by random initialization.
- 9. The improved super network-based evolutionary neural network architecture searching method according to claim 1, wherein in step S8, a parent population P and a child population Q are combined into a population R, and a plurality of individuals are selected as the original population of the next generation by adopting an environment selection method, which comprises the following specific steps: step S81, according to the fitness value, sequencing the individuals in the population R according to the sequence from high to low of the fitness value; and S82, selecting individuals ranked from the numbers 1 to N in the population R as the next generation population according to the preset population scale N.
Description
Improved evolutionary neural network architecture searching method based on super network Technical Field The invention relates to the technical field of image classification model construction, in particular to an improved evolutionary neural network architecture searching method based on a super network. Background Image classification (image classification) task is an image processing technique that distinguishes objects of different categories based on different characteristic information reflected in the picture. Since many models applied to image classification tasks can migrate to other computer vision fields as feature extraction networks, the image classification task is a basic task in the computer vision field, and the design of the image classification model is also a focus of attention of researchers. However, the artificial design of the neural network model requires an experienced expert, and the neural network model with excellent performance can be designed through careful study and trial and error of the distribution and characteristics of the data set. Therefore, it takes a great deal of time and labor. Currently, neural network architecture search algorithms (Neural Architecture Search, NAS) are attracting considerable attention from researchers. Such algorithms enable an efficient neural network architecture to be automatically designed based on a given data set without much expertise. Since NAS algorithms typically require continuous evaluation of neural network models in the search space, a significant amount of computer effort is required. In order to improve the search efficiency of NAS algorithms, there are two main methods: The first approach is to construct an End-to-End performance predictor (End-to-End Performance Predictor). This approach requires an encoding method that uniquely maps the neural network architecture into a set of digital decision variables. The code of the neural network architecture and its performance (e.g., accuracy of classification) are then formed into a data pair, which is used as input to a performance predictor, which is trained. After the performance predictor is trained, the performance of the neural network model in the search space can be predicted directly, and the neural network model does not need to be trained, so that the search efficiency is improved. However, this approach follows a training-before-prediction approach, requiring training of the performance predictor with a set of training samples. In general, the more samples trained, the better the predictor's performance. However, collecting more training samples means more computing resources are consumed and thus has some impact on search efficiency. Therefore, in practical use, a more efficient neural network architecture needs to be sampled with an incremental strategy, requiring a certain computational cost. The second method is a neural network architecture search method (One-shot Neural Architecture Search) based on a super network. The method comprises the steps of firstly training a super network (One-shot model) as a search space, then randomly sampling a certain number of sub-networks from the super network to evaluate the performance, ranking the sub-networks according to the performance of the sub-networks, and finally taking the sub-network with the best performance evaluation as the output of an algorithm. Because the sub-network can inherit the weight from the super-network and evaluate without training, the search efficiency of the NAS algorithm can be effectively improved. However, existing neural network architecture search algorithms based on the super network have certain drawbacks. Firstly, the training of the internal nodes of the super network is unbalanced, which results in inaccurate performance ranking of the sub network evaluation stage, and further results in that the algorithm does not find the network architecture with the best performance. Secondly, when the super network is trained, mutual interference among different sub networks may cause unstable neural network architecture search algorithm based on the super network, the super network has slower convergence speed, and even can not converge, and further, the performance prediction result of the sub model is poor. Disclosure of Invention Aiming at the defects that the super-network-based neural network architecture searching method in the prior art is unstable in performance, the super-network training convergence speed is low, even the super-network training cannot converge, and the like, the invention aims to provide the super-network-based evolutionary neural network architecture searching method, and the evolutionary algorithm is used as a searching strategy to automatically generate the neural network architecture based on the super network, so that the classification accuracy of image classification tasks is improved. In order to solve the technical problems, the invention adopts the following tech