CN-121998015-A - Neural network pruning system and method based on layer-by-layer pruning rate optimization

CN121998015ACN 121998015 ACN121998015 ACN 121998015ACN-121998015-A

Abstract

The invention relates to network pruning, in particular to a neural network pruning system and method based on layer-by-layer pruning rate optimization, a modeling module is used for identifying a pruning layer of a neural network to be pruned and constructing a pruning rate optimization problem model, an evaluation module is used for providing search direction guidance for a global optimizing module and supporting quick fine tuning of the neural network after pruning by adopting pruning rate vectors, the global optimizing module is used for operating an improved sparrow searching algorithm ISSA, integrating dual-role topological collaboration, enhanced lens imaging reverse learning based on stagnation monitoring and dynamic intensity modulation and three mechanisms of later local fine search, solving the optimal pruning rate vector of a whole network for the pruning rate optimization problem model, and a pruning execution module is used for carrying out unstructured pruning on the pruning layer of the neural network to be pruned according to the optimal pruning rate vector.

Inventors

LIU YOUYU
ZHOU XIANGXIANG
Wang Luohan
ZHANG XIANGWEI

Assignees

安徽工程大学

Dates

Publication Date: 20260508
Application Date: 20260104

Claims (10)

1. The neural network pruning system based on the layer-by-layer pruning rate optimization is characterized by comprising a modeling module, an evaluation module, a global optimizing module and a pruning execution module; The modeling module is used for identifying a pruning layer of the neural network to be pruned and constructing a pruning rate optimization problem model; the evaluation module provides searching direction guidance for the global optimizing module and supports the rapid fine adjustment of the neural network which is pruned by adopting the pruning rate vector; The global optimizing module is used for running an improved sparrow searching algorithm ISSA, integrating three mechanisms of dual-role topological cooperation, enhanced lens imaging reverse learning based on stagnation monitoring and dynamic intensity modulation and later local fine searching, and directly solving an optimal pruning rate vector of the whole network for a pruning rate optimizing problem model in single operation; and the pruning execution module performs unstructured pruning on a pruning layer of the neural network to be pruned according to the optimal pruning rate vector to generate a lightweight neural network which is suitable for the memory and the edge equipment with limited computational power.
2. The neural network pruning system based on layer-by-layer pruning rate optimization of claim 1, wherein the modeling module identifies pruneable layers of the neural network to be pruned and constructs a pruning rate optimization problem model comprising: identifying all pruning layers in the neural network, wherein the pruning layers comprise a convolution layer and a full connection layer; In order to ensure the accuracy of the final classification or regression output mapping of the neural network, the active optimizing process of the final output layer of the neural network is eliminated when constructing the variable to be optimized.
3. The neural network pruning system based on layer-by-layer pruning rate optimization of claim 2, wherein the modeling module constructs a pruning rate optimization problem model, comprising: mapping the pruning rate distribution task of each pruning layer into a pruning rate optimization problem model in an L-dimensional continuous space, and defining variables to be optimized as pruning rate vectors ; Wherein, the For the pruning rate of the first pruneable layer, l=1, 2,.., , 、 Respectively minimum pruning rate, maximum pruning rate and minimum pruning rate Is used for ensuring that the neural network can realize basic parameter compression, avoiding unobvious pruning effect and maximum pruning rate The method is used for preventing the performance of the neural network from being drastically reduced due to excessive pruning and guaranteeing the usability of the neural network after pruning.
4. The neural network pruning system based on layer-by-layer pruning rate optimization of claim 1, wherein the evaluation module provides search direction for the global optimization module, and comprises: To minimize fitness function To guide the searching direction of the global optimizing module, the fitness function Expressed by the following formula: ; Wherein, the To adopt pruning rate vector The accuracy of the neural network after pruning on the verification set, To adopt pruning rate vector The ratio of the total amount of parameters of the neural network after pruning to the total amount of parameters of the original neural network, To balance the weight coefficients of accuracy and compression, Weight coefficient And adjusting according to the actual deployment environment of the neural network.
5. The neural network pruning system based on layer-by-layer pruning rate optimization of claim 4, wherein the evaluation module supports fast fine tuning of a neural network pruned using pruning rate vectors, comprising: to support high frequency iterative evaluation in a single run, a pruning rate vector is adopted Performing quick fine adjustment of a preset round by the neural network after pruning; After trimming is completed, acquiring the precision of the trimmed neural network on the verification set Substitution into fitness function Calculating pruning rate vector Provides basis for updating the population position.
6. The neural network pruning system based on layer-by-layer pruning rate optimization of claim 1, wherein the core component of the global optimizing module is an improved sparrow search optimizing engine ISSA ENGINE with architecture reconstruction, and the engine is customized for the high-dimensional continuous spatial characteristics of the pruning rate optimizing problem model without manually presetting a layer-by-layer pruning threshold; The improved sparrow search optimization engine ISSA ENGINE is cooperatively executed by a dual-role topology cooperative module, a stagnation sensing and reverse breakthrough module ELLO and a later-stage neighborhood fine mining module to realize the whole-course automation and single global convergence of the optimizing process.
7. The neural network pruning system based on layer-by-layer pruning rate optimization of claim 6, wherein the dual-role topology coordination module performs light reconstruction on a population topology structure, reconstructs the population topology structure into a 'finder' and 'adder' dual-layer model, eliminates a 'alerter' role in a standard sparrow search algorithm, so as to greatly reduce floating point operand and memory occupation during algorithm operation, reduce computational redundancy and improve search efficiency; The dual-role topology cooperative module drives the population to update the dynamic position based on the fitness value in the L-dimensional continuous solution space, each dimension of the solution space corresponds to the pruning rate of a pruning layer, and the efficient exploration and information interaction of the population in the solution space are realized through the lightweight dual-layer population topology structure, so that a foundation is laid for searching the global optimal solution.
8. The neural network pruning system based on layer-by-layer pruning rate optimization of claim 7, wherein the stagnation perception and reverse breakthrough module ELLO monitors the update frequency of a global optimal solution in real time, judges whether an algorithm falls into a stagnation state by counting the number of times that the global optimal solution is continuously not updated, and automatically activates an enhanced lens imaging reverse learning mechanism when the number of times that the global optimal solution is continuously not updated exceeds a preset stagnation threshold value, dynamically calculates a reverse solution according to the overall state of a current population, forces the population to jump out of a local optimal attraction domain, avoids premature convergence of the algorithm and ensures the global exploration capability of a search process, and specifically comprises the following steps: 1) By generating a reverse solution ensures that the algorithm can effectively refract from the currently dead solution space to a potentially more optimal solution region: ; Wherein X is the position of an original sparrow individual, X opp is a reverse solution, g best is a current global optimal solution, X mean is the average position of a current population, rand (dim) is a random vector with the same dimension as L, each element in the random vector is uniformly distributed in the range of [0,1], and intensity is a dynamic learning intensity factor; 2) Dynamic intensity modulation, namely realizing through dynamic learning intensity factor intensity, wherein the dynamic learning intensity factor intensity is positively correlated with the number count of continuous non-updated times of a global optimal solution and negatively correlated with the iteration progress, so as to endow larger reverse disturbance intensity at the initial stage of an algorithm, enhance the global exploration capacity of the algorithm, help population traverse wider solution space, reduce the reverse disturbance intensity at the later stage of the algorithm, avoid destroying individual solutions which tend to converge, and ensure the stable convergence of the algorithm: ; Wherein intensity base is an initial learning intensity factor, progress is an iteration progress normalization coefficient, and closer to 1 indicates closer to convergence of the algorithm, T is the current iteration number, T is the maximum iteration number, and count' is a preset stagnation threshold.
9. The neural network pruning system based on layer-by-layer pruning rate optimization of claim 8, wherein the later neighborhood fine mining module is automatically triggered when the iteration progress of single operation is in a preset progress interval so as to locally and finely optimize a current global optimal solution in the later stage of an algorithm, the module firstly locks the current found global optimal solution, then applies small step random disturbance conforming to corresponding distribution in the neighborhood of the current found global optimal solution, generates a plurality of candidate solutions and evaluates the candidate solutions, and mines better solutions in the neighborhood through high-frequency local development to improve the precision of the optimal pruning rate vector, and the neural network pruning system is characterized by comprising the following steps: 1) The triggering condition is that when the iteration progress normalization coefficient progress is located in a preset progress interval, the setting of the preset progress interval is based on an algorithm convergence rule, so that the module is started after the population is primarily converged to a better solution area, local development inefficiency caused by premature triggering is avoided, and sufficient iteration times are reserved to finish local fine optimization; 2) Small step size random disturbance generation: ; Wherein, the The random disturbance step length vector with the same dimension as L is used for ensuring the randomness and the rationality of disturbance, the k is a disturbance coefficient, and the k is less than or equal to 0.02 and is used for controlling the disturbance intensity; the late neighborhood fine mining module utilizes the generated random disturbance step length vector And (3) carrying out small-step disturbance on the current global optimal solution, generating a plurality of candidate solutions in the neighborhood of the current global optimal solution, evaluating, screening out a better pruning rate vector, and excavating a better solution in the neighborhood.
10. The neural network pruning method based on the layer-by-layer pruning rate optimization is applied to the neural network pruning system based on the layer-by-layer pruning rate optimization as claimed in claim 1, and is characterized by comprising the following steps: S1, identifying all pruning layers in the neural network, wherein the pruning layers comprise convolution layers and full-connection layers, and in order to ensure the accuracy of final classification or regression output mapping of the neural network, active optimizing processing on the final output layer of the neural network is eliminated when a variable to be optimized is constructed; S2, mapping the pruning rate distribution tasks of all the pruning layers into a pruning rate optimization problem model in an L-dimensional continuous space, and defining variables to be optimized as pruning rate vectors ; S3, initializing parameters of an improved sparrow search algorithm ISSA, and starting an improved sparrow search optimization engine ISSA ENGINE; S4, driving the population to update the dynamic position based on the fitness value in the L-dimensional continuous solution space by utilizing the dual-role topology cooperative module, realizing global exploration, reducing calculation redundancy and improving search efficiency; s5, in the iterative process, a global optimal solution updating state is monitored in real time through a stagnation sensing and reverse breakthrough module ELLO, when an algorithm falls into a stagnation state, an enhanced lens imaging reverse learning mechanism is immediately activated to generate a reverse solution, a dynamic learning intensity factor is utilized to force a population to jump out of a local optimal attraction domain, premature convergence of the algorithm is avoided, and global exploration capacity in the searching process is guaranteed; S6, when the iteration progress reaches the later stage of the algorithm, a later-stage neighborhood fine mining module automatically triggers, small-step random disturbance conforming to corresponding distribution is applied to the neighborhood of the current global optimal solution, high-precision local development is executed, better solutions in the neighborhood are mined, and the precision of the optimal pruning rate vector is improved; s7, outputting an optimal pruning rate vector after the single operation is finished ; S8, according to the optimal pruning rate vector Unstructured pruning is carried out on the pruned layers of the neural network, namely L1 norms of weight parameters of each pruned layer are calculated, all L1 norms are ordered from small to large, and the optimal pruning rate vector is used for carrying out the unstructured pruning on the pruned layers of the neural network Determining pruning rate of each pruneable layer, marking weight parameters of the pruning rate after the pruning rate to be removed according to the ranking of the pruneable layers, constructing a binary mask, removing redundant weight parameters by utilizing the binary mask, keeping the structure of a final output layer unchanged, generating a lightweight neural network which is suitable for the memory and the edge equipment with limited computational power, and exporting the lightweight neural network for subsequent deployment.

Description

Neural network pruning system and method based on layer-by-layer pruning rate optimization Technical Field The invention relates to network pruning, in particular to a neural network pruning system and method based on layer-by-layer pruning rate optimization. Background Deep neural networks have made breakthrough progress in the fields of computer vision, natural language processing, and complex industrial engineering predictions. However, as model performance increases, the number of parameters and computational complexity increases exponentially, which presents a significant challenge for deployment of models on resource constrained devices (e.g., mobile terminals, embedded sensors, edge computing gateways, etc.). Particularly in the fields of industrial Internet and intelligent manufacturing, such as robot polishing parameter optimization, building material strength prediction, aircraft pneumatic performance analysis, building energy efficiency real-time evaluation and other tasks, real-time reasoning is often needed by relying on embedded chips with lower calculation power. Network pruning (Network Pruning) is a mainstream model compression technique aimed at reducing model volume and inference delay by removing redundant connections. Among them, unstructured pruning is of great interest due to its high compression ratio characteristics. However, in practical industrial applications, existing network pruning techniques face serious challenges: 1) The method lacks scene self-adaptation and structural safety, and has long research and development period, namely industrial scenes are quite different, model structure differences corresponding to different tasks are huge from image recognition of a mobile terminal to regression prediction of the industrial scene, the existing scheme generally depends on manual experience or a complicated test-mark-reshelding process to set a layer-by-layer pruning threshold, however, when facing a new task, research and development personnel need to repeatedly try to search for proper pruning parameters, so that the model adaptation period is overlong, and the requirement of industrial quick iteration cannot be met; 2) The method is difficult to balance precision and efficiency on low-power consumption equipment, the existing scheme is huge in calculation cost and difficult to directly run on edge equipment, or key feature extraction layer structures in a model are easy to damage due to the adoption of a uniform/extensive pruning strategy (for example, when weak sensor vibration signals or pneumatic data are processed), so that the model is greatly reduced in prediction precision while pursuing light weight, and the problems of excessive pruning signal distortion or insufficient pruning and too slow reasoning are easy to occur, so that the industrial safety and real-time standard cannot be met. Therefore, a neural network pruning system and method capable of automatically searching a global optimal pruning strategy in a single operation according to different industrial scenes, with low calculation cost and capable of effectively realizing the balance between high precision and high compression rate at the edge side are needed. Disclosure of Invention (One) solving the technical problems Aiming at the defects existing in the prior art, the invention provides the neural network pruning system and the neural network pruning method based on the layer-by-layer pruning rate optimization, which can effectively overcome the defects that the prior art lacks scene self-adaptability and structural safety and is difficult to effectively realize the balance between high precision and high compression rate at the edge side. (II) technical scheme In order to achieve the above purpose, the invention is realized by the following technical scheme: The neural network pruning system based on the layer-by-layer pruning rate optimization comprises a modeling module, an evaluation module, a global optimizing module and a pruning executing module; The modeling module is used for identifying a pruning layer of the neural network to be pruned and constructing a pruning rate optimization problem model; the evaluation module provides searching direction guidance for the global optimizing module and supports the rapid fine adjustment of the neural network which is pruned by adopting the pruning rate vector; The global optimizing module is used for running an improved sparrow searching algorithm ISSA, integrating three mechanisms of dual-role topological cooperation, enhanced lens imaging reverse learning based on stagnation monitoring and dynamic intensity modulation and later local fine searching, and directly solving an optimal pruning rate vector of the whole network for a pruning rate optimizing problem model in single operation; and the pruning execution module performs unstructured pruning on a pruning layer of the neural network to be pruned according to the optimal pruning rate vector to genera