CN-121998016-A - Neural network model pruning method and system

CN121998016ACN 121998016 ACN121998016 ACN 121998016ACN-121998016-A

Abstract

The application discloses a neural network model pruning method and system, and relates to the field of model pruning, comprising the steps of obtaining a first neural network model; the method comprises the steps of replacing a standard layer in a model to obtain a second neural network model, adding correlation gating in each residual error module in the model to control correlation propagation, adding a sparse filter at the output end of a linear layer to remove noise during correlation propagation to finally obtain a third neural network model, and adding a correlation gating and filtering mechanism in the model needing pruning to avoid dependence on a complex propagation scheme of a nonlinear layer design and reduce implementation complexity and calculation amount.

Inventors

JIANG JINGFEI
NIU DI
XU JINWEI
Pan Hengyue
Lv Qianru
LI LIANGWEI
ZHOU SHUNAN
ZHU MINGHUA

Assignees

中国人民解放军国防科技大学

Dates

Publication Date: 20260508
Application Date: 20260410

Claims (10)

1. A neural network model pruning method, comprising: acquiring a first neural network model; Replacing a standard layer in the first neural network model to obtain a second neural network model; adding correlation gating in each residual module in the second neural network model for controlling correlation propagation, adding a sparse filter at the output end of a linear layer in the second neural network model for removing noise during correlation propagation, and finally obtaining a third neural network model.
2. The neural network model pruning method of claim 1, further comprising: acquiring image data; and inputting the image data into the third neural network model, pruning according to the calculated component correlation score, and obtaining a fourth neural network model.
3. The neural network model pruning method of claim 1, wherein replacing the standard layer in the first neural network model comprises: And replacing all linear layers with LRP linear modules according to a calculation logic rule of only rewriting forward propagation and backward propagation of the first neural network model, and replacing nonlinear layers and normalization layers with the LRP nonlinear modules.
4. A neural network model pruning method according to claim 3, characterized in that all linear layers are replaced by LRP linear modules, in particular: when forward propagation is carried out, the LRP linear module executes operation consistent with the original linear layer, the output characteristic value is completely consistent with the original model, and the input tensor and the output tensor are cached; and in the back propagation process, the LRP linear module performs element-by-element product operation with the input tensor and the parameters according to the stabilization factor calculated by the output tensor so as to output, and calculates the relevance score of the parameters.
5. The neural network model pruning method according to claim 3, wherein the nonlinear layer and the normalization layer are replaced by LRP nonlinear modules, specifically: During forward propagation, the LRP nonlinear module does not change the original propagation calculation logic of the first neural network model; and in the back propagation process, the LRP nonlinear module adopts an identity mapping rule to directly transmit and output the correlation tensor returned by the subsequent layer.
6. The neural network model pruning method of claim 1, wherein adding a correlation gate in each residual module in the second neural network model for controlling correlation propagation comprises: in forward propagation, the propagation is carried out by adopting an identity mapping rule through the correlation gating; during back propagation, all zero tensors in the same shape as the input are output through the correlation gating.
7. The neural network model pruning method according to claim 1, wherein a sparse filter is added at an output end of a linear layer in the second neural network model for removing noise at the time of correlation propagation, comprising: During forward propagation, propagating through the sparse filter by adopting an identity mapping rule; And during back propagation, calculating a correlation score according to characteristic dimensions on the input correlation tensor through the sparse filter, reserving a preset correlation tensor, forcibly sparsifying the rest part, and compensating the reserved correlation tensor through an energy compensation coefficient.
8. The neural network model pruning method according to claim 2, wherein the calculation formula of the component relevance score is specifically: ; Wherein, the For the j-th sample; n is the total sample amount; Represent the first The first sample is at A relevance score generated on the parameters.
9. A neural network model pruning system, comprising: The acquisition module is used for acquiring the first neural network model; the replacing module is used for replacing the standard layer in the first neural network model to obtain a second neural network model; The model pruning module is used for adding correlation gating in each residual module in the second neural network model, controlling correlation propagation, adding a sparse filter at the output end of a linear layer in the second neural network model, removing noise during correlation propagation, and finally obtaining a third neural network model.
10. The neural network model pruning system of claim 9, wherein the acquisition module is further configured to acquire image data, and wherein the model pruning module is further configured to input the image data into the third neural network model, prune according to the calculated component relevance score, and obtain a fourth neural network model.

Description

Neural network model pruning method and system Technical Field The invention relates to the field of model compression, in particular to a neural network model pruning method and system. Background Along with the development of computer science and the continuous improvement of computing power, the scale and the structural complexity of the deep learning model are continuously increased. When the model is deployed to the edge equipment (such as an embedded system, a mobile terminal and the like) with limited resources, the problems of high reasoning delay, increased power consumption, increased memory bandwidth occupation, even insufficient calculation power or storage and the like often occur, so that the floor application in the scene with high real-time performance and high energy efficiency is limited. Existing importance metrics can be broadly divided into three categories, (1) magnitude-based metrics. The method is widely adopted due to simple realization and low cost by taking the weight absolute value and the like as the basis. However, the method is difficult to describe nonlinear association and cross-layer coupling relation between parameters and performance, and sub-optimal trade-off is easily formed between model compactness and accuracy. (2) a metric based on a first order gradient. The sensitivity of the parameter to the loss is measured by using gradient information based on the Taylor expansion first order term of the loss function, and the correlation with the objective function is finer. However, its nature is still a first order approximation, with insufficient capture of interactions between different parameters. (3) based on a second order Hessian metric. The joint effect between parameters is evaluated by second order information, where the off-diagonal elements of the Hessian matrix can reflect cross-parameter/cross-layer interactions, typically leading to better pruning performance. However, the method has large calculation and storage cost, and the joint effect is not fully utilized, so that customized design is often required for a specific network architecture, and the realization is complex, and the mobility and the expandability are limited. In recent years, pruning based on interpretability has been attracting attention in recent years. The concept quantifies the "relevance/contribution" of the network elements using an interpretive attribution method and performs pruning accordingly. The prior art verifies feasibility on the middle-small scale CNN, and further explores the influence of propagation rule selection and super parameter setting of different layers on pruning effect. Nevertheless, the direction as a whole is mainly researched by exploratory studies, and there is still room for improvement in terms of uniformity and generalizability compared with the mature first-order/second-order method, and a unified framework capable of covering both CNN and transducer is still lacking. In order to solve the common problem of the existing pruning measurement, how to avoid the dependence on a nonlinear layer design complex propagation scheme when a model performs pruning, thereby reducing the implementation complexity and the calculation amount and being a technical problem to be solved in the field. Disclosure of Invention In order to solve the technical problems, the application aims to provide a neural network model pruning method and system, which remarkably simplify propagation rules through an established gating and filtering mechanism, avoid dependence on a nonlinear layer design complex propagation scheme and reduce complexity and calculation amount. In order to achieve the above purpose, the present application provides a neural network model pruning method and system. The above-mentioned application purpose of the application is realized through the following technical scheme: A neural network model pruning method, comprising: acquiring a first neural network model; Replacing a standard layer in the first neural network model to obtain a second neural network model; adding correlation gating in each residual module in the second neural network model for controlling correlation propagation, adding a sparse filter at the output end of a linear layer in the second neural network model for removing noise during correlation propagation, and finally obtaining a third neural network model. Preferably, the method further comprises: acquiring image data; and inputting the image data into the third neural network model, pruning according to the calculated component correlation score, and obtaining a fourth neural network model. Preferably, replacing the standard layer in the first neural network model includes: And replacing all linear layers with LRP linear modules according to a calculation logic rule of only rewriting forward propagation and backward propagation of the first neural network model, and replacing nonlinear layers and normalization layers with the LRP nonlinear mo