CN-115271059-B - Optical remote sensing target detection light-weight method based on knowledge distillation network
Abstract
The invention provides an optical remote sensing target detection light-weight method based on a knowledge distillation network, which adopts a model compression method of knowledge distillation to reduce the network size and generate a light-weight chemical generation network which accords with the deployment of on-board load and is used for target detection, namely the invention is based on a general deep learning framework, according to the satellite-borne GPU with limited computing resources and reasonable design, a light-weight network structure is adopted, so that the model can be used for realizing rapid, accurate and flexible remote sensing image target detection while maintaining higher detection accuracy, realizing on-orbit real-time computation of key frame remote sensing images, and being applicable to a satellite-borne GPU platform with limited computing resources.
Inventors
- NI SHUYAN
- WANG HAINING
- YANG YING
- FU QIWEI
- XU JING
- CUI JIANHUA
- LIAO YURONG
Assignees
- 中国人民解放军战略支援部队航天工程大学
Dates
- Publication Date
- 20260512
- Application Date
- 20220330
Claims (5)
- 1. The optical remote sensing target detection light-weight method based on the knowledge distillation network is characterized in that a target detection neural network model based on the knowledge distillation is adopted to carry out target detection on an optical remote sensing image shot by a satellite in real time, wherein the acquisition method of the target detection neural network model comprises the following steps: S1, acquiring a target area and a background area of each optical remote sensing image, respectively taking the optical remote sensing images and the target areas as input and output of CSPNet v networks, and training the CSPNet v networks to obtain a teacher network; S2, taking a teacher network as input of a distillation network, and training the distillation network under the guidance of network parameters of the teacher network to obtain a student network; Wherein the distillation process comprises the steps of: 1) Expanding the output characteristics of the learning top layer to the characteristics of the middle layer, so that more knowledge is transferred to the student network, and outputting a fine characteristic diagram; 2) Designing a distillation loss function to restrict the training process of a student network and a teacher network, and endowing a key target area in a remote sensing image with a high weight value and a background information area with a low weight value; the loss function comprises two constraints, namely, rotation IoU loss of executing regression tasks on the target detection neural network, a distillation loss for information transfer between a teacher network and a student network and a training process of restraining the distillation network; 3) The parameters of the student network are updated in a back propagation mode, and the weights of the teacher network nodes are updated on a downstream task in a forward propagation mode, and the method specifically comprises the steps of judging whether distillation loss between the student network and the teacher network is smaller than a set threshold value, if so, the current student network is the target detection neural network model, if not, the distillation loss is used for the back propagation of the distillation network, and meanwhile, the rotation IoU loss is used for the back propagation of the CSPNet v network, so that the distillation network and the CSPNet v network are adjusted until the distillation loss is smaller than the set threshold value; the loss function is expressed as: wherein M is a network parameter of the model, Indicating that the rotation IoU is losing constraint, In order to balance the parameters of the device, Representing distillation loss between the student network and the teacher network; The rotation IoU is lost Is defined as follows: Wherein, the For the corner coordinates used for representing the real detection frame of the target and generated by adopting the 180-degree five-parameter representation method, The angular point coordinates are angular point coordinates which are generated by adopting a 180-degree five-parameter representation method and used for representing a target prediction frame after distillation, and the angular point coordinates comprise a frame width, a frame height, a target center point abscissa, a target center point ordinate and a rotation angle of the frame; said distillation loss Is defined as follows: wherein W, H, S respectively represent the length, width and channel number of the characteristic diagram representing the characteristic diagram generated in the distillation network training process, Representing the distillation loss relationship between the teacher network and the student network, Representing a feature map obtained by extracting features from a student network, Representing a feature map obtained by extracting features by a teacher network, In order to set the balance parameter(s), Is a characteristic diagram The sum of the number of pixels not being 0.
- 2. The method of claim 1, wherein the CSPNet v network is a YOLOX network, the convolutional layers are depth separable convolutional layers, and the connections between network nodes are jump connections.
- 3. The method for light weight optical remote sensing target detection based on knowledge distillation network as claimed in claim 1, wherein the convolution calculation process of CSPNet v network and distillation network is divided into deep convolution and point-by-point convolution operation, wherein the number of output channels of DEPTHWISE layers is set to 128, the convolution kernel size is 3 x 1, the number of output channels of Pointwise layers is set to be consistent with DEPTHWISE layers, and the convolution kernel size is 1 x 1.
- 4. The method of claim 1, wherein each of the optical remote sensing images in step S1 is extracted from a DIOR dataset.
- 5. The method for light-weight optical remote sensing target detection based on a knowledge distillation network according to claim 1, wherein after each optical remote sensing image is subjected to filtering and denoising treatment and uniformly cut into pixel blocks with set sizes, the pixel blocks are divided into target pixel blocks and background pixel blocks, and finally the target pixel blocks and the background pixel blocks are used as input of the CSPNet v network.
Description
Optical remote sensing target detection light-weight method based on knowledge distillation network Technical Field The invention belongs to the field of intelligent image processing, and particularly relates to an optical remote sensing target detection light-weight method based on a knowledge distillation network. Background Along with the development of information processing technology to the intelligent direction, the importance of the deep convolutional neural network in the remote sensing image target detection field is increasingly highlighted. Because the optical remote sensing image acquired by the satellite load has huge scale, the real-time processing of the remote sensing image is difficult to realize by using a satellite downloading data mode, and therefore, the on-orbit processing technology becomes an important development direction of the optical remote sensing image target detection. The deep convolutional neural network model often has the problems of large model parameters, network node redundancy and the like, and causes serious memory overhead, so that the model is limited to be deployed on a satellite-borne GPU with limited computing resources. In addition, the traditional satellite remote sensing image target detection technology adopts a data downloading mode to detect the target at the ground end, and the result often does not have real-time processing capability. Therefore, the traditional space optical remote sensing technology is difficult to adapt to accurate and flexible rapid response requirements of disaster relief, intelligence reconnaissance, target monitoring and the like. Disclosure of Invention In order to solve the problems, the invention provides the optical remote sensing target detection light-weight method based on the knowledge distillation network, which can detect the key frame image acquired by the satellite optical load in real time, quickly, accurately and flexibly, and can be deployed on a hardware platform with limited computing resources such as satellite on-orbit processing and the like. The method for light-weight optical remote sensing target detection based on the knowledge distillation network adopts a target detection neural network model based on the knowledge distillation to carry out target detection on an optical remote sensing image shot by a satellite in real time, wherein the method for acquiring the target detection neural network model comprises the following steps: S1, acquiring a target area and a background area of each optical remote sensing image, respectively taking the optical remote sensing images and the target areas as input and output of CSPNet v networks, and training the CSPNet v networks to obtain a teacher network; S2, taking a teacher network as input of a distillation network, and training the distillation network under the guidance of network parameters of the teacher network to obtain a student network; And S3, judging whether distillation loss between the student network and the teacher network is smaller than a set threshold, if so, using the current student network as the target detection neural network model, if not, using the distillation loss for back propagation of the distillation network, and simultaneously using the rotation IoU loss for back propagation of the CSPNet v network, so as to adjust the distillation network and the CSPNet v network until the distillation loss is smaller than the set threshold. Further, the rotation IoU is lostIs defined as follows: wherein N is the angular point coordinate which is generated by adopting a 180-degree five-parameter representation method and is used for representing the real detection frame of the target, The angular point coordinates used for representing the target prediction frame after distillation are generated by adopting a 180-degree five-parameter representation method, and comprise a frame width, a frame height, a target center point abscissa, a target center point ordinate and a rotation angle of the frame. Further, the distillation loss L tea-stu (X, Y) is defined as follows: Wherein W, H, S respectively represents the length, width and channel number of the characteristic diagram generated in the distillation network training process, F feature represents the distillation loss relation between the teacher network and the student network, F stu (x, y, z) represents the characteristic diagram obtained by extracting the characteristics of the student network, F tea (x, y, z) represents the characteristic diagram obtained by extracting the characteristics of the teacher network, ω is a set balance parameter, and M is the sum of the number of pixel points which are not 0 in the characteristic diagram F tea (x, y, z). Further, the CSPNet v network has a YOLOX frame, the convolutional layers are depth separable convolutional layers, and the connections between network nodes are hop connections. Further, the convolution calculation process of CSPNet v network and di