CN-122024112-A - Knowledge distillation-based lightweight low-altitude intelligent identification method and system

CN122024112ACN 122024112 ACN122024112 ACN 122024112ACN-122024112-A

Abstract

The application provides a light-weight low-altitude intelligent identification method and system based on knowledge distillation, and relates to the field of image processing. The method comprises the steps of obtaining low-altitude remote sensing image data, inputting the image data into a lightweight low-altitude intelligent recognition model, extracting multi-scale feature information of an image through a feature extraction network, wherein the lightweight low-altitude intelligent recognition model comprises a feature extraction network, a feature pyramid feature fusion network and a target detection task head which are sequentially connected, performing cross-scale feature fusion based on the multi-scale feature information to obtain enhanced feature representation, executing a target detection task according to the enhanced feature representation, positioning and classifying targets in the low-altitude remote sensing image, and outputting a target recognition result in a low-altitude scene. The method is used for the light-weight low-altitude intelligent recognition process based on knowledge distillation, and solves the technical problem that the model cost cannot be reduced while the recognition accuracy is ensured in the prior art.

Inventors

SU LIPING
ZHANG YAFEI
XUE BOWEI
MA XIAOLIANG

Assignees

中科星图数字地球合肥有限公司

Dates

Publication Date: 20260512
Application Date: 20260410

Claims (10)

1. A light-weight low-altitude intelligent identification method based on knowledge distillation is characterized by comprising the following steps of: Acquiring low-altitude remote sensing image data; The low-altitude remote sensing image data is input into a lightweight low-altitude intelligent recognition model, multi-scale feature information of an image is extracted through a feature extraction network, the lightweight low-altitude intelligent recognition model comprises a feature extraction network, a feature pyramid feature fusion network and a target detection task head which are sequentially connected, wherein the lightweight low-altitude intelligent recognition model is obtained by constructing a training set based on the low-altitude remote sensing image data, carrying out knowledge distillation training by constructing a heterogeneous knowledge distillation frame between a teacher network and a student network, and carrying out pruning and quantization treatment on the student network after distillation training; Performing cross-scale feature fusion based on the multi-scale feature information to obtain enhanced feature representation; And executing a target detection task according to the enhanced feature representation, positioning and identifying the target in the low-altitude remote sensing image data, and outputting a target identification result in the low-altitude scene.
2. The method of claim 1, wherein the lightweight low-altitude intelligent recognition model is trained by way of knowledge distillation, comprising: Constructing a training set based on low-altitude remote sensing image data; Constructing a heterogeneous knowledge distillation frame between a teacher network and a student network, wherein the characteristic extraction network of the teacher network is a characteristic extraction network based on a visual transformation structure, and the student network is used for constructing a lightweight low-altitude intelligent recognition model; inputting a training set into the teacher network, extracting multi-scale teacher characteristics, and outputting soft label supervision information; Carrying out channel dimension matching and spatial resolution alignment on the multi-scale teacher features through a feature projection mapping module to obtain teacher supervision features consistent with the student network feature space; Inputting a training set into the student network, and extracting multi-scale student characteristics; Constructing a distillation loss function based on the soft tag supervision information and teacher supervision characteristics; and performing distillation training on the student network based on the multi-scale student characteristics, the multi-scale teacher characteristics, the teacher supervision characteristics and the loss function to obtain a distillation enhanced student network.
3. The method of claim 2, wherein the distillation training of the student network further comprises: performing multi-angle rotation transformation on images of the training set, and constructing rotation consistency distillation constraint for keeping feature representation of the student network consistent under different rotation input conditions; Determining a potential small target area according to teacher supervision characteristics, and generating a small target area mask; in the distillation training process, the distillation loss obtained by the loss function is weighted spatially by utilizing the small target area mask, so that the characteristic learning capacity of a student network on a tiny target in low-altitude remote sensing image data is enhanced; And constructing a multi-task learning mechanism through a target classification task, a target positioning task and a rotation angle prediction task, and performing joint optimization on the student network.
4. The method of claim 2, wherein the distillation loss function The following formula is satisfied: Wherein, the Distillation loss for soft label used for measuring teacher network output probability distribution Predicting probability distribution with student network Differences between; Alignment distillation loss for feature for measuring kth scale feature of student network Corresponding to the teacher's network to the kth scale feature Differences between; The representation feature mapping module is used for carrying out channel dimension matching and spatial resolution alignment on teacher features, wherein N represents the number of feature layers participating in distillation; the weight coefficient is the characteristic distillation loss; the square of L2 norm is represented, i is the target category index, S is the student network, and T is the teacher network.
5. The method of claim 2, wherein after the distillation-enhanced student network, the method further comprises a light-weight process for the student network comprising: Removing redundant convolution layers and low-contribution characteristic channels in the student network in a structured pruning mode to obtain a pruned student network; performing model quantization processing on the pruned student network, and performing low-bit quantization on the network weight and the activation value to obtain a quantized student network; and (3) carrying out model format conversion and graph optimization on the quantized student network to obtain a lightweight low-altitude intelligent recognition model.
6. The method according to claim 1, wherein the inputting the low-altitude remote sensing image data into a lightweight low-altitude smart identification model extracts multi-scale feature information of an image through a feature extraction network, comprising: inputting the low-altitude remote sensing image data into the feature extraction network, and carrying out layer-by-layer convolution feature extraction on the image to obtain basic feature images with different depths; Inputting the basic feature map into the feature pyramid feature fusion network, performing up-sampling and down-sampling processing on feature maps with different scales, and realizing multi-scale feature fusion through cross-layer connection to obtain a plurality of scale feature layers, wherein the multi-scale feature layers comprise a high-resolution feature layer, a medium-resolution feature layer and a low-resolution feature layer and are used for representing feature information of targets with different scales; And outputting the multi-scale characteristic layer as multi-scale characteristic information.
7. The method of claim 1, wherein the cross-scale feature fusion based on the multi-scale feature information to obtain an enhanced feature representation comprises: performing up-sampling processing on a low-resolution feature layer in the multi-scale feature information, performing feature fusion with a medium-resolution feature layer in the multi-scale feature information, and outputting a first fusion feature; Performing up-sampling processing on the first fusion feature, performing feature fusion with a high-resolution feature layer in the multi-scale feature information, and outputting a second fusion feature; Performing downsampling treatment on the high-resolution feature layer in the multi-scale feature information, performing feature fusion with the medium-resolution feature layer in the multi-scale feature information, and outputting a third fusion feature; Performing downsampling processing on the third fusion feature, performing feature fusion with a low-resolution feature layer in the multi-scale feature information, and outputting a fourth fusion feature; and carrying out feature enhancement processing on the second fusion feature and the fourth fusion feature to obtain enhanced feature representation.
8. The method of claim 7, wherein the feature enhancement processing the second fused feature and the fourth fused feature to obtain an enhanced feature representation comprises: Carrying out channel attention weighting processing on the channel dimension of the second fusion feature to obtain a first enhancement feature; carrying out channel attention weighting treatment on the channel dimension of the fourth fusion feature to obtain a second enhancement feature; splicing and fusing the first enhancement feature and the second enhancement feature to obtain a fusion feature; And generating a spatial weight based on the importance of the spatial position of the fusion feature, and weighting the fusion feature by using the spatial weight to obtain the enhanced feature representation.
9. The method of claim 1, wherein performing the object detection task based on the enhanced feature representation, locating and classifying objects in the low-altitude remote sensing image data, and outputting an object recognition result in a low-altitude scene, comprises: Inputting the enhanced feature representation into a target detection network, and performing feature decoding processing on the enhanced feature representation to obtain a detection feature map; generating a plurality of candidate target areas based on the detection feature map, and extracting area features corresponding to the candidate target areas; constructing a target classification branch and a target positioning branch by utilizing the regional characteristics; Carrying out category probability prediction on the candidate target region through the target classification branch, and carrying out bounding box position regression on the candidate target region through the target positioning branch to obtain a candidate target detection result; extracting contextual features of the low-altitude scene based on the enhanced feature representation; and carrying out confidence adjustment and redundancy inhibition processing on the candidate target detection result based on the context characteristics to obtain a target identification result.
10. A knowledge distillation-based light-weight low-altitude intelligent recognition system is characterized by comprising data acquisition equipment and electronic equipment; The data acquisition equipment is used for acquiring low-altitude remote sensing image data; The electronic equipment is used for inputting low-altitude remote sensing image data into a feature extraction network of a lightweight low-altitude intelligent recognition model to extract multi-scale feature information of an image, the lightweight low-altitude intelligent recognition model comprises a feature extraction network, a feature pyramid feature fusion network and a target detection task head which are sequentially connected, the lightweight low-altitude intelligent recognition model is obtained by constructing a training set based on the low-altitude remote sensing image data, carrying out knowledge distillation training through constructing a heterogeneous knowledge distillation frame between a teacher network and a student network, carrying out pruning and quantization processing on the student network after distillation training, carrying out cross-scale feature fusion based on the multi-scale feature information to obtain enhanced feature representation, executing a target detection task according to the enhanced feature representation, carrying out positioning and category recognition on a target in the low-altitude remote sensing image, and outputting a target recognition result in the low-altitude scene.

Description

Knowledge distillation-based lightweight low-altitude intelligent identification method and system Technical Field The application relates to the field of image processing, in particular to a light-weight low-altitude intelligent identification method and system based on knowledge distillation. Background Compared with the traditional satellite remote sensing, the low-altitude remote sensing has the advantages of low acquisition cost, high spatial resolution, flexible acquisition period and the like, and can acquire finer ground scene information in a smaller scale range. However, because the low-altitude remote sensing image generally has the characteristics of large target size difference, dense target distribution, obvious visual angle change and the like, the accurate identification and positioning of targets in the image still have great challenges. In recent years, with the development of deep learning technology, a target detection method based on a convolutional neural network is widely applied to remote sensing image target recognition tasks. The existing method generally performs feature expression on images by constructing a multi-layer feature extraction network and combines a multi-scale feature fusion structure so as to improve the detection capability of the model on targets with different scales. However, in a low-altitude remote sensing scene, the target generally has the characteristics of small size, dense quantity, complex background and the like, and the conventional target detection method still has a certain limitation in practical application. On the one hand, in order to obtain higher recognition accuracy, the existing target detection model generally adopts a deeper or more complex network structure, which can lead to larger model parameter scale and higher calculation amount, so that efficient deployment is difficult to realize on unmanned aerial vehicle platforms or edge computing equipment with limited resources. On the other hand, in a complex low-altitude scene, the information interaction mode among different scale features has an important influence on the target detection performance, and the conventional multi-scale feature fusion method still has a further improvement space in terms of feature expression capability and information utilization efficiency. Therefore, how to reduce model overhead while guaranteeing recognition accuracy becomes a technical problem to be solved in the current low-altitude intelligent perception field. Disclosure of Invention The application provides a light-weight low-altitude intelligent recognition method and system based on knowledge distillation, which solve the technical problem that the model cost cannot be reduced while the recognition accuracy is ensured in the prior art. In order to achieve the above purpose, the application adopts the following technical scheme: The method comprises the steps of obtaining low-altitude remote sensing image data, inputting the image data into a light-altitude intelligent recognition model, extracting multi-scale feature information of an image through a feature extraction network, enabling the light-altitude intelligent recognition model to comprise a feature extraction network, a feature pyramid feature fusion network and a target detection task head which are sequentially connected, conducting cross-scale feature fusion based on the multi-scale feature information to obtain enhanced feature representation, executing a target detection task according to the enhanced feature representation, positioning and classifying targets in the low-altitude remote sensing image, and outputting a target recognition result in a low-altitude scene. With reference to the first aspect, in one possible implementation manner, the lightweight low-altitude intelligent recognition model is obtained by training in a knowledge distillation manner, and comprises the steps of constructing a training set based on low-altitude remote sensing image data, constructing a heterogeneous knowledge distillation framework between a teacher network and a student network, wherein the teacher network is a feature extraction network based on a visual transformation structure, the student network is used for constructing the lightweight low-altitude intelligent recognition model, inputting the training set into the teacher network, extracting multi-scale teacher features and outputting soft tag supervision information, performing channel dimension matching and spatial resolution alignment on the multi-scale teacher features through a feature projection mapping module to obtain teacher supervision features consistent with the characteristic space of the student network, inputting the training set into the student network, extracting multi-scale student features, constructing a distillation loss function based on the soft tag supervision information and the teacher supervision features, and performing distillation training on the student networ