CN-119131632-B - Aerial image target detection method based on classification-positioning double-branch interactive distillation
Abstract
The invention discloses an aerial image target detection method based on classification-positioning double-branch interactive distillation, which comprises a pre-training teacher network, a student network, an interactive distillation area selection module and a self-adaptive generation distillation module, wherein knowledge of a pre-training teacher network and a student network detection head are mutually guided by constructing the interactive distillation area selection module to dynamically select an optimal distillation area, the classification branches of the student network detection head are integrated into the self-adaptive generation distillation module, the output characteristic layers of the classification branches of the student network detection head are masked, so that the output characteristic layers of the pre-training teacher network are reconstructed in a self-adaptive mode, the learning capacity of the student network is further enhanced, and finally KL divergence loss is introduced to restrict the distillation area. The invention realizes the interaction between classification and positioning tasks in the distillation process of the pre-training teacher network and the student network, and effectively improves the detection precision and speed of the student network while reducing the quantity of parameters.
Inventors
- WANG JUN
- Ran Yiyang
- LI YULIAN
- SHEN ZHENGWEN
- LI YAMENG
- YANG ZHIYONG
Assignees
- 中国矿业大学
Dates
- Publication Date
- 20260512
- Application Date
- 20240909
Claims (5)
- 1. An aerial image target detection method based on classification-positioning double-branch interactive distillation is characterized by comprising the following steps: Step S1, downloading a DIOR-R of an aerial image detection data set, randomly cutting an image into 800 multiplied by 800 block diagrams, randomly dividing the block diagrams with uniform sizes into a training data set and a test data set according to the proportion of 7:3, carrying out data enhancement on the training data set to form a teacher-student learning network training data set, and turning to step S2; S2, pre-training a teacher network by using a teacher-student learning network training data set to obtain a pre-training teacher network, and turning to step S3; s3, constructing a teacher-student learning network by utilizing a pre-training teacher network, a student network, a self-adaptive generation distillation module and an interactive distillation area selection module, and turning to S4; Step S4, inputting a teacher-student learning network training data set into a teacher-student learning network for training, extracting respective multi-scale characteristic layers and corresponding output predicted values of a pre-training teacher network and a student network, constructing an interactive distillation region selection module for dynamically selecting a proper classification distillation region and a positioning distillation region for distillation, constructing an adaptive generation distillation module for masking output characteristic layers of classification branches of the student network so as to self-adaptively reconstruct the output characteristic layers of the pre-training teacher network, fixing parameters of the pre-training teacher network, updating the parameters of the student network through an integral loss function of the student network, finally obtaining the trained student network, Wherein, an interactive distillation zone selection module is constructed, specifically as follows: the interactive distillation region selection module comprises a classification main distillation region, a positioning auxiliary classification distillation region and a classification auxiliary positioning distillation region; The classification main distillation area and the positioning main distillation area are directly determined by label distribution, namely the position of a positive sample during training, wherein the positioning auxiliary classification distillation area is used for guiding the classification distillation area of the student network to select by utilizing the positioning branch of the pre-training teacher network; Turning to step S5; and S5, inputting the test data set into a trained student network for detection, combining detection results of each block diagram, and outputting the types and positions of all targets in the test data set in the image to obtain the detection precision of the trained student network.
- 2. The method for detecting an aerial image target based on classification-localization dual-branch interactive distillation as claimed in claim 1, wherein in step S4, the selection of the classification main distillation area and the localization main distillation area in the interactive distillation area selection module is specifically as follows: Extracting all output characteristics of the detection head parts of the pre-training teacher network and the student network, and calculating the cross-ratio matrix between all anchor frames and the rotation truth frames of the first layer The formula is: ; Wherein, the Representing the preset anchor frame of the first layer, Represents a rotation truth box, and, Representing the degree of overlap of the two frames; Will be Is set to be , Representing positive sample thresholds set in the base frame greater than The regions of the two main distillation regions are classified main distillation regions and positioning main distillation regions which are required to be learned by a student network, and the masks of the two main distillation regions are respectively recorded as And Layer I classifying the loss function of the main distillation zone The calculation formula is as follows: ; layer I located main distillation zone loss function The calculation formula is as follows: ; Wherein, the A classification predictive value representing a first layer detection head of the student network, Representing the classification predictive value of the first layer detection head of the pre-training teacher network, The regression prediction value of the first layer detection head of the student network is represented, Representing regression predictive values of first layer detection heads of the pre-training teacher network, Indicating KL divergence loss; calculating the loss function of the classified main distillation area of all the characteristic layers and locating the loss function of the main distillation area, and finally classifying the loss function of the main distillation area The method comprises the following steps: ; Locating the loss function of a main distillation zone The method comprises the following steps: ; Wherein, the Indicating the total number of layers of the output feature.
- 3. The method for detecting an aerial image target based on classification-localization dual-branch interactive distillation as claimed in claim 2, wherein in step S4, the classification distillation region assisted by localization and the localization distillation region assisted by classification in the interactive distillation region selection module are selected as follows: for the selection of location-assisted classified distillation regions, first a pre-trained teacher network first layer detection head is calculated Convergence matrix of prediction frames and J truth frames The calculation formula is as follows: ; Wherein, the An output prediction frame for representing a first layer detection head of the pre-training teacher network; For each truth box, remove the first True value box and first layer detection head of pre-training teacher network Cross-ratio matrix of each prediction frame Maximum value in (1) Of the remaining values, the first k are calculated The sum of the values is recorded as the value of iou_sum, k is k=min (1000, [ iou_sum ]), [ iou_sum ] represents the downward rounding of iou_sum, let k be the number of samples required to be distilled by the truth box, the index value corresponding to the required distilled sample is the classification area which the student network should learn, the classification area which the student network should learn is not distilled any more in the area with the same index value as the classification main distillation area, the operation of obtaining the index value is defined as get_index (), then the mask of the classification distillation area which is assisted by the first layer detection head positioning is determined The calculation formula is as follows: ; Then the loss function corresponding to the classified distillation area assisted by the positioning of the first layer detection head The formula is: ; Calculating a loss function corresponding to the positioning auxiliary classified distillation area of each layer of detection heads, and finally, calculating a loss function corresponding to the positioning auxiliary classified distillation area The formula is: ; Wherein, the A classification predictive value representing all layers of detection heads of the student network, Representing the classification predictors of all layers of detection heads of the pre-trained teacher network, A mask representing a classified distillation area of all layer detection head positioning assistance; The selection process of the classification-assisted locating distillation area is similar to that of the locating-assisted locating distillation area, and the classification predicted values of all layers of detection heads of the pre-training teacher network are calculated Cross ratio matrix between all anchor frames and rotation truth frames of all layers In combination, a classification weighted cross-correlation matrix is defined The calculation formula is as follows: ; for each truth box, remove the first Classification weighted cross-correlation matrix of first layer detection heads of pre-training teacher network corresponding to true boxes Maximum value in (1) In the rest value, calculating the first k' classification weighted sum-of-ratios matrix And is recorded as iou_sum ', k' =min (1000, [ iou_sum ']), let k' be the number of samples required to be distilled by the truth box, the index value corresponding to the required distilled sample is the positioning area required to be learned by the student network, and the first layer detects the mask of the positioning distillation area assisted by head classification The method comprises the following steps: ; Then the loss function corresponding to the classification-assisted localized distillation zone of the first layer of detection heads The formula is: ; Wherein, the A classification weighted cross-correlation matrix representing a first layer detection head of the pre-training teacher network; Calculating a loss function corresponding to the classification-assisted positioning distillation area of each layer of detection heads, and finally, calculating a loss function corresponding to the classification-assisted positioning distillation area The formula is: ; Wherein, the The regression prediction value of all layers of detection heads of the student network is represented, Representing regression predictions for all layers of detection heads of the pre-trained teacher network, Mask values representing all layer detection head classification assisted localized distillation regions.
- 4. The method for detecting an aerial image target based on classification-positioning dual-branch interactive distillation as claimed in claim 3, wherein in step S4, an adaptive generation distillation module is constructed, specifically as follows: The first layer classification predictive value of the pre-trained teacher network is The first layer classification predictive value of the student network is , A high characteristic of the first layer is indicated, Representing the width of the features of the first layer, K representing the number of channels of the features; designing a random mask to carry out random mask on the predicted value of the branch output of the student network classification, wherein the calculation formula is as follows: ; Wherein, the Representing position on a layer I feature map The mask value at which the code is to be stored, Representing arbitrary position of layer I feature map The characteristic value of the position, Indicating the set mask threshold, if the position Characteristic value at Greater than or equal to the mask threshold Setting the mask value of the position to 1, otherwise setting the mask value to 0; predicting classification value of first layer of student network Mask values corresponding to respective positions of the first layer The multiplication is sent to a classification restorer which consists of two 3X 3 convolutions and a ReLU function, and the calculation formula of the restored classification characteristics is as follows: ; Wherein, the Classification prediction value representing first layer of student network The classification predicted value recovered after the random mask is deformed to obtain: ; Wherein, the The classification prediction probability that the classification prediction value of the first layer of the student network is recovered after being subjected to random masking is represented, , Representing a dimension transfer function; Will be Added to the classified main distillation zone and the location-assisted classified distillation zone, the loss function of the classified main distillation zone is redefined as : ; Wherein, the Representing the classification prediction probability of the first layer of the pre-training teacher network; the loss function of the location-assisted classified distillation zone is redefined as : ; Wherein, the The classification prediction probability representing the resumption of the classification prediction values of all layers of the student network after the random mask, Representing the classification prediction probabilities for all layers of the pre-trained teacher network.
- 5. The method for detecting an aerial image target based on classification-positioning double-branch interactive distillation according to claim 4, wherein in step S4, the construction of the integral loss function of the student network is specifically as follows: When training the student network, all parameters of the pre-training teacher network are kept unchanged, the parameters of the student network are updated through a loss function, and the overall loss function L of the student network is as follows: ; Wherein, the Representing the original detection task loss of the student network, wherein the original detection task loss comprises two parts of classification loss and regression loss; 、 、 、 representing coefficients used to balance the different distillation loss terms.
Description
Aerial image target detection method based on classification-positioning double-branch interactive distillation Technical Field The invention relates to the technical field of unmanned aerial vehicle aerial images, in particular to an aerial image target detection method based on classification-positioning double-branch interactive distillation. Background The aerial image target detection is an important research direction in the remote sensing field, and is widely applied to a plurality of fields such as military reconnaissance, disaster monitoring, urban planning and the like. Traditional high-precision aerial image target detection models are often exchanged at the expense of calculation amount and parameter amount, and memory resources and calculation resources of edge calculation platforms such as satellites, unmanned aerial vehicles and the like are limited. Therefore, while pursuing high detection accuracy, the model must be lightweight to enable the algorithm to be deployed on a mobile platform. In order to overcome the above problems, many model weight reduction methods have emerged. Knowledge distillation is a commonly used model compression technique that approximates the performance of a pre-trained complex teacher network by training a lightweight student network. Through knowledge distillation, the student network can greatly reduce the demand of computing resources under the condition of less performance loss, so that the student network is more suitable for deployment in practical application. In the knowledge distillation process, not all the prediction results of the pre-training teacher network play a role in promoting the learning process of the student network, and an improper learning area may reduce the learning effect. The detection head of the target detector is usually composed of two independent parallel classification branches and a positioning branch, and because the aerial image has the characteristics of high resolution, complex background, multi-scale targets and the like, the complexity can increase the difficulty of classification and positioning tasks, and the two independent branches are seriously misaligned when the targets are understood and predicted. Therefore, when knowledge distillation is performed, if the information of the classification branches and the positioning branches can be subjected to cross fusion distillation, the student network can be helped to learn the knowledge of the head of the pre-training teacher network detection head better. Disclosure of Invention The invention provides an aerial image target detection method based on classification-positioning double-branch interactive distillation, which effectively digs the knowledge contained in the detection head part, combines the interactive learning and self-adaptive distillation generation method, effectively enhances the robustness and detection performance of a model, and reduces the calculation amount of the model. The technical scheme of the invention is that the method for detecting the target of the aerial image based on classification-positioning double-branch interactive distillation comprises the following steps: Step S1, downloading a DIOR-R of an aerial image detection data set, randomly cutting an image into 800 multiplied by 800 block diagrams, randomly dividing the block diagrams with uniform sizes into a training data set and a test data set according to the proportion of 7:3, carrying out data enhancement on the training data set to form a teacher and student learning network training data set, and turning to step S2. And S2, pre-training the teacher network by using a teacher-student learning network training data set to obtain a pre-training teacher network, and turning to step S3. And S3, constructing a teacher-student learning network by using the pre-training teacher network, the student network, the self-adaptive generation distillation module and the interactive distillation area selection module, and turning to the step S4. Step S4, inputting a teacher-student learning network training data set into a teacher-student learning network for training, extracting respective multi-scale characteristic layers and corresponding output predicted values of a pre-training teacher network and a student network, constructing an interactive distillation region selection module for dynamically selecting a proper classification distillation region and a positioning distillation region for distillation, constructing an adaptive generation distillation module for masking output characteristic layers of classification branches of the student network so as to self-adaptively reconstruct the output characteristic layers of the pre-training teacher network, fixing parameters of the pre-training teacher network, updating the parameters of the student network through an integral loss function of the student network, finally obtaining a trained student network, and turning to step S5. And S5, input