CN-122023911-A - Small target detection model training method, small target detection method and system based on diffusion model

CN122023911ACN 122023911 ACN122023911 ACN 122023911ACN-122023911-A

Abstract

The application discloses a small target detection model training method, a small target detection method and a small target detection system based on a diffusion model, and relates to the field of computer vision, wherein the small target detection model training method comprises the steps of training a small target detection model by using a high-resolution image and a low-resolution image, wherein the training comprises forward propagation and backward propagation; the small target detection model comprises a backbone network, a super-resolution branch, a target detection branch and a dense condition module; back-propagating the small target detection model according to the loss value to optimize the small target detection model; according to the method and the device, the super-resolution branch and the target detection branch are jointly optimized in the training process, the target detection branch provides accurate target position information for the super-resolution branch, and the super-resolution branch pertinently enhances the characteristics of a target area, so that the accuracy of small target detection is improved.

Inventors

WANG ZELONG
CHANG YAJUN
WANG YINGYING
Ling Chengyang
WANG XINGWANG
ZHENG ZHONGHUA
HAN RUI
ZHANG CHI

Assignees

中国人民解放军国防科技大学

Dates

Publication Date: 20260512
Application Date: 20260130

Claims (10)

1. The small target detection model training method based on the diffusion model is characterized by comprising the following steps of: training a small target detection model using high resolution images and low resolution images of visible light pictures, the training process comprising forward propagation and backward propagation, wherein: The forward propagation comprises inputting the high-resolution image and the low-resolution image into a small target detection model to obtain a detection result of a small target to be detected in the visible light picture, wherein the small target detection model comprises a backbone network, a super-resolution branch, a target detection branch and a dense condition module, and the small target detection model comprises the following components: The backbone network is used for extracting the characteristics of the low-resolution image to obtain multi-scale characteristics, wherein the multi-scale characteristics are used as the input of the target detection branch and the dense condition module; The target detection branch consists of a feature pyramid and a path aggregation network and is used for processing the multi-scale features to obtain predicted confidence, a classification label and a boundary frame coordinate, wherein the predicted confidence, the classification label and the boundary frame coordinate are used as detection results to define a small target to be detected in the visible light image; the super-resolution branch is used for adding noise to the high-resolution image, converting the high-resolution image into a pure noise image, and extracting features of the pure noise image by adopting a neural network model to obtain downsampling features and upsampling features; The dense condition module is used for fusing the multi-scale features, the downsampling features and the upsampling features to obtain a super-resolution image; The back propagation comprises calculating a loss value of the small target detection model according to the confidence level of prediction, the classification label and the boundary frame coordinates, and the super-resolution image and the super-resolution true value; And when the execution times of the training process reach a preset value, obtaining the trained small target detection model.
2. The method of claim 1, wherein the feature extraction of the low resolution image to obtain multi-scale features comprises: sequentially performing multi-stage downsampling operation on the low-resolution image by using a backbone network of YOLOv s model to obtain multi-scale features, wherein the multi-scale features comprise multi-layer downsampling features Wherein, the first The expression of the layer down sampling feature is: ; wherein: representing the first feature extraction of the low resolution image from the backbone network of YOLOv s model A sub-layer sampling feature; A backbone network representing YOLOv s models; representing a low resolution image.
3. The diffusion model-based small target detection model training method of claim 2, wherein the processing the multi-scale features to obtain predicted confidence, classification labels, and bounding box coordinates comprises: processing the three later layers of features in the multi-layer downsampling features by using a target detection branch formed by a feature pyramid and a path aggregation network to obtain a processing result; inputting the processing result into a detection head to obtain target confidence coefficient, class probability and boundary frame offset; And adopting non-maximal inhibition de-duplication processing to obtain the confidence coefficient of the prediction, the classification label and the boundary frame coordinates.
4. The diffusion model-based small object detection model training method of claim 1, wherein the adding noise to the high resolution image converts the high resolution image to a pure noise image, comprising: adding noise conforming to standard normal distribution to the high-resolution image to obtain a noise image; For time steps From 1 to Definition of Time of day noise image And Time of day noise image The relation is satisfied: ; wherein: for a fixed constant, Step over time The increase in (2) becomes larger; representing noise; Definition of the definition The relation is deformed into: ; Deducing a deformed relational expression by a mathematical induction method: ; wherein: A high resolution image representing an initial time; Increasing the time step Make the time step Tends to be To convert the high resolution image to a pure noise image.
5. The method for training a small target detection model based on a diffusion model according to claim 1, wherein the fusing the multi-scale feature, the downsampling feature and the upsampling feature to obtain a super-resolution image comprises: Acquiring multi-scale features, wherein the multi-scale features are multi-layer downsampling features obtained by multi-stage feature extraction of a low-resolution image by a backbone network, and the multi-layer downsampling features comprise high-layer semantic features obtained by feature extraction of a last stage and a plurality of other layer features obtained by feature extraction of other stages; And respectively carrying out feature fusion on the high-level semantic features and other layer features to obtain enhanced features, and carrying out up-sampling operation on the enhanced features with the largest size, wherein the generation mode of each enhanced feature is as follows: ; wherein: Represent the first A plurality of enhancement features; Respectively representing the weight and bias of the convolution; representing an upsampling operation; acquiring downsampling characteristics and upsampling characteristics, wherein the downsampling characteristics and upsampling characteristics are obtained by extracting characteristics of the pure noise image through a neural network model; Splicing the upsampled features and the downsampled features with the same scale, and modeling the spliced features by using the enhanced features to obtain potential features, wherein the potential features are generated in the following way: ; wherein: Representing potential features; Represent the first A plurality of enhancement features; Represent the first A plurality of downsampling features; Represent the first Individual upsampling features ; And processing the potential features by adopting implicit feature representation to obtain the super-resolution image.
6. The diffusion model-based small target detection model training method according to claim 1, wherein the calculating the loss value of the small target detection model according to the confidence level of prediction, the classification label and the bounding box coordinates, and the super-resolution image and the super-resolution true value comprises: Calculating a loss value of the target detection branch according to the confidence level of prediction, the classification label and the boundary frame coordinates; Calculating a loss value of the super-resolution branch according to the super-resolution image and the super-resolution true value; Calculating a loss value of a small target detection model by using the loss value of the target detection branch and the loss value of the super-resolution branch, wherein the calculation formula of the loss value of the small target detection model is as follows: ; wherein: Representing a loss value of the small target detection model; A loss value representing the target detection branch; A loss value representing a super-resolution branch; representing the adjustment factor.
7. The method for training a small target detection model based on a diffusion model according to claim 6, wherein the calculation formula for calculating the loss value of the target detection branch according to the confidence level of prediction, the classification label and the bounding box coordinates is: ; wherein: A loss value representing the target detection branch; respectively representing prediction and true confidence; Representing the predicted and actual class labels respectively, Representing the predicted and actual bounding box coordinates respectively, Representing the cross-entropy loss function, The loss function of the L1 is indicated, Representing the weight coefficient.
8. The method of claim 7, wherein calculating the loss value of the super-resolution branch from the super-resolution image and the super-resolution true value comprises: Creating a binary mask for the visible light image, wherein the binary mask is used for setting the pixel value in a target area to be 1 and setting the pixel value in a background area to be 0; Multiplying the binary mask with the high-resolution image to obtain a true value of the super-resolution image; calculating the loss value of the super-resolution branch according to the true value of the super-resolution image and the super-resolution image, wherein the calculation formula of the loss value of the super-resolution branch is as follows: ; wherein: representing a super-resolution image; Representing a binary mask; Representing a high resolution image; Representation of Norms.
9. A small target detection method, characterized in that the small target detection method comprises: Acquiring a high-resolution image and a low-resolution image of a target image; inputting the high-resolution image and the low-resolution image of the target image into a small target detection model based on a diffusion model to output a small target to be detected in the target image; the small target detection model based on the diffusion model is obtained by training according to the small target detection model training method based on the diffusion model as set forth in any one of claims 1 to 8.
10. A small target detection system, the small target detection system comprising: an image input unit for acquiring a high resolution image and a low resolution image of a target image; the small target detection unit is used for inputting the high-resolution image and the low-resolution image of the target image into a small target detection model based on the diffusion model so as to output a small target to be detected in the target image; the small target detection model based on the diffusion model is obtained by training according to the small target detection model training method based on the diffusion model as set forth in any one of claims 1 to 8.

Description

Small target detection model training method, small target detection method and system based on diffusion model Technical Field The application relates to the field of computer vision, in particular to a small target detection model training method, a small target detection method and a small target detection system based on a diffusion model. Background The super-resolution technology can directly improve the resolution of the small target, recover the detailed information of the small target, and provide a solution path with potential for improving the detection performance of the small target. However, the superdivision method based on convolutional neural network often causes excessive smoothing of the image and lack of high-frequency details, while the method based on generating the countermeasure network (GAN) can generate sharper textures, but has the problems of unstable training and artifact. In contrast, the diffusion model takes random noise as a starting point, reconstructs an image by step denoising, has the advantages of simple mathematical expression and stable training, and is hopeful to break through the limitation of the existing method. Diffusion model is one of the mainstream methods in the current depth generation model field. Depending on the conditions of generation, diffusion models can be classified into two types, unconditional generation and conditional generation. The former focuses on increasing the diversity of the generated samples, while the latter controls the output result by introducing condition information so as to meet the expected target. Currently, diffusion models have been successfully applied in a number of fields of computer vision. However, the diffusion model is difficult to directly integrate into a superminute task to support real-time detection due to high calculation cost and low reasoning speed, and image distortion and mode collapse exist when a super-resolution technology is adopted to assist in small target detection, so that the target detection precision is limited. Disclosure of Invention The application aims to provide a small target detection model training method, a small target detection method and a small target detection system based on a diffusion model, which can improve the accuracy of small target detection. In order to achieve the above object, the present application provides the following solutions: In a first aspect, the present application provides a small target detection model training method based on a diffusion model, where the small target detection model training method based on the diffusion model includes: training a small target detection model using high resolution images and low resolution images of visible light pictures, the training process comprising forward propagation and backward propagation, wherein: The forward propagation comprises inputting the high-resolution image and the low-resolution image into a small target detection model to obtain a detection result of a small target to be detected in the visible light picture, wherein the small target detection model comprises a backbone network, a super-resolution branch, a target detection branch and a dense condition module, and the small target detection model comprises the following components: The backbone network is used for extracting the characteristics of the low-resolution image to obtain multi-scale characteristics, wherein the multi-scale characteristics are used as the input of the target detection branch and the dense condition module; The target detection branch consists of a feature pyramid and a path aggregation network and is used for processing the multi-scale features to obtain predicted confidence, a classification label and a boundary frame coordinate, wherein the predicted confidence, the classification label and the boundary frame coordinate are used as detection results to define a small target to be detected in the visible light image; the super-resolution branch is used for adding noise to the high-resolution image, converting the high-resolution image into a pure noise image, and extracting features of the pure noise image by adopting a neural network model to obtain downsampling features and upsampling features; The dense condition module is used for fusing the multi-scale features, the downsampling features and the upsampling features to obtain a super-resolution image; The back propagation comprises calculating a loss value of the small target detection model according to the confidence level of prediction, the classification label and the boundary frame coordinates, and the super-resolution image and the super-resolution true value; And when the execution times of the training process reach a preset value, obtaining the trained small target detection model. In a second aspect, the present application provides a small target detection method, the small target detection method comprising: Acquiring a high-resolution image and a low-resolutio