CN-121999401-A - Road disease detection method and device under unmanned aerial vehicle visual angle, electronic equipment and program product

CN121999401ACN 121999401 ACN121999401 ACN 121999401ACN-121999401-A

Abstract

The application discloses a road disease detection method and device under an unmanned aerial vehicle visual angle, electronic equipment and a program product. The method is realized based on the target detection model after training, and when the target detection model is trained, the initial query is dynamically regulated through a self-adaptive noise query generation mechanism, so that the generated noise query is more in line with the target distribution characteristics, and the query expression stability, the model convergence efficiency and the detection precision are improved. After training is completed, the model removes the noise query generation process, and the reasoning complexity is not increased. In addition, during training, the Hungary matching cost function and the consistency loss function combined with scale perception, form consistency and direction consistency are designed, so that the matching reliability of a small target and a complex disease target can be improved, and the characterization capability of space continuity features of cracks, joints and strip diseases can be enhanced no matter the model is in a training process or an application process, and a backbone network of the model is provided with a direction perception strip feature enhancement module.

Inventors

WANG PENGFEI
WANG PENG
LIU JIAMEI
WU YINYIN

Assignees

深圳市锐明像素科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260410

Claims (10)

1. The method for detecting the road disease under the view angle of the unmanned aerial vehicle is characterized by comprising the following steps of: inputting the acquired image to be detected into a pre-trained target detection model to obtain a road disease detection result corresponding to the image to be detected, wherein the image to be detected is a road image under the view angle of the unmanned aerial vehicle; The target detection model is obtained by training an initial detection model, wherein the initial detection model comprises a backbone network, a Transformer coding network, a Transformer decoding network, a detection module and a detection module, wherein the backbone network is used for extracting characteristics of an input image to obtain image characteristics; In the training process of the initial detection model, a ADQG module is arranged on the transducer coding network, the ADQG module is used for generating noise inquiry related to the content of the training sample according to a real target frame of the training sample and combining with image characteristics corresponding to the training sample, the transducer decoding network is also used for executing a denoising task according to the noise inquiry to obtain the real target frame corresponding to the noise inquiry so as to assist the initial detection model to converge, and the target detection model is obtained by converging and removing the initial detection model of the ADQG module.
2. The road disease detection method of claim 1, wherein the ADQG module is specifically configured to: for each training sample: Splitting the coded image features corresponding to the current training sample into multi-scale two-dimensional features; Determining target content local features corresponding to a real target frame of the current training sample in the two-dimensional features of each scale; Splicing the local features of the target content under all scales to obtain the features of the target content; and generating a noise query corresponding to the current training sample based on the target content characteristics and the real target frame.
3. The road disease detection method of claim 2, wherein the generating the noise query corresponding to the current training sample based on the target content features and the real target frame comprises: Generating disturbance information according to the target content characteristics; Determining a noise reference frame based on the disturbance information and scale information corresponding to the real target frame; And generating a noise query corresponding to the current training sample based on the target content characteristics and the noise reference frame.
4. The road disease detection method of claim 1, wherein the loss function of the initial detection model comprises a hungarian matching cost function and a matching consistency loss function, wherein the hungarian matching cost function introduces scale constraints for representing target scale differences, morphological constraints for representing target morphological differences and direction constraints for representing target direction differences on the basis of classification cost, position cost and overlapping cost, and the matching consistency loss function is used for constraining consistency between prediction results in different decoding stages.
5. The road disease detection method according to any one of claims 1 to 4, wherein the backbone network is provided with DSFEM modules, and the DSFEM modules are used for extracting direction-aware features along two directions of a feature map respectively to model spatial continuity features of a strip disease target.
6. The method for detecting road disease as defined in claim 5, wherein the DSFEM module is configured to extract direction-sensing features from two directions of the input features, respectively, to model spatial continuity features of the bar-shaped disease target, and comprises: Performing feature transformation on the input features to obtain basic features; Performing stripe space attention operation on the basic features along two directions to obtain corresponding directional stripe enhancement features; Carrying out statistical treatment on the enhancement features of the strips in each direction to obtain corresponding direction weights; weighting the corresponding directional strip enhancement features based on the directional weight of each, and fusing the weighted directional strip enhancement features to obtain directional fusion features; Fusing the direction fusion feature and the basic feature based on preset learnable parameters to obtain a target enhancement feature; And fusing the target enhancement feature with the input feature to obtain an output feature corresponding to the input feature.
7. The method of claim 6, wherein the performing a strip spatial attention operation on the base feature in two directions results in a corresponding directional strip enhancement feature, comprising: In each direction: Global average pooling is carried out on the basic features, and pooling results are obtained; Sequentially performing convolution operation, attention operation and remolding operation on the pooling result to obtain a strip weight; Expanding the basic features along a sliding window in the current direction to obtain a plurality of local strips in the current direction; Weighting the local strips in the current direction after the remodeling based on the strip weights respectively to obtain the enhancement characteristics of the local strips in the current direction; and summing all the local strip enhancement features, and remolding the summation result to obtain the current direction strip enhancement features.
8. Road disease detection device under unmanned aerial vehicle visual angle, its characterized in that includes: The detection module is used for inputting the acquired image to be detected into a target detection model which is trained in advance to obtain a road disease detection result corresponding to the image to be detected, wherein the image to be detected is a road image under the view angle of the unmanned aerial vehicle; The target detection model is obtained by training an initial detection model, wherein the initial detection model comprises a backbone network, a Transformer coding network, a Transformer decoding network, a detection module and a detection module, wherein the backbone network is used for extracting characteristics of an input image to obtain image characteristics; In the training process of the initial detection model, a ADQG module is arranged on the transducer coding network, the ADQG module is used for generating noise inquiry related to the content of the training sample according to a real target frame of the training sample and combining with image characteristics corresponding to the training sample, the transducer decoding network is also used for executing a denoising task according to the noise inquiry to obtain the real target frame corresponding to the noise inquiry so as to assist the initial detection model to converge, and the target detection model is obtained by converging and removing the initial detection model of the ADQG module.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the method of road fault detection under the perspective of a drone as claimed in any one of claims 1 to 7.
10. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method for road disease detection at the unmanned aerial vehicle viewing angle according to any one of claims 1 to 7.

Description

Road disease detection method and device under unmanned aerial vehicle visual angle, electronic equipment and program product Technical Field The application belongs to the technical field of image processing, and particularly relates to a road disease detection method under an unmanned aerial vehicle visual angle, a road disease detection device under the unmanned aerial vehicle visual angle, electronic equipment and a computer program product. Background Along with the wide application of unmanned aerial vehicle platforms in the field of urban inspection and road maintenance, automatic detection of road diseases based on unmanned aerial vehicle aerial images gradually becomes an important technical means in intelligent traffic and urban fine management. Compared with the traditional vehicle-mounted detection mode, the unmanned aerial vehicle has the advantages of flexible visual angle, wide coverage range, low deployment cost and the like, and can quickly patrol scenes such as urban roads, park roads, expressways and the like under the condition that traffic operation is not affected. However, the road disease targets in the aerial image of the unmanned aerial vehicle generally have the characteristics of large scale difference, discrete distribution, higher small target occupation and the like, so that the accurate detection of different disease targets is difficult to stably realize by the existing detection model. Particularly, in a target detection framework based on Query, noise Query is generated by random noise addition on the basis of the position of a real target frame to assist model convergence, but in a complex scene, the denoising training mode has limited improvement on model convergence efficiency and detection precision. Disclosure of Invention The application provides a road disease detection method under an unmanned aerial vehicle visual angle, a road disease detection device under the unmanned aerial vehicle visual angle, electronic equipment and a computer program product. In a first aspect, the present application provides a method for detecting a road disease under an unmanned aerial vehicle viewing angle, including: Inputting the acquired image to be detected into a target detection model which is trained in advance to obtain a road disease detection result corresponding to the image to be detected, wherein the image to be detected is a road image under the view angle of the unmanned aerial vehicle; The target detection model is obtained by training an initial detection model, wherein the initial detection model comprises a backbone network, a Transformer coding network, a Transformer decoding network and a target detection model, wherein the backbone network is used for extracting characteristics of an input image to obtain image characteristics; In the training process of the initial detection model, a ADQG module is arranged on a transducer coding network, a ADQG module is used for generating noise inquiry related to the content of the training sample according to the real target frame of the training sample and combining with image characteristics corresponding to the training sample, a transducer decoding network is also used for executing a denoising task according to the noise inquiry to obtain the real target frame corresponding to the noise inquiry so as to assist the initial detection model to converge, and the target detection model is obtained by converging and removing the initial detection model of the ADQG module. Optionally, the ADQG module is specifically configured to: for each training sample: Splitting the coded image features corresponding to the current training sample into multi-scale two-dimensional features; Determining target content local features corresponding to a real target frame of a current training sample in the two-dimensional features of each scale; Splicing the local features of the target content under all scales to obtain the features of the target content; and generating a noise query corresponding to the current training sample based on the target content characteristics and the real target frame. Optionally, generating the noise query corresponding to the current training sample based on the target content feature and the real target frame includes: generating disturbance information according to the target content characteristics; determining a noise reference frame based on the disturbance information and scale information corresponding to the real target frame; and generating a noise query corresponding to the current training sample based on the target content characteristics and the noise reference frame. Optionally, the loss function of the initial detection model includes a hungarian matching cost function and a matching consistency loss function, wherein the hungarian matching cost function introduces a scale constraint for representing a target scale difference, a morphology constraint for representing a target morphology difference