CN-121999207-A - D-FINE algorithm-based complex scene road target detection method

CN121999207ACN 121999207 ACN121999207 ACN 121999207ACN-121999207-A

Abstract

The application discloses a method for detecting a road target in a complex scene based on a D-FINE algorithm, and belongs to the technical field of computer vision target detection. The method comprises the steps of responding to receiving two scale features output by a backbone network of an improved D-FINE model, respectively carrying out channel projection on the two scale features in a coding network of the improved D-FINE model, respectively inputting the two scale features into two MS-GKT modules in the coding network to obtain corresponding enhancement features, wherein the features are extracted from an image to be detected in a complex road scene, the MS-GKT modules comprise a GKT branching module, a local convolution branching module and a channel attention gating module, respectively corresponding output features of the coding network are obtained based on the enhancement features, and inputting the output features into a decoding network of the improved D-FINE model to obtain target detection results output by a detection head of the decoding network. The method improves the accuracy and the real-time performance of road target detection in complex scenes.

Inventors

LUO QIANG
GAN SHENG
LIU JINXIN
Deng Jifan
SONG WEIZHEN
LI ZHIBIN
HU LIANG

Assignees

江西科技师范大学

Dates

Publication Date: 20260508
Application Date: 20260407

Claims (10)

1. The method for detecting the complex scene road target based on the D-FINE algorithm is characterized by comprising the following steps of: responding to the received two-scale features output by a backbone network of an improved D-FINE model, respectively carrying out channel projection on the two-scale features in a coding network of the improved D-FINE model, and respectively inputting the two-scale features into two MS-GKT modules in the coding network to obtain corresponding enhancement features, wherein the features are extracted from an image to be detected in a complex road scene, and the MS-GKT modules comprise a GKT branching module, a local convolution branching module and a channel attention gating module; based on the enhancement features, obtaining output features of the coding networks respectively corresponding to the enhancement features; and inputting the output characteristics into a decoding network of the improved D-FINE model to obtain a target detection result output by a detection head of the decoding network.
2. The method for detecting the road target of the complex scene based on the D-FINE algorithm according to claim 1, wherein the GKT branching module is formed by circularly connecting N GroupKANChannelBlock modules with a shape transformation module, the shape transformation module is used for deforming the feature, and the GroupKANChannelBlock module is used for carrying out channel nonlinearity and space modeling on the deformed feature, wherein N is a positive integer.
3. The method for detecting the road target in the complex scene based on the D-FINE algorithm according to claim 2, wherein the GroupKANChannelBlock modules comprise G KANLINEAR and a splicing layer for splicing the channels of the outputs of G KANLINEAR, wherein each KANLINEAR corresponds to the characteristics of a group of channels of the deformed characteristics and is used for constructing a nonlinear relation among the channels in the module based on KAN, and G is a positive integer.
4. The method for detecting a road object in a complex scene based on the D-FINE algorithm according to claim 2, wherein the shape transformation module is PWDWConv.
5. The method for detecting the road target of the complex scene based on the D-FINE algorithm according to claim 1, wherein the local convolution branch module comprises a DWConv module and a1×1 convolution module, the DWConv module is used for capturing a local structure from the projected features, and the 1×1 convolution module is used for performing channel mapping.
6. The method for detecting the road targets of the complex scene based on the D-FINE algorithm according to claim 1, wherein the channel attention gating module comprises a GAP module, a C-MLP module and a Sigmoid gating module, wherein the GAP module is used for obtaining image-level channel descriptions based on projected features, the C-MLP module is used for generating channel weights of the GKT branching module and the local convolution branching module based on the image-level channel descriptions, and the Sigmoid gating module is used for splitting the channel weights into a first weight corresponding to the GKT branching module and a second weight corresponding to the local convolution branching module.
7. The method for detecting a complex scene road target based on a D-FINE algorithm according to claim 6, wherein the first weight performs element-by-element multiplication with the output of the GKT branch module, the second weight performs element-by-element multiplication with the local convolution branch module, and the two products after element-by-element multiplication are connected with the projected feature in a residual way, so as to obtain the enhanced feature.
8. The method for detecting the road target in the complex scene based on the D-FINE algorithm according to claim 1, wherein the two scale features comprise a large scale feature and a small scale feature, the obtaining the output features of the coding network based on the enhancement features respectively comprises the following steps: vector encoding is carried out on the enhancement features corresponding to the small-scale features based on the encoding network, so that an intermediate vector is obtained; And carrying out feature fusion on the enhancement features corresponding to the large-scale features and the intermediate vectors based on the FPN of the coding network to obtain the output features of the coding network corresponding to the large-scale features and the small-scale features respectively.
9. A computer device comprising a processor, a memory and a program or instructions stored on the memory and executable on the processor, which when executed by the processor, implement the steps of the D-FINE algorithm based complex scene road object detection method according to any one of claims 1-8.
10. A readable storage medium, wherein a program or instructions is stored on the readable storage medium, which when executed by a processor, implements the steps of the complex scene road object detection method based on D-FINE algorithm as claimed in any one of claims 1 to 8.

Description

D-FINE algorithm-based complex scene road target detection method Technical Field The application belongs to the technical field of computer vision target detection, and particularly relates to a method for detecting a road target in a complex scene based on a D-FINE algorithm. Background With the rapid development of automatic driving technology and intelligent traffic systems, the target detection technology under complex road scenes has become a key research direction in the field of computer vision. The YOLO series is used as a representative of a single-stage target detection algorithm, and is widely used in the field of road target detection due to its high-efficiency detection speed. However, the YOLO-based detection method still has a certain limitation, and particularly in a scene with dense targets and serious occlusion, the performance of the existing method still needs to be improved. The transducer architecture has the problems of high computational complexity, large training data requirement, insufficient small target detection performance and the like, and limits the application of the transducer architecture in real-time road target detection. Therefore, how to implement real-time detection of a transducer architecture model in a complex scene road target is a problem to be solved. Disclosure of Invention The embodiment of the application aims to provide a method for detecting a road target in a complex scene based on a D-FINE algorithm, which can solve the problem of how to improve the accuracy and the instantaneity of the road target detection in the complex scene. In order to solve the technical problems, the application is realized as follows: In a first aspect, an embodiment of the present application provides a method for detecting a road target in a complex scene based on a D-FINE algorithm, where the method includes: responding to the received two-scale features output by a backbone network of an improved D-FINE model, respectively carrying out channel projection on the two-scale features in a coding network of the improved D-FINE model, and respectively inputting the two-scale features into two MS-GKT modules in the coding network to obtain corresponding enhancement features, wherein the features are extracted from an image to be detected in a complex road scene, and the MS-GKT modules comprise a GKT branching module, a local convolution branching module and a channel attention gating module; based on the enhancement features, obtaining output features of the coding networks respectively corresponding to the enhancement features; and inputting the output characteristics into a decoding network of the improved D-FINE model to obtain a target detection result output by a detection head of the decoding network. In a second aspect, an embodiment of the present application provides a device for detecting a complex scene road target based on a D-FINE algorithm, the device for detecting a complex scene road target based on the D-FINE algorithm including: The device comprises a feature enhancement module, a channel projection module, a channel attention gating module and a feature enhancement module, wherein the feature enhancement module is used for responding to the received two-scale features output by a backbone network of an improved D-FINE model, respectively performing channel projection on the two-scale features in a coding network of the improved D-FINE model, respectively inputting the two-scale features into two MS-GKT modules in the coding network to obtain corresponding enhancement features, wherein the features are extracted from an image to be detected in a complex road scene, and the MS-GKT module comprises a GKT branching module, a local convolution branching module and a channel attention gating module. And the coding module is used for obtaining the output characteristics of the coding network respectively corresponding to the enhancement characteristics. And the target detection module is used for inputting the output characteristics into the decoding network of the improved D-FINE model to obtain a target detection result output by a detection head of the decoding network. In a third aspect, embodiments of the present application provide a computer device comprising a processor, a memory and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the method as described in the first aspect. In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a program or instructions which when executed by a processor perform the steps of the method according to the first aspect. In a fifth aspect, embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to the first aspect. In