CN-121999469-A - Traffic sign detection method, device and storage medium

CN121999469ACN 121999469 ACN121999469 ACN 121999469ACN-121999469-A

Abstract

The application discloses a traffic sign detection method, a device and a storage medium, wherein the traffic sign detection device acquires a traffic scene image, a multi-scale basic feature map is obtained by extracting features of the traffic scene image through a main network in a trained detection model, frequency domain feature self-adaptive enhancement is carried out on the multi-scale basic feature map through preset neck network and brightness information of the traffic scene image in the trained detection model to obtain enhanced features, and traffic sign detection is carried out on the enhanced features through a detection head in the trained detection model to obtain a traffic sign detection result. Based on the scheme, the method can adapt to environmental changes, strengthen key characteristics of the traffic sign from a frequency domain view, overcome detection performance bottleneck of the current detection method under a complex real scene, and effectively improve the robustness and detection precision of traffic sign detection.

Inventors

ZHAO BIN
LIU YAJIE
FANG PENGFEI
WANG CHUAN

Assignees

陕西物流集团产业研究院有限公司

Dates

Publication Date: 20260508
Application Date: 20260409

Claims (10)

1. A traffic sign detection method, the method comprising: acquiring a traffic scene image; Extracting features of the traffic scene image through a trunk network in the trained detection model to obtain a multi-scale basic feature map; Carrying out frequency domain feature self-adaptive enhancement on the multi-scale basic feature map through the preset neck network in the trained detection model and the brightness information of the traffic scene image to obtain enhanced features, wherein the frequency domain feature self-adaptive enhancement characterizes a process of differentially enhancing a low-frequency region and a high-frequency region of a magnitude spectrum corresponding to the multi-scale basic feature map by using a weight vector determined by the brightness information so as to determine the enhanced features based on the enhanced magnitude spectrum; And detecting the traffic sign by the detection head in the trained detection model for the enhanced features to obtain a traffic sign detection result.
2. The traffic sign detection method of claim 1, wherein the preset neck network comprises a preset frequency domain adaptive attention module; the frequency domain feature self-adaptive enhancement is performed on the multi-scale basic feature map through a preset neck network in the trained detection model, and the enhanced features are obtained, including: Performing fast Fourier transform on the multi-scale basic feature map through the preset frequency domain self-adaptive attention module to obtain a complex frequency spectrum corresponding to the multi-scale basic feature map; Performing global brightness estimation on the traffic scene image to obtain the brightness information; And carrying out frequency domain characteristic self-adaptive enhancement on the multi-scale basic characteristic map based on the complex frequency spectrum and the brightness information to obtain the enhanced characteristic.
3. The traffic sign detection method according to claim 2, wherein the performing frequency domain feature adaptive enhancement on the multi-scale base feature map based on the complex spectrum and the luminance information to obtain the enhanced features includes: The complex frequency spectrum is separated into an amplitude spectrum and a phase spectrum, wherein the amplitude spectrum is used for representing the energy intensity of each frequency component in the multi-scale basic feature map; Carrying out nonlinear mapping processing on the brightness information to obtain a two-dimensional dynamic weight vector, wherein the two-dimensional dynamic weight vector comprises a low-frequency enhancement weight and a high-frequency enhancement weight, the low-frequency enhancement weight represents enhancement intensity of low-frequency components in the amplitude spectrum, and the high-frequency enhancement weight represents enhancement intensity of high-frequency components in the amplitude spectrum; Determining an enhanced magnitude spectrum based on the two-dimensional dynamic weight vector and the magnitude spectrum; determining an enhanced feature map corresponding to the multi-scale basic feature map based on the enhanced amplitude spectrum and the phase spectrum; and determining the enhanced features based on the enhanced feature map.
4. The traffic sign detection method of claim 3, wherein the determining an enhanced magnitude spectrum based on the two-dimensional dynamic weight vector and the magnitude spectrum comprises: Dividing the amplitude spectrum into a central low-frequency region and a peripheral high-frequency region according to a preset frequency radius threshold; respectively determining a binary mask corresponding to the central low-frequency region and a binary mask corresponding to the peripheral high-frequency region; multiplying the low-frequency enhancement weight by a binary mask corresponding to the central low-frequency region, and multiplying the high-frequency enhancement weight by a binary mask corresponding to the peripheral high-frequency region to obtain two weighted masks; and adding the two weighted masks to obtain a fusion mask, and performing element-by-element multiplication operation on the fusion mask and the amplitude spectrum to obtain the enhanced amplitude spectrum.
5. The traffic sign detection method according to claim 4, wherein the determining an enhanced feature map corresponding to the multi-scale base feature map based on the enhanced amplitude spectrum and the phase spectrum includes: determining a first frequency domain feature from the enhanced amplitude spectrum and the phase spectrum; performing inverse fast Fourier transform on the first frequency domain feature to obtain a complex value feature result; and taking a real part of the complex-valued feature result to obtain the enhancement feature map.
6. The traffic sign detection method of claim 5, wherein the determining the enhanced features based on the enhanced feature map comprises: Residual connection is carried out on the enhanced feature map and the multi-scale basic feature map, and a fusion intermediate feature map is obtained; Distributing attention weights to each feature channel in the fused intermediate feature graph; And performing element-by-element multiplication operation on the attention weight and the fused intermediate feature map to obtain the enhanced feature.
7. The traffic sign detection method according to claim 1, wherein the feature extraction is performed on the traffic scene image through a backbone network in the trained detection model to obtain a multi-scale basic feature map, and the method comprises: Carrying out different-level convolution and downsampling processing on the traffic scene image through the backbone network to obtain a bottom-layer basic visual feature, a middle-layer structural feature and a high-layer semantic feature of the traffic scene image, wherein the bottom-layer basic visual feature is used for representing the edge, texture and color information of the traffic scene image, the middle-layer structural feature is used for representing the shape and local contour information of a target in the traffic scene image, and the high-layer semantic feature is used for representing the category and attribute information of the target in the traffic scene image; generating feature maps with different space sizes according to the bottom layer basic visual features, the middle layer structural features and the high-level semantic features respectively; And determining the multi-scale basic feature map according to the feature maps with different space sizes.
8. The traffic sign detection method according to claim 1, wherein the detecting the traffic sign by the detection head in the trained detection model to the enhanced feature, to obtain a traffic sign detection result, includes: carrying out bounding box regression processing on the enhanced features through the detection head to obtain bounding boxes of all traffic sign prediction targets in the traffic scene image; performing category prediction processing on the enhanced features in the boundary box to obtain a plurality of traffic sign category candidate results; carrying out confidence calculation on the traffic sign type candidate result, and determining the maximum value in the obtained multiple confidence degrees of the traffic sign type candidate result as a target confidence degree; and determining the traffic sign detection result based on the target confidence and a boundary box corresponding to the target confidence, wherein the traffic sign detection result comprises a boundary box of a target traffic sign, a category label and the target confidence corresponding to the category label.
9. A traffic sign detection device comprising a processor, a memory storing instructions executable by the processor, which when executed by the processor, implement the traffic sign detection method according to any one of claims 1 to 8.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the traffic sign detection method according to any one of claims 1 to 8.

Description

Traffic sign detection method, device and storage medium Technical Field The application relates to the technical field of computer vision, in particular to a traffic sign detection method, a traffic sign detection device and a storage medium. Background Along with the rapid development of artificial intelligence and internet of things, intelligent traffic systems and automatic driving technologies become the core research direction of the modern traffic field, traffic sign detection is used as a key link of vehicle environment sensing, the detection accuracy of the traffic sign detection directly determines driving safety and decision reliability, and accurate and real-time traffic sign detection has important significance for improving the intelligent level of automatic driving, optimizing traffic management efficiency and reducing traffic accident occurrence rate. The existing traffic sign detection method relies on artificial design features such as color segmentation, shape matching, edge detection and the like, and the method is usually combined with a support vector machine (Support Vector Machine, SVM), a K nearest neighbor (K-Nearest Neighbors, KNN) and other classifiers to complete identification, but in a real traffic scene, factors such as severe illumination changes, severe weather such as rain, fog, snow and the like, traffic sign fouling or shielding and the like can seriously interfere with the extraction process of artificial features, so that the detection performance is greatly reduced, and meanwhile, the method is highly sensitive to visual angle and scale changes, has poor generalization capability and is difficult to meet the severe requirements of actual application on detection instantaneity and stability. Some target detection algorithms based on deep learning, especially a YOLO series single-stage detector, become a mainstream scheme of traffic sign detection because of the end-to-end efficient reasoning capability, but the current YOLO model still has significant defects in processing real traffic scenes, namely complex illumination (backlight and night) and weather conditions weaken the edge and texture information of the traffic sign, the existing model mostly adopts a spatial domain attention mechanism, is extremely sensitive to feature intensity disturbance caused by illumination and cannot effectively distinguish and strengthen key distinguishing features, and secondly, long-distance small-scale traffic signs are easily submerged by background noise in a feature pyramid, and the omission rate is high. Many of the current improvements focus on the lightweight network architecture or the adjustment of the loss function, and fail to solve the above problems from the essential level of feature representation. Disclosure of Invention The application provides a traffic sign detection method, a device and a storage medium, which can adapt to environmental changes, strengthen key characteristics of traffic signs from a frequency domain view, overcome detection performance bottleneck of the current detection method in a complex real scene, and effectively improve the robustness and detection precision of traffic sign detection. In order to achieve the above object, the present application provides the following technical solutions: In a first aspect, an embodiment of the present application provides a traffic sign detection method, including: acquiring a traffic scene image; extracting features of the traffic scene image through a trunk network in the trained detection model to obtain a multi-scale basic feature map; Carrying out frequency domain feature self-adaptive enhancement on the multi-scale basic feature map through brightness information of a preset neck network and traffic scene images in the trained detection model to obtain enhanced features, wherein the frequency domain feature self-adaptive enhancement represents a process of differentially enhancing a low-frequency region and a high-frequency region of a magnitude spectrum corresponding to the multi-scale basic feature map by using a weight vector determined by the brightness information so as to determine the enhanced features based on the enhanced magnitude spectrum; and detecting the traffic sign by the detection head in the trained detection model for the enhanced features to obtain a traffic sign detection result. In some embodiments of the application, the preset neck network comprises a preset frequency domain adaptive attention module; frequency domain characteristic self-adaptive enhancement is carried out on the multi-scale basic characteristic map through a preset neck network in the trained detection model, and the enhanced characteristics are obtained, wherein the method comprises the following steps: Performing fast Fourier transform on the multi-scale basic feature map through a preset frequency domain self-adaptive attention module to obtain a complex frequency spectrum corresponding to the multi-scale b