CN-116958936-B - Traffic sign detection and identification method based on multi-information attention fusion network

CN116958936BCN 116958936 BCN116958936 BCN 116958936BCN-116958936-B

Abstract

The invention provides a traffic sign detection and identification method based on a multi-information attention fusion network, which designs MIAF-Net models and gives consideration to the detection precision and speed. The method comprises the steps of firstly, constructing and initializing MIAF-Net models, namely, a main network constructed by a Conv module and a designed FCSP module, a neck network composed of an FPN feature aggregation network, a PAN feature aggregation network and a foreground perception attention module FPA, a multi-scale information fusion detection head MIFH, secondly, carrying out feature extraction on input images by the main network, carrying out fusion on multi-scale features output by the main network by the neck network, decoupling the features by adopting the multi-scale information fusion detection head MIFH according to the fused features, carrying out extraction, fusion and decoupling on the features according to set rules, thirdly, initializing MIAF-Net model super parameters, training the models by using a road traffic sign training set, detecting and identifying road traffic signs by using the trained models, and outputting results.

Inventors

ZHAO YIFAN
WANG CHANGHONG
ZHONG JIAPENG
LI YUANWEI
CAI CHANG

Assignees

哈尔滨工业大学(鞍山)工业技术研究院

Dates

Publication Date: 20260508
Application Date: 20230731

Claims (8)

1. A traffic sign detection and identification method based on a multi-information attention fusion network is characterized in that the method adopts a MIAF-Net model to detect and identify street-level traffic signs, and the detection precision and speed are considered, and the method specifically comprises the following steps: step one, constructing and initializing MIAF-Net models, wherein the MIAF-Net models comprise a main network built by a Conv module and a designed FCSP module, a neck network consisting of an FPN feature aggregation network, a PAN feature aggregation network and a foreground perception attention module FPA, and a detection head MIFH for multi-scale information fusion; Step two, the main network is adopted to extract the characteristics of the input image, the neck network is adopted to fuse the multi-scale characteristics output by the main network, the detection head MIFH for multi-scale information fusion is adopted to decouple the characteristics according to the fused characteristics, and the characteristics are extracted, fused and decoupled according to the set rules; Initializing MIAF-Net model super parameters, training a MIAF-Net model by using a road traffic sign training set, detecting and identifying a road traffic sign by using a trained MIAF-Net model, and outputting a result; In the first step, a neck network composed of an FPN feature aggregation network, a PAN feature aggregation network and a foreground perception attention module FPA fuses semantic and position information in features of different scales, and the neck network has a two-stage feature fusion model from top to bottom and from bottom to top, and the neck network comprises: 1) Adopting a top-down first-stage feature fusion model combining an FPN feature aggregation network and a foreground perception attention module FPA; 2) A bottom-up second-level feature fusion model combining a PAN feature aggregation network and a foreground perception attention module FPA is adopted; And in the second step, extracting, fusing and decoupling the features according to the set rule, wherein the method comprises the following steps: 1) Inputting an input image into a backbone network, and extracting characteristics of the input image through the backbone network, wherein the backbone network has 5 branches of characteristic output, namely C1 branch, C2 branch, C3 branch, C4 branch and C5 branch characteristic output; 2) The output from the branch of the backbone network C2 to the branch of the backbone network C5 is sent to a first-stage feature fusion model, the first-stage feature fusion model comprises four layers, the first layer to the third layer are fused by adopting cascading residual errors and a foreground perception attention module FPA, and the fourth layer is only output through the foreground perception attention module FPA, so that first-stage fusion features are obtained; 3) The output of the first-stage feature fusion model is sent to a second-stage feature fusion model, the second-stage feature fusion model comprises four layers, the first layer outputs through a foreground perception attention module FPA only, and the second layer to the fourth layer adopt cascade residual errors for fusion and the foreground perception attention module FPA for fusion, so that second-stage fusion features are obtained; 4) And taking the second-stage fusion characteristic as input, decoupling the characteristic by adopting a detection head for multi-scale information fusion, and outputting the category and the position of the traffic sign in the image.
2. The traffic sign detecting and identifying method based on the multi-information attention fusion network according to claim 1, wherein in the first step, a backbone network constructed by a Conv module and a FCSP module is used for extracting multi-scale image features, and features of a plurality of branches are output, so that semantic and position information of a plurality of different scales are respectively provided, wherein the features of the branches include a C1 branch, a C2 branch, a C3 branch, a C4 branch and a C5 branch, the feature sizes of the branches are divided into 1/2 of an input image size, 1/4 of the input image size, 1/8 of the input image size, 1/16 of the input image size and 1/32 of the input image size.
3. The traffic sign detecting and identifying method based on the multi-information attention fusion network according to claim 1, wherein in the first step, the FCSP module is designed to be composed of Conv module and Faster BottleNeck.
4. The method for detecting and identifying traffic signs based on the multi-information attention fusion network according to claim 1, wherein in the first step, the multi-scale information fusion detection head MIFH fuses semantic and location information of a plurality of scales, provides features with different preferences for detection and identification, and outputs detection and identification results.
5. The traffic sign detecting and identifying method based on a multi-information attention fusion network according to claim 1, wherein the outputting of the branches C2 to C5 of the backbone network is sent to the first-stage feature fusion model according to a set rule, the first layer to the third layer are fused by adopting a cascade residual and a foreground perception attention module FPA, and the fourth layer is output only by the foreground perception attention module FPA, so as to obtain the first-stage fusion feature, comprising the following steps: 1) Introducing the C5 branch feature and the C4 branch feature as input features into a first-layer feature fusion model, and generating a first-stage fusion feature F5 after learning by using a pre-attention Jing Ganzhi module FPA, wherein the feature size is 1/16 of the input image size; 2) Introducing the C3 branch feature and the F5 feature as input features into a second-layer feature fusion model, and generating a first-stage fusion feature F4 after learning by using a pre-attention Jing Ganzhi module FPA, wherein the feature size is 1/8 of the input image size; 3) Introducing the C2 branch feature and the F4 feature as input features into a third-layer feature fusion model, and generating a first-stage fusion feature F3 after learning by using a pre-attention Jing Ganzhi module FPA, wherein the feature size is 1/4 of the input image size; 4) And the F3 feature is used as an input feature to be introduced into a pre-attention Jing Ganzhi module FPA for learning to generate a first-stage fusion feature F2, wherein the feature size is 1/2 of the input image size.
6. The traffic sign detecting and identifying method based on a multi-information attention fusion network according to claim 5, wherein the outputting of the first-stage feature fusion model is sent to the second-stage feature fusion model according to a set rule, the first layer outputs only through the foreground-aware attention module FPA, and the second layer to the fourth layer fuse through the cascade residual and the foreground-aware attention module FPA, so as to obtain the second-stage fusion feature, comprising the following steps: 1) After the first-stage fusion feature F2 is used as an input feature to be learned by a pre-attention Jing Ganzhi module FPA, a second-stage fusion feature P2 is generated, and the size of a feature map is 1/2 of the size of an input image; 2) Introducing the first-stage fusion feature F3 and the second-stage fusion feature P2 as input features into a second-layer feature fusion model, and generating the second-stage fusion feature P3 after learning by using a pre-attention Jing Ganzhi module FPA, wherein the feature map size is 1/4 of the input image size; 3) Introducing the first-stage fusion feature F4 and the second-stage fusion feature P3 as input features into a second-layer feature fusion model, and generating the second-stage fusion feature P4 after learning by using a pre-attention Jing Ganzhi module FPA, wherein the feature map size is 1/8 of the input image size; 4) And introducing the first-stage fusion feature F5 and the second-stage fusion feature P4 serving as input features into a second-layer feature fusion model, and generating the second-stage fusion feature P5 after learning by using a pre-attention Jing Ganzhi module FPA, wherein the feature map size is 1/16 of the input image size.
7. The method for detecting and identifying traffic signs based on a multi-information attention fusion network according to claim 6, wherein the step of taking the second-level fusion features as input according to a set rule, decoupling the features by using a detection head for multi-scale information fusion, and outputting the types and the positions of the traffic signs in the image comprises the following steps: 1) The second-level feature fusion P4 is subjected to 2-time upsampling and then is connected with the second-level fusion feature P5 to obtain features to be classified, the C5 branch features are subjected to 2-time downsampling and then are added with the second-level fusion feature P5 to obtain features to be positioned, the two features are decoupled respectively, and the category and the position of the H5-layer traffic sign are output; 2) The second-level feature fusion P3 is subjected to 2-time upsampling and then is connected with the second-level fusion feature P4 to obtain features to be classified, the second-level fusion feature P5 is subjected to 2-time downsampling and then is added with the second-level fusion feature P4 to obtain features to be positioned, the two features are decoupled respectively, and the category and the position of the H4-layer traffic sign are output; 3) The second-level feature fusion P2 is subjected to 2-time upsampling and then is connected with the second-level fusion feature P3 to obtain features to be classified, the second-level fusion feature P4 is subjected to 2-time downsampling and then is added with the second-level fusion feature P3 to obtain features to be positioned, the two features are decoupled respectively, and the category and the position of the H3-layer traffic sign are output; 4) Connecting the second-level feature fusion C1 with the second-level fusion feature P2 to obtain features to be classified, downsampling the second-level fusion feature P3 by 2 times, adding the downsampled second-level fusion feature P2 with the second-level fusion feature P2 to obtain features to be positioned, decoupling the two features respectively, and outputting the category and the position of the H2-layer traffic sign; 5) And shielding some detection results from the output H2 layer to the output H5 layer, which are lower than the confidence threshold and IoU threshold, by setting the confidence threshold and IoU threshold, sequencing the remaining outputs from the output H2 layer to the output H5 layer, selecting the highest confidence and IoU as the final traffic sign detection result, and marking in the picture.
8. The traffic sign detection and recognition method based on the multi-information attention fusion network according to claim 1, wherein in the third step, the MIAF-Net model is trained as follows: setting the super parameters of the model, inputting the traffic sign training set into the MIAF-Net model, training the model, detecting and identifying the road traffic sign by using the trained model, and outputting the result.

Description

Traffic sign detection and identification method based on multi-information attention fusion network Technical Field The invention relates to the technical field of navigation and image processing, in particular to a traffic sign detection and identification method based on a multi-information attention fusion network. Background The detection and identification of traffic signs are an important technical direction in computer vision, and are also one of core technologies in the automatic driving field, and accurate detection and identification results can provide timely and definite road and safety information for drivers or intelligent vehicles. In the outdoor traffic sign detection task, the detection effect is not ideal because the traffic sign is various, the occupied pixel area is small, and the traffic sign is easy to be interfered by the external environment. The traditional detection mode has a complex structure, is difficult to ensure real-time performance, is faced with the field complex background, and has severely reduced detection precision, and under the normal condition, the detection method is only aimed at a plurality of specific traffic signs, and has poor expansibility and generalization capability. The deep convolutional neural network is used as a most advanced feature extraction technology, greatly promotes the development of the target detection field, and also provides new possibility for traffic sign detection. The design of the current deep learning network architecture has higher computational complexity, and the detection speed is seriously reduced although the detection and identification precision is improved, which is contrary to the higher timeliness required in the application. Therefore, it is necessary to propose an end-to-end traffic sign detection and identification method, which can compromise the accuracy and speed of detection. Disclosure of Invention In order to solve the technical problems in the background art, the invention provides a traffic sign detection and identification method based on a multi-information attention fusion network, which adopts a MIAF-Net model designed, can detect and identify street-level traffic signs through single pictures, continuous frames and videos transmitted by a vehicle-mounted camera, and gives consideration to the detection precision and speed. In order to achieve the above purpose, the invention is realized by adopting the following technical scheme: A traffic sign detection and identification method based on a multi-information attention fusion network adopts a MIAF-Net model to detect and identify street-level traffic signs, and gives consideration to the detection precision and speed, and specifically comprises the following steps: step one, constructing and initializing MIAF-Net models, wherein the MIAF-Net models comprise a main network built by a Conv module and a designed FCSP module, a neck network consisting of an FPN feature aggregation network, a PAN feature aggregation network and a foreground perception attention module FPA, and a detection head MIFH for multi-scale information fusion; Step two, the main network is adopted to extract the characteristics of the input image, the neck network is adopted to fuse the multi-scale characteristics output by the main network, the detection head MIFH for multi-scale information fusion is adopted to decouple the characteristics according to the fused characteristics, and the characteristics are extracted, fused and decoupled according to the set rules; Initializing MIAF-Net model super parameters, training the MIAF-Net model by using a road traffic sign training set, detecting and identifying the road traffic sign by using the trained MIAF-Net model, and outputting a result. In the first step, a main network constructed by Conv module and FCSP module is used for extracting multi-scale image features, and has a plurality of branched feature outputs for providing semantic and position information of different scales, wherein the semantic and position information comprises C1 branch, C2 branch, C3 branch, C4 branch and C5 branch, and the 5 branch feature sizes are divided into 1/2 of the input image size, 1/4 of the input image size, 1/8 of the input image size, 1/16 of the input image size and 1/32 of the input image size. Further, in the first step, the FCSP modules are constituted by Conv modules and Faster BottleNeck. Further, in the first step, the neck network composed of the FPN feature aggregation network, the PAN feature aggregation network and the foreground aware attention module FPA fuses the semantic and location information in the features of different scales, and has a two-stage feature fusion model from top to bottom and from bottom to top, including: 1) Adopting a top-down first-stage feature fusion model combining an FPN feature aggregation network and a foreground perception attention module FPA; 2) And a bottom-up second-level feature fusion model combining