CN-119580219-B - Road traffic sign recognition method based on image semantic understanding

CN119580219BCN 119580219 BCN119580219 BCN 119580219BCN-119580219-B

Abstract

The invention provides a road traffic sign recognition method based on image semantic understanding, which comprises the steps of carrying out image acquisition based on image acquisition equipment, preprocessing acquired images, constructing a road traffic sign data set, constructing a road traffic sign detection model based on a Blip network, training the model based on the road traffic sign data set to obtain a trained road traffic sign detection model, and carrying out effect test on the trained road traffic sign detection model based on the road traffic sign data set. The invention can identify road traffic signs and generate statement descriptions for the signs from the aspects of color, shape, composition and the like, and the dependence on marked signs is avoided by identifying the lane types from a macroscopic angle, so that the high-efficiency and accurate traffic sign identification can be realized in a complex environment, and the invention has higher practical value.

Inventors

CHENG XIN
ZHOU JINGMEI
SHANG XUMING
ZHOU ZHOU
SUN ZHENZHONG
ZHANG LICHENG
HAO RURU
ZHAO XIANGMO

Assignees

长安大学

Dates

Publication Date: 20260512
Application Date: 20241113

Claims (6)

1. The road traffic sign recognition method based on image semantic understanding is characterized by comprising the following steps of: image acquisition is carried out based on the image acquisition equipment; preprocessing the acquired image to construct a road traffic sign data set; Constructing a road traffic sign detection model based on a Blip network, and training the model based on a road traffic sign data set to obtain a trained road traffic sign detection model; The road traffic sign detection model is constructed based on the Blip network, and specifically comprises the following steps: Constructing a road traffic sign detection model, wherein the road traffic sign detection model comprises a visual encoder, a text encoder, a visual text encoder and a visual text decoder, wherein the visual encoder adopts a Vision Transformer structure and a Cross-Attention mechanism and is used for extracting feature information of road signs in images and is used as one of joint characterization, the text encoder adopts a BERT architecture and is used for extracting text features, the visual text encoder adopts a Cross-Attention mechanism, an Attention part adopts a bidirectional Self-Attention mechanism and is used for introducing additional [ Encode ] token and is used for carrying out classification tasks by utilizing the image features and the text features, the visual text decoder adopts a Cross-Attention mechanism and is used for adopting Casual-Attention, additional [ DEcode ] token and ending token are introduced, the road traffic sign detection model combines the image features extracted by the visual encoder and the text features of the road traffic signs extracted by the visual text decoder through a Cross-Attention mechanism and forms new characterization features containing multiple feature information of the road traffic signs; The road traffic sign detection model adopts a downstream task Image Capiton of Blip to generate statement description for the road traffic sign image, and a language model objective function is used to generate text description of the given road traffic sign image.
2. The method according to claim 1, characterized in that the image acquisition is performed based on an image acquisition device, in particular: Based on the camera equipment installed on the vehicle, the image data of road scene in front of the vehicle is obtained in real time, wherein the camera equipment comprises a high-definition camera and a night vision camera and is used for switching according to different environment requirements.
3. The method according to claim 1, characterized in that the acquired image is preprocessed to construct a road traffic sign dataset, in particular: acquiring collected image data, adjusting the image size of the collected image data, and unifying the image size; Denoising the image data with uniform size; performing color correction on the image data after denoising; performing data enhancement processing on the image data after the color correction; manually labeling the image data subjected to the data enhancement treatment; and summarizing the marked image data to generate a road traffic sign data set.
4. A method according to claim 3, wherein the image data after unification is denoised by gaussian filtering or median filtering to remove noise from the image.
5. The method according to claim 4, wherein the denoising of the image data is performed by: And acquiring the image data subjected to denoising processing, detecting the image data, and carrying out white balance adjustment and color correction on the image with uneven illumination.
6. The method according to claim 5, wherein the color corrected image data is subjected to a data enhancement process, in particular: Image enhancement is carried out on the image data after color correction based on an image enhancement model, wherein the image enhancement model comprises an independent feature processing module, a joint feature processing module and a feature fusion module, the independent feature processing module is used for learning to obtain five private feature images, the joint feature processing module is used for learning to obtain a public feature image, and the feature fusion module is used for uniformly representing the private feature image and the public feature image, so that effective enhancement of the image is finally achieved.

Description

Road traffic sign recognition method based on image semantic understanding Technical Field The invention relates to the technical field of computer vision and intelligent traffic, in particular to a road traffic sign recognition method based on image semantic understanding. Background With the development of intelligent traffic systems, automatic driving technology has become an important trend in future traffic. Road traffic sign identification is one of the key links of automatic driving, and provides important data such as traffic rules, indication information and the like for automatic driving vehicles. At present, the traffic sign recognition method based on the traditional image recognition technology has the problems of low recognition rate, poor environmental adaptability and the like. Therefore, it is necessary to design a road traffic sign recognition method based on image semantic understanding. Disclosure of Invention In order to overcome the defects of the prior art, the invention aims to provide a road traffic sign recognition method based on image semantic understanding. In order to achieve the above object, the present invention provides the following solutions: the invention provides a road traffic sign recognition method based on image semantic understanding, which comprises the following steps: image acquisition is carried out based on the image acquisition equipment; preprocessing the acquired image to construct a road traffic sign data set; Constructing a road traffic sign detection model based on a Blip network, and training the model based on a road traffic sign data set to obtain a trained road traffic sign detection model; And performing effect test on the trained road traffic sign detection model based on the road traffic sign data set. Preferably, the image acquisition is performed based on an image acquisition device, specifically: Based on the camera equipment installed on the vehicle, the image data of road scene in front of the vehicle is obtained in real time, wherein the camera equipment comprises a high-definition camera and a night vision camera and is used for switching according to different environment requirements. Preferably, the collected image is preprocessed to construct a road traffic sign data set, specifically: acquiring collected image data, adjusting the image size of the collected image data, and unifying the image size; Denoising the image data with uniform size; performing color correction on the image data after denoising; performing data enhancement processing on the image data after the color correction; manually labeling the image data subjected to the data enhancement treatment; and summarizing the marked image data to generate a road traffic sign data set. Preferably, the image data with uniform size is subjected to denoising processing through Gaussian filtering or median filtering, so that noise in the image is removed. Preferably, the color correction is performed on the image data after the denoising process, specifically: And acquiring the image data subjected to denoising processing, detecting the image data, and carrying out white balance adjustment and color correction on the image with uneven illumination. Preferably, the data enhancement processing is performed on the image data after the color correction, specifically: Image enhancement is carried out on the image data after color correction based on an image enhancement model, wherein the image enhancement model comprises an independent feature processing module, a joint feature processing module and a feature fusion module, the independent feature processing module is used for learning to obtain five private feature images, the joint feature processing module is used for learning to obtain a public feature image, and the feature fusion module is used for uniformly representing the private feature image and the public feature image, so that effective enhancement of the image is finally achieved. Preferably, a road traffic sign detection model is constructed based on a Blip network, specifically: the road traffic sign detection model is constructed, wherein the road traffic sign detection model comprises a visual encoder, a text encoder, a visual text encoder and a visual text decoder, wherein the visual encoder adopts a Vision Transformer structure and a Cross-Attention mechanism and is used for extracting feature information of road signs in images and is used as one of joint characterization, the text encoder adopts a BERT architecture and is used for extracting text features, the visual text encoder adopts a Cross-Attention mechanism, an Attention part adopts a bidirectional Self-Attention mechanism and is used for introducing additional [ Encode ] signs and is used for carrying out classification tasks by utilizing the image features and the text features, the visual text decoder adopts a Cross-Attention mechanism and is used for adopting Casual-Attention, additional [ DEcode ] signs a