CN-115937591-B - Method and system for improving detection and classification precision of traffic scene targets

CN115937591BCN 115937591 BCN115937591 BCN 115937591BCN-115937591-B

Abstract

The invention discloses a method and a system for improving the detection and classification precision of traffic scene targets, comprising the following steps of S1, acquiring traffic videos; S2, converting the traffic video into traffic images, S3, identifying the traffic images based on the convolutional neural network to obtain classification results of the traffic images, and S4, performing early warning based on the classification results of the traffic images. The method has the advantages that the convergence rate of the model can be improved, the training of the model is accelerated, and the recognition accuracy of the model is improved.

Inventors

MO WANGZHONG
WU JINFENG
Song tengfei
ZHANG QIAOHUAN
CHEN RUISHENG
JIANG DONGQI

Assignees

浙江中控信息产业股份有限公司

Dates

Publication Date: 20260505
Application Date: 20221212

Claims (9)

1. The method for improving the detection and classification precision of the traffic scene targets is characterized by comprising the following steps of: s1, acquiring traffic videos; s2, converting the traffic video into traffic images; S3, identifying traffic images based on a convolutional neural network, inputting the traffic images, and calculating anchor frames and data enhancement; The CSP structure is used for carrying out five downsampling on the traffic image; Respectively extracting and fusing the characteristics of the characteristic diagrams obtained by the third, fourth and fifth downsampling to obtain characteristic diagrams T3, T4 and T5; After T3, T4 and T5, respectively accessing GSConv structures to generate feature maps P3, P4 and P5; p5 is subjected to double up-sampling and then is subjected to channel superposition with P4, a C3 structure is adopted, then 2 times up-sampling and then is subjected to channel superposition with P3, and a C3 structure is adopted to obtain a characteristic M3; M3 is subjected to channel superposition with P4 after being subjected to 2 times downsampling, and then a C3 structure is adopted to obtain a characteristic M4; m4 is subjected to channel superposition with P5 after being subjected to 2 times downsampling, and then a C3 structure is adopted to obtain a characteristic M5; M3, M4 and M5 respectively obtain feature maps Q3, Q4 and Q5 through a conv+BN+ SiLU convolution structure block; Q3, Q4 and Q5 are respectively input into the detection head; predicting Q3, Q4 and Q5, generating a boundary frame and predicting to obtain a classification result of the traffic image; And S4, early warning is carried out based on the classification result of the traffic image.
2. The method for improving the detection and classification accuracy of traffic scene objects according to claim 1, wherein the traffic image is scaled according to the image size of the convolutional neural network model.
3. The method for improving the detection and classification accuracy of traffic scene targets according to claim 1 or 2, wherein the feature extraction and fusion process of the feature map obtained by the third, fourth and fifth downsampling comprises the step of performing feature extraction and feature fusion by adopting 1x1 convolution.
4. The method for improving the detection and classification precision of traffic scene targets according to claim 1 or 2, wherein GSConv is structured such that a feature map C21 is obtained by inputting a feature map C1 and C1 through a conv+BN+ SiLU convolution structure block, a feature map C22 is obtained by passing the feature map C21 through a conv+BN+ SiLU convolution structure block again, channels of the C21 and the C22 are overlapped, and a shuffle is added for channel clipping, and then the feature map C2 is output.
5. The method for improving the detection and classification accuracy of traffic scene targets according to claim 1, wherein the sample allocation strategy in the convolutional neural network model comprises the following steps: s301, matching anchors with GT, and determining positive sample anchors of the current feature map; s302, positive samples of the current feature map are distributed to corresponding grid; s303, calculating regression and classification loss of each positive sample to each GT, and obtaining a cost matrix and a IoU matrix; S304, sorting and selecting the first ten candidate frames based on IoU matrixes; S305, adding IoU of the ten candidate frames and rounding down to obtain the number k of the candidate frames; S306, selecting the first k candidate frames according to the cost matrix and removing the repeated candidate frames.
6. The method for improving the detection and classification accuracy of traffic scene targets according to claim 1, wherein in step S4, if the classification result is matched with the early warning classification, the early warning is performed on the classification result through characters, sound and lamplight.
7. The method according to claim 1, wherein in step S2, the traffic video is converted into a traffic image every 5 to 20 frames.
8. A system for improving the detection and classification precision of traffic scene targets is suitable for the method for improving the detection and classification precision of traffic scene targets according to any one of claims 1-7, and is characterized by comprising a video acquisition module, wherein the video acquisition module is connected with an operation and maintenance transmission module, the operation and maintenance transmission module is connected with a video storage module, the video storage module is connected with a video processing module and a display module, the video processing module is connected with a target detection module, the target detection module is connected with a target storage module and a target early warning module, and the target storage module is connected with the display module.
9. The system for improving the detection and classification accuracy of traffic scene targets according to claim 8, wherein the video acquisition module is arranged on a support rod where traffic lights are located or on monitoring rods on two sides of a road.

Description

Method and system for improving detection and classification precision of traffic scene targets Technical Field The invention relates to the technical field of traffic detection, in particular to a method and a system for improving the detection classification precision of traffic scene targets. Background In recent years, the application of the target detection technology based on deep learning in traffic scenes is becoming wider and wider. The early people pay more attention to the position information of the traffic targets, but as technology and life are combined more and more tightly, the category requirements of the traffic targets are finer and finer. In the prior art, the classification recognition is performed mainly by adding an attention module, so that the network refines the intermediate layer characteristics, and the classification precision is further improved. However, after the attention module is added, the complexity reasoning time of the model is increased, the deployment and the use of the model are not facilitated, and particularly, the requirement on the reasoning speed of the model on a traffic scene is high, and the model is required to be ensured to be light when deployed on Jetson equipment and the like. There is a problem in that the model recognition time is long. For example, a method for identifying pedestrian traffic lights based on geometric properties of traffic lights disclosed in Chinese patent literature, its publication number is CN113011251A, its application date is 2021, 02 and 03, and the invention identifies the dynamic state of the traffic lights by means of traffic light frame images and the identified shapes of the traffic lights, so as to provide more accurate guidance for visually impaired people, but has the problems of longer model identification time and lower accuracy. Disclosure of Invention Aiming at the defects of long recognition time and low accuracy of the model in the prior art, the invention provides a method and a system for improving the detection and classification accuracy of traffic scene targets, which can improve the convergence rate of the model, accelerate the training of the model and improve the recognition accuracy of the model. The following is a technical scheme of the invention, a method for improving the detection classification precision of traffic scene targets, comprising the following steps: s1, acquiring traffic videos; s2, converting the traffic video into traffic images; s3, identifying traffic images based on a convolutional neural network to obtain a classification result of the traffic images; And S4, early warning is carried out based on the classification result of the traffic image. In the scheme, the traffic video is acquired through the video acquisition module, the traffic video is converted into the traffic image, the convolutional neural network is convenient to identify and classify the traffic image to obtain the classification result of the traffic image, and early warning is carried out based on the classification result of the traffic image. Classification information can be obtained according to traffic videos, and the convergence speed and the recognition accuracy of the model can be improved. Preferably, S3 comprises the steps of: S31, inputting traffic images, calculating anchor frames and enhancing data; S32, carrying out five downsampling on the traffic image by using a CSP structure; s33, respectively extracting and fusing the characteristics of the characteristic diagrams obtained by the third, fourth and fifth downsampling to obtain characteristic diagrams T3, T4 and T5; s34, respectively accessing GSConv structures after T3, T4 and T5, and respectively marking the generated characteristic diagrams as P3, P4 and P5; S35, carrying out channel superposition on P5 after double up-sampling and P4, further refining fusion characteristics after C3 structure, then carrying out channel superposition on P3 after 2 times up-sampling, further refining fusion characteristics after C3 structure to obtain characteristics M3, carrying out channel superposition on M3 after 2 times down-sampling and P4, and further refining fusion characteristics after C3 structure to obtain characteristics M4; M4 is subjected to channel superposition with P5 after being subjected to 2 times downsampling, and fusion characteristics are further refined after a C3 structure is adopted, so that characteristics M5 are obtained; S36, respectively obtaining characteristic diagrams by M3, M4 and M5 through a conv+BN+ SiLU convolution structure block, and respectively marking the characteristic diagrams as Q3, Q4 and Q5; s37, respectively inputting Q3, Q4 and Q5 into the detection head; And S38, predicting Q3, Q4 and Q5, generating a boundary box and predicting the classification of the traffic image. In the scheme, the traffic image is subjected to five downsampling, the feature map obtained through downsampling is processed through the GSCon