CN-121999462-A - Target detection method and system for road side low-quality scene

CN121999462ACN 121999462 ACN121999462 ACN 121999462ACN-121999462-A

Abstract

The invention discloses a target detection method and a target detection system for a low-quality scene of a road side, and relates to the field of intelligent parking management, wherein the method comprises the following steps: for event data, a parallel feature extraction network based on image frames and event data is constructed by using voxel grid representation of the event as input, and simultaneously, a multi-scale information is combined by using a feature pyramid network to construct a target detector, compared with the current target detection of a road side low-quality scene based on only image frames, by utilizing the advantage of the event camera in the low-quality scene, the method has better robustness for detecting the common low-quality image targets, improves the detection precision of targets such as vehicles, pedestrians and the like in the low-quality scene at the road side, and is beneficial to traffic management of the parking scene at the road side.

Inventors

YAN HAO
DING LIZHU

Assignees

爱泊车科技有限公司

Dates

Publication Date: 20260508
Application Date: 20241108

Claims (10)

1. The method for detecting the target of the low-quality scene of the road side is characterized by comprising the following steps: constructing an image event data set according to the road side scene data and the open source simulation data; Respectively extracting features of the image data and the event data in the image event data set through a preset parallel feature extraction model; fusing the extracted image and event features, inputting the fused features into a preset feature pyramid network for multi-scale information extraction to obtain pyramid feature information; training the target detector according to the pyramid characteristic information, and detecting the target of the road side low-quality scene through the trained target detector.
2. The method for detecting an object in a low-quality road-side scene according to claim 1, wherein the step of constructing the image event data set from road-side scene data and open-source simulation data comprises: and calculating to obtain a transformation matrix of pictures shot by the two cameras according to the installation positions of the acquisition cameras, and unifying the two cameras to the same visual angle according to the transformation matrix to obtain an image and event sequence pair under the road side scene.
3. The method for detecting an object in a low-quality road scene according to claim 1 or 2, wherein before the step of extracting features of the image data and the event data in the image event data set by presetting a parallel feature extraction model, the method further comprises: According to the formula And A representation of event data input and alignment of events with image data is made, where x i ,y i represents pixel coordinate position, T i represents event triggered time, p i represents event polarity, p i = +1 or-1, v (x, y, T) is a voxel grid into which events are converted within a time window deltat, deltab () represents a bilinear sampling function, where, K rgb 、K event represents the internal parameters of the conventional camera and the event camera, B represents the number of time DeltaTbin, RGB represents the RGB image output by the conventional camera, T 1 represents the time triggered by the first event, R rgb and R event represent the rotation matrix parameters inside the conventional camera and the event camera, respectively, and R event,rgb represents the rotation matrix parameters for coordinate conversion between the conventional camera and the event camera.
4. The method for detecting the object of the low-quality road scene according to claim 1, wherein the steps of fusing the extracted image and the event feature, inputting the fused feature into a preset feature pyramid network to extract multi-scale information, and obtaining pyramid feature information comprise: processing the extracted image data features and time data features by using a preset convolutional neural network to obtain feature diagrams with different sizes; Performing transverse 1x1 convolution channel transformation on the feature images with different sizes; Upsampling the feature images with different sizes by using a nearest neighbor interpolation upsampling algorithm; And adding the up-sampled feature images with the same size obtained through channel transformation pixel by pixel to obtain pyramid feature information.
5. The method for detecting an object in a low-quality road scene according to claim 1, wherein before the step of detecting the object in the low-quality road scene by the trained object detector, the method further comprises: The target detector is trained and updated according to the formula L det ＝λ 1 L reg +λ 2 L cls , where L reg represents a position regression loss function, L cls represents a classification loss function, and λ 1 and λ 2 represent weight coefficients.
6. A target detection system for a roadside low-quality scene, the system comprising: the construction module is used for constructing an image event data set according to the road side scene data and the open source simulation data; The extraction module is used for extracting the characteristics of the image data and the event data in the image event data set respectively through a preset parallel characteristic extraction model; the fusion module is used for fusing the extracted image and event characteristics, inputting the fused characteristics into a preset characteristic pyramid network for multi-scale information extraction, and obtaining pyramid characteristic information; And the detection module is used for training the target detector according to the pyramid characteristic information and detecting the target of the road side low-quality scene through the trained target detector.
7. The system for detecting objects in a low-quality road scene according to claim 6, wherein, The construction module is specifically configured to calculate a transformation matrix of pictures shot by two cameras according to the installation positions of the acquisition cameras, and unify the two cameras to the same viewing angle according to the transformation matrix, so as to obtain an image and an event sequence pair in a road side scene.
8. The object detection system for a roadside low-quality scene as claimed in claim 6 or 7, wherein, The extraction module is also used for preparing the formula And A representation of event data input and alignment of events with image data is made, where x i ,y i represents pixel coordinate position, T i represents event triggered time, p i represents event polarity, p i = +1 or-1, v (x, y, T) is a voxel grid into which events are converted within a time window deltat, deltab () represents a bilinear sampling function, where, K rgb 、K event represents the internal parameters of the conventional camera and the event camera, B represents the number of time DeltaTbin, RGB represents the RGB image output by the conventional camera, T 1 represents the time triggered by the first event, R rgb and R event represent the rotation matrix parameters inside the conventional camera and the event camera, respectively, and R event,rgb represents the rotation matrix parameters for coordinate conversion between the conventional camera and the event camera.
9. The system for detecting objects in a low-quality road scene according to claim 6, wherein, The fusion module is specifically configured to process the extracted image data features and time data features by using a preset convolutional neural network to obtain feature images with different sizes, perform transverse 1x1 convolutional channel transformation on the feature images with different sizes, perform upsampling on the feature images with different sizes by using a nearest neighbor interpolation upsampling algorithm, and perform pixel-by-pixel addition on the upsampled feature images and feature images with the same size obtained by channel transformation to obtain pyramid feature information.
10. The system for detecting objects in a low-quality road scene according to claim 6, wherein, The detection module is further configured to perform training update on the target detector according to a formula L det ＝λ 1 L reg +λ 2 L cls , where L reg represents a position regression loss function, L cls represents a classification loss function, and λ 1 and λ 2 represent weight coefficients.

Description

Target detection method and system for road side low-quality scene Technical Field The invention relates to the field of intelligent parking management, in particular to a target detection method and system for a low-quality scene of a road side. Background In recent years, with the development of social economy, the convenience requirement of daily travel of people is continuously increased, the number of automobiles is continuously increased, but the number of berths is increased to be far smaller than that of vehicles, so that the difficulty and disorder of parking are one of the main problems to be solved in urban traffic development in China. At present, traffic management departments in all places are actively developed and are pushed to solve the problems of difficult parking and messy parking of vast owners. In particular, with the development of intelligent technologies in recent years, various attempts and construction of intelligent parking management technologies have been made by various traffic management departments in order to effectively alleviate the problem. The intelligent parking management is a mainstream scheme at present, mainly comprises the steps of arranging a high-order video camera on the road side to collect data, and then utilizing a visual algorithm such as a target detection algorithm to realize vehicle target detection of a road side parking scene. But compared with other spectrums, the visible light imaging range is obviously narrower, and can only be effective in environments with good light illumination and higher visibility, and can possibly fail in special environments such as night, rain and fog, and the like, and under low-quality scenes such as night, rain and fog, the quality of images shot by a camera is lower, the visibility of a vehicle target is lower, noise information in the images is excessive, so that the judgment accuracy of a target detection task is lower under the low-quality scenes. In addition, the accuracy of target detection tasks in low-quality scenes is improved by using sensing equipment such as a laser radar, but the cost of the laser radar is high, and the laser radar is not suitable for being installed in a road side scene in a large quantity at present. Disclosure of Invention In order to solve the technical problems, the invention provides a method and a system for detecting targets of a low-quality scene on a road side, which can solve the problems of lower accuracy and higher implementation cost of the target detection of the low-quality scene on the road side. In order to achieve the above object, the present invention provides a method for detecting a target of a low-quality scene on a road side, the method comprising: constructing an image event data set according to the road side scene data and the open source simulation data; Respectively extracting features of the image data and the event data in the image event data set through a preset parallel feature extraction model; Fusing the extracted image and event features, inputting the fused features into a preset feature pyramid network for multi-scale information extraction to obtain pyramid feature information; training the target detector according to the pyramid characteristic information, and detecting the target of the road side low-quality scene through the trained target detector. Further, the step of constructing the pair image event data set according to the road side scene data and the open source simulation data comprises the following steps: and calculating to obtain a transformation matrix of pictures shot by the two cameras according to the installation positions of the acquisition cameras, and unifying the two cameras to the same visual angle according to the transformation matrix to obtain an image and event sequence pair under the road side scene. Further, before the step of extracting features of the image data and the event data in the image event data set by presetting a parallel feature extraction model, the method further includes: According to formula e i＝(xi,yi,ti,pi), Δ b (a) =max (0, 1- |a|), andA representation of the event data input and alignment of the event with the image data is made, where x i,yi represents the pixel coordinate position, T i represents the time of event triggering, p i represents the polarity of the event, p i = +1 or-1, the event within a certain time window Δt is converted into a voxel grid of BxHxW, where H, W represents the height and width of the event frame, B represents the number of time bins,K rgb、Kevent represents internal parameters of the conventional camera and the event camera, RGB represents RGB images output by the conventional camera, R rgb and R event represent rotation matrix parameters inside the conventional camera and the event camera, respectively, and R event,rgb represents rotation matrix parameters for coordinate conversion between the conventional camera and the event camera. Further, the step of fusing the ex