CN-121999610-A - Traffic monitoring method, device and medium based on radar fusion and large model
Abstract
The invention discloses a traffic monitoring method, a traffic monitoring device and a traffic monitoring medium based on radar fusion and a large model, wherein the method comprises the steps of acquiring radar point cloud data and visual image data; the method comprises the steps of respectively carrying out target identification and feature extraction on visual image data and radar point cloud data, respectively carrying out tokenization to obtain a plurality of radar tokens, a plurality of visual tokens, visual targets and radar targets, fusing the plurality of visual tokens and the plurality of radar tokens to obtain fused feature vectors, obtaining types of traffic events and a structured target information list according to the fused feature vectors and an inference large model constructed by a system, obtaining a corresponding inference text template based on traffic event knowledge maps and the types of the traffic events, obtaining data information of each target according to the structured target information list, and dynamically generating traffic monitoring inference results. The invention can solve the problems of large image data processing amount, low efficiency and the like in the existing traffic event monitoring.
Inventors
- CHEN HONGJUN
- HUO QIFENG
Assignees
- 广东智视云控科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260127
Claims (10)
- 1. The traffic monitoring method based on the radar fusion and the large model is characterized by comprising the following steps of: acquiring an area to be monitored according to a monitoring requirement, and synchronously acquiring radar point cloud data and visual image data of the area to be monitored in real time according to the area to be monitored; Performing target recognition and feature extraction on the visual image data to obtain visual feature vectors, tokenizing the visual feature vectors to obtain a plurality of visual tokens and a plurality of visual targets, performing target recognition and feature extraction on the radar point cloud data to obtain radar feature vectors, tokenizing the radar feature vectors to obtain a plurality of radar tokens and a plurality of radar targets, and fusing the plurality of visual tokens and the plurality of radar tokens according to preset rules to obtain fusion feature vectors; The method comprises the steps of reasoning, namely obtaining the type of a traffic event and a structured target information list according to a reasoning big model constructed by a fusion feature vector and a system, wherein the structured target information list comprises a visual target list, a radar target list, a correlation point pair of the visual target and the radar target, data information of each visual target and data information of each radar target; And the reasoning step is to call a corresponding natural language reasoning template based on the traffic event knowledge graph constructed by the system and the type of the traffic event, obtain the data information of each target according to the structured target information list, and dynamically generate a traffic monitoring reasoning result according to the data information of each target and the corresponding natural language reasoning template.
- 2. The traffic monitoring method based on the thunder fusion and the large model according to claim 1, wherein the steps of performing object recognition and feature extraction on the visual image data to obtain visual feature vectors and tokenizing the visual feature vectors to obtain a plurality of visual tokens and a plurality of visual objects specifically comprise: The method comprises the steps of performing preprocessing on visual image data, performing object detection on the visual image data according to an INT8 quantized object detection network to identify a plurality of visual objects and data information, wherein the data information of each visual object comprises an object type, an object position and a boundary frame; acquiring candidate frames of each visual target according to each visual target, and executing RoI-Align operation on the candidate frames of each visual target, so as to pool the candidate frames of each visual target into a visual feature map with fixed size; a feature flattening sub-step of flattening the visual feature map into a multidimensional feature vector; And a tokenizing sub-step, namely splicing the multidimensional feature vector of each visual target with the data information of the corresponding visual target, compressing the multidimensional feature vector into an INT8 vector with a fixed dimension through an INT8 convolution projection layer, and further obtaining the visual token corresponding to each visual target.
- 3. The traffic monitoring method based on radar fusion and large model according to claim 1, wherein the step of tokenizing, in which target recognition and feature extraction are performed on the radar point cloud data to obtain radar feature vectors, and tokenizing the radar feature vectors to obtain a plurality of radar tokens and a plurality of radar targets, specifically comprises: The preprocessing step comprises the steps of carrying out multidimensional fast Fourier transform processing on the radar point cloud data, wherein the multidimensional comprises a distance dimension, a Doppler dimension, an angle dimension and a height dimension; clustering points of the radar point cloud data through CFAR detection and DBSCAN clustering to extract each radar target and parameter data, and obtaining a target type of each radar target by using a lightweight classifier, wherein the parameter data of the radar target comprises center coordinates, speed vectors, sizes, RCS mean values and point cloud density; And tokenizing, namely splicing the target type of each radar target and the parameter data of the target to form a structural feature group, and compressing the structural feature group by INT8 to obtain a radar token.
- 4. The traffic monitoring method based on the radar fusion and the large model according to claim 1, wherein the step of tokenizing fuses a plurality of visual tokens and a plurality of radar tokens according to a preset rule to obtain a fusion feature vector specifically comprises the steps of firstly constructing an association pair of a visual target and a radar target based on spatial alignment, then fusing the plurality of visual tokens and the plurality of radar tokens according to the preset rule according to the association pair to obtain the fusion feature vector, and storing the fusion feature vector into a shared SRAM; the constructing the association pair of the visual target and the radar target based on the spatial alignment specifically comprises the following steps: A depth estimation sub-step of projecting the radar point cloud to an imaging plane of a camera device to generate a dense depth map; a feature dimension-increasing sub-step of back-projecting pixels in the candidate frames of each visual target to a 3D space through Lift operation based on the dense depth map to generate a pseudo point cloud; The BEV conversion sub-step is that radar point clouds of the pseudo point clouds and radar point cloud data are uniformly discretized into BEV grids to construct association pairs of visual targets and radar targets; The method comprises the steps of merging a plurality of visual tokens and a plurality of radar tokens according to a preset rule according to an association pair to obtain a merged feature vector, and storing the merged feature vector into a shared SRAM, wherein the method specifically comprises the steps of obtaining data information of each visual target and parameter data of each radar target according to each visual target and radar target in the association pair, splicing the data information of the visual target, the visual tokens, parameter data of the radar targets and the radar tokens according to the preset data rule to form a token sequence with a fixed length, namely merging the feature vector and storing the token sequence into the shared SRAM, the SRAM comprises a radar token area, a visual token area, a filling token area and a metadata area, the radar token area is used for storing the parameter data of the radar targets and the radar tokens, the filling token area is used for storing the data information of the vision targets and the radar targets, the filling token area is used for storing a plurality of zero tokens, the overlapping proportion and the matching score of the vision tokens and the radar targets, and the radar token area is located at a starting position of the radar token, and the filling token area is located behind the metadata area.
- 5. The traffic monitoring method based on the radar fusion and the large model according to claim 4, wherein after the correlation pair of the visual target and the radar target is constructed based on the spatial alignment, the method further comprises the steps of verifying the correlation pair of the visual target and the radar target based on the color consistency; The verification method based on the color consistency comprises the steps of firstly projecting each radar target onto an original RGB image of visual image data, further calculating a projection area of each radar target on the original RGB image, then extracting an RGB color histogram in each projection area, and calculating a color consistency score of each projection area, and finally considering that the correlation pair of the radar target and the visual target meets requirements when the spatial overlapping degree of the projection area corresponding to each radar target and the corresponding visual target is larger than a preset threshold value and the color consistency score is larger than the corresponding threshold value.
- 6. The traffic monitoring method based on the radar fusion and the large model according to claim 1, wherein the construction process of the reasoning large model specifically comprises the steps of acquiring multi-modal samples in a plurality of different traffic scenes by collecting historical data, each multi-modal sample comprises visual image data and radar point cloud data, and marking the type of traffic event occurring in each multi-modal sample to construct a data set and dividing the data set into a training set, a verification set and a calibration set; Training a general large model based on a training set and a verification set to obtain a trained large model, constructing a multi-task output head on an output layer of the model in the training process of the model to realize training of the large model and further realize a multi-task prediction reasoning large model, wherein the multi-task output head comprises an event classification head, a description regression head and a root cause analysis head, the event classification head adopts a fully-connected layer structure to output the probability of each type of traffic event, the description regression head comprises a structure based on Tranformaer decoders to generate a structured event description text, the root cause analysis head comprises a structure based on a Tranformaer decoder to select corresponding features to output the root cause probability distribution of each type of traffic event, in the training process of the model, the multi-task loss function is adopted to carry out joint optimization, the total loss function is the weighted sum of the task loss functions, and an optimizer in training process adopts AdamW, and the general large model is at least any one of the following models, namely Qwen, LLaMA, chatGLM, BERT and ViT; And quantizing the trained reasoning big model based on the calibration set and INT8 quantization, and strictly limiting all weights and activation values of the big model within the INT8 range based on a quantized perception training strategy to obtain the quantized reasoning big model.
- 7. The traffic monitoring method based on the radar fusion and the large model according to claim 1, wherein the data acquisition step specifically comprises the step of synchronizing radar point cloud data and visual image data of the area to be monitored in real time through a clock synchronization method, and the step of synchronizing the radar point cloud data and the visual image data of the area to be monitored in real time through the clock synchronization method comprises the following steps: Firstly, initializing radar end equipment and camera end equipment, generating clock synchronization signals through a clock source, and synchronizing the clock synchronization signals to the radar end equipment and the camera end equipment so as to enable the radar end equipment and the camera end equipment to be in clock synchronization; The method comprises the steps of obtaining visual image data through a camera end device, capturing the moment of the rising edge of a clock synchronous signal of the camera end device by a system, latching the current value of a counter in the system and recording the current value as a visual time stamp, obtaining radar point cloud data through a radar end device in the valid period of the clock synchronous signal, and synchronously reading the current value of a state register through an SPI interface of the radar end device and recording the current value as a radar time stamp; And then calculating a visual time stamp expected value according to the radar time stamp and a preset time mapping model, calculating the time offset of the visual time stamp expected value and the visual time stamp, and synchronizing Lei Dadian cloud data and visual image data clock when the time offset is within a preset range.
- 8. The traffic monitoring method based on the thunder fusion and the large model according to claim 1, wherein the traffic event knowledge graph specifically comprises the steps of carrying out data mining according to data information of each traffic event occurring in a history to obtain characteristic data of each type of traffic event, and constructing time ontology and attribute data of each type of traffic event based on the characteristic data of each type of traffic event, wherein the attribute information of each type of traffic event comprises necessary attributes, optional attributes and relational attributes; The reasoning step further comprises pushing the traffic monitoring reasoning result to an external traffic decision system, so that the traffic decision system generates decision notification according to the traffic monitoring reasoning result.
- 9. Traffic monitoring device based on thunder vision integration and big model, including memory and treater, the traffic monitoring procedure that runs on the treater is stored on the memory, traffic monitoring procedure is computer program, characterized in that, the step of realizing the traffic monitoring method based on thunder vision integration and big model according to any one of claims 1-8 when the treater is carried out the traffic monitoring procedure.
- 10. A computer-readable storage medium, on which a traffic monitoring program is stored, characterized in that the traffic monitoring program is a computer program, which, when being executed by a processor, implements the steps of the traffic monitoring method based on a thunder fusion and a large model as claimed in any one of claims 1-8.
Description
Traffic monitoring method, device and medium based on radar fusion and large model Technical Field The invention relates to the field of intelligent traffic, in particular to a traffic monitoring method, device and medium based on radar fusion and a large model. Background In the traffic field, the monitoring of traffic incidents or road conditions generally comprises the steps of shooting videos or images of road driving conditions through various installed video monitoring equipment, and then realizing the monitoring of various traffic incidents on the road based on image analysis, wherein the processed images or videos are more, so that the processing performance of a system is low, and in many times, the traffic incidents or road conditions cannot be judged in time by further checking the images or videos by staff, so that the traffic road conditions cannot be early-warned in time, and the occurrence probability of traffic accidents is increased. Disclosure of Invention In order to overcome the defects of the prior art, one of the purposes of the invention is to provide a traffic monitoring method based on the thunder fusion and a large model, which can solve the problems that the system performance is low, manual assistance is needed and timely prediction cannot be performed when the traffic condition is realized based on the image analysis in the prior art. The second purpose of the invention is to provide a traffic monitoring device based on the thunder fusion and the large model, which can solve the problems that the system performance is low, manual assistance is needed and the traffic condition can not be predicted in time when the traffic condition is realized based on the image analysis in the prior art. The third objective of the present invention is to provide a computer readable storage medium, which can solve the problems of low system performance, manual assistance and incapability of timely prediction when the traffic conditions are realized based on the image analysis. One of the purposes of the invention is realized by adopting the following technical scheme: the traffic monitoring method based on the radar fusion and the large model comprises the following steps: acquiring an area to be monitored according to a monitoring requirement, and synchronously acquiring radar point cloud data and visual image data of the area to be monitored in real time according to the area to be monitored; Performing target recognition and feature extraction on the visual image data to obtain visual feature vectors, tokenizing the visual feature vectors to obtain a plurality of visual tokens and a plurality of visual targets, performing target recognition and feature extraction on the radar point cloud data to obtain radar feature vectors, tokenizing the radar feature vectors to obtain a plurality of radar tokens and a plurality of radar targets, and fusing the plurality of visual tokens and the plurality of radar tokens according to preset rules to obtain fusion feature vectors; The method comprises the steps of reasoning, namely obtaining the type of a traffic event and a structured target information list according to a reasoning big model constructed by a fusion feature vector and a system, wherein the structured target information list comprises a visual target list, a radar target list, a correlation point pair of the visual target and the radar target, data information of each visual target and data information of each radar target; And the reasoning step is to call a corresponding natural language reasoning template based on the traffic event knowledge graph constructed by the system and the type of the traffic event, obtain the data information of each target according to the structured target information list, and dynamically generate a traffic monitoring reasoning result according to the data information of each target and the corresponding natural language reasoning template. Further, performing object recognition and feature extraction on the visual image data to obtain visual feature vectors, and tokenizing the visual feature vectors to obtain a plurality of visual tokens and a plurality of visual objects specifically includes: The method comprises the steps of performing preprocessing on visual image data, performing object detection on the visual image data according to an INT8 quantized object detection network to identify a plurality of visual objects and data information, wherein the data information of each visual object comprises an object type, an object position and a boundary frame; acquiring candidate frames of each visual target according to each visual target, and executing RoI-Align operation on the candidate frames of each visual target, so as to pool the candidate frames of each visual target into a visual feature map with fixed size; a feature flattening sub-step of flattening the visual feature map into a multidimensional feature vector; And a tokenizing sub-step, namely splic