CN-121980337-A - Traffic regulation violation reminding method and system based on multi-mode large model

CN121980337ACN 121980337 ACN121980337 ACN 121980337ACN-121980337-A

Abstract

The invention provides a traffic regulation violation reminding method and system based on a multi-mode large model, which belong to the technical field of intelligent driving of automobiles, wherein the method comprises the steps of obtaining multi-mode data of a current traffic scene, wherein the multi-mode data of the current traffic scene comprises image data, audio data and text data; inputting the multi-mode data of the current traffic scene into a pre-trained multi-mode big model, carrying out real-time reasoning on the pre-trained multi-mode big model, outputting a reasoning result, and generating warning information violating the traffic regulations according to the reasoning result. The invention constructs the omnibearing road scene understanding capability by fusing the vehicle-mounted vision, hearing, positioning, vehicle state and other multi-source data. Compared with the traditional single-mode recognition scheme, the method can greatly improve the accuracy of traffic violation detection in complex driving environments (such as severe weather and dense building groups), and effectively solve the problems of limited field of view and insufficient environmental adaptability of a single sensor.

Inventors

WANG ZIMING

Assignees

东风汽车集团股份有限公司

Dates

Publication Date: 20260505
Application Date: 20260105

Claims (10)

1. The traffic regulation violation reminding method based on the multi-mode large model is characterized by comprising the following steps of: Acquiring multi-mode data of a current traffic scene, wherein the multi-mode data of the current traffic scene comprises image data, audio data and text data; inputting the multi-modal data of the current traffic scene into a pre-trained multi-modal large model, and carrying out real-time reasoning on the pre-trained multi-modal large model and outputting a reasoning result; And generating warning information for violating the traffic regulations according to the reasoning result.
2. The method of claim 1, further comprising the step of training the multi-modal large model, wherein the step of training the multi-modal large model comprises: acquiring multi-mode sample data in a plurality of different traffic scenes; inputting the multi-mode sample data under the multiple different traffic scenes into a preset multi-mode large model for training; Introducing a real violation case into a preset multi-modal large model for fine adjustment to obtain the pre-trained multi-modal large model; The architecture of the preset multi-mode large model adopts an encoder-decoder structure, wherein the encoder comprises a visual transducer for processing image data, a voice transducer for processing audio data and a text transducer for processing text data, and the preset multi-mode large model realizes feature fusion through a cross-mode attention mechanism.
3. The method according to claim 1, wherein the image data comprises road images acquired by an onboard camera, the audio data comprises in-car sound data and/or out-of-car environmental sound data, and the text data comprises traffic regulation knowledge graph, positioning and status data.
4. The method according to claim 3, wherein the step of inputting the multi-modal data of the current traffic scene into a pre-trained multi-modal large model, the pre-trained multi-modal large model performing real-time reasoning and outputting the reasoning result comprises a multi-modal data preprocessing step of the current traffic scene, wherein the multi-modal data preprocessing step of the current traffic scene comprises: Denoising and correcting distortion of the image data; Noise reduction and sound source positioning are carried out on the audio data; and matching the positioning and state data with a high-precision map to obtain road section basic information.
5. A method according to claim 1,2 or 3, wherein the step of inputting the multimodal data of the current traffic scene into a pre-trained multimodal big model, the pre-trained multimodal big model reasoning in real time and outputting the reasoning results comprises: the pre-trained multi-mode large model identifies at least one of traffic marks, traffic signal lamp states, road marks and temporary construction marks through image data, and outputs visual semantic tags; The pre-trained multi-mode large model recognizes voice characteristics through audio data; the pre-trained multi-mode big model carries out fusion reasoning according to visual semantic tags, voice characteristics, positioning and state data and traffic regulation knowledge maps, and outputs reasoning results including violation types and mapping relations of the violation types and regulation clauses.
6. The method of claim 1,2 or 3, wherein the generating traffic regulation violation alert information based on the inference result comprises: Generating personalized reminding contents according to the violation type, the severity and the driving scene; And sending the personalized reminding content to a vehicle-mounted loudspeaker for playing.
7. The method of claim 1,2 or 3, further comprising a voice query step, wherein the voice query step comprises: when a voice query request is acquired, rule and term interpretation information corresponding to the rule and rule type is output according to the mapping relation between the rule and term and rule type.
8. A traffic regulation violation alert system based on a multimodal mass model, configured to enable the method of any of claims 1 to 7, the system comprising: The system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring multi-mode data of a current traffic scene, and the multi-mode data of the current traffic scene comprises image data, audio data and text data; the model processing module is used for inputting the multi-mode data of the current traffic scene into a pre-trained multi-mode big model, and the pre-trained multi-mode big model carries out real-time reasoning and outputs a reasoning result; and the reminding module is used for generating traffic regulation violation reminding information according to the reasoning result.
9. An electronic device, comprising: One or more processors; A memory for storing one or more programs; when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1 to 7.
10. A computer readable medium having a computer program stored thereon, characterized in that the computer program, when executed by a processor, implements the steps of the method according to any of claims 1 to 7.

Description

Traffic regulation violation reminding method and system based on multi-mode large model Technical Field The invention relates to the technical field of intelligent driving of automobiles, in particular to a traffic regulation violation reminding method and system based on a multi-mode large model. Background In modern society, with continuous progress of technology, continuous increase of vehicle keeping amount and increasingly complex traffic scene, accurate, real-time monitoring and reminding of traffic violation become more important. Especially for novice drivers, as vehicles increase, more information is required to be acquired, traffic marks such as traffic lights and double yellow lines are ignored when attention is attracted by pedestrians or vehicles, and at the moment, certain risks of violating traffic regulations due to attention loss are caused due to excessive information. The implementation of the existing traffic regulations mainly depends on traditional technical means, and the means expose a plurality of limitations when dealing with complex and changeable traffic environments. Meanwhile, for an simplistic traffic scene, although a driver can drive normally, due to the fact that routes and road conditions are too familiar or attracted by pets, mobile phone messages and other things in the vehicle, traffic violations such as red light running by mistake and traffic violations such as driving without according to lane lines are caused, loss of the driver can be caused, and certain safety risks to vehicles and pedestrians on the road can be generated. In terms of voice reminding, the existing vehicle-mounted voice system has certain voice reminding under the condition that intelligent driving or navigation is started, but is mostly based on preset rules and fixed scenes, and lacks of dynamic sensing and flexible coping capability for real-time traffic conditions. For example, a common vehicle navigation voice prompt can only prompt according to the information of a fixed speed measurement point, a limited traffic area and the like which are input in advance, and cannot respond in real time to temporary traffic control, sudden road conditions or instant violations of drivers. Such systems often fail to provide effective alerts when special traffic rules are encountered for traffic lane changes due to temporary construction, traffic police site command. In addition, the existing traffic violation monitoring and voice reminding system is mostly based on a single-mode data processing technology, lacks comprehensive analysis and understanding capability of multi-source information, and is not necessarily noticed by a driver although traffic marks such as speed limit and stop prohibition are recognized and displayed on a meter. Taking image recognition as an example, by using a camera image alone, it is difficult to simultaneously consider the vehicle running state, the driver operation behavior, the surrounding traffic participant dynamics and the overall perception of traffic signs and environmental information. Under a complex scene, a single information source easily causes information loss or misjudgment, and potential illegal behaviors cannot be accurately captured. In addition, when processing traffic regulation knowledge, most of the systems are simple rule matching, lack understanding and reasoning capability of deep semantics of regulation clauses, and are difficult to cope with complex and changeable real traffic scenes. Therefore, how to realize real-time identification and reminding of dynamic traffic scenes such as temporary marks and the like and fill the blank of the dependence of fixed data becomes a key technical problem to be solved urgently. Disclosure of Invention The invention aims to solve at least one of the problems in the prior art, improves the anti-interference capability through multi-mode fusion, improves the recognition accuracy of traffic scenes in complex environments, realizes the crossing from feature recognition to semantic understanding, can logically infer based on traffic regulation knowledge graphs, provides explanatory reminding, optimizes driver interaction experience, supports natural language dialogue reminding and meets the requirement of personalized rule inquiry. In a first aspect, an embodiment of the present invention provides a traffic regulation violation reminding method based on a multi-mode large model, including: Acquiring multi-mode data of a current traffic scene, wherein the multi-mode data of the current traffic scene comprises image data, audio data and text data; inputting the multi-modal data of the current traffic scene into a pre-trained multi-modal large model, and carrying out real-time reasoning on the pre-trained multi-modal large model and outputting a reasoning result; And generating warning information for violating the traffic regulations according to the reasoning result. In a preferred embodiment, the method further comprise