CN-122024186-A - Method and system for detecting sprinkled objects based on multi-mode model
Abstract
The invention provides a method and a system for detecting a casting object based on a multi-mode model, which relate to the technical field of image recognition and comprise the steps of obtaining road image data and carrying out casting object marking to obtain first training data, extracting a casting object area image based on the first training data to serve as a positive sample, and a non-casting object area image serving as a negative sample to generate second training data, training an initial multi-mode recognition model based on the first training data and the second training data to obtain a target casting object detection model, collecting the road image in real time and inputting the model for first recognition to obtain a primary detection result, extracting a casting object area in the image based on the primary detection result, inputting the extracted image into the model again for second recognition to obtain a final detection result, so that the technical problems of high cost, limitation by a preset category, difficult detection of a small-size target and low recognition precision caused by detection and classification splitting are solved, and the effect of improving the casting object recognition precision is achieved.
Inventors
- WU QINGGANG
- SUN HAO
- QI XINZHOU
- Sun tuo
- HAO RUOCHEN
- CHEN BEI
- LIN QIHENG
- CHEN BENWEI
Assignees
- 兆边(上海)科技有限公司
- 北京云星宇交通科技股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260413
Claims (10)
- 1. A method for detecting a casting based on a multi-modal model, comprising: acquiring road image data; performing object throwing marking on the road image data to obtain first training data; Respectively extracting a casting object area image from the road image data as a positive sample and a non-casting object area image as a negative sample based on the first training data, and generating second training data based on the positive sample and the negative sample; Training an initial multi-mode recognition model based on the first training data and the second training data to obtain a target casting detection model; acquiring a road image in real time, inputting the target projectile detection model for first identification, and obtaining a preliminary detection result; and extracting a casting object region in the road image acquired in real time based on the preliminary detection result, and inputting the extracted image into the target casting object detection model again for second identification to obtain a target casting object detection result.
- 2. The method for detecting a casting object based on a multi-modal model according to claim 1, wherein the acquiring road image data includes: acquiring video stream data, and carrying out frame splitting treatment on the video stream data to obtain original image data; performing edge detection on the road area in the original image data to obtain two side edge curves of the road; combining the curves of the two sides of the road into a closed area, and dividing the road picture in the video stream data based on the closed area to obtain a road division image; and carrying out gray scale processing on the non-road area in the road segmentation image to obtain road image data.
- 3. The method for detecting the casting object based on the multi-modal model according to claim 1, wherein the performing casting object labeling on the road image data to obtain first training data includes: Based on a preset image coordinate system rule, marking a casting object in the road image data by adopting a marking tool to carry out frame selection marking to obtain the position information of each casting object marking frame, and recording corresponding label information to obtain initial marking data; performing data cleaning on the initial marking data to remove the image without the object marking frame to obtain cleaned data; and performing off-line enhancement processing on the cleaned data to obtain first training data.
- 4. The method of claim 1, wherein the extracting, based on the first training data, a throwing object area image as a positive sample and a non-throwing object area image as a negative sample from the road image data, respectively, and generating second training data based on the positive sample and the negative sample, comprises: Based on the position information of each throwing object target marking frame in the first training data, cutting a corresponding region in the road image data to obtain a throwing object region image as a positive sample; Randomly cutting a non-throwing object area in the road image data to obtain a non-throwing object area image as a negative sample; Respectively carrying out data cleaning on the positive sample and the negative sample to obtain a cleaned positive sample and a cleaned negative sample; And performing off-line enhancement processing on the cleaned positive sample and the cleaned negative sample to obtain second training data.
- 5. The method for detecting a casting object based on a multi-modal model according to claim 1, wherein the training the initial multi-modal identification model based on the first training data and the second training data to obtain a target casting object detection model includes: formatting the first training data and the second training data into detection task question-answer pairs and classification task question-answer pairs respectively to obtain a training question-answer pair set; acquiring the base weight of the initial multi-mode identification model, and initializing a first low-rank matrix and a second low-rank matrix; Inputting the training question-answer pair set into the initial multi-modal identification model to obtain an answer text for each question-answer pair, and updating the first low-rank matrix and the second low-rank matrix through back propagation by taking the similarity between the answer text and standard answers in the corresponding question-answer pair as an optimization target until the initial multi-modal identification model converges to obtain a target first low-rank matrix and a target second low-rank matrix; And constructing a target casting object detection model based on the target first low-rank matrix, the target second low-rank matrix and the base weight.
- 6. The method for detecting the casting object based on the multi-modal model according to claim 1, wherein the acquiring the road image in real time and inputting the target casting object detection model for the first recognition to obtain the preliminary detection result includes: acquiring a road image in real time, and converting the road image into image byte stream data; based on a preset request format, combining the image byte stream data with a preset detection problem to obtain a first identification request; inputting the first identification request into the target projectile detection model to obtain a first identification result; and splitting the first identification result into target items to obtain a preliminary detection result, wherein the preliminary detection result comprises a plurality of target items, and each target item comprises coordinate position information of the throwing object in the image and a corresponding preliminary category label.
- 7. The method for detecting the casting object based on the multi-mode model according to claim 1, wherein the steps of extracting the casting object area in the road image acquired in real time based on the preliminary detection result, inputting the extracted image into the target casting object detection model again for the second recognition, and obtaining the target casting object detection result include: Based on the position information of each target item in the preliminary detection result, cutting the corresponding region in the real-time acquired road image to obtain a plurality of to-be-classified throwing object images; converting each to-be-classified object image into image byte stream data, and combining with preset classification problems based on a preset request format to obtain a second identification request; inputting the second identification request into the target projectile detection model to obtain a second identification result for each projectile image to be classified; Updating the fine classification label in the second identification result to the preliminary classification label of the corresponding target item in the preliminary detection result to obtain a target casting object detection result.
- 8. A multiple-modality model-based casting detection system, comprising: the data acquisition module is used for acquiring road image data; the image labeling module is used for labeling the throwing objects on the road image data to obtain first training data; The image extraction module is used for respectively extracting a throwing object area image from the road image data to serve as a positive sample and a non-throwing object area image from the road image data to serve as a negative sample based on the first training data, and generating second training data based on the positive sample and the negative sample; The model training module is used for training the initial multi-modal identification model based on the first training data and the second training data to obtain a target casting detection model; The primary identification module is used for acquiring road images in real time, inputting the target casting object detection model for primary identification, and obtaining a primary detection result; And the secondary identification module is used for extracting the casting object area in the road image acquired in real time based on the primary detection result, and inputting the extracted image into the target casting object detection model again for secondary identification to obtain a target casting object detection result.
- 9. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program capable of being loaded by the processor and performing the method of multi-modal model-based casting detection as claimed in any one of claims 1 to 7.
- 10. A computer-readable storage medium, characterized in that a computer program is stored that can be loaded by a processor and that performs the method of detecting a casting compound based on a multimodal model as claimed in any one of claims 1 to 7.
Description
Method and system for detecting sprinkled objects based on multi-mode model Technical Field The invention relates to the technical field of image recognition, in particular to a method and a system for detecting a throwing object based on a multi-mode model. Background In recent years, along with the continuous increase of expressway mileage and continuous increase of traffic flow in China, the road traffic safety problem is increasingly prominent, vehicles run fast and dense in traffic flow in a high-speed scene, and once a throwing object appears on a road surface, serious traffic accidents are extremely easy to cause, even secondary accidents and traffic paralysis are caused, so that great challenges are brought to crowd life and property safety and traffic management departments. Therefore, how to quickly and accurately detect road surface casting matters and identify the categories of the road surface casting matters has become an important research direction in the field of intelligent traffic monitoring. At present, the technology for detecting the casting object based on image video processing mainly comprises two types, namely a traditional algorithm, a target detection algorithm based on deep learning, a comparison method, a statistical method, a time sequence-based filtering method and the like; However, the prior art has the limitation in practical application that the traditional algorithm does not rely on a large amount of labeling data for modeling, but has poor robustness, has extremely low detection rate in the scenes of complex illumination, shadow shielding, bad weather and the like, and is difficult to meet the strict requirement of actual road monitoring. In contrast, although the detection accuracy of the detection algorithm based on deep learning is improved, the performance of the detection algorithm is highly dependent on a large-scale and high-quality labeling data set, so that the landing cost is high. More importantly, the detection category of the algorithm is strictly limited in the training set category, and undefined sprinkled objects in the training set are extremely prone to missed detection. In addition, the method is limited by image resolution and feature extraction capability, the detection effect of the method on small-size casts is poor, in addition, in the prior art, the target detection and classification tasks are usually split, detection is performed first and classification is performed later, and the final result is difficult to accurately and comprehensively describe the real attribute of the casts due to the separation processing mode. Thus, the above limitations necessarily result in an overall lack of accuracy in the identification of the projectile. Disclosure of Invention Aiming at the defects in the prior art, the invention aims to provide a method for detecting a casting object based on a multi-mode model, which has the characteristic of improving the identification precision of the casting object. The first object of the present invention is achieved by the following technical solutions: a method for detecting a casting object based on a multi-mode model comprises the following steps: acquiring road image data; performing object throwing marking on the road image data to obtain first training data; Respectively extracting a casting object area image from the road image data as a positive sample and a non-casting object area image as a negative sample based on the first training data, and generating second training data based on the positive sample and the negative sample; Training an initial multi-mode recognition model based on the first training data and the second training data to obtain a target casting detection model; acquiring a road image in real time, inputting the target projectile detection model for first identification, and obtaining a preliminary detection result; and extracting a casting object region in the road image acquired in real time based on the preliminary detection result, and inputting the extracted image into the target casting object detection model again for second identification to obtain a target casting object detection result. By adopting the technical scheme, the multi-mode training data comprising detection and classification tasks is constructed, the initial multi-mode recognition model is subjected to joint fine adjustment, so that the same model has the capability of positioning and finely classifying the sprinkled objects, the whole image is further subjected to primary detection to position the sprinkled object area through a cascade strategy of two times of recognition, and then the cut partial image is subjected to secondary fine recognition, so that the detection rate and the classification accuracy of the small-size sprinkled objects are effectively improved, and meanwhile, the type of the unobserved sprinkled objects can be universally recognized independent of the preset detection category, so that the compr