CN-122023252-A - Intelligent detection system based on multi-mode data fusion

CN122023252ACN 122023252 ACN122023252 ACN 122023252ACN-122023252-A

Abstract

The invention relates to the technical field of automatic detection, in particular to an intelligent detection system based on multi-mode data fusion, which is sequentially connected with an automatic optical detection unit, a lightweight model detection unit, a multi-mode large-model detection unit and an intelligent detection unit, wherein the automatic optical detection unit is used for collecting optical images of devices and executing preliminary screening, the lightweight model detection unit is used for receiving candidate defect area images output by the automatic optical detection unit, the multi-mode large-model detection unit is used for receiving candidate defect area images and associated text information when the confidence level output by the lightweight model detection unit is in a set middle range, the intelligent detection unit is used for executing routing distribution of detection tasks and dynamic adjustment of threshold parameters.

Inventors

ZHOU BUSHU
LI FENG
ZHANG ZHIQIANG

Assignees

广东中邦达智能技术有限公司

Dates

Publication Date: 20260512
Application Date: 20251225

Claims (10)

1. The intelligent detection system based on multi-mode data fusion is characterized by comprising the following units connected in sequence: the automatic optical detection unit AOI is used for acquiring optical images of the devices and performing preliminary screening, and performing direct processing on the clearly qualified products; the light model detection unit AVI is used for receiving the candidate defect area image output by the automatic optical detection unit, operating a light neural network model to judge defects, and outputting the confidence coefficient of the judging result; The multi-mode large model detection unit is used for receiving the candidate defect area image and associated text information when the confidence coefficient output by the light model detection unit is in a set middle range, and running a multi-mode fusion model to perform image-text alignment reasoning, category discrimination and result generation of defects; And the detection intelligent body unit is connected with and coordinately dispatches the automatic optical detection unit, the light-weight model detection unit and the multi-mode large model detection unit and is used for executing detection task route distribution, threshold parameter dynamic adjustment and detection result cache management and triggering an increment learning mechanism based on manual rechecking feedback.
2. The intelligent detection system based on multi-modal data fusion of claim 1, wherein the range of objects that the automated optical inspection unit performs pass-through processing is determined based on a predetermined rule algorithm, the range being dynamically adjustable based on historical inspection data.
3. The intelligent detection system based on multi-modal data fusion of claim 1, wherein the lightweight neural network model deployed by the lightweight model detection unit is a Transformer architecture or a variant thereof, and is used for identifying known defect types on the premise of ensuring detection speed.
4. The intelligent detection system based on multi-modal data fusion according to claim 1, wherein the multi-modal large model detection unit comprises: A visual encoder for extracting image features; a text encoder for extracting text information features; And the fusion inference device is used for carrying out alignment matching and logic inference on the visual features and the text features in the joint feature space and outputting defect types, position information and detection reports.
5. The intelligent detection system based on multi-modal data fusion of claim 1, wherein the detection agent unit makes a comparison decision with a preset first threshold and a preset second threshold according to a confidence value output by the lightweight model detection unit: if the confidence coefficient is higher than the first threshold value, directly adopting the result of the light model detection unit; if the confidence coefficient is lower than the second threshold value and reaches a specific condition, routing the task to a manual rechecking link; And if the confidence coefficient is between the first threshold value and the second threshold value, routing the task to a multi-modal large model detection unit for detailed analysis.
6. The intelligent detection system based on multi-modal data fusion of claim 5, wherein the detection agent unit supports dynamic updating of the first and second thresholds in accordance with statistics including historical detection false positive rates and false negative rates.
7. The intelligent detection system based on multi-mode data fusion according to claim 1, wherein the detection agent unit has a detection result caching function, and stores feature vectors of a specific low confidence mode as prototype vectors; And for the subsequent input image, if the similarity between the characteristics of the subsequent input image and any prototype vector is higher than a set threshold value, directly multiplexing the detection result of the corresponding prototype vector, and avoiding calling the multi-mode large model detection unit.
8. The intelligent detection system based on multi-modal data fusion of claim 1, wherein the system comprises an incremental learning mechanism, and when the manual review result is inconsistent with the output result of the multi-modal large model detection unit or the light-weight model detection unit, the system writes the manual review result into a knowledge base as a high-confidence sample and starts on-line fine tuning of the multi-modal large model detection unit.
9. The intelligent detection system based on multi-mode data fusion of claim 8, wherein the online fine tuning is realized by adopting a parameter efficient fine tuning technology, and specifically comprises the steps of keeping main parameters of the multi-mode large model detection unit unchanged, and performing gradient update and thermal replacement only on additional adaptive module parameters to realize online rapid optimization and deployment of model parameters without overall model full training.
10. The intelligent detection system based on multi-modal data fusion of claim 1 wherein the text information is a text prompt describing a process parameter or an expected defect type of the detected object.

Description

Intelligent detection system based on multi-mode data fusion Technical Field The invention relates to the technical field of automatic detection, in particular to an intelligent detection system based on multi-mode data fusion. Background With the continuous development of miniaturization and high density trend of electronic products, the size of solder joints and the mounting density of components of electronic products are continuously improved, which makes defect detection in the manufacturing process of the electronic products pose serious challenges. Conventional Automated Optical Inspection (AOI) systems typically rely on preset rule algorithms, such as gray threshold or template matching, to screen the target image. However, such conventional methods have significant limitations in practical applications. On the one hand, the system often generates a considerable number of suspected defect misjudgments due to the difficulty in overcoming the interference of complex factors such as the change of illumination conditions, the reflection characteristics of soldering tin, and the residues of soldering flux. These high frequency false positives result in large amounts of human resources having to be invested for review confirmation, significantly reducing the overall detection efficiency. On the other hand, for some complicated defects with extremely tiny or atypical morphological characteristics, such as a micron-scale false soldering phenomenon, tombstone effect or solder ball adhesion problem, a single visual analysis means based on rule matching often lacks enough discrimination basis, and it is difficult to effectively distinguish boundary conditions allowed by normal fluctuation of process parameters from actual functional defects, so that missed detection is caused. In order to improve the detection precision, the existing improved technical scheme mainly turns to an artificial intelligent model represented by a Convolutional Neural Network (CNN). Although the method is more accurate than the traditional rule algorithm to a certain extent, the method has various defects. For example, the CNN model after training often has a problem of insufficient generalization ability and weak adaptability when facing a new defect form possibly occurring on a production line or a new situation after a production process is changed. More importantly, the visual analysis models based on the single image cannot effectively utilize text information with important semantic value, such as design documents, technological parameters related to a mounting process and the like. The cleavage of text information makes it difficult to combine the relevant background knowledge for depth attribution analysis in the defect identification process. In addition, in order to pursue higher precision, a multi-mode fusion large model with huge parameters is directly deployed, so that strict requirements of production detection on real-time performance cannot be met due to too high model reasoning and calculation complexity, and the processing speed is difficult to reach the beat requirement of an actual production line. In recent years, the multi-mode data fusion technology, particularly the graph-text alignment reasoning technology, provides a new technical direction for coping with the challenges in the related fields of electronic product detection and the like. However, existing multi-modal fusion research results have not been specifically optimized for the high throughput, low latency operational requirements specific to the detection scenario. In particular, there is a lack of a progressive processing mechanism that effectively integrates rule prescreening, efficient lightweight models, and powerful but relatively time-consuming multimodal models in concert. Meanwhile, the existing system also generally lacks efficient online self-updating learning capability, and is difficult to dynamically optimize the self-performance based on feedback in the continuous running process of the production environment so as to cope with the continuously-changing detection requirement. For this reason, we propose an intelligent detection system based on multi-modal data fusion. Disclosure of Invention The application aims to solve the technical problems that a static model is difficult to adapt to the rapid iteration of a process, a traditional model is easy to miss detection or misjudge, and a detection flow lacks a dynamic optimization mechanism. In order to solve the technical problems, the embodiment of the application provides an intelligent detection system based on multi-mode data fusion, which comprises the following units connected in sequence: the automatic optical detection unit AOI is used for acquiring optical images of the devices and performing preliminary screening, and performing direct processing on the clearly qualified products; The light model detection unit AVI is used for receiving the candidate defect area image output by t