CN-121998971-A - Multi-mode large-model automobile part defect detection method

CN121998971ACN 121998971 ACN121998971 ACN 121998971ACN-121998971-A

Abstract

The application discloses a multi-modal large model automobile part defect detection method, which relates to the technical field of part defect detection, and comprises the steps of inputting a real automobile part image and a visual cue vector into a multi-modal large model to obtain image characteristic data; inputting a text prompt vector and task related category labels into a multi-modal large model to obtain text feature data, matching the image feature data with the text feature data to obtain predicted defect data, comparing the predicted defect data with real defect data based on a loss function to obtain predicted error data, calculating a visual prompt vector based on the predicted error data to obtain gradient data, correcting the visual prompt vector according to the gradient data to obtain a defect detection model, obtaining image data of an automobile part to be identified, inputting the image data into the defect detection model to obtain defect identification data and outputting the defect identification data. The application has the effect of improving the defect recognition accuracy.

Inventors

ZHANG JIANXIN

Assignees

湖南大学

Dates

Publication Date: 20260508
Application Date: 20260316

Claims (8)

1. The defect detection method for the multi-mode large-model automobile parts is characterized by comprising the following steps of: inputting a real part image and a visual cue vector of an automobile into a multi-mode large model, and performing data processing on the input part image by the multi-mode large model based on the input visual cue vector to obtain image characteristic data; Inputting the text prompt vector and the task related class label into a multi-modal large model, and performing data processing on the class label by the multi-modal large model based on the text prompt vector to obtain text characteristic data; matching the extracted image characteristic data with text characteristic data to obtain predicted defect data of the image of the automobile part; Acquiring real defect data of a part image, comparing the predicted defect data obtained by judgment with the real defect data based on a built-in loss function, and calculating to obtain predicted error data; calculating the visual cue vector based on the prediction error data to obtain gradient data of the corresponding visual cue vector, correcting the visual cue vector according to the gradient data until the corrected cue vector can accurately assist the multi-mode large model in identifying defects of the automobile parts, and marking the multi-mode large model with perfect training as a defect detection model; and acquiring image data of the automobile part to be identified, inputting the image data into a defect detection model, identifying the image data by the defect detection model, obtaining defect identification data and outputting the defect identification data.
2. The method for detecting defects of a multi-modal large model vehicle part of claim 1, wherein the inputting the real part image and the visual cue vector of the vehicle into the multi-modal large model, the multi-modal large model performing data processing on the input part image based on the input visual cue vector to obtain image feature data, comprises: initializing a learnable visual cue vector to obtain an initial visual vector; image segmentation is carried out on the input part image to obtain an image block sequence; Splicing the initial visual vector and the image block sequence to obtain composite image data; and inputting the composite image data into a visual encoder, guiding the attention of the encoder by the visual encoder according to the initial visual vector to obtain a guided guide image, and extracting image features of the guide image to obtain guided image feature data.
3. The method for detecting defects of a multi-modal large model vehicle component as set forth in claim 2, wherein said inputting the composite image data into a visual encoder, the visual encoder directing the attention of the encoder according to the initial visual vector, obtaining a directed guide image, and performing image feature extraction on the guide image to obtain directed image feature data comprises: Marking the image block sequence according to the initial vision vector to obtain an important distribution diagram of the image block sequence; Extracting image features of non-key areas in the important distribution diagram to obtain basic feature data; extracting image features of key areas in the important distribution areas to obtain second feature data; And screening the second characteristic data by taking the basic characteristic data as a reference, filtering to obtain image characteristics which are exclusive to the key areas, and marking the image characteristics as the image characteristic data of the guide image.
4. The method for detecting defects of a multi-modal large model automobile part of claim 1, wherein the inputting the text prompt vector and the task related class label into the multi-modal large model, the multi-modal large model performing data processing on the class label based on the text prompt vector to obtain text feature data comprises the following steps: Initializing the text prompt vector to obtain an initial text vector, and matching and splicing the initial text vector and the category label to obtain composite label data; the text encoder refines and supplements the category labels according to the initial text vector in the composite label data to obtain detailed label data; And extracting text features of the detailed labels to obtain text feature data.
5. The method for detecting defects of a multi-modal large model vehicle part as set forth in claim 1, wherein said matching the extracted image feature data with the text feature data to obtain predicted defect data of the image of the vehicle part comprises: Matching the image characteristic data based on the text characteristic data, and determining the similarity between each text characteristic data and the image characteristic data to obtain similar data; And taking the class label corresponding to the text characteristic data with similar data as the defect of the corresponding part image to obtain the predicted defect data of the corresponding automobile part image.
6. The method for detecting defects of a multi-modal large model vehicle component as set forth in claim 1, wherein said obtaining actual defect data of the part image and comparing the predicted defect data obtained by the judgment with the actual defect data based on the built-in loss function, and calculating the predicted error data comprises: Comparing the predicted defect data with the real defect data to determine whether the predicted defect data of the part image data has the real defect data; If true defect data are judged to exist, comparing similar data in the predicted defect data, and determining whether the similar data are larger than a built-in evaluation threshold value or not; if the similar data is smaller than the built-in evaluation threshold or the true defect data does not exist, performing error calculation on the predicted defect data and the true defect data based on the built-in loss function to obtain predicted error data.
7. The method for detecting defects of a multi-modal large model vehicle component as set forth in claim 6, wherein if true defect data is determined to exist, comparing similar data in the predicted defect data to determine whether the similar data is greater than a built-in evaluation threshold, further comprising: If the similarity data is judged to be larger than the built-in evaluation threshold value, marking the initial vision vector corresponding to the part image to obtain marking vector data; judging the number of defects of the part image, and if the part image has only real defects, not correcting the marking vector data; if the defect is provided with a plurality of defects, marking the non-real defect in the plurality of predicted defects as defect data to be corrected; Comparing the defect data to be corrected with the real defect data, and determining the data value of the index item with similarity between the defect data to be corrected and the real defect data to obtain the index data to be corrected; And carrying out data prediction on the marking vector data based on the action relation between the prompt item and the index item in the multi-mode large model and the index data to be corrected of the corresponding index item to obtain corresponding pseudo-gradient data, and regulating the marking vector data based on the pseudo-gradient data.
8. The method for detecting defects of a multi-modal large model vehicle component as set forth in claim 1, wherein the computing the visual cue vector based on the prediction error data to obtain gradient data corresponding to the visual cue vector, and correcting the visual cue vector based on the gradient data comprises: When the error exists between the predicted defect data and the real defect data, determining a visual cue vector corresponding to the error according to the index item with the difference value, and marking the visual cue vector as a cue item; and acquiring the action relation between the prompt items and the index items in the multi-mode large model, and carrying out data prediction on the prompt items according to the action relation and the prediction error data of the index items to obtain gradient data.

Description

Multi-mode large-model automobile part defect detection method Technical Field The application relates to the technical field of defect detection of parts, in particular to a defect detection method of a multi-mode large-model automobile part. Background In modern automotive manufacturing, the quality of automotive parts is directly related to the performance, safety and reliability of the whole vehicle. With the rapid development of the automobile industry, the requirements of consumers on the quality of automobiles are increasingly improved, and the quality control of parts by automobile manufacturers is promoted to be more strict. In the production process of automobile parts, various defects are inevitably generated due to the comprehensive influence of various factors such as raw material characteristics, fluctuation of processing technological parameters, equipment abrasion and the like, wherein the surface defects are particularly common and critical. These defects not only can influence the aesthetic feeling of the appearance of the parts, but also can cause the functional failure of the parts, thereby causing potential safety hazards and shortening the service life of the products. At present, a widely-used machine vision system mostly only relies on optical imaging to acquire two-dimensional image information of the surface of a part for defect analysis. Although the high-resolution camera can capture fine surface details, when facing complex actual working conditions, such as uneven illumination, strong reflection, shadow shielding and the like, the simple vision-based feature extraction is easy to be interfered, so that part of defects cannot be accurately identified or normal textures are wrongly judged as defects, and improvement exists. Disclosure of Invention In order to improve the defect detection and identification accuracy, the application provides a multi-mode large-model automobile part defect detection method. The application provides a multi-mode large-model automobile part defect detection method, which adopts the following technical scheme: The defect detection method for the multi-mode large-model automobile parts comprises the following steps: inputting a real part image and a visual cue vector of an automobile into a multi-mode large model, and performing data processing on the input part image by the multi-mode large model based on the input visual cue vector to obtain image characteristic data; Inputting the text prompt vector and the task related class label into a multi-modal large model, and performing data processing on the class label by the multi-modal large model based on the text prompt vector to obtain text characteristic data; matching the extracted image characteristic data with text characteristic data to obtain predicted defect data of the image of the automobile part; Acquiring real defect data of a part image, comparing the predicted defect data obtained by judgment with the real defect data based on a built-in loss function, and calculating to obtain predicted error data; calculating the visual cue vector based on the prediction error data to obtain gradient data of the corresponding visual cue vector, correcting the visual cue vector according to the gradient data until the corrected cue vector can accurately assist the multi-mode large model in identifying defects of the automobile parts, and marking the multi-mode large model with perfect training as a defect detection model; and acquiring image data of the automobile part to be identified, inputting the image data into a defect detection model, identifying the image data by the defect detection model, obtaining defect identification data and outputting the defect identification data. Preferably, initializing a learnable visual cue vector to obtain an initial visual vector; image segmentation is carried out on the input part image to obtain an image block sequence; Splicing the initial visual vector and the image block sequence to obtain composite image data; and inputting the composite image data into a visual encoder, guiding the attention of the encoder by the visual encoder according to the initial visual vector to obtain a guided guide image, and extracting image features of the guide image to obtain guided image feature data. Preferably, the image block sequence is marked according to the initial visual vector to obtain an important distribution diagram of the image block sequence; Extracting image features of non-key areas in the important distribution diagram to obtain basic feature data; extracting image features of key areas in the important distribution areas to obtain second feature data; And screening the second characteristic data by taking the basic characteristic data as a reference, filtering to obtain image characteristics which are exclusive to the key areas, and marking the image characteristics as the image characteristic data of the guide image. Preferably, initializing the text prompt