CN-121982420-A - Fusion perception type cross-mode three-dimensional anomaly detection method based on wavelet distillation
Abstract
The invention discloses a wavelet distillation-based fusion perception type cross-mode three-dimensional anomaly detection method which comprises the following steps of A, obtaining two-dimensional image data of an industrial product to be detected and three-dimensional point cloud data corresponding to the two-dimensional image data, B, obtaining multi-scale characteristics, C, taking characteristics output by a teacher model as knowledge constraint, D, enabling different mode characteristics to be aligned and fused on a semantic layer, E, and F, realizing automatic detection and judgment of defects of the industrial product through model reasoning by introducing a multi-task joint loss function formed by distillation loss, cross-mode mapping loss and characteristic reconstruction loss. The invention ensures that the model still has stable perceptibility under complex surface structure and weak abnormal scene, enhances the simultaneous characterization capability of local abnormality and global deformation, reduces the interference of noise and background texture on abnormality judgment, and ensures that the model can realize higher reasoning efficiency without additional complex post-processing.
Inventors
- LIU JING
- LIN BO
Assignees
- 西安电子科技大学广州研究院
Dates
- Publication Date
- 20260505
- Application Date
- 20260205
Claims (7)
- 1. A fusion perception type cross-mode three-dimensional anomaly detection method based on wavelet distillation is characterized by comprising the following steps: step A, acquiring two-dimensional image data of an industrial product to be detected and corresponding three-dimensional point cloud data; And B, performing discrete wavelet transformation processing on the data in the step A, decomposing the original data or the initial characteristic representation thereof into characteristic representations of a plurality of different scales and frequency sub-bands, and acquiring multi-scale characteristics comprising low-frequency global structure information and high-frequency local detail information: C, inputting the multi-scale features decomposed in the step B into a teacher model and a student model respectively, extracting corresponding two-dimensional modal features and three-dimensional modal features, and guiding the student model to learn feature layer dimensions by taking the features output by the teacher model as knowledge constraints; step D, constructing a bidirectional cross-modal feature mapping network on the basis of the step C, and realizing mapping from two-dimensional features to three-dimensional feature spaces and from three-dimensional features to two-dimensional feature spaces so as to realize alignment and fusion of different modal features on a semantic level; E, performing end-to-end optimization training on the student model and the cross-modal mapping net sheet by introducing a multi-task joint loss function consisting of distillation loss, cross-modal mapping loss and characteristic reconstruction loss; and F, generating an abnormal response and calculating a final abnormal score based on the difference between the two-dimensional and three-dimensional characteristics output by the student model and the normal characteristic distribution, and realizing automatic detection and discrimination of the defects of the industrial product through model reasoning.
- 2. The method for detecting the cross-modal three-dimensional anomaly based on the fusion perception of wavelet distillation according to claim 1, wherein in the step B, discrete wavelet transformation processing is to carry out multi-scale and multi-frequency decomposition on two-dimensional images and three-dimensional point cloud data so as to enhance the representation capability of a model on industrial defects with different scales; The two-dimensional image input is expressed as: ; Wherein the method comprises the steps of 、 Respectively representing the spatial resolution of the image, Representing the number of channels; the three-dimensional image input is expressed as: Wherein the method comprises the steps of The number of points in the point cloud; Separating the image or shallow features for two-dimensional discrete wavelet transform of the two-dimensional image data to obtain a low frequency subband and a plurality of high frequency subbands: wherein the low frequency sub-bands Mainly preserving global structural information of image, high-frequency sub-band Mainly highlighting edge variation and local details, and helping to characterize micro defects; For three-dimensional point cloud data, mapping the three-dimensional point cloud data to a feature space through a point cloud feature coding function: and then performing discrete wavelet transformation on the point cloud characteristics to obtain multi-scale three-dimensional characteristic sub-bands: through the above processing, the two-dimensional image and the three-dimensional point cloud are uniformly represented as a multi-scale feature set: 。
- 3. the method for detecting the cross-modal three-dimensional anomaly based on the fusion perception of wavelet distillation according to claim 2, wherein in the step C, a teacher model learns a cross-modal characteristic representation with high discrimination from multi-scale characteristics and is used as a knowledge reference for learning by a student model; Multi-scale two-dimensional feature set And multiscale three-dimensional feature set The two-dimensional feature coding network and the three-dimensional feature coding network which are respectively input into the teacher model are characterized in that the feature extraction process is expressed as follows: Wherein, the Representing the characteristics of a two-dimensional teacher, Representing three-dimensional teacher features.
- 4. The method for detecting the cross-modal three-dimensional anomaly based on the fusion perception of wavelet distillation as claimed in claim 3, wherein in the step C, the student model is used for receiving the same multi-scale characteristic input of the teacher model, and the characteristic extraction process is expressed as follows: Wherein the method comprises the steps of And Representing the two-dimensional and three-dimensional features of the student model output, respectively.
- 5. The method for detecting the cross-modal three-dimensional anomaly based on the fusion perception of wavelet distillation according to claim 2, wherein in the step D, a bidirectional cross-modal feature mapping network is used for realizing explicit alignment among different modal features; Taking output characteristics of a student model as input, constructing the following mapping relation: Wherein the method comprises the steps of Representing a cross-modal mapping network, progressively narrowing the inter-modal differences by multi-layer feature projection and nonlinear transformation.
- 6. The method for detecting the cross-modal three-dimensional anomaly based on the fusion perception of wavelet distillation as claimed in claim 2, wherein in the step E, the multi-task joint loss function is defined as: Wherein, the For constraining the consistency of distribution between student features and teacher features, For constraining consistency of cross-modality mapped features, For enhancing the stability of the feature during the transformation.
- 7. The method for detecting the cross-modal three-dimensional anomaly based on the wavelet distillation fusion perception type is characterized in that in the step F, in the reasoning stage, the anomaly response is calculated based on a model which is completed by training, and the two-dimensional anomaly response and the three-dimensional anomaly response are obtained by calculating the difference between the characteristics of students and the characteristics of teachers: then, the two-dimensional abnormal response and the three-dimensional abnormal response are subjected to weighted fusion to obtain a sample-level abnormal score: Wherein the method comprises the steps of Is a weight coefficient. The anomaly score is used for measuring the degree of deviation of the industrial sample to be detected from a normal state, so that defect detection and discrimination are realized.
Description
Fusion perception type cross-mode three-dimensional anomaly detection method based on wavelet distillation Technical Field The invention belongs to the technical field of industrial defect detection, and particularly relates to a fusion perception type cross-mode three-dimensional anomaly detection method based on wavelet distillation. Background Industrial defect detection is a key technical link in the fields of industrial automation and intelligent manufacturing, and the detection result directly relates to product quality stability, production efficiency improvement and manufacturing cost control. In conventional industrial processes, defect detection typically relies on manual visual inspection or rule detection methods based on manual empirical design, such as detection based on threshold, edge, or template matching. The method has the problems of low detection efficiency, strong subjectivity, poor generalization capability and the like in practical application, has higher dependence on experience of detection personnel, and is difficult to adapt to the high-speed, continuous and large-scale detection requirements in modern industrial production. Along with the continuous improvement of the complexity of industrial scenes, the traditional rule-based detection method has the defects of obviously reduced detection performance, easy false detection or missed detection when facing to the conditions of illumination change, surface reflection difference, noise interference, diversified workpiece surface materials and the like, and has difficulty in meeting the requirements of modern industry on high-precision, high-robustness and high-consistency defect detection. In recent years, with the rapid development of artificial intelligence technology, particularly a deep learning method, its application in the field of industrial defect detection is becoming a research hotspot. The detection method based on the deep neural network can overcome the limitation of the traditional artificial feature design to a certain extent by automatically learning the feature representation, and the detection precision is obviously improved. However, the actual landing of deep learning methods in industrial defect detection still faces several key technical bottlenecks. On the one hand, the deep learning model usually depends on a large number of training samples with accurate labeling to obtain good performance, and in industrial production, the defect samples often have the characteristics of low occurrence probability, multiple types, high labeling cost and the like, and a labeling data set with sufficient scale and high quality is difficult to construct, so that the applicability of the supervised learning method in an industrial defect detection scene is severely restricted. On the other hand, in many practical industrial applications, a defect detection method that relies only on a two-dimensional color image (2D image) has difficulty in coping with a complicated production environment. For example, different illumination conditions, shooting angle changes and surface reflection characteristics are easy to introduce visual noise, so that detection results are highly sensitive to imaging conditions, and meanwhile, partial structural or deformation defects are not obvious in color space, and geometric morphological features of the defects are difficult to accurately describe only through two-dimensional images, so that detection accuracy is limited. In this context, the development of three-dimensional sensing technology has made it possible to acquire workpiece surface geometry from three-dimensional point clouds or depth information. Compared with a two-dimensional image, the three-dimensional point cloud data can directly reflect the spatial structure and the morphological change of the target, and has natural advantages in the aspect of describing geometric defects such as pits, bulges and deformation. Therefore, how to effectively integrate texture information of a two-dimensional image and geometric information of a three-dimensional point cloud and construct an industrial defect detection method with both precision and robustness becomes an important technical problem in current research and engineering application. However, there are significant differences between the two-dimensional image and the three-dimensional point cloud in data structure, feature distribution and expression, and it is often difficult to fully mine the complementary information of the two-dimensional image and the three-dimensional point cloud by simple feature stitching or direct alignment. Meanwhile, industrial defects generally have the characteristics of various scales, complex forms and the like, and the characteristic representation of a single scale or a single spatial domain is difficult to comprehensively characterize the defect characteristics. Therefore, the introduction of a feature analysis method capable of simultaneously describi