CN-121482063-B - Robust unsupervised image anomaly detection method based on multi-network cooperative comparison

CN121482063BCN 121482063 BCN121482063 BCN 121482063BCN-121482063-B

Abstract

The invention discloses a robust unsupervised image anomaly detection method based on multi-network cooperative comparison. The method comprises the steps of obtaining images of a pre-screened abnormal-free product as normal images, further carrying out abnormal synthesis operation on the normal images to generate simulated abnormal images and abnormal region mask images, constructing a mixed data set together by the normal images, the simulated abnormal images and the abnormal region mask images, constructing an industrial image abnormal detection model, inputting the mixed data set into the industrial image abnormal detection model for training to obtain a trained industrial image abnormal detection model, inputting the industrial image to be detected into the trained industrial image abnormal detection model to obtain a reconstructed feature map, further carrying out fusion processing on the reconstructed feature map to obtain a pixel-level total abnormal score map, and judging whether the industrial image to be detected is abnormal or not according to the pixel-level total abnormal score map. The invention can concentrate on correcting the real abnormality, thereby greatly reducing the false alarm rate and enhancing the robustness in complex industrial environment.

Inventors

TAN ZHU
CHEN CHAOHUI
GAO HAIDONG
YU YANG

Assignees

浙江科技大学

Dates

Publication Date: 20260508
Application Date: 20260112

Claims (7)

1. A robust unsupervised image anomaly detection method based on multi-network cooperative comparison is characterized by comprising the following steps: s1, acquiring a pre-screened image without abnormal products as a normal image, and further performing abnormal synthesis operation on the normal image to generate a simulated abnormal image and an abnormal region mask image, wherein the normal image, the simulated abnormal image and the abnormal region mask image jointly construct a mixed data set; s2, constructing an industrial image anomaly detection model, and inputting the mixed data set into the industrial image anomaly detection model for training to obtain a trained industrial image anomaly detection model; s3, inputting the industrial image to be detected into a trained industrial image anomaly detection model for reconstruction to obtain a reconstructed feature map, and further carrying out fusion processing on the reconstructed feature map to obtain a pixel-level total anomaly score map; s4, judging whether the industrial image to be detected is abnormal or not according to the pixel-level total abnormal score map; In step S2, the industrial image anomaly detection model includes a sentinel network, a student network, a bottleneck module and a teacher network with frozen network parameters, normal images in the mixed data set and corresponding simulated anomaly images are input into the teacher network and the sentinel network together, the teacher network and the sentinel network output respective normal images and multi-scale feature images of the simulated anomaly images respectively, loss functions of the sentinel network are calculated according to the multi-scale feature images of the normal images output by the teacher network, the multi-scale feature images of the simulated anomaly images output by the sentinel network and the anomaly area mask images in the mixed data set, then the normal images and the multi-scale feature images of the simulated anomaly images output by the teacher network and the sentinel network are input into the bottleneck module, the compressed feature images of the teacher network and the sentinel network are processed to obtain the compressed feature images of the teacher network and the sentinel network, the compressed feature images of the teacher network and the sentinel network are input into the student network, the student network respectively output the reconstructed feature images under the guidance of the teacher network and the sentinel network, and the whistle network are updated according to the reconstructed feature images under the guidance of the teacher network and the whistle network, and the reconstructed feature images under guidance of the sentinel network and the multi-scale feature images of the image output by the teacher network and the multi-scale feature images of the sentinel network and the abnormal area mask images are calculated, and the loss functions of the general loss functions of the teacher network is calculated and the general loss functions are calculated and the general loss is obtained by the weighted and the general loss functions of the student network.
2. The robust unsupervised image anomaly detection method based on multi-network cooperative comparison of claim 1, wherein the step S1 specifically comprises the following steps: s1.1, acquiring a pre-screened image without abnormal products as a normal image, and extracting a foreground region of the normal image to construct a foreground mask image; s1.2, randomly generating a noise map and performing binarization processing to obtain a noise mask of an abnormal initial shape, and then multiplying the foreground mask image and the noise mask of the abnormal initial shape element by element to obtain an abnormal region mask image; S1.3, randomly acquiring a texture image from a preset texture data set, and fusing the texture image, a normal image and an abnormal region mask image to obtain a simulated abnormal image; S1.4, the normal image, the simulated abnormal image and the pixel-level truth value mask are combined to form a mixed data set.
3. The robust unsupervised image anomaly detection method based on multi-network collaborative comparison is characterized in that the bottleneck module comprises a multi-scale feature fusion unit and a single-class embedded unit which are sequentially connected, a multi-scale feature image of a normal image and a multi-scale feature image of a simulated anomaly image which are respectively output by a teacher network and a sentry network are input into the multi-scale feature fusion unit to be subjected to multi-scale feature fusion respectively, a teacher network fusion feature image and a sentry network fusion feature image are obtained, and then the teacher network fusion feature image and the sentry network fusion feature image are input into the single-class embedded unit to be subjected to feature extraction and compression to obtain compression feature images of the teacher network and the sentry network; the single type embedded unit is ResNet residual blocks.
4. The robust unsupervised image anomaly detection method based on multi-network cooperative comparison of claim 1, wherein the loss function of the industrial image anomaly detection model is formed by weighted summation of the loss function of a sentinel network and the loss function of a student network, and is specifically set according to the following formula: L Total =L Rec +λ 2 L CNC wherein L Total represents a loss function of the industrial image anomaly detection model, L Rec represents a loss function of the student network, L CNC represents a loss function of the sentinel network, and λ 2 represents a ratio parameter for balancing the sentinel network and the student network losses; the loss function of the sentinel network is set according to the following formula: L CNC =∑ i=1 3 (L C i +L D i ) L C i ={∑ h,w ((1-M i (h,w))·|D CNC i (h,w)|)}/{∑ h,w (1-M i (h,w))+ε} L D i ={∑ h,w (M i (h,w)·max(0,τ-D CNC i (h,w)))}/{∑ h,w M i (h,w)+ε} D CNC (h,w)=1–{(F Q (h,w)·F T (h,w))/(||F Q (h,w)||·||F T (h,w)||)} Wherein L C i represents a normal region consensus loss of the ith feature layer, L D i represents an abnormal region divergence loss of the ith feature layer, (h, w) represents position coordinates of the h-th row and the w-th column in the feature map, M i represents an abnormal region mask image on the ith feature layer, a pixel (h, w) belongs to a normal region when M (h, w) =0, a pixel (h, w) belongs to an abnormal region when M (h, w) =1, D CNC i (h, w) represents a feature distance map of the ith feature layer, epsilon represents a positive parameter for preventing denominator from being zero, τ represents a preset distance threshold, F Q (h, w) represents a multi-scale feature map of a simulated abnormal image output by the sentinel network, F T (h, w) represents a multi-scale feature map of a normal image output by the teacher network, and L2 norm for normalizing the factors; the loss function of the student network is set according to the following formula: L Rec =∑ i (L UAR i +λ 1 L g i ) L UAR =f UAR (P TS(In/Ia) ,P Q(In) )+f UAR (P QS(In/Ia) ,P T(In) ) f UAR (P A ,P B )=(1/N i )[∑ k=1 Ni ((||P k A -P k B ||²)/(exp(βH(P k A ))+exp(βH(P k B ))))+β∑ k=1 Ni ((H(P k A )+H(P k B )))] L g i =1-(SG(φ(F T i ))·φ(F S i ))/(||SG(φ(F T i ))||·||φ(F S i )||) Wherein L UAR i represents the uncertainty perceived reconstruction loss of the ith feature layer, L g i represents the global cosine similarity loss of the ith feature layer, lambda 1 represents the scale parameters for balancing the uncertainty perceived reconstruction loss and the global cosine similarity loss, and P A and P B represent the two compared components, The probability distribution map converted from the feature map, f UAR (P A ,P B ) represents an uncertainty perception reconstruction function, P TS(In/Ia) represents probability distribution of output features after the student network S reconstructs the input normal image I n and the abnormal image I a under the guidance of the teacher network T, P Q(In) represents probability distribution obtained by extracting features and converting the features when the sentinel network Q receives the input normal image, P QS(In/Ia) represents probability distribution obtained by the student network S reconstructing the normal image I n and the abnormal image I a of the input image under the guidance of the sentinel network Q, outputting probability distribution of the feature, P T(In) represents probability distribution obtained by extracting the feature and converting the feature when the teacher network T receives normal image input, N i represents the size of the space dimension of the ith feature map, β denotes a positive parameter for controlling sensitivity to an entropy value, H (P k ) denotes shannon entropy of a kth pixel position, Φ (∈) denotes a flattening operation, SG (∈) denotes a stop gradient operation, and L2 norm for a normalization factor.
5. The robust unsupervised image anomaly detection method based on multi-network cooperative comparison of claim 1, wherein the teacher network and the sentry network are the same in network architecture and adopt WIDERESNET-50 architecture.
6. The robust unsupervised image anomaly detection method based on multi-network collaborative comparison is characterized in that step S3 is specifically that an industrial image to be detected is input into a trained industrial image anomaly detection model, the industrial image to be detected is respectively input into a teacher network and a trained whistle network to conduct feature extraction to obtain respective multi-scale feature images, the respective multi-scale feature images are further input into a bottleneck module to be respectively processed to obtain compression feature images of the teacher network and the whistle network, the compression feature images of the teacher network and the whistle network are input into a student model to be processed to obtain a reconstruction feature image, then cosine distances between the multi-scale feature images and the corresponding reconstruction feature images are calculated to obtain corresponding original anomaly score images, and after bilinear interpolation processing is conducted on the original anomaly score images, average value fusion is conducted on pixels at corresponding positions to obtain a pixel-level total anomaly score image.
7. The robust unsupervised image anomaly detection method based on multi-network cooperative comparison of claim 1, wherein the step S4 is specifically implemented by comparing a maximum pixel value in a pixel-level total anomaly score map with a preset threshold, if the maximum pixel value in the pixel-level total anomaly score map is greater than the preset threshold, the industrial image to be detected is an anomaly image, and if the maximum pixel value in the pixel-level total anomaly score map is less than or equal to the preset threshold, the industrial image to be detected is a normal image.

Description

Robust unsupervised image anomaly detection method based on multi-network cooperative comparison Technical Field The invention relates to the field of industrial image detection, in particular to a robust unsupervised image anomaly detection method based on multi-network collaborative comparison. Background In the fields of modern industrial manufacturing, safety monitoring, medical diagnosis and the like, an automatic abnormality detection technology is important. It aims to automatically identify data samples which are inconsistent with the normal mode, such as scratches, flaws on the surface of a product or lesion areas in medical images, through an algorithm. Among them, unsupervised anomaly detection (UIAD) is an important research direction, and its core advantage is that model training is only performed using normal (defect-free) image data, thereby avoiding the high cost and labor input required for collecting and labeling a large number of diverse anomaly samples. Currently, the UIAD method based on feature reconstruction is the main technical route in this field. The rationale for this type of approach is to learn the "standard" feature representation and reconstruction capabilities of normal samples using a deep learning model (e.g., self-encoder). In the reasoning stage, when an image to be measured is input, if the image contains an abnormal region, the model cannot accurately reconstruct the image into a normal mode, so that a large reconstruction error is generated at an abnormal position. By measuring such reconstruction errors, the system can identify and locate anomalies. However, such methods of the prior art generally suffer from one or more technical drawbacks, firstly, the reliance on a single and unreliable reconstruction error indicator, resulting in poor detection robustness. Conventional reconstruction models perform poorly in the face of visually complex scenes. On the one hand, for some "difficult normal samples" (e.g. object surfaces with complex textures, natural variations or strong specular reflections), the model itself may produce high reconstruction errors, which are very prone to misinformation by the system as anomalies (i.e. high false alarm rates). On the other hand, for some "minor anomalies" (e.g., minor scratches, minor discoloration, or minor structural deviations), the disturbance caused in the feature space is very small, resulting in a reconstruction error that is comparable to the normal background, thus causing system omission (i.e., high false negative). This dependence on a single index makes it difficult for existing methods to balance robustness to normal changes with sensitivity to minor anomalies. Second, model training lacks explicit discriminant supervision, resulting in inadequate feature space discrimination. In the conventional training paradigm, the encoder receives only indirect, non-discriminative supervisory signals, with the sole objective of minimizing reconstruction errors for normal samples. The model is not explicitly guided to learn how to "distinguish" between normal and abnormal features. This results in the final learned feature space being suboptimal for distinguishing between normal and abnormal modes (suboptimal). When the normal mode and the abnormal mode are closer in characteristics, the model is difficult to make an accurate judgment. Therefore, there is a strong need in the art for a new unsupervised anomaly detection method that can overcome the dependence on single reconstruction errors and learn more discriminative feature representations, thereby enabling accurate and robust detection of various anomalies (especially small anomalies) in complex industrial environments. Disclosure of Invention In order to solve the problems in the background art, the invention provides a robust unsupervised image anomaly detection method based on multi-network collaborative comparison. The technical scheme adopted by the invention is as follows: The invention comprises the following steps: s1, acquiring a pre-screened image without abnormal products as a normal image, and further performing abnormal synthesis operation on the normal image to generate a simulated abnormal image and an abnormal region mask image, wherein the normal image, the simulated abnormal image and the abnormal region mask image jointly construct a mixed data set; s2, constructing an industrial image anomaly detection model, and inputting the mixed data set into the industrial image anomaly detection model for training to obtain a trained industrial image anomaly detection model; s3, inputting the industrial image to be detected into a trained industrial image anomaly detection model for reconstruction to obtain a reconstructed feature map, and further carrying out fusion processing on the reconstructed feature map to obtain a pixel-level total anomaly score map; and S4, judging whether the industrial image to be detected is abnormal or not according to the pixel-level to