CN-121998923-A - Industrial image anomaly detection method, equipment and storage medium
Abstract
A method for detecting abnormal industrial image includes constructing detection model for detecting abnormal industrial scene image, synthesizing simulated abnormal image from synthesized image of synthesized product, inputting to AE self-encoder for reconstruction, adding memory enhancement module to obtain abnormal fraction image of potential space, outputting reconstructed image by decoder, splicing reconstructed image and synthesized simulated abnormal image, inputting to discrimination network, outputting segmentation image for abnormal detection and positioning, and obtaining abnormal fraction at image level. The invention embeds a multi-scale space-channel refining module in a discrimination network, combines a multi-scale feature fusion and a attention mechanism, strengthens discrimination precision of small target abnormal and complex boundaries, constructs a memory enhancement module in a reconstruction network, improves prototype learning capacity of the network to a normal mode, constructs a memory consistency loss function between a reconstructed image and an original image, and respectively constrains prediction behaviors of the model in a normal area and an abnormal area.
Inventors
- RUAN YADUAN
- ZHOU HONGSHENG
- WANG MIAO
- YU DINGWEN
Assignees
- 南京大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260114
Claims (6)
- 1. An industrial image anomaly detection method is characterized in that a detection model is constructed for anomaly detection of an industrial scene image, and the detection model construction comprises the following steps: Step1, obtaining a normal image of a product in an industrial production scene, and synthesizing the normal image with a random noise image to generate a simulated abnormal image; Step2, reconstructing the simulated abnormal image by taking the self-encoder as a reconstruction network to obtain a reconstructed image; Step3, splicing the simulated abnormal image and the reconstructed image in the channel dimension, and inputting the spliced images into a discrimination network branch, wherein the discrimination network adopts a multi-level architecture based on U-net, each layer of jump connection of the U-net is replaced by a multi-scale space-channel refining module MSCR, a multi-scale feature map output by an encoder is subjected to feature fusion through the multi-scale space-channel refining module MSCR, and then a decoder of a corresponding layer is input to output a segmentation result of the abnormality; The multi-scale space-channel refining module MSCR performs the following steps: step3.1, feature fusion, namely receiving feature graphs of a current layer and an adjacent layer output by an encoder, splicing after up-sampling or down-sampling to unify resolution to form initial fusion features, and performing channel dimension reduction by using 1X 1 convolution; Space path, namely performing convolution operation on the dimension reduction features of Step3.1 by using convolution kernels with three different sizes of 3×3, 5×5 and 7×7 in parallel, and summing the convolution operation results to fuse multi-scale space features; Step3.3, carrying out global average pooling and global maximum pooling operation on the dimension reduction characteristics of step3.1, generating channel attention weights through MLP, obtaining channel attention force diagrams and modeling the dependency relationship among channels; step3.4, adding or fusing the characteristics processed by the space path and the channel path according to elements, adjusting the characteristics to the target channel number through 1X 1 convolution, and outputting the enhanced characteristic diagram to be sent to a decoder corresponding to the current layer; Step4, outputting an abnormal segmentation map at a pixel level by a discrimination network, carrying out local average pooling on the abnormal segmentation map to smooth noise, and taking a global maximum value as an image level abnormal score; step5, performing end-to-end training on the network to obtain a detection model, inputting an image to be detected of the product, performing anomaly detection, positioning anomalies by the anomaly segmentation map, and determining anomaly grades by anomaly scores.
- 2. The industrial image anomaly detection method according to claim 1, wherein Step1 uses a Simplex noise generator to generate a random noise image, and the process of generating a simulated anomaly image comprises generating the Simplex noise image and performing binarization processing to obtain an anomaly mask, randomly sampling a texture source image from a texture data set, extracting anomaly texture features in combination with the anomaly mask, and superimposing the anomaly texture features onto an original normal image using a translucence parameter.
- 3. The industrial image anomaly detection method according to claim 1, wherein in Step2, a memory enhancement module is provided in the reconstruction network, features of the normal image are extracted in advance by an encoder of the reconstruction network, and the memory enhancement module maintains a learnable memory matrix The memory enhancing module utilizes the stored normal mode prototypes to enhance the characteristic Z of the simulated abnormal image, and firstly, the characteristic map is obtained Remodelling into Separating the spatial positions from the feature dimensions, and then calculating cosine similarity between the feature vector of each spatial position and all memory terms: Wherein the method comprises the steps of Is a flattened representation of the spatial dimension of the feature map Z; i represents the space position index in the characteristic diagram, and the value range is [1, H multiplied by W ]; then applying Softmax to the similarity matrix formed by cosine similarity along the memory dimension to obtain the attention distribution Representing the association strength of each position to the memory term, and generating enhanced features through weighted summation of the memory terms: And after the output memory enhanced characteristics are fused with the original encoder output characteristic residual errors, reconstructing a reconstructed image through a decoder.
- 4. The industrial image anomaly detection method according to claim 3, wherein the memory enhancement module outputs the memory enhanced features and also outputs an anomaly score map of a potential space , End-to-end training of the network by constructing a combination of loss functions in Step6, including reconstruction loss, focus loss, and memory consistency loss For constraining normal and abnormal region scores, defined as: Wherein, therein Weight coefficient lost for normal region, and For balancing the loss contribution of the normal region and the abnormal region, Providing a direct supervision signal for reconstructing an anomaly score graph obtained in a network and for memory consistency loss; Is a constant stable to the numerical value, prevents zero removal error, and is defined at the same time Is a true anomaly mask, wherein 0 represents a normal region, 1 represents an anomaly region, and the anomaly mask is used for Is derived from the normal region And an abnormal region Wherein Is an indicator function, returns 1 when the condition is true, otherwise returns 0, binarizes successive exception masks using 0.5 as a threshold, and As a judging condition, binary masks generated by random noise are used as supervision signals; The term constraint normal region outputs a low anomaly score, The term constraint abnormal region outputs a high abnormal score so as to strengthen the judgment boundary of the model in the normal and abnormal regions.
- 5. An electronic device, characterized in that it comprises a processor and a memory, wherein at least one instruction or at least one program is stored in the memory, and the at least one instruction or at least one program is loaded and executed by the processor, so as to implement the industrial image anomaly detection method according to any one of claims 1 to 4.
- 6. A computer-readable storage medium, wherein at least one instruction or at least one program is stored in the computer-readable storage medium, and when the at least one instruction or the at least one program is executed, the industrial image anomaly detection method according to any one of claims 1 to 4 is implemented.
Description
Industrial image anomaly detection method, equipment and storage medium Technical Field The invention belongs to the technical field of computer vision and deep learning, relates to a quality control technology in industrial intelligent manufacturing, and in particular relates to an industrial image anomaly detection method, equipment and a storage medium. Background With the development of intelligent manufacturing, machine vision-based industrial image anomaly detection has become a key technology for guaranteeing product quality. In a practical industrial scenario, abnormal samples (defective products) are often extremely scarce and of various types, resulting in difficulty in obtaining sufficient training data with conventional supervised learning methods. Therefore, an unsupervised anomaly detection method using only normal samples for training is a mainstream research direction. The current mainstream unsupervised method is mainly divided into a reconstruction-based method and a feature embedding-based method, but limitations exist in the methods: first, based on reconstruction methods, such as self-encoders, the hypothesis model cannot reconstruct the unseen outlier region. However, deep neural networks often have too strong generalization capability, which results in that they can also perform a better reconstruction of abnormal areas, i.e. "identity mapping" problems, resulting in missed detection. Furthermore, simple pixel level reconstruction errors are difficult to capture for complex structural defects. Second, feature embedding-based methods rely on pre-trained network extraction features on natural image datasets such as ImageNet. Because the distribution difference between the industrial image and the natural image is large, the direct use of the pre-training features has field deviation. In addition, the prior art has also investigated methods of synthesizing anomalies, attempting to train a discriminant model by generating simulated defects. However, in the conventional synthesis method, if Perlin noise is used, the generated abnormal shape is not natural enough, the directivity artifact is more, the calculation cost is more, the decision boundary obtained by modeling is not compact enough, and the detection capability of small target anomalies such as fine scratches is insufficient. Therefore, how to design an anomaly detection method which can not only synthesize vivid anomalies with high efficiency, but also combine the advantages of reconstruction and discrimination effectively, and accurately locate multi-scale defects is a technical problem to be solved currently. Disclosure of Invention Aiming at the problems of low detection precision and low training efficiency of a small target caused by excessive generalization of a reconstruction model, large deviation in the field of pre-training features and insufficient authenticity of a synthesized anomaly in the existing industrial image anomaly detection technology, the invention provides an anomaly detection method for accurately positioning multi-scale defects, which takes the advantages of synthesis anomaly and reconstruction discrimination into consideration. The technical scheme of the invention is that the industrial image anomaly detection method is used for constructing a detection model for anomaly detection of an industrial scene image, and the detection model construction comprises the following steps: Step1, obtaining a normal image of a product in an industrial production scene, and synthesizing the normal image with a random noise image to generate a simulated abnormal image; Step2, reconstructing the simulated abnormal image by taking the self-encoder as a reconstruction network to obtain a reconstructed image; Step3, splicing the simulated abnormal image and the reconstructed image in the channel dimension, and inputting the spliced images into a discrimination network branch, wherein the discrimination network adopts a multi-level architecture based on U-net, each layer of jump connection of the U-net is replaced by a multi-scale space-channel refining module MSCR, a multi-scale feature map output by an encoder is subjected to feature fusion through the multi-scale space-channel refining module MSCR, and then a decoder of a corresponding layer is input to output a segmentation result of the abnormality; The multi-scale space-channel refining module MSCR performs the following steps: step3.1, feature fusion, namely receiving feature graphs of a current layer and an adjacent layer output by an encoder, splicing after up-sampling or down-sampling to unify resolution to form initial fusion features, and performing channel dimension reduction by using 1X 1 convolution; Space path, namely performing convolution operation on the dimension reduction features of Step3.1 by using convolution kernels with three different sizes of 3×3, 5×5 and 7×7 in parallel, and summing the convolution operation results to fuse multi-scale space fe