CN-121999279-A - Image detection method and computing device

CN121999279ACN 121999279 ACN121999279 ACN 121999279ACN-121999279-A

Abstract

The embodiment of the specification provides a method and computing equipment for detecting images, wherein the method comprises the steps of dividing a first image to be detected into a plurality of image blocks, carrying out frequency domain analysis on each image block to obtain each frequency domain analysis result corresponding to each image block, determining a plurality of first image blocks meeting preset screening conditions from the plurality of image blocks based on each frequency domain analysis result, wherein the preset screening conditions at least comprise that the frequency domain analysis result indicates that the richness of the contained high-frequency features is the first N, the high-frequency features correspond to areas in the image blocks, the gray level change amplitude of which exceeds a specified threshold, determining the image features corresponding to the first image based on the plurality of first image blocks through an extraction network, and determining whether at least the first image is a detection result of a generated image or not through a detection network based on at least the image features so as to accurately detect the generated image.

Inventors

ZHANG YI
LI JIANSHU
GAO WEIZE
DENG WENZHONG
YAO WEIBIN

Assignees

蚂蚁区块链科技(上海)有限公司

Dates

Publication Date: 20260508
Application Date: 20260107

Claims (10)

1. A method of detecting an image, comprising: dividing a first image to be detected into a plurality of image blocks; carrying out frequency domain analysis on each image block to obtain each frequency domain analysis result corresponding to each image block; Determining a plurality of first image blocks meeting preset screening conditions from the plurality of image blocks based on each frequency domain analysis result, wherein the preset screening conditions at least comprise that the frequency domain analysis result indicates that the richness of the included high-frequency features is the first N, and the high-frequency features correspond to areas in which the gray level variation amplitude in the image blocks exceeds a specified threshold; Determining image features corresponding to the first image based on the plurality of first image blocks through an extraction network; a detection result is determined by a detection network based at least on the image features, wherein the detection result indicates at least whether the first image is an image.
2. The method of claim 1, wherein the first image is a facial image, and the preset filtering condition further comprises that the frequency domain analysis result indicates that the richness of the included high-frequency features is the last M.
3. The method of claim 1, wherein the determining the image characteristics of the first image comprises: Processing each first image block through a preset feature enhancement algorithm to obtain each enhanced image block, wherein the preset feature enhancement algorithm is used for enhancing high-frequency features in the image; And obtaining the image characteristics of the first image based on each enhanced image block through the extraction network.
4. The method of claim 1, wherein each frequency domain analysis result comprises a set of DCT coefficients corresponding to a respective image block; the determining a plurality of first image blocks meeting the preset screening conditions comprises the following steps: Determining a fake sensitivity value corresponding to each image block based on the DCT coefficient set corresponding to each image block and a preset fake sensitivity formula, wherein the fake sensitivity value is positively correlated with the richness of the high-frequency features in the image block; And determining N image blocks with front fake sensitivity values from the image blocks based on the fake sensitivity values corresponding to the image blocks, and taking the N image blocks with front fake sensitivity values as the first image blocks.
5. The method of any of claims 1-4, further comprising, prior to said determining a detection result: Determining semantic features of the first image at least through an encoder corresponding to a designated image processing model based on the first image, wherein each sample image in a training set of the designated image processing model is a real image; the determining the detection result comprises the following steps: and combining the image features and the semantic features, and determining the detection result through the detection network.
6. The method of claim 5, wherein the specified image processing model is an image reconstruction model; The determining the semantic features of the first image includes: and obtaining the semantic features through an encoder of the image reconstruction model based on the first image.
7. The method of claim 5, wherein the specified image processing model is an image generation model; The determining the semantic features of the first image includes: Processing the first image with an encoder corresponding to the image generation model to obtain a first potential encoding of the first image; Generating a model through the image based on the first potential code to obtain a generated image; the semantic features of the first image are determined based on image differences between the generated image and the first image and/or encoding differences between the first potential encoding and a potential encoding distribution corresponding to the image generation model.
8. The method of claim 5, wherein the combining the image features and the semantic features, by detecting a network, determines a detection result comprising: Fusing the image features and the semantic features to obtain fused features; and based on the fusion characteristics, obtaining the detection result through a detection network.
9. The method of claim 5, wherein each sample image in the training set of the specified image processing model is a real face image and the first image is a face image.
10. A computing device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-9.

Description

Image detection method and computing device Technical Field The present disclosure relates to the field of artificial intelligence, and in particular, to a method and computing device for detecting images. Background With the development of deep learning technology, it is becoming more realistic to generate images (hereinafter referred to as generated images) generated by generation models such as a countermeasure network (GAN) and a Latent Diffusion Model (LDM). These generative models can generate highly simulated face images or videos, which are generally difficult to identify by naked eyes, so that users cannot judge the authenticity of network content, and threatens the content security and the authenticity of the network environment. Researchers have found that GAN et al generative models have limited ability to reconstruct high frequency details of images, often producing periodic grid-like artifacts (i.e., up-sampling artifacts, up-SAMPLING ARTIFACTS). For this reason, an image detection method based on frequency domain anomalies is proposed in the related art for identifying the authenticity of an image. Such methods convert an image to a frequency domain by fourier transform (FFT) or Discrete Cosine Transform (DCT) to obtain a spectrogram, then analyze the spectrogram, extract its frequency domain statistics features (e.g., including energy distribution, peak features, etc.) from the spectrogram, and then train a classifier (e.g., SVM) using the frequency domain statistics features extracted from the spectrogram of the real image, the frequency domain statistics features extracted from the spectrogram of the generated image, and their respective labels indicating authenticity of the image, so that the two classifiers map the frequency domain statistics features (e.g., energy distribution, peak features, etc.) in the spectrogram to authenticity-class labels. Subsequently, the authenticity of each image is identified based on a spectrogram obtained by FFT or DCT processing of each image through the trained classifier. However, the above image detection method is highly dependent on frequency domain traces such as fixed pattern artifacts left by a specific generation model (e.g., early GAN). When the generation model is upgraded (e.g., a smoother up-sampling method is used instead of transpose convolution) or a new image generation technique is present (e.g., a diffusion model), these particular frequency domain traces (e.g., periodic grid-like artifacts) may disappear or change (i.e., the frequency domain statistics extracted from the spectrogram of the generated image may also disappear or change), thereby resulting in detection failure. In view of this, there is a need for an improved method of detecting images to achieve accurate detection of the generated images. Disclosure of Invention One or more embodiments of the present disclosure provide a method and computing device for detecting images to enable accurate detection of generated images. According to a first aspect, there is provided a method of detecting an image, comprising: dividing a first image to be detected into a plurality of image blocks; carrying out frequency domain analysis on each image block to obtain each frequency domain analysis result corresponding to each image block; Determining a plurality of first image blocks meeting preset screening conditions from the plurality of image blocks based on each frequency domain analysis result, wherein the preset screening conditions at least comprise that the frequency domain analysis result indicates that the richness of the included high-frequency features is the first N, and the high-frequency features correspond to areas in which the gray level variation amplitude in the image blocks exceeds a specified threshold; Determining image features corresponding to the first image based on the plurality of first image blocks through an extraction network; a detection result is determined by a detection network based at least on the image features, wherein the detection result indicates at least whether the first image is an image. According to a second aspect, there is provided an apparatus for detecting an image, comprising: A segmentation module configured to segment a first image to be detected into a plurality of image blocks; the obtaining module is configured to perform frequency domain analysis on each image block to obtain each frequency domain analysis result corresponding to each image block; the first determining module is configured to determine a plurality of first image blocks meeting preset screening conditions from the plurality of image blocks based on each frequency domain analysis result, wherein the preset screening conditions at least comprise that the frequency domain analysis result indicates that the richness of the included high-frequency features is the first N, and the high-frequency features correspond to areas in which the gray level variation amplitude in the