CN-121982501-A - Method and system for analyzing interpretability of image counterfeiting detection model

CN121982501ACN 121982501 ACN121982501 ACN 121982501ACN-121982501-A

Abstract

The invention relates to the technical field of computer vision and discloses an interpretation analysis method and system of an image forgery detection model, wherein the method comprises the steps of extracting middle layer characteristics of the image forgery detection model to be interpreted, generating semantic characteristic diagrams based on nonnegative matrix decomposition of sparsity constraint, performing characteristic importance pre-screening on the semantic characteristic diagrams to obtain important characteristic diagrams, positioning a high-activation area and extracting image blocks aiming at each important characteristic diagram, analyzing dominant frequency components and bandwidths of the image blocks corresponding to the high-activation area, constructing a band-stop filter in a frequency domain of an original image, generating disturbance images, giving initial weights to the characteristics based on decision distances, and carrying out grouping fusion on the characteristics according to signs of the initial weights to respectively generate final visible significance evidence diagrams supporting forgery and reality. The method can effectively improve the interpretability of the image counterfeiting detection model by combining the nonnegative matrix factorization and the frequency domain disturbance shielding method.

Inventors

YANG HUI
LIU SHIYU
KANG RONGBAO
RAO ZHIHONG
LIU FANG
LIU ZHENGJUN
ZHANG WENBO

Assignees

中国电子科技集团公司第三十研究所

Dates

Publication Date: 20260505
Application Date: 20260408

Claims (10)

1. An image falsification detection model interpretability analysis method, comprising: Extracting middle layer characteristics of an image counterfeiting detection model to be explained, and generating a semantic feature map based on nonnegative matrix factorization of sparsity constraint; feature importance pre-screening is carried out on the semantic feature images, so that important feature images are obtained; For each important feature map, locating a high-activation region, extracting an image block, analyzing dominant frequency components and bandwidths of the image block corresponding to the high-activation region, constructing a band-stop filter in a frequency domain of an original image, generating a disturbance image, and giving initial weights to the features based on decision distances; and carrying out grouping fusion on the characteristics according to the signs of the initial weights, and respectively generating final evidence diagrams supporting counterfeiting and real visual saliency.
2. The method for analyzing the interpretability of the image forgery detection model according to claim 1, wherein the extracting the intermediate layer features of the image forgery detection model to be interpreted and generating the semantic feature map based on the non-negative matrix factorization of the sparsity constraint includes: intermediate layer output characteristic tensor based on image forgery detection model to be interpreted In the space dimension will Expanded into a matrix Wherein , 、 And The height, width and channel number of the feature map are respectively represented, Representing the real number domain; pair matrix Transposed matrix of (a) By adding Nonnegative matrix factorization of sparsity constraint to obtain a base matrix Sum coefficient matrix : ≥0, ≥0 Wherein, the Representing an optimized objective function; the Frobenius norm of a matrix, i.e. the square root of the sum of the squares of all elements of the matrix; Representing a matrix Norms, i.e. the sum of the absolute values of the elements of the matrix, base matrix Each column of (2) is a semantic basis, coefficient matrix A global activation of a semantic base for each row of the database; To be applied to coefficient matrix Upper part of the cylinder The weight coefficient of the regularization term is used for controlling the sparsity intensity; Representing the number of base vectors; matrix coefficients Is reshaped into a space diagram to obtain A set of semantic feature graphs: Wherein, the Is the first Semantic feature graphs consisting of coefficient matrices Is the first of (2) The remodeling of the rows is carried out to obtain, 。
3. The method for analyzing the interpretability of the image falsification detection model according to claim 2, wherein after the generating the semantic feature map by the nonnegative matrix factorization based on the sparsity constraint, the method further comprises the step of visualizing the semantic base: collecting reference images Inputting each image in the image database into the same image forgery detection model, extracting the same intermediate layer characteristics, and calculating the semantic feature map of each image to be matched with the first image in each spatial position Dot product of each semantic base to obtain the semantic base in the reference image set A response chart on the table; For the first Binarizing a response map of the semantic base by a response threshold, setting a value to 1 when the response is greater than the response threshold, otherwise, setting 0 to obtain an activation mask of the semantic base on the image If the reference image set Providing pixel-level semantic annotation, then for each semantic concept Computing an activation mask Semantic tags Cross-over ratio of (C) : At the reference image set Cumulatively calculating the cross ratio of the activation mask and each semantic tag on all images of (1) to obtain the first Individual semantic bases and semantic concepts Is of the overall cross-over ratio of (C) : Wherein, the Is the size of the set of reference images, And Respectively the first An activation mask and a semantic tag for the sheet image; Taking the integral cross-over ratio The highest semantic label or labels are used as final semantic labels, and each semantic base is assigned a semantic label related to the semantic label, and the semantic labels are used for subsequent interpretation of the specific visual anomaly type corresponding to the highlighted area in the final saliency map.
4. The method for analyzing the interpretability of the image falsification detection model according to claim 2, wherein the performing feature importance pre-screening on the semantic feature map to obtain an important feature map comprises: On the verification set, for each semantic feature graph Calculating an average activation intensity vector Original predictive probability vector of fake image with image fake detection model The average activation intensity vector For describing the average activity of a certain semantic feature on all images of the verification set, the original predictive probability vector The original judgment confidence is used for describing whether each image in the verification set is forged or not by the image forging detection model; For the first in the verification set Sheet image Corresponding first Individual semantic feature graphs Average activation intensity vector of (a) The calculation formula is as follows: calculating an average activation intensity vector And the original predictive probability vector Pearson correlation coefficient of (c) As a reference for feature importance: Wherein, the The covariance is represented by the sign of the covariance, Representing standard deviation, pearson correlation coefficient The closer the absolute value of (2) is to 1, the stronger the correlation between the average activation intensity of the corresponding feature and the model decision confidence is; Reservation of Form a set of important feature graphs Wherein, the method comprises the steps of, In order to set the threshold value in advance, , To meet the requirements of Is used for the number of semantic features of (a), 。
5. The method of claim 4, wherein locating high activation areas and extracting image blocks for each significant feature map comprises: Important feature map Selecting all coordinates with the activation value larger than the activation threshold value, or directly selecting a connected region containing the maximum activation value as a high activation region; calculating the central coordinate or circumscribed rectangle of the high activation area, and defining the upper left corner coordinate as the rectangular area The lower right corner coordinates are Due to the important feature map Is the original image Downsampled dimensions, the coordinates are mapped back to the original image Is of the dimensions: Wherein, the Is the original image Is positioned at the left upper corner of the lens, Is the original image Is provided at the lower right-hand corner of the pair, And (3) with Respectively original images Height and width of (2); from the original image Cutting out the rectangular region to obtain image blocks 。
6. The method for analyzing the interpretability of an image falsification detection model of claim 5, wherein the analyzing the dominant frequency component and the bandwidth of the image block corresponding to the high activation area comprises: For image blocks After gray processing, two-dimensional discrete Fourier transform is performed: Wherein, the And Respectively image blocks Is arranged between the width and the height of the steel plate, Is the frequency domain coordinate, and complex frequency spectrum is obtained after the centering operation ; To analyze the energy distribution, a power spectrum is calculated : In the power spectrum In the method, a central point representing a direct current component is ignored, and a local maximum point with the strongest energy is searched, and the frequency coordinate of the point is found I.e. the dominant frequency component, and is converted into coordinates relative to the center of the image: Wherein, the A frequency representing the most prominent periodic pattern in the horizontal and vertical directions in the local area image content; To be used for For the center, calculating the energy distribution of the power spectrum in the corresponding radius based on the preset energy threshold value Find the energy threshold As the bandwidth : Wherein, the Is based on Is centered and has a radius of Is a region of (a) in the above-mentioned region(s).
7. The method of claim 6, wherein constructing a band reject filter in the frequency domain of the original image and generating a perturbation image comprises: For image blocks Normalized by the frequency spectrum parameters to obtain normalized spatial frequency Bandwidth of radius And mapping the normalized frequency back to the coordinate system of the full-graph centralized spectrum: Band-stop filter with full-view structure : Wherein, the , Is a scaling factor; Is determined as an attenuation factor Attenuation intensity at the point; computing an original image Is transformed by two-dimensional discrete Fourier transform to obtain the original frequency spectrum And convert the original spectrum And band-stop filter And performing point-by-point complex multiplication to complete frequency domain disturbance shielding: For the frequency spectrum after disturbance Performing two-dimensional inverse discrete Fourier transform and taking the real part to obtain an image after disturbance 。
8. The method of claim 7, wherein assigning an initial weight to each feature based on a decision distance comprises: defining decision distance Wherein Detecting probability for a counterfeit class; image after disturbance Inputting the image falsification detection model again to judge, and combining the decision distance Initial weights of design features : If the image is disturbed And the original image Different, then initial weight Wherein, if the original image The result of the judgment is fake, and the disturbed image If the judgment result is true, then If the contrary, then ; If the image is disturbed And the original image The initial weights are the same 。
9. The method for analyzing the interpretability of the image forgery detection model according to claim 8, wherein the grouping and fusing the features according to the signs of the initial weights respectively generate final visual saliency evidence diagrams supporting forgery and reality, and the method comprises the following steps: feature grouping and weight normalization by classifying features into forgery-supporting groups based on weight symbols Supporting real groups Normalizing the absolute value of the weight in each group to obtain a normalized result : Weighting fusion to generate initial saliency map, namely weighting and summing semantic feature maps in the same group according to normalized weights to obtain the initial saliency map I.e. And : Space optimization of gray-scale images of original input images As a guide graph, for an initial saliency map Filtering is performed to smooth the inner region while preserving the edges: Wherein, the In order to optimize the saliency map, In order to filter the radius of the beam, Is a regularization parameter; Upsampling and final output of the optimized saliency map By bilinear upsampling to original image size Obtaining two final visual saliency evidence graphs And (3) with And the highlight portions support areas where the image falsification detection model determines falsification and reality, respectively.
10. An image falsification detection model interpretability analysis system, comprising: the feature extraction and feature map generation module is configured to extract middle layer features of the image counterfeiting detection model to be explained and generate a semantic feature map based on nonnegative matrix factorization of sparsity constraint; the feature importance pre-screening module is configured to perform feature importance pre-screening on the semantic feature map to obtain an important feature map; The fine evaluation and weight giving module is configured to locate a high-activation area and extract image blocks for each important feature map, analyze dominant frequency components and bandwidths of the image blocks corresponding to the high-activation area, construct a band-stop filter in a frequency domain of an original image and generate a disturbance image, and give initial weight to each feature based on decision distances; the feature fusion and result output module is configured to group and fuse the features according to the signs of the initial weights, and respectively generate final visual saliency evidence diagrams supporting counterfeiting and reality.

Description

Method and system for analyzing interpretability of image counterfeiting detection model Technical Field The invention relates to the technical field of computer vision, in particular to an image forgery detection model interpretability analysis method and system. Background Currently, image forgery detection techniques based on deep learning have demonstrated a strong ability to identify false content generated by generated artificial intelligence or conventional image manipulation means. However, such high performance models are commonly regarded as "black boxes" and their internal decision process lacks transparency, and only output a true-false binary classification probability, and cannot provide human-understandable visual evidence for making the decision. Studies have shown that current techniques for image forgery detection have drawbacks. For example, nazneen Mansoor et al in his paper Explainable AI for DeepFake Detection (2025) indicate that existing detection systems lack human interpretability and have shortcomings in providing a consistent and interpretable interpretation behind their classification decisions. To alleviate this problem, the prior art attempts to introduce an interpretable analytical approach, but most have significant limitations. One common approach is to perform independent post-processing on top of a trained detection model, such as generating a coarse class activation heat map, whose core logic is to measure the importance of each element to the final decision by computing the gradient of the model output relative to the input pixels or intermediate features. Representative include Grad-CAM and variants thereof. The underlying assumption of this type of approach is that the magnitude of the gradient directly reflects the extent to which small changes in input affect output. However, the limitations are very significant in that these methods rely heavily on the network layer chosen and the generated thermodynamic diagram often only identifies important areas relevant to the decision, and cannot reveal what visual anomalies are driving the decision within that area. The essence is that after the model decision is completed, the interpretation itself may deviate from the actual inference logic of the model, and it is difficult to map the high-dimensional features of the model back to human-understandable semantic concepts. Another category of research attempts to infer feature importance by perturbing an input to observe output changes, such as blurring or masking a particular region, by systematically masking or modifying some portion of the content of the input, intermediate features, etc., and observing changes in model output probabilities to infer the importance of the modified portion. The method is intuitively clear, but its limitations are equally profound in that, first, it faces causal confusion problems. When masking off an area of an image results in a predicted change, we cannot determine whether this is because the true evidence of forgery is removed or because the semantic consistency of the image is broken, triggering a generalized response of the model to anomalies. Second, it presents a computational dilemma of combining explosions. The images contain a large number of pixels or feature units, the exhaustive disturbance test on them is difficult to realize in terms of calculation, and the dependence on heuristic rules to select disturbance areas introduces subjective deviations, so that the reliability of interpretation is doubtful. Meanwhile, in recent years, detection and interpretation are performed simultaneously using a multi-modal large language model, which becomes the leading edge search direction. The method has the bottleneck that the method is difficult to surmount, the performance of the method is seriously dependent on massive paired data of 'image-accurate description', the fine marking cost for high-fidelity fake marks is extremely high and scarce, meanwhile, the model possibly generates illusions, the defects which do not exist in the speculative image are taken as explanation, the detection precision of the method is always behind a special model, and the huge calculation cost is added, so that the method is difficult to land in a real scene for practical use. In summary, the prior art presents a dilemma that, on one hand, high-performance detection models are not reliable enough due to difficult interpretation, and on the other hand, primary interpretability methods often present problems of rough interpretation itself, instability and questionable reliability, difficult interpretation models make decisions by using what internal features are specifically taken as evidence, and emerging large model paths are limited by high demands on training data volume, generation of model illusions and high computational cost. Disclosure of Invention In order to solve the problem of low reliability of model decision due to opaque features and evidences