CN-121982471-A - Multi-mode fusion fake image identification method, equipment and medium

CN121982471ACN 121982471 ACN121982471 ACN 121982471ACN-121982471-A

Abstract

The invention relates to the field of false image identification, and provides a method, equipment and medium for identifying a multi-mode fused false image, wherein the method comprises the steps of extracting multi-mode characteristics including vision, physics and semantics; the method comprises the steps of adaptively fusing multimode characteristics to obtain multimode fused characteristics, performing bidirectional track verification based on the multimode fused characteristics to obtain bidirectional track differences, identifying fake images based on the bidirectional track differences, tracing fake areas in the fake images by utilizing the bidirectional track differences, and outputting tracing evidences. The invention can solve the problems of low recognition precision, poor generalization of unknown forgery, difficult interpretation of recognition results and the like in the current forgery image detection.

Inventors

LIU FANG
YANG HUI
RAO ZHIHONG
KANG RONGBAO
XU RUI
LIU SHIYU

Assignees

中国电子科技集团公司第三十研究所

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (10)

1. A method for identifying a counterfeit image with multi-modal fusion, comprising: extracting multi-modal features including vision, physics and semantics; Adaptively fusing the multi-mode features to obtain multi-mode fusion features; performing bidirectional track verification based on the multi-mode fusion characteristics to obtain bidirectional track differences; identifying a counterfeit image based on the bi-directional trajectory difference; And tracing the fake region in the fake image by utilizing the bidirectional track difference, and outputting tracing evidence.
2. The method of claim 1, wherein the extracting multi-modal features including visual, physical and semantic comprises: extracting visual mode characteristics, wherein sub-characteristics of the visual mode characteristics comprise pixel-level characteristics and texture-level characteristics; extracting physical mode characteristics, wherein sub-characteristics of the physical mode characteristics comprise illumination characteristics and shadow characteristics; and extracting semantic modal characteristics.
3. The method for identifying a counterfeit image by multi-modal fusion according to claim 2, wherein the adaptively fusing multi-modal features to obtain multi-modal fused features includes: Constructing a modal weight prediction sub-network, and inputting information entropy of each multi-modal feature into the modal weight prediction sub-network to obtain a weight coefficient of each multi-modal feature; Calculating the incidence matrixes of different sub-features among the multi-modal features by adopting a cross-modal attention mechanism, and adjusting corresponding weight coefficients through the attention mechanism based on the incidence matrixes; And fusing the multi-modal features based on the weight coefficient to obtain the multi-modal fusion features.
4. The method for identifying a counterfeit image by multi-modal fusion according to claim 1, wherein the performing bidirectional track verification based on the multi-modal fusion features to obtain bidirectional track differences comprises: Taking the multi-mode fusion characteristic as an initial state of a forward track, carrying out forward track calculation based on noise for a plurality of time steps, and simulating a natural degradation process of a real image to the noise; Taking random noise as an initial state of a reverse track, performing reverse track calculation based on the noise for a plurality of time steps, and simulating a generation process from the noise to image features; And calculating the difference between the forward track and the reverse track of each time step by using Euclidean norms, and accumulating to obtain the total difference index of the bidirectional track.
5. The method for identifying a counterfeit image with multi-modal fusion according to claim 4, wherein said performing a plurality of time steps based on a forward trace calculation of noise is expressed as: Wherein, the Is the forward state of the t time step in the forward trace, Is the forward state of the t-1 time step in the forward trace, For the dynamic noise figure of the t-th time step, To calculate gaussian noise that meets a standard normal distribution when the trajectory is forward.
6. The method for identifying a counterfeit image with multi-modal fusion according to claim 4, wherein said performing a plurality of time steps of noise-based back trajectory calculations is represented as: Wherein, the In the reverse state of the t-th time step in the reverse track, In the reverse state of the t+1th time step in the reverse track, For the dynamic noise figure of the t-th time step, , For the adaptive sampling variance of the t-th time step, In order to sample the noise of the sample, Is the prediction noise of the t-th time step.
7. The method for identifying a counterfeit image based on multi-modal fusion according to claim 4, wherein the identifying a counterfeit image based on the bi-directional trajectory difference comprises: Comparing the total difference index of the bidirectional track with an optimal judgment threshold, judging that the image is a fake image if the total difference index of the bidirectional track exceeds the optimal judgment threshold, otherwise, judging that the image is a real image; the optimal judgment threshold value is obtained by training a real image sample and a fake image sample.
8. The method for identifying a counterfeit image by multi-modal fusion according to claim 1, wherein tracing a counterfeit area in the counterfeit image by using the bidirectional track difference and outputting tracing evidence comprises: mapping the time steps of which the bidirectional track difference exceeds the local threshold value back to image blocks in the image and marking the image blocks as suspected fake blocks; calculating the similarity between the multimode fusion features of the fake region and each type of modal features in the multimode fake type feature library, wherein the category with the highest similarity and exceeding a second set value is a fake type judgment result; The method comprises the steps of outputting traceable evidence, wherein the traceable evidence comprises a modal difference thermodynamic diagram, a characteristic comparison table and a judgment report, the modal difference thermodynamic diagram is used for intuitively displaying difference distribution under each mode, the characteristic comparison table is used for quantifying differences between fake areas and real areas on modal characteristics, and the judgment report comprises a bidirectional track total difference, fake type judgment results and confidence calculated based on similarity.
9. An electronic device, comprising: And a memory communicatively coupled to the at least one processor; Wherein the memory stores instructions executable by the at least one processor, by executing the instructions stored by the memory, causing the at least one processor to perform the method of any one of claims 1-8.
10. A computer readable storage medium for storing instructions that, when executed, cause the method of any one of claims 1-8 to be implemented.

Description

Multi-mode fusion fake image identification method, equipment and medium Technical Field The invention relates to the field of counterfeit image identification, in particular to a multi-mode fusion counterfeit image identification method, equipment and medium. Background Currently, image counterfeiting technology is rapidly developed, from early simple pixel tampering, to high fidelity face generation based on GAN (GENERATIVE ADVERSARIAL Network, generation of an antagonistic Network), deepfake face change, and full scene image counterfeiting combined with a Diffusion model, the difference between a fake image and a real image is increasingly tiny in vision, and a great challenge is brought to information authenticity identification. The existing counterfeit image recognition technology has obvious short plates, namely, firstly, the feature extraction dimension is single, most methods only depend on pixel statistical features or shallow texture features, anomalies of deep dimensions such as semantic logic, physical properties and the like of a counterfeit image are difficult to capture, for example, hidden counterfeit marks such as 'contradiction between human face and background illumination direction', 'mismatching between object shadow and light source position', and the like cannot be recognized, secondly, the feature fusion mode is simple, the fusion method of partial try multi-feature only adopts a fusion mode of splicing or weighted summation, complementarity and relevance of different modal features are not considered, so that feature redundancy is high, effective information is diluted after fusion, recognition accuracy is difficult to break through, thirdly, generalization capability is weak, recognition accuracy is greatly reduced for a model trained by a specific counterfeit method when the model faces a novel counterfeit technology, for example, the recognition accuracy for a model trained for a GAN counterfeit image is always lower than 65%. Fourth, the interpretability is poor, the existing recognition model is mostly a 'black box' model (such as a deep neural network), only outputs 'fake/real' classification results, and cannot locate fake areas, explain fake types (such as face changing and splicing) and judge fake technical principles, so that the recognition results are difficult to be trusted by the fields of judicial law, news and the like. Therefore, there is a need for an image forgery identification technology that can achieve a combination of high accuracy, robustness and high interpretability, so as to solve the above-mentioned technical problems. Disclosure of Invention The invention aims to provide a multi-mode fusion counterfeit image identification method, equipment and medium, which are used for solving the problems that the current counterfeit image detection is low in identification precision, poor in generalization of unknown counterfeit countermeasures, difficult to interpret the identification result and the like. In a first aspect, the present invention provides a method for identifying a counterfeit image with multi-modal fusion, including: extracting multi-modal features including vision, physics and semantics; Adaptively fusing the multi-mode features to obtain multi-mode fusion features; performing bidirectional track verification based on the multi-mode fusion characteristics to obtain bidirectional track differences; identifying a counterfeit image based on the bi-directional trajectory difference; And tracing the fake region in the fake image by utilizing the bidirectional track difference, and outputting tracing evidence. In a preferred embodiment, the extracting multi-modal features including visual, physical and semantic includes: extracting visual mode characteristics, wherein sub-characteristics of the visual mode characteristics comprise pixel-level characteristics and texture-level characteristics; extracting physical mode characteristics, wherein sub-characteristics of the physical mode characteristics comprise illumination characteristics and shadow characteristics; and extracting semantic modal characteristics. In a preferred embodiment, the adaptively fusing the multi-modal features to obtain multi-modal fused features includes: Constructing a modal weight prediction sub-network, and inputting information entropy of each multi-modal feature into the modal weight prediction sub-network to obtain a weight coefficient of each multi-modal feature; Calculating the incidence matrixes of different sub-features among the multi-modal features by adopting a cross-modal attention mechanism, and adjusting corresponding weight coefficients through the attention mechanism based on the incidence matrixes; And fusing the multi-modal features based on the weight coefficient to obtain the multi-modal fusion features. In a preferred embodiment, the performing bidirectional track verification based on the multi-mode fusion feature to obtain a bidirectional track difference