CN-121999540-A - Face counterfeiting detection positioning and implicit identity tracing method
Abstract
The invention is suitable for the technical field of computer vision and information security, and provides a face counterfeiting detection positioning and implicit identity tracing method, which comprises the following steps of extracting multi-mode features through three parallel branches of RGB, frequency domain and noise; the method comprises the steps of carrying out dynamic weighted fusion on the extracted multi-mode features to generate a fusion feature map, introducing a generalized forgery detection adapter, utilizing a pre-training visual language model CLIP to construct an external general forgery knowledge buffer, carrying out self-adaptive linear fusion on the external general forgery knowledge buffer and an original discrimination result, distinguishing background features through a contrast learning training ViT model, constructing a background feature library corresponding to an identity tag, extracting background features of a query image, carrying out similarity calculation and sequencing on the background features and the feature library, and finally determining the implicit identity of the query image. The method is accurate in counterfeiting positioning, strong in robustness, strong in cross-domain generalization capability, and capable of tracing the identity, and provides a feasible technical scheme for comprehensive treatment of deep counterfeiting content.
Inventors
- SHI HUI
- LI FEI
- LIU MINGYANG
Assignees
- 辽宁师范大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260121
Claims (7)
- 1. The human face counterfeiting detection positioning and implicit identity tracing method is characterized by comprising the following steps of: step 1, carrying out face detection on an input image through a YOLO algorithm, dividing a face part and a background part, inputting the face part into MDFNet for image counterfeiting and positioning, synchronously analyzing the image through three parallel feature extraction branches, and respectively analyzing corresponding RGB features, frequency domain features and noise features; step 2, dynamically weighting and fusing the extracted multi-mode features to generate a fused feature map with high discrimination on the fake trace; Step 3, introducing a counterfeiting detection adapter, constructing an external general counterfeiting knowledge cache by utilizing a pre-training visual language model CLIP, and performing self-adaptive linear fusion with the original discrimination result of MDFNet; Step 4, training ViT a model through contrast learning to distinguish background features, and constructing a feature library corresponding to the feature vector and the identity tag by using the model for subsequent tracing; And 5, extracting background features of the image, performing similarity calculation with a feature library, sequencing, indexing identity tags of the image according to the best matching result, and finally determining the implicit identity of the image.
- 2. The face fake detection positioning and implicit identity tracing method according to claim 1, wherein the step 1 comprises the following specific steps: Step 1.1, carrying out face detection on an input image through a YOLO algorithm, dividing a face part and a background part, processing a main branch through an internal encoder after the face part enters a MDFNet module, and extracting a semantic feature map capable of representing high-level semantic content of the image; Step 1.2, extracting RGB features, namely firstly, sequentially extracting layered features of an original input image through three cascaded convertors, then enhancing the generated features by using a MACF based on multi-scale attention, and finally, inputting the enhanced features into a fourth convertors module to extract an RGB feature map, wherein the RGB feature map is shown as a formula (1): (1); Wherein, the Representing the final generated RGB features; 、 、 And Representing first, second, third and fourth transducer modules respectively, Representing a MACF module; is the original input image; Step 1.3, extracting frequency domain features, firstly carrying out two-dimensional Fourier transform on an original input image through frequency domain branches to obtain an amplitude spectrum, then mapping the amplitude spectrum through a lightweight convolutional encoder, extracting a frequency domain feature map capable of revealing global structural artifacts and periodic anomalies, sequentially carrying out layered feature extraction through three cascaded Transformer modules as shown in a formula (2), then enhancing the generated features through a fusion module based on multi-scale attention, and finally inputting the enhanced features into a fourth Transformer module as shown in a formula (3): (2); (3); Wherein, the Representing the finally generated frequency domain features; representing the generated frequency domain feature map; Representing a two-dimensional fourier transform operation; Representing a gray scale version of the input image; Is a very small constant for preventing logarithmic operation errors; Step 1.4, extracting noise characteristics, firstly extracting initial noise characteristics from an RGB image through a high-pass filter, then layering the characteristics through a cascaded transducer module, carrying out characteristic enhancement through a MACF module, and finally generating a noise characteristic diagram through fourth transducer block processing, wherein the noise characteristic diagram is shown in a formula (4): (4); Wherein, the Representing the finally generated noise characteristics; Representing the generated noise domain feature map.
- 3. The method for face counterfeit detection localization and implicit identity tracing according to claim 2, wherein the dynamic weighted fusion of the multi-modal features in step 2 comprises the following specific steps: Step 2.1, realizing dynamic fusion of multi-mode characteristics, and carrying out the method 、 And And (5) carrying out self-adaptive weighted summation, wherein a fusion formula is shown as (5): (5); Wherein, the The final fusion characteristic diagram; 、 And Is a dynamically generated fusion weight and satisfies ; Step 2.2, deep fusion is carried out on the fused multi-modal characteristics by SAFR, and finally the fused unified characteristics are input into a multi-layer perceptron to generate an initial classification discrimination result As shown in formula (10): (10); Wherein, the Operation of representational space-aware feature recalibration module for merging feature maps Refining; Representing the operation of the multi-layer perceptron.
- 4. A face falsification detection positioning and implicit identity tracing method according to claim 3, wherein said step 2.1 specifically comprises the steps of: Step 2.1.1 characterization of the three branches 、 And Splicing in the channel dimension to form a fusion candidate tensor As shown in formula (6): (6); Wherein, the The splicing operation is performed; Indicating the batch size; Representing the number of channels; Representing the height of the feature map and W representing the width of the feature map, Representing fusion candidate tensors Is a data dimension of (1); Step 2.1.2 using one Convolution pair Channel compression and information mixing are performed, and global average pooling operation is applied to generate a vector capable of representing image-level context information as shown in formula (7) As shown in formula (8): (7); (8); Wherein, the Is a 1x 1 convolution operation; Is a nonlinear activation function; for the global averaging pooling operation, Is an intermediate feature representation; Step 2.1.3 vector is applied Inputting a full connection layer and normalizing by a Softmax function to generate fusion weights 、 And As shown in formula (9): (9); Wherein, the Representing a fusion weight tensor, fusing weights 、 And Normalized by a Softmax function for weighted combination of a plurality of features or information; Representing an activation function.
- 5. The face fake detection positioning and implicit identity tracing method according to claim 4, wherein the step 3 comprises the following specific steps: Step 3.1, firstly, selecting partial images from a plurality of data sets to construct a buffer model, and for each selected buffer image Pre-trained and frozen CLIP image encoder Extracting high-dimensional visual features of the cache Key Key, wherein the features are normalized by L2 and then serve as the cache Key Key, as shown in a formula (11): (11); second, it will be associated with each cached image Corresponding real label Converting into a single thermal coding form as a buffer Value, as shown in formula (12): (12); Wherein, the Representing a single thermal encoding operation; finally, all the extracted key and value pairs are stored to form a cache model, as shown in formula (13): (13); Wherein, the Representing a cache model; step 3.2. Image to be detected Parallel input MDFNet and frozen CLIP image encoder The classification module MLP of MDFNet generates an initial discrimination result, and at the same time, Extraction of As query features Ensuring the consistency with the cache feature space, calculating the query feature And (3) with Cosine similarity among all cache keys Key, and generating a non-negative weight based on the similarity As shown in equation (14): (14); Wherein, the Is a super parameter for controlling the intensity of the similarity decay; Obtaining prediction of a cache model by weighting the cache value of a query As in formula (15): (15); prediction of a cache model And MDFNet initial classification discrimination results Linear weighted fusion is carried out to generate a final classification discrimination result As in equation (16): (16); Wherein, the And Are all super parameters for balancing And Contribution in the final discrimination result.
- 6. The face fake detection positioning and implicit identity tracing method according to claim 5, wherein the step 4 comprises the following specific steps: Step 4.1, according to the background image data of known implicit identity, the image pair is labeled, and the image from the same identity is used as positive sample pair Images from different identities as negative pairs Providing clear contrast signals for model learning; step 4.2, encoding the image samples by using a separate ViT module, and extracting deep background characteristic representation thereof, as shown in a formula (17): (17); Wherein, the A representation ViT module; And Is respectively from And Features extracted from the above; And 4.3, respectively calculating Euclidean distance between positive and negative sample pairs to measure the similarity degree between the features, and adopting a loss function based on cosine similarity, wherein the definition of the loss function is shown in a formula (18): (18); Wherein, the Through a feature extraction module for a pair of images The obtained feature vector; indicating that the pair of images has the same implicit identity and is a positive sample pair; Representing that the pair of images has different implicit identities, as a negative sample pair; representing cosine similarity calculation; is a super parameter for adjusting the loss of the negative sample, and has the function of pulling the distances among different identity features; Representation of Is a degree of similarity of (2); representing taking the maximum value of its parameters; step 4.4, constructing a feature database and using the feature extraction module after training Processing a group of implicit identity image sets with known sources, extracting implicit identity characteristics of each image, storing the characteristics and corresponding identity labels together to form a characteristic database The feature library is used as a reference set for implicit identity comparison, as shown in formula (19): (19); Wherein, the Represent the first -Displaying an image of known identity; Representing its corresponding identity tag; is the total number of features in the database.
- 7. The face fake detection positioning and implicit identity tracing method according to claim 6, wherein the step 5 comprises the following specific steps: Step 5.1, for a to-be-detected image to be traced Using trained feature extraction modules Extracting implicit identity features thereof As shown in formula (20): (20); Step 5.2 calculating implicit identity characteristics With each feature stored in the feature database F database Cosine similarity between the two, as shown in formula (21): (21); Wherein, the Calculating a function for the similarity; and 5.3, sequencing all similarity scores, and determining the identity label corresponding to the feature with the highest score as a tracing result, wherein the tracing result is shown in a formula (22): (22); Wherein, the To identify known image features in the database that are most similar to the query image features; Is an identity tag, used to determine the final implicit identity.
Description
Face counterfeiting detection positioning and implicit identity tracing method Technical Field The invention belongs to the technical field of computer vision and information security, and particularly relates to a face counterfeiting detection positioning and implicit identity tracing method. Background The existing depth counterfeiting detection method mostly depends on a single feature space (such as RGB space), and has the problems of single dimension of counterfeiting features, insufficient multi-mode feature fusion and the like. This single domain or finite feature based detection paradigm severely constrains the generalization ability of the model in the face of unknown forgery techniques or different data sets (i.e., cross-domain scenarios). In recent years, partial researches try to statically fuse different types of features (such as semantics and textures), and are difficult to adaptively adjust according to the counterfeiting characteristics of an image, so that the application flexibility of a model in a cross-compression rate and variable counterfeiting scene is limited, and the generalization performance of the model in a complex and unknown counterfeiting environment is poor. In addition, the existing feature extraction method mostly adopts a global processing strategy, so that the sensitivity to global artifacts is difficult to be kept, meanwhile, the accurate capturing of local fine-granularity fake marks is difficult to realize, and feature redundancy or key information loss is easy to cause. In addition, most of the existing face fake detection technologies identify authenticity or position, lack of tracing the source of fake images, and cannot effectively trace the source in the face of false information propagation. Disclosure of Invention The embodiment of the invention aims to provide a face counterfeiting detection positioning and implicit identity tracing method, which aims to solve the problems in the background technology. The embodiment of the invention relates to a face counterfeiting detection positioning and implicit identity tracing method, which comprises the following steps of: Step 1, carrying out face detection on an input image through a YOLO algorithm, dividing a face part and a background part, inputting the face part into a Multi-mode dynamic fusion network (Multi-modalDynamicFusionNetworkforImageForgeryLocalization, MDFNet) for image counterfeiting and positioning, synchronously analyzing the image through three parallel feature extraction branches, and respectively analyzing corresponding RGB features, frequency domain features and noise features; step 2, dynamically weighting and fusing the extracted multi-mode features to generate a fused feature map with high discrimination on the fake trace; Step 3, introducing a counterfeiting detection adapter, constructing an external general counterfeiting knowledge cache by utilizing a pre-training visual language model CLIP, and performing self-adaptive linear fusion with the original discrimination result of MDFNet; Step 4, training ViT a model through contrast learning to distinguish background features, and constructing a feature library corresponding to the feature vector and the identity tag by using the model for subsequent tracing; And 5, extracting background features of the image, performing similarity calculation with a feature library, sequencing, indexing identity tags of the image according to the best matching result, and finally determining the implicit identity of the image. According to a further technical scheme, the step 1 comprises the following specific steps: Step 1.1, carrying out face detection on an input image through a YOLO algorithm, dividing a face part and a background part, processing a main branch through an inner encoder (transducer) after the face part enters a MDFNet module, and extracting a semantic feature map capable of representing high-level semantic content of the image; Step 1.2, extracting RGB features, namely firstly, carrying out layered feature extraction on an original input image through three cascaded convertors modules, then enhancing the generated features by using a Multi-scale attention-based fusion module (Multi-scaleAttention-basedContextualFusion, MACF), and finally, inputting the enhanced features into a fourth convertors module to extract an RGB feature map, wherein the RGB feature map is shown in a formula (1): (1); Wherein, the Representing the final generated RGB features;、、 And Representing first, second, third and fourth transducer modules respectively,Representing a MACF module; is the original input image; Step 1.3, extracting frequency domain features, firstly carrying out two-dimensional Fourier transform (FFT) on an original input image through frequency domain branches to obtain an amplitude spectrum, then mapping the amplitude spectrum through a lightweight convolutional encoder, extracting a frequency domain feature map capable of revealing global str