CN-121999502-A - Document falsification detection method based on dynamic kernel fusion and adaptive gradient modulation

CN121999502ACN 121999502 ACN121999502 ACN 121999502ACN-121999502-A

Abstract

The invention relates to a document falsification detection method based on dynamic kernel fusion and self-adaptive gradient modulation, and belongs to the technical field of digital evidence obtaining and computer vision. The method comprises the steps of obtaining an RGB image of a document to be detected, constructing a feature encoder comprising a dual-branch feature extraction and feature updating module, taking ConvNeXt V-Base as a backbone, combining a cross-domain feature calibration module to fuse vision and frequency domain features, performing learnable wavelet decomposition to obtain optimized multi-scale features, constructing a frequency domain self-adaptive feature decoder, completing dynamic kernel fusion through Fourier band grouping weight, kernel element precise modulation and space-frequency band dynamic matching, outputting a tamper prediction graph, and adopting self-adaptive gradient cosine loss, cross entropy loss and Lowatt loss to perform combined optimization during training. The method can accurately capture the frequency abnormality and trace of document falsification, effectively relieve the problem of unbalanced category, and has high detection precision and robustness under different compression scenes.

Inventors

GAO ZAN
WU CHUANG
ZHAO YIBO
MA CHUNJIE
LI CHEN
HU NIAN
LI XINHUI

Assignees

天津理工大学
山东省人工智能研究院

Dates

Publication Date: 20260508
Application Date: 20260410

Claims (10)

1. A document falsification detection method based on dynamic kernel fusion and adaptive gradient modulation is characterized by comprising the following steps: s1, acquiring RGB images of a document to be detected; S2, constructing a feature encoder, wherein the feature encoder comprises a double-branch feature extraction module and a feature updating module, the double-branch feature extraction module comprises a ConvNeXt V-Base backbone network and a cross-domain feature calibration module, and the feature updating module comprises a preprocessing unit, a frequency component separation unit, a frequency component enhancement unit, a cross-scale fusion unit and a final feature optimization unit; RGB image Processing through ConvNeXt V-Base backbone network and cross-domain feature calibration module to obtain initial multiscale visual feature set And calibrating the fusion feature Wherein, the method comprises the steps of, Respectively representing a first visual feature, a second visual feature, a third visual feature and a fourth visual feature, wherein the resolution of the first visual feature is that The resolution of the second visual feature is The resolution of the third visual feature is The resolution of the fourth visual feature is ,H、 Representing RGB images, respectively Height, width of (2); will calibrate the fusion features Replacing an initial multiscale visual feature set Second visual features in (a) Obtaining an updated multiscale visual feature set The feature updating module processes the updated multi-scale visual feature set through a learnable wavelet decomposition operator to obtain an optimized multi-scale feature set ; S3, constructing a frequency domain self-adaptive feature decoder, and completing feature depth fusion through a dynamic kernel fusion mechanism; the frequency domain self-adaptive feature decoder comprises a multi-scale feature unified mapping unit and a frequency domain self-adaptive convolution module; the frequency domain self-adaptive convolution module comprises a Fourier frequency band grouping weight FBGW sub-module, a kernel element accurate modulation KEPM sub-module and a space-frequency band dynamic matching SFM sub-module; the optimized multiscale feature set Processing by frequency domain self-adaptive feature decoder to generate final output predictive picture ; And S4, in the training process, adopting self-adaptive gradient cosine loss, cross entropy loss and Lowatt loss to jointly optimize model parameters.
2. The document falsification detection method based on dynamic kernel fusion and adaptive gradient modulation according to claim 1, wherein the cross-domain feature calibration module in step S2 specifically comprises: Image RGB Converted into YCbCr color space, and the Y channel is processed by discrete cosine transform extraction to obtain frequency characteristics Frequency characteristics And a second visual characteristic Splicing in the channel dimension to obtain a first splicing characteristic ; For the first splice feature Global average pooling is carried out on each channel of the image processing system, the space pixel value of each channel is compressed into 1 channel statistical value, and channel characteristic vectors are generated The channel feature vector Processing by a weight generator to obtain channel weights The weight generator comprises a full-connection layer and a Sigmoid activation function, wherein the full-connection layer comprises a dimension-reduction full-connection layer and a dimension-increase full-connection layer; Weighting channels With first splice feature Element-by-element multiplication of channel dimensions is performed to generate channel weighted features Weighting the channel Global maximization of channel dimension is carried out to obtain a maximized pooling feature map Weighting the channel Global average pooling of channel dimensions is carried out to obtain an average pooling feature map Maximize pooling feature map And averaging the pooled feature map Splicing in the channel dimension to obtain a second splicing characteristic The second splice feature Through the convolution layer with the convolution kernel size of 1 multiplied by 1, the dimension is reduced to 1 channel, and then through the Sigmoid activation function processing, the space weight is obtained Weighting the space Weighted channel features Element-by-element multiplication of spatial dimensions to generate channel-space dual weighting features ; The channel-space dual weighting feature The number of channels is regulated by processing a convolution layer with the convolution kernel size of 1 multiplied by 1, a batch normalization layer and a ReLU activation function, and the enhanced frequency domain characteristic is obtained The enhanced frequency domain features After zero crossing initialization convolutional layer processing and second visual feature Performing element-by-element addition fusion to obtain calibration fusion characteristics 。
3. The method for detecting document falsification based on dynamic kernel fusion and adaptive gradient modulation as claimed in claim 2, wherein the preprocessing unit in step S2 specifically updates the multiscale visual feature set Through a convolution layer with the convolution kernel size of 1 multiplied by 1, unifying the channel number and outputting a standardized feature set 。
4. The method for detecting document tampering based on dynamic kernel fusion and adaptive gradient modulation as defined in claim 3, wherein the frequency component separation unit in step S2 is specifically the standardized feature set After the features in the model are processed by the average aggregation type convolution kernel and the difference perception type convolution kernel, the features of the low-frequency components are generated through the average downsampling operation with the step length of 2 Features with high frequency components , Corresponding to 。
5. The method for detecting document tampering based on dynamic kernel fusion and adaptive gradient modulation as defined in claim 4, wherein the frequency component enhancement unit in step S2 is specifically configured to perform low frequency component characterization Features with high frequency components Respectively executing self-adaptive enhancement flow to obtain low-frequency enhancement components And a high frequency enhancement component The operation process comprises the following steps: The low frequency component characteristics The dimension regulation and nonlinear mapping is carried out through the lightweight characteristic processing block, then the channel attention and the space attention mechanism are sequentially introduced, and the processing result and the low-frequency component characteristic after the channel attention and the space attention mechanism are introduced Hadamard product operation is carried out to obtain low-frequency enhancement components And the same applies to the high-frequency component characteristics Through the above process, the high-frequency enhancement component is obtained 。
6. The method for detecting document tampering based on dynamic kernel fusion and adaptive gradient modulation as claimed in claim 5, wherein the cross-scale fusion unit in step S2 is specifically implemented by using guided feature aggregation and window linear modeling to enhance the low frequency component set And a high frequency enhancement component set Processing to obtain a fused low-frequency component And fusing high frequency components ; Taking a high-frequency enhancement component set as an example, the specific processes of the window linear modeling and the guided feature aggregation are as follows: Window linear modeling: Combining high frequency enhancement components The features in the method are ranked according to the size of the scale, namely, the first scale features Features of the second scale Features of the third scale Fourth scale feature First scale features at maximum scale Taking the second scale feature as a reference Features of the third scale Fourth scale feature Up-sampling to first scale features respectively To obtain a second preliminary alignment feature Third preliminary alignment feature Fourth preliminary alignment feature First scale feature Without any treatment, the spatial dimensions remain unchanged, noted as first preliminary alignment features First preliminary alignment feature Second preliminary alignment feature Third preliminary alignment feature Fourth preliminary alignment feature Dividing windows, and learning scaling parameters for each window And offset parameter The formula is: , wherein, Represent the first Preliminary alignment feature single window aligned features, k=1, 2,3,4; First, the The features under a single window corresponding to the preliminary alignment features are denoted by w; representing a scaling parameter corresponding to the single window; representing the offset parameters corresponding to a single window, and comparing all windows Splicing to obtain alignment feature ; Guided feature aggregation: the first scale feature Processing by a convolution layer with the convolution kernel size of 1 multiplied by 1 and a Sigmoid activation function to generate a guide weight map For each alignment feature Weight is distributed according to the scale contribution degree to generate scale weight Based on the guidance weight map Weight of scale Alignment features Generating a fused high frequency component The formula is as follows: , Wherein, the Representing element-by-element multiplication; And the same is done to obtain a fused low frequency component 。
7. The document falsification detection method based on dynamic kernel fusion and adaptive gradient modulation according to claim 6, wherein the final feature optimization unit in step S2 specifically comprises: Fusing high frequency components by bilinear interpolation upsampling Upsampling to and from a first visual feature Is then up-sampled and fused with the first visual feature Splicing in the channel dimension to obtain a third splicing characteristic, wherein the third splicing characteristic is input into a convolution layer with the convolution kernel size of 1 multiplied by 1 to generate a high-frequency dominant characteristic ; Fusing low frequency components by bilinear interpolation upsampling Upsampling to and fourth visual feature Is then up-sampled and fused with the low frequency component to a fourth visual feature Splicing in the channel dimension to obtain a fourth splicing characteristic, wherein the fourth splicing characteristic is input into a convolution layer with the convolution kernel size of 1 multiplied by 1 to generate a low-frequency dominant characteristic ; Finally, an optimized multi-scale feature set is obtained , , 。
8. The method for document falsification detection based on dynamic kernel fusion and adaptive gradient modulation of claim 7, wherein the multi-scale feature unified mapping unit in step S3 is specifically configured to optimize a multi-scale feature set through a lightweight MLP module The features in the method are mapped to a unified embedding dimension, and after up-sampling is carried out to the same spatial resolution, splicing is carried out in the channel dimension to obtain a fifth splicing feature 。
9. The document falsification detection method based on dynamic kernel fusion and adaptive gradient modulation according to claim 8, wherein the frequency domain adaptive convolution module in step S3 specifically comprises: Fourier band grouping weights FBGW sub-module given input fixed initial parameter weights Will fix the initial parameter weight Remodelling into a matrix of Fourier domain spectral coefficients Fourier domain spectral coefficient matrix Indexing by Fourier A kind of electronic device Norm number Ordering from low to high, uniformly dividing into Sets of disjoint parameters , 、 Respectively fourier domain spectral coefficient matrix Horizontal coordinates, vertical coordinates of (a); for each parameter set Performing an inverse discrete Fourier transform to convert into spatial domain data Spatial domain data Clipping and reorganizing into standard convolution weights Forming 10 weight groups with differentiated frequency responses; the nuclear element accurate modulation KEPM submodule comprises a local channel perception branch and a global channel guidance branch, and the specific process is as follows: characterizing the fifth splice Inputting local channel perception branches, capturing local channel information of a document by adopting lightweight 1D convolution to obtain a compact modulation matrix ; Characterizing the fifth splice Input to global channel guide branch, and extract global statistical information by global average pooling Output/input channel modulation values via three different full connection layers Output channel modulation value Nuclear space modulation value Thereby obtaining sparse modulation vector The formula is expressed as follows, , , , , , Wherein, the Representing an outer product operation; 、、 three different weight matrices representing the fully connected layer; 、、 representing three different biases of the fully connected layer, dense modulation matrix Sparse modulation vector Fusion is carried out through Hadamard product operation, and a final modulation matrix is obtained ; The standard convolution weights And final modulation matrix Carrying out Hadamard product operation to obtain modulated parallel weight based on fifth splicing characteristic Generated attention coefficients Controlling the ith parallel weight after modulation and outputting the fusion weight The formula is expressed as follows: , , Wherein, the Representing the attention coefficient corresponding to the ith parallel weight; Satisfy the following requirements ; The specific process of the space-frequency band dynamic matching SFM submodule is as follows: Will fuse the weights Zero-filling to fifth stitching feature Dividing binary masks based on octaves , Fusion weight after zero filling The method is decomposed into 4 frequency band weights of very low frequency, intermediate frequency and high frequency, and the formula is as follows: , Wherein, the Band weights representing the b-th band; 、 Respectively representing discrete Fourier transform operation and inverse discrete Fourier transform operation; And directly calculating characteristic responses of the frequency bands in a Fourier domain based on a convolution theorem, wherein the formula is as follows: , Wherein, the A convolved output characteristic representing a b-th frequency band; the fifth stitching feature Generating a space modulation diagram of each frequency band through a Sigmoid activation function after dimension reduction of a convolution layer with the convolution kernel size of 1 multiplied by 1 Spatial modulation pattern By broadcast multiplication Multiplying, and summing the modulated characteristics of all frequency bands to obtain a final output prediction graph 。
10. The document falsification detection method based on dynamic kernel fusion and adaptive gradient modulation as claimed in claim 9, wherein the total loss function The definition is as follows: , Wherein, the Representing adaptive gradient cosine loss; Representing cross entropy loss; representing a Lowz loss; 、、 Three different loss weights are represented respectively.

Description

Document falsification detection method based on dynamic kernel fusion and adaptive gradient modulation Technical Field The invention belongs to the technical field of digital evidence obtaining and computer vision, and particularly relates to a document falsification detection method based on dynamic kernel fusion and self-adaptive gradient modulation. Background Along with the rapid development of the digitizing technology, the digitized storage and transmission of documents are increasingly popular, but the document falsification technology is also updated, falsification means are increasingly hidden, such as text replacement, content erasure, copy-paste, seal forging and the like, and seriously threaten information security and social trust. Document falsification detection is one of core technologies for digital evidence obtaining, aims at accurately identifying and positioning falsified areas in a document, and provides technical support for information authenticity verification. The existing document falsification detection method mainly has the defects that feature extraction is single, multiple-dependent visual features are adopted, frequency domain abnormal clues are ignored, fine differences such as JPEG compression artifacts, falsified edge high-frequency noise inconsistency and the like are difficult to capture, a multi-scale feature fusion effect is poor, a traditional convolution network is limited by space invariance and cannot effectively sense local frequency mutation of strong structural components such as characters, tables and the like, a loss function design is unreasonable, a document falsification area usually only accounts for 1% -5% of an image, a traditional loss function (such as cross entropy and price loss) is biased to a background area due to the problem of category imbalance, and the capturing capability of the fine falsification trace is insufficient. Therefore, a document falsification detection method capable of fusing multi-domain features, accurately capturing frequency abnormality and adapting type imbalance is urgently needed, and detection precision and robustness are improved. Disclosure of Invention The invention aims to achieve the aim, and the aim is achieved by the following technical scheme: the invention provides a document falsification detection method based on dynamic kernel fusion and adaptive gradient modulation, which comprises the following steps: s1, acquiring RGB images of a document to be detected; S2, constructing a feature encoder, wherein the feature encoder comprises a double-branch feature extraction module and a feature updating module, the double-branch feature extraction module comprises a ConvNeXt V-Base backbone network and a cross-domain feature calibration module, and the feature updating module comprises a preprocessing unit, a frequency component separation unit, a frequency component enhancement unit, a cross-scale fusion unit and a final feature optimization unit; RGB image Processing through ConvNeXt V-Base backbone network and cross-domain feature calibration module to obtain initial multiscale visual feature setAnd calibrating the fusion featureWherein, the method comprises the steps of,Respectively representing a first visual feature, a second visual feature, a third visual feature and a fourth visual feature, wherein the resolution of the first visual feature is thatThe resolution of the second visual feature isThe resolution of the third visual feature isThe resolution of the fourth visual feature is,H、Representing RGB images, respectivelyHeight, width of (2); will calibrate the fusion featuresReplacing an initial multiscale visual feature setSecond visual features in (a)Obtaining an updated multiscale visual feature setThe feature updating module processes the updated multi-scale visual feature set through a learnable wavelet decomposition operator to obtain an optimized multi-scale feature set; S3, constructing a frequency domain self-adaptive feature decoder, wherein the frequency domain self-adaptive feature decoder is designed aiming at document structural features and frequency abnormality features, and feature depth fusion is completed through a dynamic kernel fusion mechanism; the frequency domain self-adaptive feature decoder comprises a multi-scale feature unified mapping unit and a frequency domain self-adaptive convolution module; The frequency domain self-adaptive convolution module is used for completing feature fusion and realizing accurate capture of frequency features, and comprises a Fourier frequency band grouping weight FBGW submodule, a kernel element accurate modulation KEPM submodule and a space-frequency band dynamic matching SFM submodule; the optimized multiscale feature set Processing by frequency domain self-adaptive feature decoder to generate final output predictive picture; And S4, in the training process, adopting self-adaptive gradient cosine loss, cross entropy loss and Lowatt loss to jointly op