CN-122001987-A - Method and system for quickly generating robust watermark for screen shooting based on structural perception

CN122001987ACN 122001987 ACN122001987 ACN 122001987ACN-122001987-A

Abstract

The invention provides a method and a system for quickly generating a screen shot robust watermark based on structural perception, wherein the method comprises the steps of 1, mapping original watermark information to a high-dimensional continuous latent space with redundancy characteristics, 2, constructing a watermark residual error generation network guided by conditions, directly mapping the redundant watermark vector to a distribution space of watermark residual error signals under the constraint of an image structure priori, realizing joint control of the amplitude and the spatial distribution of the watermark residual error signals, 3, realizing watermark quick extraction through a multi-stage downsampling and channel constraint mechanism, and 4, adopting a plurality of continuous illumination field generation mechanisms to be combined with an imaging distortion model, and constructing illumination disturbance of multi-scene combination. The invention converts the watermark embedding process from direct optimization of the watermark image domain to structural modeling of the generated watermark residual error, and improves the robustness under the condition of screen shooting while obviously reducing the computational complexity.

Inventors

ZHANG JIANBO
WANG BAOWEI
CUI QI
Meng Ruohan
Du Xilei
YANG GAOBO

Assignees

南京信息工程大学
江苏羽驰区块链科技研究院有限公司

Dates

Publication Date: 20260508
Application Date: 20260122

Claims (10)

1. The method for quickly generating the screen shot robust watermark based on the structural perception is characterized by comprising the following steps of: step 1, preprocessing watermark information to be embedded, namely constructing a learnable distortion perception potential representation, mapping original watermark information to a high-dimensional continuous latent space with redundancy characteristics so as to predefine redundancy representation of watermark signals; Step 2, constructing a watermark residual error generation network guided by conditions, namely performing constraint mapping on redundant representation of watermark signals by utilizing image structure prior under the condition of not explicitly modeling the semantic features of an original image, so that watermark redundant vectors adaptively generate watermark residual error signals meeting the constraint of amplitude and space distribution; Step 3, constructing a lightweight watermark extraction network based on a trusted support domain, wherein the watermark extraction network adopts a network paradigm of multi-stage downsampling, and introduces a convolution based on the trusted support domain in each stage to perform feature extraction, and reduces the computational complexity while maintaining the watermark discrimination precision through a channel constraint and redundancy elimination mechanism, so as to realize real-time watermark extraction; And 4, introducing a screen shooting simulation layer in a training stage, constructing a joint distortion model for describing the space continuous brightness change and imaging nonideal in the screen shooting process by the screen shooting simulation layer through collaborative modeling of a continuous illumination field generation mechanism and an imaging distortion model, wherein the joint distortion model is used for uniformly representing various shooting interference factors introduced by illumination distribution change and the imaging process.
2. The method of claim 1, wherein step 1 comprises: Step 1-1, representing the original watermark information as a binary vector of 64 bits in length First, the original watermark information is embedded into the continuous real value space , wherein, Is a micro-continuum embeddable map; Representing real space; constructing a multi-component redundancy mapping operator controlled by random inactivation for projecting a continuous watermark representation into a high-dimensional continuous redundancy representation space, said multi-component redundancy mapping operator being defined as: , Wherein the method comprises the steps of Representing the generated redundant watermark feature vector with 256 bits in length; representing a learnable parameter as Is a redundant mapping operator; represents a random deactivation control variable; The number of components is redundancy mapped; And Respectively the first The weight matrix and the bias vector of each redundancy mapping component; A control matrix for random deactivation; Is the first Random deactivation vectors in the individual branches and satisfy Bernoulli distribution Random inactivation of vectors with probability Randomly shielding each dimension of the redundant mapping result and discarding rate ; Introducing controlled random disturbance into a redundant representation space through a multi-component random mapping mechanism, so as to construct watermark redundant representation with robustness to random loss and nonlinear distortion; Step 1-2, energy normalization and distribution redistribution constraint of redundant representation, namely, applying energy normalization and distribution constraint to the redundant representation to obtain normalized redundant representation : , Wherein the method comprises the steps of A stability term to prevent denominator from being zero; representing mathematical expectation operators, introducing a differentiable redundant reassignment function Performing distribution adjustment on the normalized redundant representation to obtain a redundant representation finally used for watermark residual error generation ; Step 1-3, modeling the redundancy decoding and information consistency of distortion perception, namely, applying high-strength random disturbance to the redundancy representation, reconstructing the redundancy representation to a watermark information space through a redundancy decoding operator controlled by random inactivation, wherein the redundancy decoding process is defined as follows: , Wherein the method comprises the steps of Representing the reconstructed continuous value watermark representation; representing a learnable parameter as Is a redundant decoding operator; representing the number of decoding map components; 、 Respectively the first A weight matrix and a bias vector of each decoding component; a control matrix for random deactivation of the decoding stage; Meet the discarding rate of Bernoulli distribution of (2) And (2) and ; Step 1-4, before integrating the redundancy mapping operator and the redundancy decoding operator into the watermark generation network and the watermark extraction network, performing noise perception pre-training optimization on the redundancy mapping operator, wherein in the pre-training stage, a multi-constraint joint objective function between original watermark information and reconstructed watermark information is minimized to establish consistency and stability constraint of redundancy representation, and the objective function The definition is as follows: , Wherein the method comprises the steps of A smooth approximate mapping function for continuous to discrete values; representing a variance function, a first term For constraining consistency of watermark information, a second item For limiting the degree of statistical dispersion of redundant representations, a third term Overall bias drift for suppressing redundant representations; Is a weight coefficient.
3. The method according to claim 2, wherein in step 1-4, after the pre-training is completed, the redundancy mapping operator and the redundancy decoding operator are embedded as micro redundancy constraint modules into an end-to-end joint training process of the watermark residual generation network and the watermark extraction network, and maintain a parameter updatable state in a joint optimization stage, so as to adaptively adjust the redundancy representation structure.
4. A method according to claim 3, wherein step 2 comprises: Step 2-1, redundant watermark feature vectors Reshaping operators by tensors Mapping to a two-dimensional spatial representation to construct a spatial prior for subsequent convolution operations, The re-parameterized result is expressed as , Representing an initial watermark feature tensor; Step 2-2, for the initial watermark feature tensor Applying a convolution kernel of size Is used for realizing channel dimension expansion and local feature modeling, and sequentially combines batch normalization and nonlinear activation operations to form basic feature representation : , Wherein the method comprises the steps of Representing a convolution operation; And Respectively the weight parameters and the bias of the convolution layer; Representing the number of channels after expansion; Representing a batch normalization operator; Representing a ReLU activation function; step 2-3, introducing the catalyst A step-by-step up-sampling network formed by cascade transpose convolution modules, the first The stage upsampling operation is defined as: , Wherein the method comprises the steps of Represent the first Expanding the spatial resolution of the feature map by 2 times in each stage operation, and simultaneously halving the number of channels; Through the process of After up sampling, the final watermark template characteristic diagram is obtained : , Wherein the method comprises the steps of 、 And Representing the height, width and channel number of the watermark template feature map, respectively, and the height and width and carrier image Is uniform in spatial dimensions; step 2-4, constructing structural condition information based on gradient energy field, and utilizing Sobel operator to extract carrier image Extracting structural condition information The method specifically comprises the following steps: step 2-4-1, firstly defining Sobel gradient operator in horizontal direction Sobel gradient operator in vertical direction : ; Step 2-4-2, carrying out convolution operation on the carrier image to obtain a horizontal gradient response And vertical gradient response : ; Step 2-4-3, defining a structural condition diagram Normalized representation for gradient energy field: , Wherein the method comprises the steps of Representing an element-by-element absolute value operation, Representing the linear normalization of eigenvalues to intervals Is a normalization operator of (2); Step 2-5, watermark template feature map And structural condition diagram Stitching in the channel dimension to form a joint feature representation : , Subsequent pairing of Applying a convolution kernel of size Convolving the modules to achieve feature fusion and passing through one The convolution layer compresses the channel number, and finally generates a watermark residual signal R by combining with the output of the Tanh activation function: , Wherein the method comprises the steps of , Respectively the weight parameters and the bias of the convolution layer; Step 2-6, constructing a Markov discriminant under the constraint of structural conditions, wherein the input of the Markov discriminant is a generated watermark residual signal And structural condition diagram Splicing results in the channel dimension; the Markov discriminant models the input features layer by layer through a multi-layer local receptive field convolution operation, applies LeakyReLU activation functions after each convolution layer, and finally passes through one The convolution layer maps the multi-channel characteristics into a single-channel discrimination score graph for resisting learning and local consistency constraint; the Markov discriminant finally obtains a two-dimensional space score map The score map Is defined by the spatial position of each of the two elements A discrimination score corresponding to a portion of the input image, wherein The probability modeling is expressed as: , Wherein the method comprises the steps of Representing a markov discriminant under structural constraints, The output of the discriminator is a two-dimensional space score graph At the position of Output at Corresponding to local region in input image Is a result of the discrimination of (2); And Respectively representing watermark residual error and structural condition in the local area Outputting a logarithmic conditional probability approximately representing that the local watermark residual is in a true watermark pattern under structural condition constraint Markov discriminant under given structural conditions In the case of (2), only the watermark residual distribution in the local section is judged, so that the independent assumption of the Markov local condition is satisfied, and the probability of judging the overall condition is judged The product form of each local discriminant probability is decomposed: ; Step 2-7, adopting pixel space to make linear superposition and making watermark residual signal With carrier image According to the formula Performing spatial domain superposition to generate a final watermark image Wherein To adjust the intensity coefficient for a balance of robustness and imperceptibility.
5. The method of claim 4, wherein step 3 comprises: step 3-1, representing an input image of the lightweight watermark extraction network based on the trusted support domain as Wherein And Respectively representing the height and width of the image, comprising the following steps: step 3-1-1, constructing a multi-stage feature transformation process, introducing a group of learnable stage mapping operators Wherein Represent the first Parameter sets corresponding to the stages, wherein the characteristic recurrence relation of each stage is defined as follows: ; Wherein the method comprises the steps of Represent the first The characteristics of the phases map the tensors, Is of the dimension of , 、 And The number, the height and the width of the channels are respectively corresponding; Step 3-1-2, intermediate feature mapping for any stage Constructing channel credibility mapping function : , Wherein the method comprises the steps of Respectively represent the first A learnable weight matrix and a bias vector in the phase channel and space joint statistical mapping function; representing a channel statistics aggregation function, defined as: , Wherein the method comprises the steps of Respectively representing global average pooling, maximum pooling and variance statistics; The function is activated for the purpose of GELU, And Is the first Parameters of the stage, representing soft credibility weights of each channel; step 3-1-3, based on channel credibility weight Introducing channel gating re-parameterization to the input features: , Wherein the method comprises the steps of Is shown in the first Stage, spatial position Where, via channel credibility weights The modulated re-parameterized characteristic response value; A partial convolution map based on the trusted support domain is then constructed: , Wherein the method comprises the steps of Is shown in the first In the individual feature transformation stage, the spatial position is The condition convolution at the position outputs a characteristic response value; Respectively representing the horizontal and vertical coordinate values of the space offset position; representing a standard convolution domain; Represent the first In the stage, from the input channel To the output channel A conditional convolution kernel weight function of (2); Represent the first In the stage, the input channel index is The space position is The re-parameterized characteristic response value after the channel credibility weight modulation; step 3-1-4, intermediate feature Introducing cross-channel linear mixing and normalization mapping: , Wherein the method comprises the steps of First, the Mapping the output characteristics of the stage; representing a 1x 1 cross-channel convolution mapping operator; Represent the first Intermediate feature mapping of the stage; Step 3-2 mapping the final stage characteristics after the multi-stage characteristic transformation is finished Introducing region partition sets And defining regional level statistical aggregation: , , Wherein the method comprises the steps of Expressed in terms of spatial position Is a local region sub-block of the index; Represent the first The individual channels are in the region Maximum response statistics in; Represent the first The individual channels are in the region Average response statistics within; Represent the first Stage (1) The index of each channel and space position is A characteristic response value at; Constructing fusion characteristics and obtaining single-channel judgment characteristics through linear projection operators : , Wherein, the And Is a learnable parameter; Representing a splicing operation; And Respectively expressed by At all regional locations Channel-level statistics mapping of upper constructs At all regional locations The channel-level statistical feature mapping formed above; Eventually, it will And inputting a redundant decoding network to complete watermark information reconstruction.
6. The method of claim 5, wherein step 4 comprises: in the network training stage, each training batch dynamically instantiates an illumination generation model through a random variable driven condition selection mechanism, and the method comprises the following steps: step 4-1-1, defining continuous low-frequency illumination field For a set of linear combinations of direction-variable two-dimensional gaussian basis functions, a low-frequency illumination field generation model based on direction-variable gaussian expansion is constructed, expressed formally as: , wherein the basis functions are single The definition is as follows: , Wherein, the First, the Weight coefficients of the individual basis functions; Take the value of ; Representing the coordinates At an angle of A result after rotation; Representing a gaussian diffusion coefficient; Represent the first Exp is a natural exponential function; Step 4-1-2, introducing a global illumination tilt model based on a brightness gradient field The illumination distribution is defined as: , Wherein, the A gradient intensity vector representing the illumination of the light, The gradient strength in the horizontal direction and the gradient strength in the vertical direction are respectively; Representing a global illumination offset; Is a small-scale random disturbance item; step 4-1-3, introducing a nonlinear illumination model based on program generated noise The definition is: , Wherein, the Expressed in frequency parameters Two-dimensional program noise function, parameters For controlling the intensity of influence of noise illumination; step 4-1-4, employing a simplified reflected light model To simulate the specular reflection characteristics of the screen surface against an external light source, the illumination distribution is expressed as: , Wherein, the And (3) with Respectively representing diffuse reflection and specular reflection coefficients; representing an ambient light distribution; Is a pixel point Is a reflection direction vector of (a); is a camera line-of-sight direction vector; Is a high light attenuation index; Step 4-1-5, introducing discrete random variable in training stage The definition is: , Representing a uniform probability distribution and based on Is used for selecting corresponding illumination generation model : , Wherein, the Is an indication function; and 4-2, simulating moire distortion by generating periodic interference fringes, and counteracting perspective distortion by adopting perspective transformation.
7. The method of claim 6, wherein in steps 4-1-5, the following loss function is used in the training process : , Wherein the method comprises the steps of 、、 The weight coefficient of the watermark generator, the weight coefficient of the watermark extractor and the weight coefficient of the Markov discriminator are respectively; is the loss of the watermark generator, and the calculation formula is: , Wherein the method comprises the steps of Representing a zero-value image; the representation pair is defined in the image space domain Mathematical expectation operators for all spatial locations within; is a spatial pixel coordinate; Representing spatial position Watermark residual signal value outputted from the position; is the loss of the watermark extractor, and the calculation formula is: , Wherein the method comprises the steps of Representing all watermark messages Is a mathematical expectation operator of (a), Representation of Is from a binary set The method comprises the steps of randomly sampling according to uniform distribution; is the loss of the condition discriminator, and the calculation formula is as follows: , Wherein the method comprises the steps of Represented in all real image sample sets Statistical expectations on; Expressed in watermark message space Statistical expectations on; represents a conditional guided watermark residual generation network, Is a network parameter.
8. A fast-generating screen shot robust watermarking system based on structural awareness realized according to the method of any of claims 1-7, comprising: the watermark generating unit is used for receiving the watermark and the image structure information and outputting a watermark signal; The Markov condition judging unit is used for receiving the watermark signal and the image structure information, and enhancing the robustness of the watermark signal under noise attack through countermeasure training; Watermark embedding unit for efficiently passing the resulting watermark signal and carrier image Obtaining a watermark image by a space domain superposition method; the analog distortion unit is used for simulating distortion existing in the screen shooting distortion process of the watermark image, so as to help the watermark network to enhance the robustness in the screen shooting scene; And the decoding and extracting unit is used for extracting hidden watermark information from the output image of the analog distortion unit.
9. An electronic device comprising a processor and a memory, the memory storing program code that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 7.
10. A storage medium storing a computer program or instructions which, when run on a computer, performs the steps of the method of any one of claims 1 to 7.

Description

Method and system for quickly generating robust watermark for screen shooting based on structural perception Technical Field The invention belongs to the field of information security, and particularly relates to a method and a system for quickly generating a robust watermark for screen shooting based on structural perception. Background In the era of explosive growth of digital content, the popularity of mobile terminals and high-definition imaging technology has greatly promoted the convenience of image creation, sharing and transaction. However, open-ended propagation environments also expose digital works to unprecedented copyright risks. Particularly, on devices with limited computing and storage resources such as smart phones and tablet computers, how to realize efficient, lightweight and robust watermarking algorithms under limited computing power so as to meet the demands of copyright right and leakage tracing has become an important problem to be solved. The digital watermark is used as an important branch in the field of information hiding, and the core idea is to embed a section of invisible identification information into a carrier image, so as to realize ownership verification, content tracking and anti-counterfeiting authentication while maintaining visual quality. Unlike the scheme that pays attention to information confidentiality in encryption or steganography, the robust watermark emphasizes that embedded information can still be reliably extracted after an image is subjected to various distortion operations such as compression, shooting and the like. Conventional watermarking algorithms are typically based on embedding strategies in the spatial domain or transform domain. For example, the spatial domain method directly modifies the pixel values by the least significant bits (LEAST SIGNIFICANT Bit, LSB), and the Transform domain method then embeds the watermark on the frequency components using discrete cosine Transform (Discrete Cosine Transform, DCT) or discrete wavelet Transform (DISCRETE WAVELET Transform, DWT). These approaches are robust to common distortions in the digital domain (e.g., JPEG compression, gaussian noise, filtering, etc.), but tend to behave very fragile when subjected to cross-modal physical attacks. In recent years, a screen shot attack is becoming one of the main means of digital content leakage. An attacker can bypass the digital rights management system only by shooting the display screen through the mobile phone camera, and easily obtain a high-fidelity copy. The composite distortion introduced in this process is extremely complex, including lens blur, moire interference, uneven illumination, color shift, perspective distortion, and subsequent secondary compression via social media platforms. These distortions not only have non-linear and non-differentiable characteristics, but also are coupled to each other in spatial and frequency dimensions, significantly impairing the structural consistency and recoverability of conventional watermark signals, thus rendering the traceability verification almost ineffective. The development of deep learning brings new opportunities for robust watermarking technology. Through the end-to-end embedding and extraction network, the watermark may be adaptively encoded into the structural features of the image, thereby significantly improving its resistance to distortion. However, the existing deep watermark framework still has three bottlenecks: First, it is difficult to achieve both performance and efficiency. In order to maintain high robustness under various complex distortions, many methods explicitly model image features using deep or multi-branch convolutional network structures, resulting in huge scale of model parameters and high computational complexity. And the watermark embedding modeling paradigm still stays at the image-level direct optimization level, so that model learning is overloaded. The conventional deep watermarking method generally treats the watermark embedding process as an end-to-end pixel level optimization problem for a watermark image or a carrier image, and needs a network to simultaneously consider the image perception quality, the watermark robustness and various distortion adaptability. The method often relies on a deep convolution structure or a multi-branch network for redundant modeling, so that a large amount of parameters and calculation overhead are introduced, and the problem of limited efficiency of the model is faced in a mobile terminal or real-time deployment scene. Second, existing watermark decision mechanisms rely excessively on pixel-level or global averaging features, and are highly sensitive to local distortion. In the watermark extraction stage, many methods directly perform watermark decoding based on pixel-by-pixel prediction or global feature regression, and do not fully consider local blurring, reflection interference and regional illumination variation which are commonly existe