CN-122027802-A - Anti-screen video watermarking method based on fast Zernike moment and SuperPoint network

CN122027802ACN 122027802 ACN122027802 ACN 122027802ACN-122027802-A

Abstract

The invention discloses a screen capture resisting video watermarking method based on a fast Zernike moment and SuperPoint networks, which mainly comprises two steps of watermark embedding and watermark extraction, and the core is to combine the geometric invariance of the Zernike moment and the high-precision characteristic positioning of the SuperPoint networks and combine with improved quantization index modulation (MQIM) to realize the screen capture attack resisting video watermarking processing. In the aspect of anti-attack capability, the method is outstanding in the aspect of specific composite interference of screen shot, after screen shot attack comprising mole lines and geometric distortion, the average normalized correlation coefficient (NC) of watermark extraction is larger than 0.98, the extraction success rate is stabilized to be more than 95%, and in the aspect of calculation efficiency, the speed of watermark embedding and extraction is improved to 35FPS, so that the requirement of real-time processing is completely met.

Inventors

CHEN BOYU
TAO KEWEI
ZHANG SHANQING
LU JIANFENG
LI LI

Assignees

杭州电子科技大学

Dates

Publication Date: 20260512
Application Date: 20251230

Claims (10)

1. An anti-panning video watermarking method based on a fast Zernike moment and SuperPoint networks is characterized by comprising the following steps: Watermark embedding process Step 1-1, video frame format conversion and embedding carrier selection; step 1-2, extracting and screening feature points based on SuperPoint networks; step 1-3, calculating by adopting a fast-returning Zernike moment algorithm to obtain a Zernike moment amplitude; Step 1-4, watermark embedding based on improvement MQIM; Step 1-5, reconstructing and encoding video frames; (II) watermark extraction flow Step 2-1, preprocessing a screen shot video and geometrically correcting; step 2-2, feature point matching and embedded region positioning; Step 2-3, zernike moment extraction and MQIM decoding; And 2-4, copyright verification.
2. The anti-panning video watermarking method according to claim 1, wherein the step 1-1 specifically comprises the following steps: (1) Format conversion, namely converting an input RGB format video frame into a YUV color space; (2) And selecting the U channel as a watermark embedding carrier.
3. The anti-panning video watermarking method based on the fast Zernike moment and SuperPoint network as claimed in claim 2, wherein the feature point extraction and screening is specifically as follows: (1) Feature point extraction, namely detecting corner features of the U channel frame by adopting SuperPoint networks, generating a dense feature point set, outputting a feature point confidence coefficient thermodynamic diagram, and incorporating feature points with confidence coefficient more than or equal to 0.8 into a candidate set; (2) The feature point screening comprises ① boundary constraint, namely setting the resolution of a video frame as A multiplied by B, setting the radius of an embedded region as k, and enabling feature point coordinates (x, y) to meet the requirements that x is larger than or equal to k, y is larger than or equal to k, A-x is larger than or equal to k and B-y is larger than or equal to k, ② non-overlapping screening, namely sorting candidate feature points according to ascending order of x coordinates, reserving points with the distance of the adjacent feature points being larger than or equal to k by adopting a sliding window, and ③ deduplication optimization, namely storing the feature point coordinates through a hash table, and eliminating repeated coordinates.
4. A fast Zernike moment and SuperPoint network-based anti-panning video watermarking method according to claim 3, wherein the fast-homing Zernike moment algorithm optimizes Zernike moment numerical computation by adopting a recursive strategy, and is specifically implemented as follows: (1) Zernike moment definition Zernike moment is the orthogonal moment in a unit disk, radial basis function , The normalized distance from the point (x, y) to the center of the unit circle is expressed as the polar diameter, and the satisfaction of 0 is less than or equal to Radial basis function less than or equal to 1 Function of angular direction Composition basis function = Wherein n is the order, m is the degree of repetition, and n-m is even and m is less than or equal to n; (2) Recursive strategy: The recursion relation is that a low-repetition polynomial is derived from a radial polynomial with fixed order n and high repetition, and the formula is as follows: Wherein the higher order term recurrence coefficient Coefficient of recurrence of intermediate and secondary terms Sum-of-partial compensation term coefficient The definition is as follows: Special case handling, when n=m or n=m+2, using a closed form solution to avoid recursive divergence; Complexity optimization, namely reducing the computational complexity from O (n 4 ) to O (n 2 ) through inter-frame radial basis function value sharing; (3) Normalization processing, namely mapping the watermark embedding area to a unit disc, and calculating the Zernike moment amplitude |A nm |.
5. The anti-panning video watermarking method based on fast Zernike moment and SuperPoint network according to claim 4, wherein the watermark embedding is specifically as follows: (1) Watermark preprocessing, namely watermark is a 64-bit binary sequence, is generated by encoding key copyright information according to a predefined format, generates an 8-bit CRC check code and is added to a 64-bit original watermark sequence to jointly form a 72-bit watermark string, wherein the key copyright information comprises copyright identification, a time stamp and a device identification code; (2) Dynamic embedding intensity adjustment, namely adopting improved quantization index modulation MQIM to dynamically adjust quantization step S according to Zernike moment amplitude |A nm |, wherein the formula is as follows: Wherein the method comprises the steps of For the amplitude normalization value (a is more than or equal to 0 and less than or equal to 1), S epsilon [0.01,0.03]; (3) Embedding each bit of watermark bit in the watermark string into the amplitude of Zernike moment one by one, wherein the embedded amplitude I A nm ' I meets the following conditions: When watermarking bits When=1, |a nm ' |e [ ks+s/2, (k+1) S); When watermarking bits When=0, |a nm ' |e [ kS, ks+s/2); where k is an integer, and finally, frame update is achieved by residual superposition: Where f W (x, y) represents the pixel value of the watermarked U-channel frame at coordinates (x, y), f (x, y) represents the pixel value of the original U-channel frame at coordinates (x, y), I is the embedded intensity coefficient, Is a watermark residual.
6. The anti-panning video watermarking method according to claim 5, wherein the video frame reconstruction and encoding is specifically as follows: (1) Channel merging, namely merging the U channel frame embedded with the watermark with the original Y, V channel frame, and converting the U channel frame into a reconstructed frame in an RGB format; (2) And (3) video coding, namely adopting an H.264 coding standard to code the reconstructed frames, controlling the fluctuation of the code rate to be less than or equal to a set threshold value, and finally generating the video with the watermark, wherein the video consists of a series of reconstructed video frames.
7. The anti-panning video watermarking method based on a fast Zernike moment and SuperPoint network according to claim 1, wherein the panning video preprocessing and geometry correction is specifically as follows: (1) Deblurring, namely processing each frame of image of the screen shot video by adopting non-local mean filtering, eliminating motion blur caused by screen capture, and simultaneously keeping the edge details of the screen; (2) Enhancing the time domain significance of the screen frame through the directional gradient histogram characteristics, and generating a multi-scale edge map by adopting a Canny edge detection algorithm; (3) The method comprises the steps of performing perspective correction, namely ① corner detection, namely rapidly and initially selecting candidate corners in an edge graph by adopting a FAST corner detection algorithm, introducing a random sampling consistency algorithm, namely randomly sampling a small number of candidate points for multiple times to calculate a perspective transformation model, counting the inner points meeting the model, regarding a corner set corresponding to the model with the largest inner points as a correct corner library, setting a matching error threshold, regarding the points with the re-projection errors larger than the threshold in the corner library and the ideal model as outer points and removing the outer points, finally obtaining an optimized corner library, and performing ② perspective transformation, namely calculating a perspective transformation matrix according to the corner coordinates in the corner library, performing perspective transformation on a screen shot video frame with geometric distortion caused by the screen shot angle based on the perspective transformation matrix, correcting the screen shot video frame into a regular rectangular frame, cutting a non-screen area through a slicing algorithm, and recovering to the original video resolution.
8. The anti-panning video watermarking method based on a fast Zernike moment and SuperPoint network as claimed in claim 7, wherein the feature point matching and embedding region positioning is specifically as follows: (1) Extracting feature points by adopting SuperPoint networks, extracting feature points by adopting SuperPoint networks, setting a confidence threshold value on the basis of parameter configuration of an embedding stage, and filtering, wherein only feature points with probability values not lower than the threshold value are reserved, and are included in a candidate set for subsequent matching; (2) Nearest neighbor matching, namely mapping the extracted characteristic points to a coordinate system in embedding by adopting a K nearest neighbor matching algorithm; (3) And (3) region positioning, namely determining a watermark embedding region according to the matched characteristic point coordinates, and ensuring the position of the watermark embedding region to be consistent with that of the region in the embedding stage.
9. The anti-panning video watermarking method according to claim 8, wherein the steps 2-3 specifically operate as follows: (1) Zernike moment calculation, namely calculating the amplitude |A nm '' | of the Zernike moment by adopting a fast-regression Zernike moment algorithm on the embedded area after positioning; (2) MQIM inverse map decoding by comparing |a nm ″ | to quantization step S, decoding watermark bits: when |A nm '' | -kS < S/2, the bits are decoded =0; When |A nm '' | -kS is not less than S/2, the bits are decoded =1; (3) And if the verification fails, the watermark information of the adjacent frames is subjected to interpolation restoration and then is output as effective watermark information, and the interpolation restoration is concretely carried out by utilizing the 64-bit watermark information which is successfully extracted and passes the verification in the adjacent frames and restoring the error watermark bits of the current frame by a majority judgment or time sequence interpolation method.
10. The anti-panning video watermarking method according to claim 9, wherein the steps 2-3 specifically operate as follows: calculating a normalized correlation coefficient NC of the decoded watermark and the original watermark, wherein the formula is as follows: Where N represents the total length of the watermark sequence, Representing the value of the original watermark sequence at the j-th position, Representing the mean value of the original watermark sequence w, Representing the value of the watermark sequence extracted from the video at the j-th position, Representing the average value of watermark sequences extracted from the screen shot video, judging as a legal video when NC is more than or equal to 0.95, judging as a pirated video when NC is less than or equal to 0.95, and outputting leakage tracing information read from successfully extracted watermark bits.

Description

Anti-screen video watermarking method based on fast Zernike moment and SuperPoint network Technical Field The invention belongs to the technical field of digital watermarking and information security, in particular relates to a robust video watermarking method aiming at video content copyright protection, which is particularly suitable for resisting screen shooting attacks (such as geometric distortion, moire interference and the like caused by shooting a screen of a mobile phone) and is applied to digital media copyright authentication, leakage tracking and content integrity verification. Background The prior video watermarking technology mainly comprises a mixed transform domain watermarking algorithm, a quantization index modulation method and a watermarking model based on deep learning. These techniques all expose significant limitations in dealing with the special attack pattern of being shot (screened) by the screen. The process of panning attacks involves digital-to-analog-to-digital conversion, which introduces complex distortions including moire dynamic interference, lens geometric distortion (e.g., barrel/pincushion distortion), and CMOS sensor noise, which pose a serious challenge to the robustness of the watermark. The limitations of conventional transform domain and quantization methods are based on transform domain (e.g. DCT, DWT) or quantization index modulation, which have the core drawback of relying on spectral decomposition of fixed basis functions or preset quantization step sizes. When the method faces dynamic geometric deformation (such as perspective, rotation and scaling) caused by screen shot, stable feature extraction is difficult to maintain, global geometric information is easy to lose, and watermark extraction accuracy is greatly reduced. Meanwhile, the adaptability of the anti-attack watermark embedding method to the special compound noise (such as moire and illumination change) of the screen shot is poor, the watermark embedding strength is enhanced for improving the anti-attack capability, the unacceptable degradation of the video image quality is often directly caused, and the imperceptibility and the robustness are difficult to balance. The challenge of the deep learning model is that although the watermark model based on deep learning can adaptively optimize watermark embedding and extracting strategies through end-to-end training and simulate screen shot distortion to a certain extent, the watermark model generally has the problems of high computational complexity and need of large-scale data labeling for training, and is difficult to meet the low-delay requirement of real-time scenes such as monitoring video, live streaming and the like. In addition, the model has limited capability of generalizing changes caused by unknown attacks which are not covered by training data or different equipment combinations (such as resolution and sensor difference between a display and a shooting device), and has the problem of poor cross-equipment compatibility. For example, a low resolution screen in combination with a high resolution camera may result in a watermark extraction that drops by more than 10%. In summary, when the existing video watermarking technology is used for coping with the panning attack, the existing video watermarking technology has the obvious defects of weak geometric attack resistance, difficulty in balancing imperceptibility and robustness, poor cross-equipment compatibility, low calculation efficiency and the like. Disclosure of Invention The invention provides a screen shot video watermarking method based on a fast Zernike moment and SuperPoint network, which aims to systematically solve the watermarking technical problem under screen shot attack, and aims to resist Moire interference, lens geometric distortion and CMOS noise in screen shot attack and ensure that watermark information can still be accurately extracted under a high-distortion environment. The robustness of the watermark to geometric attacks (such as rotation and scaling), compression attacks (such as H.264/HEVC) and frame loss attacks is improved while the high video quality (PSNR is more than or equal to 45 dB) is ensured, and the cooperative optimization of imperceptibility and anti-attack capability is realized. The calculation complexity is reduced through algorithm optimization, and watermark real-time embedding and extraction (the reasoning speed is more than or equal to 30 FPS) are realized, so that the requirements of real-time application scenes such as monitoring video, live streaming and the like are met. The cross-equipment compatibility is improved, the watermark extraction rate is ensured to be stabilized at more than 95% under the combination of different displays and shooting equipment, and the performance fluctuation caused by equipment difference is overcome. The watermark embedding position and strength strategy are optimized, the influence of local area loss (such as frame