CN-121981876-A - Multimode watermark conversion method and device

CN121981876ACN 121981876 ACN121981876 ACN 121981876ACN-121981876-A

Abstract

The invention provides a method and a device for converting multi-mode watermarks, which relate to the technical field of digital watermarks, and the method comprises the steps of identifying the request type of user request information; extracting characteristic data of an image watermark through a depth convolution network if the request type is a first type, quantizing the characteristic data by using a preset codebook to obtain a discretized codeword, constructing an adjacent matrix according to the discretized codeword, determining text information of the image watermark based on the adjacent matrix, embedding the text information into an initial noise characteristic of a diffusion model to generate a watermark-containing image, extracting binary text watermark information in the watermark-containing image through the diffusion model if the request type is a second type, determining the adjacent matrix based on the binary text watermark information, determining the characteristic data of the binary text watermark information by using the preset codebook, repairing the characteristic data of the binary text watermark information, and decoding to obtain the image watermark in the watermark-containing image.

Inventors

CHEN PING
FANG ZHONGLI
YANG RUIHAO
WANG ZELIN
Wang Lvchun
HU JIAYIN

Assignees

复旦大学

Dates

Publication Date: 20260505
Application Date: 20260403

Claims (9)

1. A method of converting a multi-modal watermark, comprising the steps of: Acquiring user request information and identifying a request type of the user request information; If the request type of the user request information is a first type for image generation, extracting characteristic data of an image watermark through a deep convolution network, quantizing the characteristic data by utilizing a preset codebook to obtain a discretized codeword, and constructing an adjacent matrix according to the discretized codeword; If the request type of the user request information is a second type for decoding the watermark in the watermark-containing image, extracting binary text watermark information in the watermark-containing image through a diffusion model, determining an adjacent matrix based on the binary text watermark information, determining characteristic data of the binary text watermark information by using the preset codebook, repairing the characteristic data of the binary text watermark information to obtain repair characteristics, and decoding the repair characteristics to obtain the image watermark in the watermark-containing image.
2. The method of claim 1, wherein prior to the step of obtaining user request information and identifying a request type of the user request information, further comprising: And training the initial codebook according to a preset loss function by adopting a straight-through estimator to obtain a preset codebook which meets the mutual information condition and meets the capacity condition.
3. The method of claim 2, wherein the predetermined loss function comprises an embedded loss term, a committed loss term, and a perceived loss term, and the mathematical representation of the predetermined loss function is L total = L perceptual + β(L embedding + L commitment ), wherein L total is total loss, L perceptual is perceived loss term, L embedding is embedded loss term, and L commitment is committed loss term.
4. The method for converting a multi-modal watermark according to claim 1, wherein the step of repairing the feature data of the binary text watermark information to obtain the repaired feature comprises: compressing the characteristic data of the binary text watermark information in restoration to obtain compressed data; and carrying out up-sampling processing in restoration on the compressed data to obtain restoration characteristics.
5. The method for converting a multi-modal watermark as set forth in claim 4, wherein the step of compressing the feature data of the binary text watermark information in the repair process to obtain compressed data includes: Acquiring size information of a watermark image, and determining compression parameters based on the size information; and carrying out compression processing in repairing on the characteristic data of the binary text watermark information according to the compression parameters to obtain compressed data.
6. The method of claim 1, wherein the step of determining text information of the image watermark based on the adjacency matrix comprises: And adjusting the adjacent matrix into a target matrix with a preset size, and expressing data in the target matrix in an 8-bit binary format.
7. A multi-modal watermark conversion apparatus, the apparatus comprising: the request identification module is used for acquiring the user request information and identifying the request type of the user request information; The first computing module is used for extracting characteristic data of the image watermark through a depth convolution network if the request type of the user request information is a first type used for image generation, quantizing the characteristic data by utilizing a preset codebook to obtain a discretized codeword, and constructing an adjacent matrix according to the discretized codeword; the second calculation module is used for extracting binary text watermark information in the watermark-containing image through a diffusion model if the request type of the user request information is a second type for decoding the watermark in the watermark-containing image, determining an adjacent matrix based on the binary text watermark information and determining characteristic data of the binary text watermark information by utilizing the preset codebook, repairing the characteristic data of the binary text watermark information to obtain repair characteristics, and decoding the repair characteristics to obtain the image watermark in the watermark-containing image.
8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method of converting a multimodal watermark as claimed in any of claims 1 to 6 when executing the computer program.
9. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the method of converting a multimodal watermark as claimed in any one of claims 1 to 6.

Description

Multimode watermark conversion method and device Technical Field The present invention relates to the field of digital watermarking technologies, and in particular, to a method and an apparatus for converting multi-mode watermarks. Background With the rapid development of artificial intelligence Content (ARTIFICIAL INTELLIGENCE GENERATED Content, AIGC) technology, a generation model represented by a diffusion model is widely used in the fields of image generation, content creation, information propagation and the like. The high-quality and low-threshold image generation capability obviously improves the production efficiency and the creation freedom degree, and simultaneously brings a series of security and compliance risks such as copyright right determination difficulty, content traceability missing, false information propagation, content abuse generation and the like. How to reliably identify and track the source of a generated image and ensure the legal rights and interests of the content has become a key problem to be solved in the field of digital content security. The digital watermarking technology is used as a direct and effective content tracing and copyright protection means and is widely applied to identity identification and ownership authentication of digital images. The existing image watermarking methods can be mainly divided into two types, namely a watermarking method based on text information and a steganography method based on an image carrier. In which text-based watermarking methods typically enable tracking and verification of content sources by embedding binary identification information during image generation or propagation, the method has stronger robustness when facing conventional image processing operation, but the watermark information capacity is strictly limited, and complex or high-dimensional copyright information is difficult to bear. In contrast, the image steganography technology can embed secret images or structured information with high capacity, but the methods generally rely on an idealized noiseless training environment, have weak resistance to common attacks such as compression, clipping, noise disturbance and the like in practical application, and are easy to lose or distort watermark information. In a complex real scene, the image is often required to be processed, transmitted and redistributed for multiple times, the traditional watermarking method is difficult to balance among robustness, information capacity and safety, text watermarking is difficult to meet the requirement of high information density, and the image steganography method is difficult to maintain stable restorability under a strong attack environment. This contradiction is particularly prominent in the generated image scene, and limits the further application of the existing watermarking technology in the AIGC copyright protection and trusted generation fields. In addition, with the development of multi-modal artificial intelligence technology, the semantic association between text and images is increasingly tight, and a single-modal watermarking mechanism has difficulty in adapting to the requirements of a generated model on multi-modal information collaborative processing. Most of the existing watermark schemes are limited to single-mode information embedding and extraction, lack of uniform representation space, difficulty in realizing flexible conversion between text watermark and image watermark, and incapability of fully utilizing the expression capability of depth model on high-dimensional characteristics and structural information. Disclosure of Invention In order to solve the defects in the prior art, the invention provides a multi-mode watermark conversion method to solve the problems that in the prior art, single-mode information is embedded and extracted, unified representation space is lacking, flexible conversion between text watermark and image watermark is difficult to realize, and the expression capability of a depth model on high-dimensional characteristics and structural information cannot be fully utilized. In order to solve the technical problems, the invention adopts the following technical scheme. In a first aspect, the present application provides a method for converting a multi-modal watermark, including the steps of: Acquiring user request information and identifying a request type of the user request information; If the request type of the user request information is a first type for image generation, extracting characteristic data of an image watermark through a deep convolution network, quantizing the characteristic data by utilizing a preset codebook to obtain a discretized codeword, and constructing an adjacent matrix according to the discretized codeword; If the request type of the user request information is a second type for decoding the watermark in the watermark-containing image, extracting binary text watermark information in the watermark-containing image th