CN-121997303-A - Imperceptible layered watermark embedding method based on language model multi-bit embedding

CN121997303ACN 121997303 ACN121997303 ACN 121997303ACN-121997303-A

Abstract

The invention relates to the technical field of industrial data processing, and discloses an imperceptible hierarchical watermark embedding method based on language model multi-bit embedding, which comprises the steps of obtaining an industrial design script sequence to be processed, a candidate mark list and an instruction logic tree; the method comprises the steps of determining a logic topology depth value of a candidate mark relative to a global reference node based on a node dependency relationship, calculating a dynamic sampling offset operator by combining a private key, a hash value and the depth value, performing dimension remapping correction on the mark containing a geometric constraint value, performing sampling space partition modulation on a high-entropy semantic interval based on the operator, and embedding a multi-bit watermark.

Inventors

LU XIAOYUAN
JIN JIAMIN

Assignees

上海浦东密码研究院

Dates

Publication Date: 20260508
Application Date: 20260129

Claims (10)

1. An imperceptible hierarchical watermark embedding method based on language model multi-bit embedding is characterized by comprising the following steps: Step S1, acquiring an industrial design script sequence to be processed, a candidate mark list corresponding to the industrial design script sequence to be processed and an instruction logic tree corresponding to the industrial design script sequence to be processed, wherein the instruction logic tree comprises a global reference node, a geometric entity definition node, a size association constraint node and a manufacturing parameter instruction node; Step S2, determining that a candidate mark to be generated currently corresponds to a logic topology depth value relative to a global reference node in an instruction logic tree based on the hierarchical dependency relationship of each node in the instruction logic tree, wherein the logic topology depth value is used for representing the logic hierarchical order of the current instruction in a parameterized modeling flow; step S3, calculating a dynamic sampling offset operator by utilizing a dynamic hash value generated by a preset private key and an industrial design script sequence to be processed and combining a logic topology depth value so as to establish a probability mapping relation between watermark bits and key instruction fields; Step S4, before sampling space modulation is executed, searching whether a candidate mark containing a geometric constraint value exists in a high entropy semantic interval of a candidate mark list, if so, executing dimension remapping correction on a probability distribution vector of the candidate mark list by using a dynamic sampling offset operator, mapping watermark information to the outside of an instruction field corresponding to the geometric constraint value so as to limit the probability distribution offset of the candidate mark list in a preset value fluctuation interval; And S5, according to the numerical value of the dynamic sampling offset operator, sampling space partition modulation is carried out on the high-entropy semantic interval of the candidate mark list, so that the generated candidate mark carries a multi-bit superposition watermark containing model source information and user tracing information.
2. The method for embedding the imperceptible hierarchical watermark based on the multi-bit embedding of the language model according to claim 1, wherein the step S2 specifically comprises traversing an instruction logic tree, identifying geometric variable nodes or process parameter nodes to which candidate marks to be generated currently belong, calculating the logic path length of the geometric variable nodes or the process parameter nodes relative to an initial definition node, calculating a logic topology depth value according to the logic path length, and reducing the logic topology depth value when the logic path length exceeds a preset length threshold value so as to reduce the sampling offset intensity at the corresponding candidate marks.
3. The method for embedding an imperceptible hierarchical watermark based on multi-bit embedding in a language model according to claim 1, wherein in step S3, a dynamic sample offset operator is used The calculation rule of (2) is as follows: Wherein H is based on a preset private key And industrial design script sequence to be processed The hash operation is executed, d is a logic topology depth value, alpha is a preset topology response coefficient, and the value range of alpha is 0.05 to 0.15.
4. The method for embedding the imperceptible hierarchical watermark based on the language model multi-bit embedding of claim 1, wherein the step S5 specifically comprises dividing a model code sampling interval and an identity mark sampling interval in a high entropy semantic interval according to the numerical value of a dynamic sampling offset operator, and when the fact that the logic topology depth value of a candidate mark to be generated currently is larger than a preset depth threshold value is detected, mapping the identity mark sampling interval into the model code sampling interval in a nested mode, and realizing the nonlinear embedding of the multi-bit superposition watermark by changing the probability distribution of the candidate mark.
5. The method for embedding the imperceptible hierarchical watermark based on the language model multi-bit embedding of claim 1 is characterized by further comprising a hierarchical detection step of counting the total number of instructions in an industrial script to be detected, a step of S52 reconstructing a hash projection space according to a preset private key if the total number of instructions is lower than a preset first quantity threshold value, executing model identification matching, and a step of S53 reversely pushing bit offset characteristics in a high-entropy semantic interval based on a dynamic sampling offset operator if the total number of instructions is not lower than the first quantity threshold value, and executing user tracing identification.
6. The method for multi-bit embedded imperceptible hierarchical watermark embedding based on language model as recited in claim 1, wherein the high entropy semantic interval is determined by calculating an un-normalized predictive probability distribution for each candidate tag in the candidate tag list And the dynamic hash value is utilized to carry out disorder rearrangement on the high-entropy word list and delimit a sampling interval for bearing the multi-bit superimposed watermark.
7. The method for embedding an imperceptible hierarchical watermark based on multi-bit embedding of a language model according to claim 1, wherein the step S5 is further comprised of a consistency check step of retrieving probability distribution offsets corresponding to candidate marks modulated by sampling spatial partitioning, and if it is determined that the probability distribution offsets cause variation of geometric constraint values to exceed a preset tolerance range, performing negative compensation on a dynamic sampling offset operator until the output candidate marks meet monotonicity constraint of an industrial design specification.
8. The method for embedding the imperceptible hierarchical watermark based on the multi-bit embedding of the language model as set forth in claim 1, wherein the step S3 is preceded by a security certificate loading step of retrieving a security encryption certificate corresponding to the current industrial design task from a third party server, extracting a feature vector in the security encryption certificate, and injecting the feature vector into an initialization vector of the hash operation as an auxiliary factor.
9. The method for embedding the imperceptible hierarchical watermark based on the multi-bit embedding of the language model as set forth in claim 5, wherein the step S53 specifically includes extracting a key instruction paragraph from the industrial script to be detected and restoring a local topological feature corresponding to the key instruction paragraph, calculating a logical correlation between the local topological feature and a dynamic sampling offset operator, and reconstructing a bit stream hidden in a vocabulary probability distribution based on the logical correlation to obtain the embedded user traceability information.
10. A language model multi-bit embedding based imperceptible layered watermark embedding method according to claim 1, characterized in that the industrial design script sequence to be processed is a script generated based on a parameterized modeling language, and the size-dependent constraint nodes comprise unit mm for defining the geometrical spacing and degrees for defining the rotation angle.

Description

Imperceptible layered watermark embedding method based on language model multi-bit embedding Technical Field The invention belongs to the technical field of industrial data processing, and particularly relates to an imperceptible hierarchical watermark embedding method based on language model multi-bit embedding. Background Along with the penetration of large language models in the field of industrial data processing, the generation of industrial design scripts, processing instructions and process logic descriptions by utilizing the generation of artificial intelligence has become a mainstream trend, in order to ensure the safety of industrial design assets, the realization of data validation and tracing by implanting invisible watermarks in the generated text is a core means of the current industrial data safety processing, industrial instruction data has high logic atomicity and parameter sensitivity, the conventional multi-bit watermark embedding scheme generally adopts a probability migration mechanism based on statistical distribution, the logarithmic probability is modulated in a vocabulary sampling stage, however, the scheme has conflict between tracing information capacity and data logic precision in an actual industrial scene, industrial instruction sequences are short and have high repetition rate, the conventional statistical scheme is difficult to accumulate enough statistical significance in an extremely short character window, so that tracing information extraction fails, and meanwhile, the conventional probability modulation scheme does not consider the logic constraint relation inside industrial parameters to cause the deviation of numerical value or structure definition in the instruction scripts. Aiming at the problem of difficult tracing of short sequence texts, a conventional improvement path generally enhances signal strength by increasing probability offset amplitude, logic deduction shows that simple parameter enhancement can induce distortion of text expression, so that an industrial modeling script or a processing instruction is logically collapsed in subsequent simulation execution, engineering usability as an industrial production input source is lost, watermark embedding algorithm control logic lacks industrial field knowledge coupling, and causes the engineering usability problem, for example, chinese patent publication with an authorized bulletin number of CN119939544B discloses a large language model generation content detection method based on sentence semantic watermark injection, mark mapping is established by utilizing sentence semantic features to improve tamper resistance, adaptation defects exist in industrial data processing, namely a semantic coding mechanism identifies macroscopic semantic tendency, ignores the node level dependency relationship of an industrial instruction logic tree, leads to disjoint of watermark signal distribution and design skeleton, has low source confidence when processing short sequence instructions, and lacks a geometric constraint numerical value non-avoidance mechanism when the sentence watermark is screened, pursuing semantic mark uniqueness causes key size parameter or rotation angle disturbance, semantic space and multiple bit threat capacity is restricted, and CAD instruction precision processing is precise. Therefore, how to implant the mark with high reliability and imperceptibility and supporting hierarchical traceability in the short sequence text on the premise of maintaining the logical integrity of the industrial instruction, so that the generated industrial digital asset has source traceability under the working conditions with different lengths becomes the technical problem to be solved by the invention. Disclosure of Invention The invention provides a multi-bit embedded imperceptible hierarchical watermark embedding method based on a language model, which comprises the following steps: Step S1, acquiring an industrial design script sequence to be processed, a candidate mark list corresponding to the industrial design script sequence to be processed and an instruction logic tree corresponding to the industrial design script sequence to be processed, wherein the instruction logic tree comprises a global reference node, a geometric entity definition node, a size association constraint node and a manufacturing parameter instruction node; Step S2, determining that a candidate mark to be generated currently corresponds to a logic topology depth value relative to a global reference node in an instruction logic tree based on the hierarchical dependency relationship of each node in the instruction logic tree, wherein the logic topology depth value is used for representing the logic hierarchical order of the current instruction in a parameterized modeling flow; step S3, calculating a dynamic sampling offset operator by utilizing a dynamic hash value generated by a preset private key and an industrial design script sequence to be proce