CN-122020682-A - Conditional triggering text watermarking method for official content trusted verification

CN122020682ACN 122020682 ACN122020682 ACN 122020682ACN-122020682-A

Abstract

The invention relates to a conditional triggering text watermarking method for verifying official content credibility, which realizes implicit watermark embedding of official text and rapid screening of public opinion ends through full-flow design of large language model confidence coefficient extraction, hierarchical triggering watermark embedding, pseudo-random red-green set division, statistical feature detection and thresholding authenticity judgment. The invention gives consideration to the natural readability of official texts and the detectability of watermarks, the detection Z-score can reach 7.49, the hit proportion of green sets reaches 69.21%, and the invention is suitable for scenes such as government official document release, news draft clarification, network public opinion management and the like, and has good expansibility and engineering practicability.

Inventors

HE MINGZE
PENG JING
TIAN TING
YANG XINGCHUN
XU HONG
YANG MIAO
QU XIAOFENG
ZENG QIANG

Assignees

四川警察学院
四川省公安科研中心

Dates

Publication Date: 20260512
Application Date: 20251226

Claims (7)

1. A conditional triggering text watermarking method for verifying the trust of official contents is characterized by comprising the following steps: calling a large language model to generate official document text, and extracting confidence information output by the model in the generation process as watermark embedding basis; Step two, hierarchical control is carried out on the generating process according to the confidence information, watermark embedding is skipped under the high confidence condition, watermark embedding is triggered under the medium-low confidence condition, and bias strength is adjusted according to the level; when watermark embedding is triggered, dividing a red set and a green set of candidate output by combining a pseudo-random function with a preset seed, and ensuring that the dividing results of a generating end and a detecting end are consistent; step four, applying corresponding bias to the green set according to a grading strategy and completing sampling to generate official text carrying implicit watermark features; detecting suspected official texts transmitted by a network in a public opinion monitoring stage, reconstructing a red set and a green set by a detection end based on the same pseudo-random rule as a generation end, and counting watermark characteristics; And step six, judging whether the text is true or false according to the detection result and a preset threshold value interval, and judging that the watermark is generated officially when the watermark is detected, and judging that the watermark is generated unofficially when the watermark is not detected, so as to realize the credibility verification and public opinion screening of the authoritative content.
2. The method for conditional triggering text watermarking for official content trusted verification according to claim 1, wherein the step one specifically comprises: S1.1, inputting preset prompt information or fact elements, and initializing a language model to generate official document text; s1.2, in each prediction step, acquiring a candidate word element set output by a large language model Probability distribution corresponding to the probability distribution Wherein ; S1.3 determining the highest probability candidate lemma from the probability distribution P And its probability value And the next highest probability candidate lemma And its probability value ; S1.4 calculating confidence index And takes the value as the confidence information of the current prediction step.
3. The method for conditional triggering text watermarking for official content trusted verification according to claim 2, wherein the step two specifically comprises: S2.1 when the confidence index satisfies When the confidence level is judged to be high, the candidate word element with the highest probability is directly output, and watermark embedding is not executed; s2.2 when the confidence index satisfies When the set is determined to be the medium message level, linear bias is applied to logits of green set candidate word elements after red and green sets are divided Implementing medium strength watermark embedding, wherein Is a linear amplification factor; S2.3 when When the confidence level is low, the index bias is applied to logits of the green set candidate word elements after the red and green sets are divided Implementing strong watermark embedding, wherein For the maximum strength of the bias to be the maximum, Is an exponential growth coefficient.
4. The method for conditional triggering text watermarking for official content trusted verification according to claim 3, wherein the third step comprises: s3.1, setting the candidate character set as And setting fixed random seeds ; S3.2 calling a pseudo-random function Is a collection Generating a pseudo-random value for each candidate token in a database Wherein Index of candidate word elements in the set; S3.3 according to the dividing ratio parameter Will be assembled Divided into red sets And green set ; S3.4, fixing the pseudo-random function and the setting of the random seeds so that the detection end can reproduce the red set under the same condition And green set And ensuring the consistency of the division results.
5. The method for conditional triggering text watermarking for official content trusted verification according to claim 4, wherein the step four specifically comprises: S4.1, determining the bias strength of the current prediction step according to the hierarchical control result Wherein a high confidence level corresponds to The middle confidence level corresponds to a linear bias function, and the low confidence level corresponds to an exponential bias function; S4.2 applying a bias value to the Green set Logits score of each candidate word element in the set, and keeping red set Logits is unchanged; S4.3, normalizing the corrected logits distribution to obtain a new probability distribution; And S4.4, sampling is performed based on the corrected probability distribution, the current predicted word element is output and is used as a part of the generated text, and the official text carrying the implicit watermark feature is generated.
6. The method for conditional triggering text watermarking for official content trusted verification according to claim 5, wherein the fifth step comprises: s5.1, receiving and analyzing a text to be detected, and dividing the text into continuous word sequences; S5.2, calling a pseudo-random function and seeds consistent with the generating end on each prediction step, and reconstructing a corresponding red set And green set ; S5.3, counting whether the output word elements in the text belong to the green set, and accumulating the hit times of the green set ; S5.4, calculating the detection statistic according to the statistic result , Wherein In order to detect the number of positions, To green aggregate proportion, and As the basis for watermark feature determination.
7. The method for conditional triggering text watermarking for official content trusted verification according to claim 6, wherein the step six specifically includes: s6.1. Statistics of the detection And a preset threshold interval Comparing; s6.2 when When the text is judged to contain watermark characteristics, the text is judged to be generated by authorities; S6.3 when When the text is judged to be free of watermark features and is regarded as unofficial generation At that time, the text is marked as suspicious and submitted to a human or other system for review.

Description

Conditional triggering text watermarking method for official content trusted verification Technical Field The invention relates to a conditional triggering text watermarking method for verifying the credibility of official contents, which is suitable for the generation and anti-counterfeiting of official texts such as government documents, news manuscripts and the like and the rapid screening of false official texts in network public opinion monitoring, and belongs to the technical field of natural language processing and information security. Background With the rapid development of artificial intelligence, particularly large language models (Large Language Model, LLM), the application of generating text (AI-GENERATED CONTENT, AIGC) based on artificial intelligence is becoming increasingly widespread. Government departments, media institutions, and large enterprises are increasingly attempting to automatically write policy documents, news wire copy, public opinion responses, enterprise declarations, and other content through large language models to improve the efficiency and coverage of information release. Artificial intelligence has high naturalness and high fluency in text generation, making it of great value in public management, business propagation, and social governance. However, accompanying the popularity of applications is a false content problem that impersonates an official or authoritative body in the network space. Some lawbreakers can use publicly available large language models to generate false declarations, notices or news articles that are highly similar to official text forms and styles, thereby making rumors, misleading public opinion, and even severely impacting social security and public order. For example, false announcements may cause financial market fluctuations and false declarations may cause public panic, which puts an urgent need for the assurance of information credibility. To cope with the above risks, the academia and industry have sequentially proposed text watermarking technology, the basic idea of which is to embed implicit detectable features in the generated text, so as to verify the origin and authenticity thereof through algorithms later. Some existing researches mainly adopt modes of vocabulary replacement, probability distribution disturbance or candidate set division and the like, and statistical features are embedded into generated texts so as to realize tracing and detection. However, the existing text watermarking methods still have significant drawbacks: (1) And the forced embedding strategy is that a plurality of methods embed watermarks in all generation steps, so that the smoothness of texts is easily reduced, even the phenomenon of unnatural semantics is generated, and the public trust and the readability of official documents are influenced. (2) Most methods adopt fixed strength or unified rules to carry out watermark embedding, and the watermark signals are too weak and detection accuracy is insufficient in part of scenes due to the fact that the watermark embedding cannot be flexibly adjusted according to the difference of the confidence coefficient of model output, and the watermark signals are too strong in other scenes to influence text quality. (3) The existing method is mainly focused on general text generation and model traceability verification, lacks a systematic scheme aiming at a complete application chain of official information generation, network propagation monitoring and false content screening, and is difficult to meet the dual requirements of public opinion governance on authority and timeliness. In summary, the prior art cannot maintain the naturalness of the text and simultaneously has the robustness of strength control and detection of watermark embedding, and lacks an overall solution for public opinion screening. Therefore, a new technical method is urgently needed, which can combine the output characteristics of a large language model, adaptively embed an implicit watermark, and realize high-confidence detection and judgment in a public opinion monitoring stage, thereby effectively preventing the transmission of false and impersonated texts, and maintaining the information credibility of a network space and the stability of social public opinion. Disclosure of Invention In order to solve the problems of text distortion, lack of self-adaptive mechanism and insufficient public opinion scene adaptation caused by forced embedding in the existing text watermarking method, the invention provides a conditional triggering type text watermarking method for verifying the credibility of official contents, which realizes intelligent watermark embedding of an official text generation end and rapid authenticity screening of a public opinion monitoring end. A conditional triggering text watermarking method for official content trusted verification, comprising the following steps: calling a large language model to generate official document tex