CN-122020707-A - Sentence level differential privacy prompt word protection method and device for large language model

CN122020707ACN 122020707 ACN122020707 ACN 122020707ACN-122020707-A

Abstract

The invention discloses a sentence level differential privacy prompt word protection method and device for a large language model, which are used for acquiring original prompt words input by a user to the large language model, searching candidate replacement words for tokens included in the original prompt words to obtain candidate replacement word sets corresponding to the tokens, randomly selecting the candidate replacement words from the candidate replacement word sets corresponding to the tokens to be combined to obtain a plurality of candidate disturbance prompt words corresponding to the original prompt words, and determining target disturbance prompt words from the plurality of candidate disturbance prompt words by carrying out privacy and utility calculation on the candidate disturbance prompt words. The invention screens the candidate replacement word set for each token and combines the candidate replacement word set into the candidate disturbance prompting words of sentence level, breaks through the limitation of the traditional token level disturbance, upgrades the disturbance unit into a complete sentence, fundamentally solves the problems of excessive noise and semantic breakage caused by privacy budget fragmentation of long prompting words, and realizes the depth balance of privacy and utility.

Inventors

ZHENG LELE
ZHANG CHAO
ZHANG TAO
ZHU XINGHUI
CHENG KE
SHEN YULONG

Assignees

西安电子科技大学

Dates

Publication Date: 20260512
Application Date: 20260106

Claims (8)

1. A sentence level differential privacy prompt word protection method facing a large language model is characterized by comprising the following steps: acquiring an original prompt word input by a user to a large language model, wherein the original prompt word comprises a plurality of tokens; Searching candidate replacement words for each token included in the original prompt word to obtain a candidate replacement word set corresponding to each token; Randomly selecting candidate replacement words from the candidate replacement word sets corresponding to the tokens to be combined, so as to obtain a plurality of candidate disturbance prompt words corresponding to the original prompt words; and determining target disturbance prompting words from the plurality of candidate disturbance prompting words by carrying out privacy and utility calculation on each candidate disturbance prompting word.
2. The method for protecting sentence-level differential privacy-preserving hints for large language models according to claim 1, wherein the determining the target perturbation hints from the plurality of candidate perturbation hints by performing privacy and utility calculations on each of the candidate perturbation hints comprises: Calculating cosine similarity of the original prompting words and each candidate disturbance prompting word to obtain utility function values corresponding to each candidate disturbance prompting word; according to the preset privacy budget, performing index calculation on the utility function values corresponding to the candidate disturbance prompt words to obtain utility index values corresponding to the candidate disturbance prompt words; And determining a target disturbance prompting word from the plurality of candidate disturbance prompting words according to the utility index value corresponding to each candidate disturbance prompting word.
3. The method for protecting sentence-level differential privacy prompt words for large language model according to claim 1, wherein the step of randomly selecting candidate replacement words from the candidate replacement word sets corresponding to the tokens to combine to obtain a plurality of candidate disturbance prompt words corresponding to the original prompt words comprises the following steps: randomly selecting candidate replacement words from the candidate replacement word sets corresponding to the tokens to be combined to obtain a plurality of initial candidate disturbance prompt words; Performing confusion degree calculation on the plurality of initial candidate disturbance prompting words to obtain the confusion degree of each initial candidate disturbance prompting word; Based on a preset confusion degree threshold, screening initial candidate disturbance prompt words with confusion degree smaller than the preset confusion degree threshold, and obtaining a plurality of candidate disturbance prompt words corresponding to the original prompt words.
4. The method for protecting sentence-level differential privacy prompt words for large language models according to claim 1, wherein the searching candidate replacement words for each token included in the original prompt words to obtain a candidate replacement word set corresponding to each token comprises: Generating target embedded vectors of all tokens included in the original prompt word through a text embedded model; Based on a preset word list in the text embedding model, calculating Euclidean distances between a target embedding vector of each token and a plurality of replacement words in the preset word list; and determining a candidate replacement word set corresponding to each token according to Euclidean distance between each token and a plurality of replacement words in the preset word list and a preset distance threshold.
5. The sentence level differential privacy prompt protection method for large language model according to claim 2, wherein the utility function value corresponding to each candidate disturbance prompt is expressed as: Wherein, the Representing candidate disturbance prompt words The corresponding utility function value is used for generating a utility function value, The original prompt word is represented as such, Represent the first The candidate disturbance prompt words are used for generating a candidate disturbance prompt word, Is an embedded vector representation.
6. The method for protecting sentence level differential privacy prompt words for large language models according to claim 5, wherein the utility index value corresponding to each candidate disturbance prompt word is expressed as: Wherein, the Representing candidate disturbance prompt words The corresponding value of the utility index is used, Representing a pre-set privacy budget and, Representing the sensitivity of the utility function, Representing an exponential function.
7. A method for protecting sentence-level differential privacy hinting words for large language models according to claim 3, wherein the confusion of the initial candidate disturbance hinting words is expressed as: Wherein, the Representing initial candidate disturbance prompt words Is used to determine the degree of confusion of the user, Representing initial candidate disturbance prompt words Comprises the first The number of candidate replacement words is chosen to be, , Representing candidate replacement words Conditional probability of (2).
8. A sentence level differential privacy prompt word protection device facing a large language model is characterized by comprising: The acquisition module is used for acquiring an original prompt word input by a user to the large language model, wherein the original prompt word comprises a plurality of tokens; The replacement word determining module is used for searching candidate replacement words for all tokens included in the original prompt word to obtain a candidate replacement word set corresponding to all the tokens; The sentence-level disturbance construction module is used for randomly selecting candidate replacement words from the candidate replacement word sets corresponding to the tokens to be combined, so as to obtain a plurality of candidate disturbance prompt words corresponding to the original prompt words; and the target disturbance prompt word determining module is used for determining the target disturbance prompt word from the plurality of candidate disturbance prompt words by carrying out privacy and utility calculation on each candidate disturbance prompt word.

Description

Sentence level differential privacy prompt word protection method and device for large language model Technical Field The invention belongs to the technical field of large language models, and particularly relates to a sentence-level differential privacy prompt word protection method and device for a large language model. Background The Large Language Model (LLMs) has significantly advanced in the field of natural language processing, supports various tasks such as text generation, question and answer, dialogue systems and the like, and users interact with the model by providing prompt words, wherein the prompt words not only transmit user intention, but also often contain sensitive information such as personal identification, behavior preference, proprietary data and the like, in a black box deployment or cloud reasoning scene, the users are difficult to know the storage and processing modes of input, and part of the public language model has the condition of revealing the sensitive information in the past user input, so that serious privacy revealing risks exist. Meanwhile, the existing privacy protection method has the defects that although privacy guarantee is strong, the method based on cryptography is high in calculation cost and large in communication delay, real-time or large-scale scenes are difficult to adapt, a client-server hybrid architecture needs to be partially accessed to LLMs architecture or parameters, applicability in a black box scene is limited, most of the methods based on Differential Privacy (DP) are token-level disturbance, the problems that long prompt words are too much in noise and low in output quality due to distribution of privacy budget among tokens, context perception is lacked in independent disturbance of the tokens, semantic incoherence or unnatural sentences are easy to generate and the like exist, and the practicality of the prompt words cannot be considered while privacy is guaranteed. In this context, a prompt word perturbation framework that balances privacy protection and language quality is needed to meet the needs of trusted deployment in LLMs inference services. Disclosure of Invention In order to solve the problems in the prior art, the invention provides a sentence-level differential privacy prompt word protection method and device for a large language model. The technical problems to be solved by the invention are realized by the following technical scheme: in a first aspect, the present invention provides a method for protecting sentence-level differential privacy hint words for a large language model, including: acquiring an original prompt word input by a user to a large language model, wherein the original prompt word comprises a plurality of tokens; searching candidate replacement words for each token included in the original prompt word to obtain a candidate replacement word set corresponding to each token; Randomly selecting candidate replacement words from the candidate replacement word sets corresponding to the tokens to be combined to obtain a plurality of candidate disturbance prompt words corresponding to the original prompt words; and determining the target disturbance prompting words from the plurality of candidate disturbance prompting words by carrying out privacy and utility calculation on each candidate disturbance prompting word. In a second aspect, the present invention provides a sentence-level differential privacy prompt word protection device for a large language model, including: The acquisition module is used for acquiring an original prompt word input by a user to the large language model, wherein the original prompt word comprises a plurality of tokens; The replacement word determining module is used for searching candidate replacement words for all tokens included in the original prompt word to obtain a candidate replacement word set corresponding to each token; the sentence-level disturbance construction module is used for randomly selecting candidate replacement words from the candidate replacement word set corresponding to each token to be combined, so as to obtain a plurality of candidate disturbance prompt words corresponding to the original prompt words; the target disturbance prompt word determining module is used for determining the target disturbance prompt word from the plurality of candidate disturbance prompt words by carrying out privacy and utility calculation on each candidate disturbance prompt word. The sentence-level differential privacy prompt word protection method and device for the large language model provided by the invention break through the limitation of traditional token-level disturbance, upgrade disturbance units into complete sentences, and radically solve the problems of excessive noise and semantic breakage of long prompt words caused by privacy budget fragmentation. Secondly, by carrying out privacy and utility calculation on each candidate disturbance prompting word, selecting the optimal disturba