CN-122021788-A - Knowledge poisoning attack defending method for retrieval enhancement generation system

CN122021788ACN 122021788 ACN122021788 ACN 122021788ACN-122021788-A

Abstract

A search enhancement generation system knowledge poisoning attack defense method relates to the technical field of artificial intelligent security and graphic neural networks, and solves the problems that the existing defense method relies on isolated detection of single document content, ignores structural relations between documents and interaction contexts of the documents and user queries, and further causes difficulty in capturing malicious documents with high search correlation but low semantic consistency. The method models the query and search documents into a heterogeneous graph structure, fuses semantic embedding and graph structural features, constructs a global graph comprising query-document edges and document-document similar edges, classifies document nodes through an edge type-aware graph attention network, and identifies malicious injection documents. The defensive capability of the RAG system to knowledge poisoning attack is improved, and safe and reliable external knowledge input is provided for a large language model.

Inventors

REN WEIWU
LIANG CONG

Assignees

长春理工大学

Dates

Publication Date: 20260512
Application Date: 20260410

Claims (8)

1. A method for defending a knowledge poisoning attack of a search enhancement generation system is characterized by comprising the following steps: Step S1, collecting labeled search data under a known attack scene, wherein each sample in the search data comprises a user query document and a corresponding search document set; S2, respectively encoding each document in a user query document and a retrieval document set by utilizing a pre-training sentence encoding model to obtain a query semantic embedded vector and a document semantic embedded vector; s3, constructing a global heterogram according to the query semantic embedded vector and the document semantic embedded vector obtained in the step S2; S4, calculating a four-dimensional structural feature vector for each document node in the global heterogram; Step S5, splicing the document semantic embedding vector and the four-dimensional structural feature vector to form an initial feature representation of each node, adopting a graph annotation force network model perceived by an edge type to encode the global abnormal graph to obtain embedded representations of all document nodes, fusing edge attribute and edge type embedded information by the graph annotation force network model, realizing self-adaptive aggregation of document node features through a multi-head attention mechanism, and calculating the probability of each document; And S6, setting a decision threshold, judging the document with the probability higher than the decision threshold as a malicious document, inputting the rest of the document as a context into a large language model to generate a final answer, and realizing effective defense on knowledge poisoning attacks in a retrieval enhancement generation system.
2. The method for defending against a poisoning attack by a knowledge of a search enhancement generation system according to claim 1, wherein in step S1, an original sample set S is constructed for tagged search data, the original sample set S includes a plurality of samples, each sample includes a user query document q and a corresponding search document set thereof Wherein each document is marked with a binary label 0 Represents a normal document, and 1 represents a malicious document.
3. The method for defending against a poisoning attack by retrieving enhancement generation system knowledge according to claim 2, wherein in step S3, the specific process of constructing the global heterogram is as follows: step S31 initializing a map data list Selecting a first sample from the original sample set S as a current sample to be processed; Step S32, constructing a query-document heterogram of a current sample, adding query nodes and document nodes, and adding directed edges from the query nodes to each document node; s33, constructing semantic association edges among document nodes to obtain a query-document heterogram of a current sample; step S34, adding the query-document heterogeneous graph of the current sample to the graph data list Selecting the next sample in the original sample set S, judging whether the sample exists or not, if so, returning to the step S32, otherwise, executing the step S35; Step S35, list the graph data Query-document heterogeneous graphs of all samples in a document are merged into a global heterogeneous graph Wherein, the method comprises the steps of, In order to query the set of nodes, For a set of document nodes, E is a set of edges.
4. The method for defending against a system knowledge poisoning attack by retrieving enhancement generation according to claim 3, wherein in step S33, the process of constructing the semantic association edge is as follows: Computing a current document With the rest of the documents Cosine similarity of (2), if the similarity is greater than a preset threshold And the rest of the documents Belonging to the current document The top k most similar neighbors of (1), then add from the current document Pointing to the remaining documents The edge attribute is a cosine similarity value.
5. The method for defending against a knowledge poisoning attack by a search enhancement generation system according to claim 4, wherein in step S4, the specific process of calculating the feature vector of the four-dimensional structure is as follows: setting four-dimensional structural features including reciprocal edge proportion, neighbor average similarity, local clustering coefficient and search score; The reciprocal edge proportion is used for representing the proportion of the reverse edge existing in the outgoing edge of the document node; The neighbor average similarity is used for representing the average value of all the edge attributes corresponding to the outgoing edges of the document nodes; the local clustering coefficient is used for representing the ratio of the actual connection number between the neighbors of the document nodes to the theoretical maximum connection number in the undirected subgraph formed by reciprocal edges; The retrieval score is used for representing the retrieval score of the document node; And carrying out standardization processing on the four-dimensional structural features to obtain four-dimensional structural feature vectors.
6. The method for defending against a knowledge poisoning attack by a search enhancement generation system according to claim 5, wherein the reciprocal edge ratio formula is: ; In the formula, Is a document node in the heterogram; A directed edge set between document nodes; Is a document node Is a degree of departure of (2); Is a document node Is a set of neighbors that are to be selected, Is a very small positive number; The calculation formula of the neighbor average similarity is as follows: ; In the formula, Is a document node Document node Cosine similarity between them; the calculation formula of the local clustering coefficient is that the method comprises the following steps of Record as a reciprocal neighbor set ; ; In the formula, For the set of reciprocal edges, Is a document node Is a set of reciprocal neighbors of a (c), Is a document node in the heterogram if <2, Then =0; The calculation formula of the search score is as follows: ; In the formula, To query documents Semantic embedded vectors of (a); Is a document Semantic embedded vectors of (a); Representing the dot product operation of the vector.
7. The method for defending against a system knowledge poisoning attack by a search enhancement generation system according to claim 6, wherein in step S5, the specific implementation process of the graph annotation network model is as follows: S51, initializing a graph attention network model with edge type perception and a full-connection classification layer, and setting a maximum training round And the current round t=1; s52, executing forward propagation of a t-th round; The global heterogeneous graph serving as a training set is input into a graph attention network model, the network model utilizes a multi-head attention mechanism to fuse edge attribute and edge type information, calculates new embedded representation of document nodes, and outputs the prediction probability of each document belonging to a malicious document through a fully-connected classification layer ; In the training process, the prediction probability is calculated The loss value between the real tag y and the network model parameter is updated by using a back propagation algorithm; Step S53, executing verification and early-stop judgment; evaluating the performance of the current network model on the verification set, and judging whether the current turn t is smaller than And the early stop condition is not triggered, if yes, let t=t+1, return to execute step S52, otherwise, end training and save the optimal network model parameters, and execute step S6.
8. The method for defending against a poisoning attack by retrieving enhancement generation system knowledge of claim 7, wherein the specific implementation process of step S6 is as follows: s61, setting a decision threshold search interval , For the lower threshold of the decision threshold, For determining the upper limit of the threshold value, selecting the current threshold value ; S62, calculating a current threshold value Classifying F1 fraction under the condition, and recording the optimal threshold corresponding to the optimal F1 fraction ; S63, selecting the next threshold value Judging the next threshold value Whether or not to be less than or equal to If so, then = Returning to execute the step S62, otherwise, executing the step S63; S63, adopting optimal network model parameters and optimal decision threshold Reasoning is carried out on the new retrieval document, if the malicious probability output by the network model is greater than Then it is determined to be a malicious document.

Description

Knowledge poisoning attack defending method for retrieval enhancement generation system Technical Field The invention relates to the technical field of artificial intelligent security and graph neural networks, in particular to a knowledge poisoning attack defending method for a retrieval enhancement generation system. Background A retrieval enhancement generation (RAG) system may promote fact accuracy by retrieving documents from an external knowledge base to assist in generating responses from a large language model. However, an attacker may inject a small number of malicious documents into the knowledge base, manipulate the search results, cause the model to generate spurious or harmful content, and such an attack is referred to as a knowledge-based poisoning attack. Existing defense methods rely primarily on isolated detection of individual document content, such as determining document authenticity based on a text classifier, or comparing multiple sources through consistency checks. However, such methods ignore structural relationships between documents and the interactive context of documents with user queries, and it is difficult to capture typical anomaly patterns of high retrieval relevance but low semantic consistency for malicious documents. Therefore, a dynamic defense mechanism capable of jointly modeling query-document multisource relationships and fusing semantic and structural features is needed to realize efficient and robust identification of knowledge poisoning attacks. Disclosure of Invention The invention provides a search enhancement generation system knowledge poisoning attack defense method, which aims to solve the problems that the existing defense method relies on isolated detection of single document content, ignores the structural relation between documents and the interaction context of the documents and user query, further causes difficulty in capturing high search correlation but low semantic consistency of malicious documents, and the like. A search enhancement generation system knowledge poisoning attack defending method is realized by the following steps: Step S1, collecting labeled search data under a known attack scene, wherein each sample in the search data comprises a user query document and a corresponding search document set; S2, respectively encoding each document in a user query document and a retrieval document set by utilizing a pre-training sentence encoding model to obtain a query semantic embedded vector and a document semantic embedded vector; s3, constructing a global heterogram according to the query semantic embedded vector and the document semantic embedded vector obtained in the step S2; S4, calculating a four-dimensional structural feature vector for each document node in the global heterogram; Step S5, splicing the document semantic embedding vector and the four-dimensional structural feature vector to form an initial feature representation of each node, adopting a graph annotation force network model perceived by an edge type to encode the global abnormal graph to obtain embedded representations of all document nodes, fusing edge attribute and edge type embedded information by the graph annotation force network model, realizing self-adaptive aggregation of document node features through a multi-head attention mechanism, and calculating the probability of each document; And S6, setting a decision threshold, judging the document with the probability higher than the decision threshold as a malicious document, inputting the rest of the document as a context into a large language model to generate a final answer, and realizing effective defense on knowledge poisoning attacks in a retrieval enhancement generation system. The invention has the beneficial effects that: According to the knowledge poisoning attack defense method, the RAG search result is modeled to be the global abnormal graph containing the cross query edges, the semantic and refined graph structural features are fused, and the detection capability of the hidden poisoning sample is remarkably improved. The knowledge poisoning attack defending method supports real-time deployment, is suitable for an actual RAG safety protection scene, improves the defending capability of the RAG system on knowledge poisoning attacks, and provides safe and reliable external knowledge input for a large language model. The knowledge poisoning attack defense method is compared with the existing RAG defense method and the trusted RAG detection method on indexes of False Positive Rate (FPR), accuracy rate (DACC) and False Negative Rate (FNR) respectively, and the detection method is obviously superior to the existing method. Drawings FIG. 1 is a flow chart of a method for defending against a knowledge-based poisoning attack of a search enhancement generation system according to the present invention. FIG. 2 is a graph comparing the present invention with two prior art methods on DACC. FIG. 3 is a graph comparing the F