CN-122021789-A - Knowledge pollution detection and filtering method for retrieval enhancement generation system

CN122021789ACN 122021789 ACN122021789 ACN 122021789ACN-122021789-A

Abstract

The method for detecting and filtering the knowledge pollution of the seed retrieval enhancement generation system relates to the technical field of artificial intelligence safety and solves the problems that the RAG knowledge pollution detection method in the prior art ignores the structural characteristics among documents and has weak anti-interference capability; the method combines the detection frames of double-view feature fusion and robust graph learning, respectively extracts pre-training semantic vectors and TF-IDF vocabulary sparse features of texts, constructs graph data capable of capturing double differences of words and semantics through calculating a similarity matrix of weighted fusion, utilizes GATv layers and KAN layers to deeply excavate nonlinear features of nodes under a residual connection structure, gives consideration to reasoning efficiency on the premise of ensuring high detection robustness, and can realize detection and filtration of online RAG system knowledge pollution attack by matching with greedy filtering strategies so as to provide high-precision security defense for an RAG system.

Inventors

REN WEIWU
WANG ZHIWEI

Assignees

长春理工大学

Dates

Publication Date: 20260512
Application Date: 20260410

Claims (10)

1. The method for detecting and filtering the knowledge pollution of the search enhancement generation system is characterized by comprising the following steps of: Step one, constructing a training data set; step two, respectively constructing semantic view features and vocabulary view features for the training data set in the step one, and calculating a similarity matrix; step three, calculating a distance matrix according to the similarity matrix in the step two, and constructing graph data; step four, constructing KAGATv-Res model, and outputting corresponding classification score by adopting the KAGATv-Res model after training, wherein the specific process is as follows: Step four, inputting the graph data into a first GATv layer of KAGATv-Res model, calculating the attention coefficient between the node and the neighbor node thereof through a dynamic attention mechanism, and aggregating neighbor information to obtain hidden layer characteristics; Inputting the hidden layer characteristics into a KAN layer to recursively generate Chebyshev polynomial items, introducing a learnable spectrum gating parameter to carry out weighted screening on the polynomial, calculating nonlinear output, and splicing the nonlinear output with the characteristics of a basic linear layer in the KAN layer to obtain output characteristics of the KAN layer; mapping the output characteristics of the KAN layer to the dimension consistent with the output of the first GATv layers by adopting a linear projection layer, and performing residual connection to obtain residual characteristics; and fourthly, performing second GATv layers of aggregation on the residual characteristics, and calculating a final classification score through a full connection layer.
2. The method for detecting and filtering knowledge pollution of a search enhancement generation system according to claim 1, wherein in the first step, the training data set comprises original data as positive sample data and malicious documents as negative sample data, and the specific process of constructing the negative sample data is as follows: Loading a retriever model for white box optimization, and configuring attack parameters of white box text against attacks; step one, reading an input problem set M from an input text, calling an external LLM (logical level management) generation component I, and splicing a problem text Q with the component I to generate a malicious document P; step one, judging whether to select black box attack, if yes, enabling the component S=Q, generating a malicious document P=S_I, executing step one, otherwise, loading white box attack, executing Hotflip iteration optimization, and generating a combined text Executing the fourth step; and step four, decoding and outputting natural texts, calling an external LLM for verification, outputting and storing a result when the verification is successful, returning to step two, and storing a final result as negative sample data of a training data set when the processing of all the problem texts is completed.
3. The method for detecting and filtering knowledge pollution of a search enhancement generation system according to claim 2, wherein in the third step, the specific process of performing Hotflip iterative optimization is as follows: Taking the component I generated by the external LLM as the component I of the white box attack, adopting a retriever model to code the word of the problem text Q and calculating an embedded vector of the problem text; constructing an initial token sequence of the component S, wherein the initial token sequence adopts a sequence subjected to word segmentation of the question text Q as an initial value; Inputting the malicious document P=S+I into a retriever model, calculating the embedded representation of the malicious document P, calculating the cosine similarity between the embedded representation and the embedded vector of the problem text, and taking the cosine similarity as an optimization target; the optimization target is subjected to counter propagation to obtain gradient information of a retriever model embedding layer on each token position of the component S, and the top k candidate tokens are screened out for the current token position to be replaced according to dot product results of the gradient information and a retriever model embedding layer word vector; Selecting tokens to be replaced in the first k candidate token replacing components S, calculating cosine similarity between the tokens to be replaced and the embedded vector of the problem text for the combined text after replacement, selecting the candidate tokens with high similarity scores to update the corresponding tokens in the S components, obtaining an optimized token sequence, and obtaining the combined text when the similarity scores exceed a set threshold value =S⊕I。
4. The method for detecting and filtering knowledge pollution of a search enhancement generation system according to claim 1, wherein in the second step, semantic features are extracted by using a sentence embedding model based on a transducer, vocabulary sparse features are extracted by using TF-IDF, and the semantic features and the vocabulary sparse features are spliced to obtain a node feature matrix 。
5. The method for detecting and filtering knowledge pollution of a search enhancement generation system according to claim 4, wherein in step two, a semantic similarity matrix is calculated Sum vocabulary similarity matrix And obtaining a final similarity matrix through weighted fusion Expressed by the following formula: ; In the formula, Is a semantic weight coefficient.
6. The method for detecting and filtering knowledge contamination of a search enhancement generation system as claimed in claim 5, wherein the distance matrix D=1- Searching k nearest neighbors for each node, constructing an edge index E comprising a bidirectional edge and a self-loop, and finally forming graph data 。
7. The method of claim 6, wherein in step four, in KAGATv-Res model training phase, a total loss function including a classification cross entropy loss function and a Jacobian derivative regularization term is constructed to smooth decision boundaries, and an automatic differentiation mechanism is used to calculate Jacobian derivatives of KAN layer output relative to input Wherein, the method comprises the steps of, For the KAN layer to output characteristics, Hidden layer features output for the first GATv layers; The total loss function The following are provided: ; In the formula, For the purpose of classifying the scores into two categories, () Classifying the cross entropy loss function; In order to regularize the weight super-parameters, Is a real label; () The maximum value is found for the jacobian derivative.
8. The method for detecting and filtering knowledge pollution of a search enhancement generation system according to any one of claims 1-7, wherein the method for detecting and filtering the knowledge pollution further comprises detecting and reasoning a KAGATv-Res model after training, and performing real-time cleaning and automatic filling on a search Top-K document by adopting a greedy filtering strategy.
9. The method for detecting and filtering the knowledge pollution of the retrieval enhancement generation system according to claim 8, wherein the method is characterized in that a user is queried to search a Top-K preliminary candidate document set, and the candidate documents are arranged according to a descending order of retrieval scores to construct a candidate pool; constructing an inference sub-graph in real time for a currently selected candidate document subset, inputting the inference sub-graph into a KAGATv-Res model after training, and obtaining the prediction category of each candidate document: ; If it is =1, Then determine that the document is malicious text; And executing a greedy filtering strategy, and finally realizing the detection and filtering of the knowledge pollution attack in the RAG system.
10. The method for detecting and filtering knowledge pollution of a search enhancement generation system according to claim 9, wherein the greedy filtering strategy comprises the following specific processes: Sequentially selecting documents from the candidate pool; If the current document is detected as a malicious document, removing the current document from the selected list and recording the current document as filtered; Automatically selecting a next document which is not marked as malicious from the candidate pool to fill up, and repeating the detection process until K security documents are selected; And inputting the K security documents as context connection into an external LLM, generating a final user answer, and realizing knowledge pollution detection and filtration in the RAG system.

Description

Knowledge pollution detection and filtering method for retrieval enhancement generation system Technical Field The invention relates to the technical field of artificial intelligence security, in particular to a defending technology for resisting knowledge base pollution (Knowledge Poisoning) and security of a retrieval enhancement generation (RETRIEVAL-Augmented Generation, RAG) system. Specifically, the invention discloses a knowledge pollution detection and filtering method based on a KAN enhanced graph attention network (KAGATv-Res), which is used for automatically identifying and filtering malicious texts injected by PoisonedRAG type aggressors (including black box text generation and white box attack) in Top-k documents retrieved by a RAG system, so that the robustness and the security of downstream large model (external large language model, LLM) questions and answers are improved. Background Along with the wide application of RAG system architecture in scenes such as question and answer, search, knowledge enhancement generation and the like, an attacker adopts a knowledge pollution means to inject misleading text (malicious text) into search corpus/index, and induces LLM to generate wrong answers through carefully constructed prompts or contexts. The existing defense method mainly comprises the steps of blacklist filtering based on keywords or modes, wherein the blacklist filtering is easy to avoid by an antagonism sample, recall loss is large, and miskilling or missing report is easy to occur based on a single document classifier by neglecting the adjacent relation and group mode (collective anomaly) information among documents. Therefore, a method for detecting and filtering knowledge pollution of a search enhancement generation system is needed, which can simultaneously utilize the semantic representation of a document and the relationship (local neighbor and cross-class boundary) between documents so as to capture group mode and structural abnormality. Disclosure of Invention The invention provides a method for detecting and filtering the knowledge pollution of a retrieval enhancement generation system, which aims to solve the problems that in the prior art, RAG knowledge pollution detection methods only carry out independent judgment on single documents, semantic adjacent information is ignored, misjudgment is easy to cause, and meanwhile, the anti-interference capability is weak. The method for detecting and filtering the knowledge pollution of the search enhancement generation system is realized by the following steps: Step one, constructing a training data set; step two, respectively constructing semantic view features and vocabulary view features for the training data set in the step one, and calculating a similarity matrix; step three, calculating a distance matrix according to the similarity matrix in the step two, and constructing graph data; step four, constructing KAGATv-Res model, and outputting corresponding classification score by adopting the KAGATv-Res model after training, wherein the specific process is as follows: Step four, inputting the graph data into a first GATv layer of KAGATv-Res model, calculating the attention coefficient between the node and the neighbor node thereof through a dynamic attention mechanism, and aggregating neighbor information to obtain hidden layer characteristics; Inputting the hidden layer characteristics into a KAN layer to recursively generate Chebyshev polynomial items, introducing a learnable spectrum gating parameter to carry out weighted screening on the polynomial, calculating nonlinear output, and splicing the nonlinear output with the characteristics of a basic linear layer in the KAN layer to obtain output characteristics of the KAN layer; mapping the output characteristics of the KAN layer to the dimension consistent with the output of the first GATv layers by adopting a linear projection layer, and performing residual connection to obtain residual characteristics; and fourthly, performing second GATv layers of aggregation on the residual characteristics, and calculating a final classification score through a full connection layer. The method has the beneficial effects that the method is realized by combining a chebyshev KAN layer (RobustChebyLayer, KAN) with a spectrum gating mechanism and a robust detection framework of a residual diagram attention network (Graph Attention Network v2, GATv 2), a document relation diagram is constructed through the KAN enhanced diagram attention network by fusing semantic vectors and TF-IDF vocabulary features, nonlinear feature extraction is enhanced by using the chebyshev polynomial layer with the spectrum gating, and jacobian derivative regularization is introduced in training to improve model robustness. Finally, the retrieved Top-K document is dynamically cleaned by combining a greedy filtering strategy, potential malicious injected text is effectively identified and removed, and the generation safet