Search

CN-121980059-A - RAG recall method and device based on encryption space and related equipment

CN121980059ACN 121980059 ACN121980059 ACN 121980059ACN-121980059-A

Abstract

The embodiment of the invention discloses an RAG recall method, an RAG recall device and related equipment based on an encryption space. The method comprises the steps of obtaining a user ID based on a current user request, constructing a corresponding encryption space and an encryption key according to the user ID, obtaining sensitive information fields of the user, conducting encryption processing on the sensitive information fields based on the encryption space and the encryption key to generate encryption fields, inputting the encryption fields into a preset embedded model to conduct embedding processing to obtain a basic vector, conducting disturbance processing on the basic vector by calling a vector disturbance function corresponding to the encryption space to obtain a final encryption vector, storing the final encryption vector into an index space of the current user, obtaining a query request of the current user, conducting encryption on the query request by adopting the encryption key to obtain an encryption query vector, and conducting vector matching and plaintext recall in the index space based on the encryption query vector to obtain a final retrieval result. The method can still keep higher semantic retrieval and plaintext recall performance under the information encryption state.

Inventors

  • LAN FENG
  • LIU YU

Assignees

  • 北京泰信天成科技有限公司

Dates

Publication Date
20260505
Application Date
20260130

Claims (6)

  1. 1. An encryption space-based RAG recall method, comprising: receiving a current user request, analyzing the user information of the current user request to obtain a user ID, and constructing a corresponding encryption space and an encryption key based on the user ID, wherein the encryption space is a logically isolated vector processing and storage space and at least comprises an independent vector disturbance parameter set, a vector index structure and space identification information, and different encryption spaces are isolated from each other in a parameter layer, an index layer and an access layer; Acquiring a sensitive information field of a user, and encrypting the sensitive information field based on the encryption space and the encryption key to generate an encryption field; inputting the encrypted field into a preset embedding model for embedding processing to obtain a basic vector; Invoking a vector disturbance function corresponding to the encryption space to process the basic vector to obtain a final encryption vector, and storing the final encryption vector into an index space of a current user; acquiring a query request of a current user, and encrypting the query request by adopting the encryption key to obtain an encrypted query vector; and carrying out vector matching and plaintext recall in the index space based on the encrypted query vector to obtain a final retrieval result.
  2. 2. The encryption space-based RAG recall method of claim 1, wherein the obtaining the sensitive information field of the user comprises: and acquiring text information accessed by a user, and carrying out information identification on the text information to obtain the sensitive information field.
  3. 3. The RAG recall method based on encryption space of claim 1, wherein the invoking the vector perturbation function corresponding to the encryption space to process the base vector to obtain a final encryption vector comprises: the final encryption vector is calculated according to the following formula : , In the formula, The vector perturbation function is represented as a function of the vector, A random orthogonal matrix representing the encryption space, Indicating that the amount of space is inexpensive, The process of embedding is indicated and, Represent the first And (3) base vectors.
  4. 4. An encryption space-based RAG recall device, comprising: The system comprises an analysis module, a storage module and a storage module, wherein the analysis module is used for receiving a current user request, analyzing the user information of the current user request to obtain a user ID, constructing a corresponding encryption space and an encryption key based on the user ID, wherein the encryption space is a logically isolated vector processing and storage space and at least comprises an independent vector disturbance parameter set, a vector index structure and space identification information, and different encryption spaces are isolated from each other in a parameter layer, an index layer and an access layer; the encryption module is used for acquiring sensitive information fields of users, and carrying out encryption processing on the sensitive information fields based on the encryption space and the encryption key to generate encryption fields; The embedding processing module is used for inputting the encrypted field into a preset embedding model to perform embedding processing to obtain a basic vector; The storage module is used for calling a vector disturbance function corresponding to the encryption space to process the basic vector to obtain a final encryption vector, and storing the final encryption vector into an index space of a current user; The acquisition module is used for acquiring a query request of a current user, and encrypting the query request by adopting the encryption key to obtain an encrypted query vector; and the matching module is used for carrying out vector matching and plaintext recall in the index space based on the encrypted query vector to obtain a final retrieval result.
  5. 5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the encryption space based RAG recall method of any one of claims 1 to 3 when the computer program is executed.
  6. 6. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the encryption space based RAG recall method of any of claims 1 to 3.

Description

RAG recall method and device based on encryption space and related equipment Technical Field The embodiment of the invention relates to the technical field of data processing, in particular to an RAG recall method and device based on an encryption space and related equipment. Background Along with the continuous development of a large-scale pre-training language model, the RAG technology remarkably improves the accuracy and controllability of the model in the scenes of question-answering, knowledge-aided decision making and the like in the professional field in a mode of 'external knowledge retrieval and large model generation'. However, in actual deployment, RAG systems often require access to proprietary data sources containing large amounts of sensitive information, such as medical records, financial account information, government internal files, and business materials. The existing RAG system mostly adopts a unified vector space and a unified encryption key system to manage data, and data isolation is generally realized through logic modes such as an Access Control List (ACL), tenant identification or role authority. However, the scheme only takes effect at the application layer, once the bottom storage medium, the vector index or the system main encryption key is revealed, an attacker still can infer sensitive semantic information at the vector layer, even re-identification of cross-user data is realized, and great potential safety hazards exist. In addition, direct full-scale encryption of vectors will destroy distance relationships in vector space, resulting in failure of semantic similarity calculation, severely affecting retrieval recall. Therefore, how to realize the separation of the cryptology data in the multi-user scene without obviously reducing the semantic retrieval performance becomes an important technical problem faced by the current RAG technology. Disclosure of Invention The embodiment of the invention provides an RAG recall method, an RAG recall device and related equipment based on an encryption space, and aims to solve the technical problem that the traditional technology is difficult to prevent sensitive data from leaking while guaranteeing the usability of semantic retrieval. In a first aspect, an embodiment of the present invention provides an encryption space-based RAG recall method, which includes: receiving a current user request, analyzing the user information of the current user request to obtain a user ID, and constructing a corresponding encryption space and an encryption key based on the user ID, wherein the encryption space is a logically isolated vector processing and storage space and at least comprises an independent vector disturbance parameter set, a vector index structure and space identification information, and different encryption spaces are isolated from each other in a parameter layer, an index layer and an access layer; Acquiring a sensitive information field of a user, and encrypting the sensitive information field based on the encryption space and the encryption key to generate an encryption field; inputting the encrypted field into a preset embedding model for embedding processing to obtain a basic vector; Invoking a vector disturbance function corresponding to the encryption space to process the basic vector to obtain a final encryption vector, and storing the final encryption vector into an index space of a current user; acquiring a query request of a current user, and encrypting the query request by adopting the encryption key to obtain an encrypted query vector; and carrying out vector matching and plaintext recall in the index space based on the encrypted query vector to obtain a final retrieval result. In a second aspect, an embodiment of the present invention provides an RAG recall device based on an encrypted space, including: The system comprises an analysis module, a storage module and a storage module, wherein the analysis module is used for receiving a current user request, analyzing the user information of the current user request to obtain a user ID, constructing a corresponding encryption space and an encryption key based on the user ID, wherein the encryption space is a logically isolated vector processing and storage space and at least comprises an independent vector disturbance parameter set, a vector index structure and space identification information, and different encryption spaces are isolated from each other in a parameter layer, an index layer and an access layer; the encryption module is used for acquiring sensitive information fields of users, and carrying out encryption processing on the sensitive information fields based on the encryption space and the encryption key to generate encryption fields; The embedding processing module is used for inputting the encrypted field into a preset embedding model to perform embedding processing to obtain a basic vector; The storage module is used for calling a vector disturbance function co