CN-122019742-A - Power system knowledge retrieval method and platform based on RAG framework
Abstract
The invention provides a power system knowledge retrieval method and a platform based on an RAG framework, and relates to the technical field of knowledge retrieval, wherein the method comprises the steps of constructing a knowledge base on a power system file based on the RAG framework to generate a vector database; the method comprises the steps of reconstructing service problems to generate service adaptation expression, converting an input word embedding model into 1024-dimensional vectors of the problems, carrying out sub-block vector similarity matching, screening K candidate block metadata, associating K father block IDs, screening and outputting M candidate block metadata after the K father block 1024-dimensional vectors are called, and returning complete chapter content and tracing information. The invention solves the technical problems that the vectorization in the prior art adopts general dimension, text complexity of discipline inspection file is not combined to optimize vector dimension, and the representation capability of the vector to service information is insufficient, so that the traditional knowledge base is difficult to meet the high-efficiency retrieval requirement of the power system discipline inspection service.
Inventors
- WANG CHUNHUA
- XIONG ZHONGHAO
- ZHI WEI
- Gao Pengsong
- HUANG YANJUN
- LI ZHIJIN
- MA BOYANG
- SHU CHANG
- ZHANG HUI
- LI YU
- BAI QUANSHENG
Assignees
- 中国大唐集团数字科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251209
Claims (8)
- 1. The power system knowledge retrieval method based on the RAG framework is characterized by comprising the following steps of: Carrying out knowledge base construction processing on the power system file based on the RAG framework to generate a vector database; After receiving the user input problem, reconstructing the service problem to generate a service adaptation expression; embedding the business adaptation expression input word into a model, and converting the business adaptation expression input word into a problem 1024-dimensional vector; inputting the 1024-dimensional vector of the problem into the vector database, and carrying out sub-block vector similarity matching to screen K candidate block metadata for recall; According to K father-son relations of the K candidate block metadata, associating K father block IDs to which the K candidate block metadata belong; after K parent block 1024-dimensional vectors are called according to the K parent block IDs, M candidate block metadata are screened and output from the K candidate block metadata by comparing the problem 1024-dimensional vectors; and returning complete chapter content and tracing information according to the M candidate block metadata.
- 2. The method for retrieving power system knowledge based on a RAG framework according to claim 1, wherein the power system file is subjected to knowledge base construction processing based on the RAG framework to generate a vector database, the method comprising: The power system file is subjected to multi-granularity structuring processing to generate a structured data file; Performing dynamic hierarchical blocking on the structured data file based on semantic perception to obtain a parent-child two-stage blocking metadata set; Inputting the parent-child two-stage block metadata set into a pre-trained word embedding model for vectorization processing, and outputting a parent-child two-stage block 1024-dimensional vector set; And after the parent-child two-stage block metadata set and the parent-child two-stage block 1024-dimensional vector set are mapped and bound, the parent-child two-stage block metadata set and the parent-child two-stage block 1024-dimensional vector set are stored in an associated mode according to the parent-child relationship, and the vector database is generated.
- 3. The RAG framework based power system knowledge retrieval method of claim 2, wherein each chunk metadata identification has a unique ID, parent-child relationship, file source and token number.
- 4. The RAG framework based power system knowledge retrieval method of claim 3, wherein the problem 1024-dimensional vector is input into the vector database for sub-block vector similarity matching to screen recall K candidate block metadata, the method comprising: inputting the problem 1024-dimensional vector into the vector database; Traversing the sub-block vector sets in the father-son two-stage block 1024-dimensional vector set by adopting the problem 1024-dimensional vector to perform similarity matching to obtain a sub-block similarity set; And after serializing the sub-block similarity set, recalling the K candidate block metadata corresponding to the TOPK block similarities.
- 5. The RAG framework based power system knowledge retrieval method of claim 1, wherein after receiving a user input problem, performing service problem reconstruction to generate a service adaptation expression, the method comprising: carrying out semantic segmentation on the user input problem to obtain a semantic unit sequence; Taking the intention consistency as a constraint, carrying out term standardization replacement on the semantic unit sequence to generate a standardized term sequence; and carrying out structural standardization of the standardized term sequence through business context supplement, and outputting the business adaptation expression.
- 6. The RAG framework based power system knowledge retrieval method of claim 2, wherein the structured data file is dynamically hierarchically partitioned based on semantic perception to obtain a parent-child two-level partitioned metadata set, the method comprising: dividing the structured data file into a plurality of parent-level block metadata by taking a native chapter structure of the structured data file as a dividing unit; traversing the parent-level block metadata to execute sub-block segmentation by taking a preset length threshold as a segmentation triggering condition to obtain a plurality of candidate sub-block sequences; Performing sentence boundary symbol detection on the plurality of candidate sub-block sequences, and then executing segmentation position adjustment and correction to obtain a plurality of corrected sub-block sequences; performing adjacent sub-block overlapping region design on the plurality of corrected sub-block sequences to obtain a plurality of groups of sub-level block metadata; the parent-level block metadata and the child-level block metadata form a parent-level block metadata set, the child-level block metadata form a child-level block metadata set, and the parent-level block metadata set and the child-level block metadata form a parent-child-level block metadata set.
- 7. The RAG frame-based power system knowledge retrieval method according to claim 2, wherein the parent-child two-stage block metadata set is input into a pre-trained word embedding model for vectorization processing, and a parent-child two-stage block 1024-dimensional vector set is output, the method comprising: Inputting a first linear string of first parent level chunk metadata into the word embedding model; Executing in the word embedding model: s1, extracting first semantic features through a Transformer coding layer of the word embedding model; S2, generating a first sentence vector by a mean pooling layer of the word embedding model based on the first semantic features; And S3, outputting a 1024-dimensional vector of the first parent block by the 1024-dimensional full connection layer of the word embedding model based on the first sentence vector.
- 8. A RAG frame based power system knowledge retrieval platform for implementing the RAG frame based power system knowledge retrieval method of any of claims 1-7, the platform comprising: The knowledge base construction processing module is used for carrying out knowledge base construction processing on the power system file based on the RAG framework to generate a vector database; The service problem reconstruction module is used for carrying out service problem reconstruction after receiving the user input problem and generating a service adaptation expression; The vector conversion module is used for converting the business adaptation expression input word embedded model into a problem 1024-dimensional vector; The similarity matching module is used for inputting the 1024-dimensional vector of the problem into the vector database, and carrying out sub-block vector similarity matching so as to screen K candidate block metadata for recall; The ID association module is used for associating the K father-child IDs of the K candidate block metadata according to the K father-child relations; The metadata screening module is used for screening and outputting M candidate block metadata from the K candidate block metadata by comparing the problem 1024-dimensional vectors after the K parent block 1024-dimensional vectors are called according to the K parent block IDs; and the tracing information returning module is used for returning the complete chapter content and the tracing information according to the M candidate block metadata.
Description
Power system knowledge retrieval method and platform based on RAG framework Technical Field The invention relates to the technical field of knowledge retrieval, in particular to a power system knowledge retrieval method and platform based on an RAG framework. Background Along with the deepening of the application of artificial intelligence in the power industry, discipline inspection business demands for a 'quick positioning regulation system and accurate matching business problem' are increasingly urgent. The current RAG-based knowledge base construction method has three defects that firstly, file processing lacks pertinence, chapter characteristics of discipline inspection files are not combined for structural disassembly, information is easy to lose after table format conversion, secondly, a partitioning mode is rigidized, fixed length partition is adopted, semantic logic of discipline inspection clauses is easy to cut off, father-son partition association is not established, context is lost during retrieval, thirdly, user problems and service suitability are poor, the optimization problem aiming at power discipline inspection terms is not expressed, and the retrieval is only stopped at a single partitioning level, and the accuracy is not sufficient. In addition, the existing vectorization mostly adopts a general dimension, such as 768 dimension, and text complexity of the discipline inspection file is not combined to optimize the vector dimension, so that the characterization capability of the vector on service information is insufficient. The above problems make it difficult for conventional knowledge bases to meet the efficient retrieval requirements of the power system discipline inspection business. Disclosure of Invention The application provides an electric power system knowledge retrieval method and platform based on an RAG framework, and aims to solve the technical problems that vectorization in the prior art adopts general dimensions, text complexity of discipline inspection files is not combined to optimize vector dimensions, and therefore the representation capability of vectors on service information is insufficient, so that the traditional knowledge base is difficult to meet the high-efficiency retrieval requirement of electric power system discipline inspection service. The application discloses a first aspect of a power system knowledge retrieval method based on a RAG framework, which comprises the steps of carrying out knowledge base construction processing on a power system file based on the RAG framework to generate a vector database, carrying out service problem reconstruction after receiving a user input problem to generate a service adaptation expression, converting the service adaptation expression input word into a problem 1024-dimensional vector, inputting the problem 1024-dimensional vector into the vector database, carrying out sub-block vector similarity matching to screen K candidate block metadata, associating the K father block IDs to which the K candidate block metadata belong according to the K father block IDs, and screening and outputting M candidate block metadata from the K candidate block metadata by comparing the problem 1024-dimensional vector after calling the K father block 1024-dimensional vector according to the K father block IDs, and returning complete chapter content and tracing information according to the M candidate block metadata. The application discloses a second aspect of the platform, which provides an electric power system knowledge retrieval platform based on an RAG framework, wherein the platform is used for the electric power system knowledge retrieval method based on the RAG framework and comprises a knowledge base construction processing module, a service problem reconstruction module, a vector conversion module, a similarity matching module, an ID association module and a metadata screening module, wherein the knowledge base construction processing module is used for carrying out knowledge base construction processing on electric power system files based on the RAG framework to generate a vector database, the service problem reconstruction module is used for carrying out service problem reconstruction after receiving a user input problem to generate a service adaptation expression, the vector conversion module is used for converting the service adaptation expression input word into a 1024-dimensional vector of the problem, the similarity matching module is used for inputting the 1024-dimensional vector of the problem into the vector database, carrying out sub-block vector similarity matching to screen K candidate block metadata, the ID association module is used for associating K parent block IDs according to the K parent block IDs, the metadata screening module is used for calling the K parent block 1024-dimensional vectors according to the K parent block IDs, and the M candidate block metadata is used for returning candidate block metadata f