CN-121980039-A - RAG and preference alignment collaborative optimization method and system for electric power field

CN121980039ACN 121980039 ACN121980039 ACN 121980039ACN-121980039-A

Abstract

The invention relates to the technical field of artificial intelligence and discloses a method and a system for collaborative optimization of RAG and preference alignment for the electric power field. The method at least comprises the steps of conducting text blocking on a power business decision knowledge base, conducting vectorization representation and retrieval enhancement generation on each obtained text block to obtain power business adaptation texts, combining a pre-called first large language model and retrieval enhancement generation to obtain a plurality of power business decision problems and expert decision texts and common decision texts corresponding to each power business decision problem based on the power business adaptation texts, and conducting fine tuning training on a pre-trained second large language model in a direct preference optimization and probability ratio preference optimization weighting fusion mode based on a preference data set to obtain a target large language model. The invention provides a reliable intelligent modeling tool for complex decisions of the power system and also provides a reusable optimization paradigm for strengthening learning and modeling the ground in the professional field.

Inventors

FENG JUN
XU HAO
YANG SHAOJIE
LI ZHONGXIU
WANG YIDAN
GUO ZIXUAN
Han Miaolin
LI HUIBIN
SONG HAOMIN
CHEN ZEYU

Assignees

国网浙江省电力有限公司信息通信分公司
国网浙江省电力有限公司
西安交通大学

Dates

Publication Date: 20260505
Application Date: 20260409

Claims (10)

1. The utility model provides a RAG and preference alignment collaborative optimization method facing to the electric power field, which is characterized by comprising the following steps: Acquiring a plurality of documents related to power business decisions to obtain a power business decision knowledge base; text dicing is carried out on the power business decision knowledge base, and each obtained text block is generated through vectorization representation and retrieval enhancement, so that a power business adaptation text is obtained; Based on the power business adaptation text, combining a pre-called first large language model and search enhancement to generate and obtain a plurality of power business decision problems, and expert decision texts and common decision texts corresponding to each power business decision problem; the expert decision text and the common decision text corresponding to each power business decision problem are combined in a correlated mode according to a triple structure, and a preference data set is obtained; and based on the preference data set, performing fine tuning training on the pre-trained second large language model by adopting a direct preference optimization and probability ratio preference optimization weighted fusion mode to obtain a target large language model.
2. The method for collaborative optimization of RAG and preference alignment for power domain according to claim 1, wherein the obtaining a plurality of documents related to power business decisions to obtain a power business decision knowledge base comprises: collecting a plurality of PDF documents related to power business decisions from a plurality of public data sources; Respectively carrying out format conversion on each PDF document to obtain a corresponding MarkDown document; And respectively carrying out data cleaning on each Markdown document, and combining each Markdown document after data cleaning to form an electric power business decision knowledge base.
3. The method for collaborative optimization of RAG and preference alignment for power domain according to claim 1, wherein the text dicing is performed on the power business decision knowledge base, and each text block obtained is generated by vectorization representation and search enhancement, so as to obtain a power business adaptation text, comprising: text dicing is carried out on the power business decision knowledge base according to document space, and a plurality of text blocks are obtained; carrying out vectorization on each text block by adopting a preset text embedding model to obtain vector representations corresponding to the text blocks, and combining each vector representation to form a power service vector knowledge base; And generating and obtaining the power service adaptation text through retrieval enhancement based on the power service vector knowledge base.
4. The method for collaborative optimization of RAG and preference alignment for power domain according to claim 1, wherein the generating, based on the power business adaptation text, a plurality of power business decision questions and expert decision texts and common decision texts corresponding to each power business decision question in combination with a pre-called first large language model and search enhancement comprises: Based on the power business adaptation text, a plurality of power business decision questions are generated by combining a preset question generation template and a first large pre-called language model; aiming at each power business decision problem, combining the first large language model with retrieval enhancement to generate expert decision texts corresponding to the power business decision problems; and generating a common decision text corresponding to the power business decision problem by combining a second large language model in an initial state aiming at each power business decision problem.
5. The method for collaborative optimization of RAG and preference alignment for electric power domain according to claim 1, wherein the performing fine tuning training on the pre-trained second large language model based on the preference data set by adopting a weighted fusion method of direct preference optimization and probability ratio preference optimization to obtain a target large language model comprises: Dividing the preference data set into a training set, a testing set and a verification set according to a preset proportion; based on the training set, performing fine tuning training on the pre-trained second large language model by adopting a direct preference optimization and probability ratio preference optimization dynamic weighting fusion mode to obtain an initial large language model; the super parameters of the initial large language model are adjusted in real time by combining the verification set, and an intermediate large language model is obtained; And performing performance evaluation on the intermediate large language model by using the test set to obtain a target large language model.
6. The collaborative optimization method for RAG and preference alignment for electric power domain according to claim 5, wherein the performing fine tuning training on the pre-trained second large language model based on the training set by adopting a dynamic weighted fusion method of direct preference optimization and probability ratio preference optimization to obtain an initial large language model comprises: And based on the training set, performing fine tuning training on the pre-trained second large language model by utilizing a fusion loss function formed by dynamic weighted sum of the direct preference optimization loss and the probability ratio preference optimization loss to obtain an initial large language model.
7. The power domain oriented RAG and preference alignment co-optimization method of claim 6, wherein the dynamic weight factor of the direct preference optimization penalty is determined by a base weight factor based on training step number segmentation adjustment, model performance bias during training, and a range of values constraint.
8. The power domain oriented RAG and preference alignment co-optimization method of claim 7, wherein the adjustment strategy of the base weight factor comprises: dividing the training step number into a preheating period, a rising period and a stable period according to a preset phase threshold value; when the training process is in the preheating period, the basic weight factors gradually increase from the initial weight of the preheating period to the end weight of the preheating period; when the training process is in the rising period, the basic weight factor is gradually increased from the preheating period ending weight to a weight peak value; when the training process is in a stationary phase, the basic weight factor is gradually decreased from the weight peak value to a balance weight at the end of training.
9. The power domain oriented RAG and preference alignment collaborative optimization method according to claim 5, wherein the performing performance evaluation on the intermediate large language model by using the test set to obtain a target large language model comprises: And performing performance evaluation on the intermediate large language model from multiple dimensions comprising semantic consistency, professional reliability and preference alignment by using the test set to obtain a target large language model.
10. A power domain oriented RAG and preference alignment co-optimization system, comprising: the data acquisition processing module is used for acquiring a plurality of documents related to the power business decision to obtain a power business decision knowledge base; the vectorization and RAG module is used for carrying out text dicing on the power business decision knowledge base, and generating each obtained text block through vectorization representation and retrieval enhancement to obtain a power business adaptation text; The problem text generation module is used for generating a plurality of power business decision questions, expert decision texts and common decision texts corresponding to each power business decision question by combining a pre-called first large language model and retrieval enhancement based on the power business adaptation text; The association combination module is used for carrying out association combination on each power business decision problem and the expert decision text and the common decision text corresponding to each power business decision problem according to a triple structure to obtain a preference data set; And the fine tuning training module is used for carrying out fine tuning training on the pre-trained second large language model by adopting a direct preference optimization and probability ratio preference optimization weighting fusion mode based on the preference data set to obtain a target large language model.

Description

RAG and preference alignment collaborative optimization method and system for electric power field Technical Field The invention relates to the technical field of artificial intelligence, in particular to a method and a system for collaborative optimization of RAG and preference alignment in the electric power field. Background In the links of operation control, state control, daily operation and the like of the power system, the reliability and agility of the decision model directly determine the safety, stability and economy of the power grid operation. At present, decision support in the electric power field still depends on a large amount of manually constructed structured rule base and expert experience models, and the models have certain static reliability, but professional knowledge such as massive unstructured scheduling regulations, equipment manuals, fault reports and the like is difficult to quickly convert into machine executable logic, and an updating mode of manual coding usually needs to consume a long time, cannot adapt to the complex and changeable dynamic optimization requirements of the running environment of a power distribution network, and is difficult to support real-time decision making and agile regulation. With the development of the generated artificial intelligence, a Large Language Model (LLM) provides new possibility for the efficient processing of unstructured knowledge in the electric power field and the automatic construction of a decision model by means of strong natural language understanding and instruction following capability. However, the high professional barriers and the high security core appeal of the electric power system make the direct application of the general LLM face multiple challenges, namely, firstly, the coverage of the electric power professional knowledge in the general LLM pre-training data is insufficient, deviation easily occurs to semantic analysis of the special terms and the complex rules of the industry, secondly, the inherent illusion problem of the LLM is serious in damage in the electric power scene, and the service conditions can be possibly caused by exception or error association of the imaginary rules, so that serious decision errors are caused, thirdly, the fundamental contradiction exists between the probability generation nature of the LLM and the requirement of the electric power decision on strict certainty, and the professional consistency and reliability of the generated content are difficult to ensure. Disclosure of Invention In order to solve the problems, the invention provides a method and a system for collaborative optimization of RAG and preference alignment for the power domain, which break through respective limitations of RAG and preference alignment through technology collaboration and construct a special decision model for the power domain with high-quality knowledge support and accurate preference alignment. In a first aspect, an embodiment of the present invention provides a method for collaborative optimization of RAG and preference alignment for an electric power domain, including: Acquiring a plurality of documents related to power business decisions to obtain a power business decision knowledge base; text dicing is carried out on the power business decision knowledge base, and each obtained text block is generated through vectorization representation and retrieval enhancement, so that a power business adaptation text is obtained; Based on the power business adaptation text, combining a pre-called first large language model and search enhancement to generate and obtain a plurality of power business decision problems, and expert decision texts and common decision texts corresponding to each power business decision problem; the expert decision text and the common decision text corresponding to each power business decision problem are combined in a correlated mode according to a triple structure, and a preference data set is obtained; and based on the preference data set, performing fine tuning training on the pre-trained second large language model by adopting a direct preference optimization and probability ratio preference optimization weighted fusion mode to obtain a target large language model. Preferably, the obtaining a plurality of documents related to the power business decision to obtain a power business decision knowledge base includes: collecting a plurality of PDF documents related to power business decisions from a plurality of public data sources; Respectively carrying out format conversion on each PDF document to obtain a corresponding MarkDown document; And respectively carrying out data cleaning on each Markdown document, and combining each Markdown document after data cleaning to form an electric power business decision knowledge base. Preferably, the text block is performed on the power business decision knowledge base, each text block is generated through vectorization representation and retrieval