Search

CN-121808400-B - Large-model-based power line loss analysis data set construction method, device and equipment

CN121808400BCN 121808400 BCN121808400 BCN 121808400BCN-121808400-B

Abstract

The application relates to a method, a device and equipment for constructing a power line loss analysis data set based on a large model. The method comprises the steps of obtaining line loss related data derived from each power system in a power distribution network, inputting the line loss related data into a pre-built first-stage multi-mode big model, generating abstract synthetic prompt words based on structural analysis prompt words and the line loss related data in the first-stage multi-mode big model, obtaining data description abstracts corresponding to the line loss related data based on the abstract synthetic prompt words, retrieving a power line loss domain knowledge base based on the data description abstracts, obtaining line loss domain knowledge corresponding to the line loss related data, generating a line loss related data extraction instruction according to the line loss domain knowledge, and inputting the line loss related data extraction instruction and the line loss related data into a pre-built second-stage multi-mode big model to obtain a power line loss analysis data set. The method improves the accuracy of the finally constructed data set.

Inventors

  • LIU TINGJUN
  • LU YUXIN
  • ZHAO YUN
  • LIU NIAN
  • LIU XIPENG
  • LIN WEIBIN
  • CAI ZIWEN
  • WANG ZONGYI
  • WANG HAOLIN

Assignees

  • 南方电网科学研究院有限责任公司

Dates

Publication Date
20260508
Application Date
20260310

Claims (10)

  1. 1. The utility model provides a power line loss analysis data set construction method based on a large model, which is characterized by comprising the following steps: acquiring line loss related data derived from each power system in a power distribution network, inputting the line loss related data into a pre-constructed first-stage multi-modal big model, analyzing the line loss related data based on a structural analysis prompting word in the first-stage multi-modal big model to obtain metadata of the line loss related data, generating a summary synthesis prompting word corresponding to the line loss related data according to each metadata, re-inputting the summary synthesis prompting word into the first-stage multi-modal big model to obtain an original data description summary corresponding to the line loss related data, and filling the original data description summary into a pre-configured classification discrimination prompting word template to obtain a data description summary corresponding to the line loss related data, wherein the data description summary comprises a line loss service type corresponding to the line loss related data; Searching a service type definition table in a power line loss field knowledge base by using the line loss service type to obtain a core standard field ID corresponding to the line loss service type; retrieving a standard field main table in the power line loss domain knowledge base according to the core standard field ID to obtain a core standard field corresponding to the core standard field ID; constructing a corresponding semantic search query vector according to the data description abstract, searching a pre-constructed vector database according to the semantic search query vector to obtain a nonstandard line loss expression corresponding to the semantic search query vector, and determining a corresponding mapping relation from a history mapping relation table in the power line loss field knowledge base based on the nonstandard line loss expression, wherein the history mapping relation table is used for storing a mapping relation from the nonstandard line loss expression to a standard line loss expression, and the vector database is used for storing the nonstandard line loss expression in the history mapping relation table; based on the mapping relation and the core standard field, obtaining standardized field definition, a historical alias set and a business rule fragment corresponding to the line loss related data; Generating a line loss related data extraction instruction according to the standardized field definition, the historical alias set and the business rule fragment; And inputting the line loss related data extraction instruction and the line loss related data into a pre-constructed second-stage multi-modal large model, and outputting the power line loss analysis data set through the second-stage multi-modal large model.
  2. 2. The method of claim 1, wherein generating line loss related data extraction instructions from the standardized field definitions, historical alias sets, and business rule fragments comprises: obtaining a preconfigured machine-readable structured instruction template; And correspondingly filling the standardized field definition, the history alias set and the business rule fragment into the machine-readable structured instruction template to obtain the line loss related data extraction instruction.
  3. 3. The method of claim 2, wherein the inputting the line loss related data extraction instruction and the line loss related data into a pre-built second-stage multi-modal large model, outputting the power line loss analysis data set through the second-stage multi-modal large model, comprises: Carrying out semantic understanding on the line loss related data to obtain line loss related data after semantic understanding; Matching the line loss related data after semantic understanding with the standardized fields in the line loss related data extraction instruction, and extracting corresponding standardized data from the line loss related data after semantic understanding according to a JSON format; And combining the standardized data to obtain the power line loss analysis data set.
  4. 4. The method of claim 1, wherein the obtaining, based on the mapping relationship and the core standard field, a standardized field definition, a historical alias set, and a business rule segment corresponding to the line loss related data includes: mapping the core standard field according to the mapping relation; And under the condition of successful mapping, extracting standardized field definitions, historical alias sets and business rule fragments corresponding to the line loss related data from the standard field main table according to the core standard field.
  5. 5. A large model-based power line loss analysis dataset construction apparatus, the apparatus comprising: The system comprises an acquisition module, a classification judgment prompt word template, a data description abstract and a line loss service type judgment module, wherein the acquisition module is used for acquiring line loss related data exported by each power system in a power distribution network, inputting the line loss related data into a pre-constructed first-stage multi-mode big model, analyzing the line loss related data based on a structure analysis prompt word in the first-stage multi-mode big model to obtain metadata of the line loss related data, generating an abstract synthesis prompt word corresponding to the line loss related data according to each metadata, re-inputting the abstract synthesis prompt word into the first-stage multi-mode big model to obtain an original data description abstract corresponding to the line loss related data, and filling the original data description abstract into the pre-configured classification judgment prompt word template to obtain the data description abstract corresponding to the line loss related data; The retrieval module is used for retrieving a service type definition table in the power line loss field knowledge base by utilizing the line loss service type to obtain a core standard field ID corresponding to the line loss service type; the method comprises the steps of obtaining a core standard field ID, a semantic search query vector, a non-standard line loss expression, a history mapping relation table, a non-standard alias set and a service rule segment, wherein the core standard field ID corresponds to the core standard field ID, the core standard field ID is obtained by searching a standard field master table in a power line loss domain knowledge base according to the core standard field ID, the corresponding semantic search query vector is built according to the data description abstract, the pre-built vector database is searched according to the semantic search query vector, the non-standard line loss expression corresponding to the semantic search query vector is obtained, the corresponding mapping relation is determined from the history mapping relation table in the power line loss domain knowledge base based on the non-standard line loss expression, the history mapping relation table is used for storing the mapping relation from the non-standard line loss expression to the standard line loss expression, and the vector database is used for storing the non-standard line loss expression in the history mapping relation table; the generation module is used for generating a line loss related data extraction instruction according to the standardized field definition, the historical alias set and the business rule fragment; The construction module is used for inputting the line loss related data extraction instruction and the line loss related data into a pre-constructed second-stage multi-modal large model, and outputting the electric power line loss analysis data set through the second-stage multi-modal large model.
  6. 6. The apparatus of claim 5, wherein the apparatus further comprises: the generation module is further used for acquiring a machine-readable structured instruction template which is configured in advance, correspondingly filling the standardized field definition, the historical alias set and the business rule fragment into the machine-readable structured instruction template, and obtaining the line loss related data extraction instruction.
  7. 7. The apparatus of claim 5, wherein the apparatus further comprises: The construction module is also used for carrying out semantic understanding on the line loss related data to obtain line loss related data after semantic understanding, matching the line loss related data after semantic understanding with standardized fields in the line loss related data extraction instruction, extracting corresponding standardized data from the line loss related data after semantic understanding according to a JSON format, and combining the standardized data to obtain the electric power line loss analysis data set.
  8. 8. The apparatus of claim 5, wherein the apparatus further comprises: And under the condition of successful mapping, extracting standardized field definitions, historical alias sets and service rule fragments corresponding to the line loss related data from the standard field main table according to the core standard field.
  9. 9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 4 when the computer program is executed.
  10. 10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 4.

Description

Large-model-based power line loss analysis data set construction method, device and equipment Technical Field The application relates to the technical field of power distribution network line loss analysis, in particular to a method, a device, computer equipment, a computer readable storage medium and a computer program product for constructing a power line loss analysis data set based on a large model. Background In constructing high quality datasets for theoretical line loss analysis of power distribution networks, a core challenge is how to automatically process and fuse multi-source heterogeneous tabular data from different business systems. At present, the construction of a related data set is mainly realized in two ways, namely, manual operation completely relying on expert experience is adopted, data splicing is realized through manual comparison and operation, and automatic extraction is realized by writing a fixed program for a specific data source based on a special script of a hard coding rule. However, both of these current methods suffer from insufficient accuracy of the constructed data set. Disclosure of Invention In view of the foregoing, it is desirable to provide a large-model-based power line loss analysis data set construction method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve the accuracy of constructed data sets. In a first aspect, the present application provides a method for constructing a power line loss analysis data set based on a large model, including: Acquiring line loss related data derived from each power system in a power distribution network, inputting the line loss related data into a pre-constructed first-stage multi-mode large model, analyzing the line loss related data based on a structural analysis prompt word in the first-stage multi-mode large model to obtain metadata of the line loss related data, generating an abstract synthetic prompt word corresponding to the line loss related data according to each metadata, and obtaining a data description abstract corresponding to the line loss related data based on the abstract synthetic prompt word; Searching a power line loss domain knowledge base based on the data description abstract to obtain standardized field definitions, historical alias sets and business rule fragments corresponding to the line loss related data; Generating a line loss related data extraction instruction according to the standardized field definition, the historical alias set and the business rule fragment; And inputting the line loss related data extraction instruction and the line loss related data into a pre-constructed second-stage multi-mode large model, and outputting an electric power line loss analysis data set through the second-stage multi-mode large model. In one embodiment, obtaining a data description digest corresponding to the line loss related data based on the digest synthesis hint word includes: re-inputting the summary synthesis prompting words into the first-stage multi-mode large model to obtain an original data description summary corresponding to the line loss related data; and filling the original data description abstract into a pre-configured classification judgment prompt word template to obtain the data description abstract corresponding to the line loss related data. In one embodiment, generating the line loss related data extraction instruction according to the standardized field definition, the historical alias set and the business rule fragment includes: obtaining a preconfigured machine-readable structured instruction template; and correspondingly filling standardized field definitions, historical alias sets and business rule fragments into a machine-readable structured instruction template to obtain a line loss related data extraction instruction. In one exemplary embodiment, inputting the line loss related data extraction instruction and the line loss related data into a pre-constructed second-stage multi-modal large model, outputting a power line loss analysis data set through the second-stage multi-modal large model, comprising: Carrying out semantic understanding on the line loss related data to obtain line loss related data after semantic understanding; Matching the line loss related data after semantic understanding with standardized fields in the line loss related data extraction instruction, and extracting corresponding standardized data from the line loss related data after semantic understanding according to a JSON format; And combining the standardized data to obtain a power line loss analysis data set. In one embodiment, the power line loss domain knowledge base comprises a service type definition table, a standard field main table and a history mapping relation table, wherein the history mapping relation table is used for storing mapping relation from nonstandard line loss expression to standard line loss expression; retrieving a power line