CN-121636515-B - Multi-table automatic analysis and structured warehousing method and system for power distribution network cost file
Abstract
The invention discloses a multi-table automatic analysis and structured warehousing method and system for power distribution network cost files, which relate to the technical field of power grid engineering cost data processing and comprise the following steps of acquiring a cost file set of a target power distribution network project, analyzing cost tables in the cost file set, and identifying the table types of all the cost tables and the position relation of all the cost tables in the cost files; the method comprises the steps of identifying cost roles of data fields in each cost table based on table types and position relations, constructing field-level cost role mapping relations for data fields with the same field names but different cost roles, generating corresponding cost constraints, carrying out structural recombination on analysis data of each cost table based on the cost constraints, and writing the analysis data into a cost database. The invention realizes the effective control of the cost factor cross-table role difference and the transfer path by constructing the cost generation sequence relation reflecting the cost formation process and establishing the field-level cost role mapping and constraint description.
Inventors
- LI RONG
- ZHOU FANGYUAN
- ZHAO YINGYING
- SHEN SI
- Fu Anyuan
- TANG YUE
- Xia Yali
- LU XINXIN
- SHI XIAOMIN
- CHEN FULEI
- QI ZHENBIAO
- GAO XIANG
- FAN SHEN
- LI JIANQING
- YANG FAN
Assignees
- 国网安徽省电力有限公司经济技术研究院
Dates
- Publication Date
- 20260508
- Application Date
- 20260204
Claims (7)
- 1. The method for automatically analyzing and structuring the multi-table of the power distribution network cost file is characterized by comprising the following steps: Acquiring a cost file set of a target power distribution network project, analyzing cost tables in the cost file set, and identifying the table types of the cost tables and the position relation of the cost tables in the cost files; Based on the table type and the position relation, identifying the cost role of the data field in each cost table, comprising determining the input attribute or the output attribute of the cost table in the cost forming process and limiting the optional cost role range of the data field in the table based on the table type and the stage position of the cost table in the cost generating sequence relation; constructing a field-level cost role mapping relation for data fields with the same field names but different cost roles, and generating corresponding cost constraints, wherein the method comprises the steps of determining cost stages and applicable table types corresponding to all field role instances based on the cost role mapping relation to form phase constraints of the field roles; Based on cost constraint, each cost table analysis data is structured and recombined and written into a cost database, the method comprises the steps of checking analyzed data fields based on cost constraint, judging whether the analyzed data fields meet corresponding phase constraint and reference constraint, writing different target data structure nodes according to corresponding field role examples of the data fields meeting constraint, realizing role-separated storage of homonymous fields, marking the data fields which do not meet constraint as abnormal data, organizing the data structure nodes according to cost phase sequence related to the field role examples, and generating a staged structured data result.
- 2. The method for automatically resolving and structuring multiple cost files for a power distribution network according to claim 1, wherein resolving cost tables in a cost file set, identifying the table type of each cost table and the positional relationship thereof in the cost file, comprises: analyzing the cost file, and extracting the data area of each page in the cost file as a table level structure unit; generating a basic position mark of each table level structural unit based on page numbers and page sequences of the table level structural units; ordering all the table level structural units according to the basic position marks, and constructing a cost generation sequence relation; Determining the position relationship in the chapter based on the relative positions of the table-level structural unit and the adjacent chapter titles; based on the cost generation sequence relation and the intra-chapter position relation, determining the initial table type of each cost table; when the initial table type is inconsistent with the stage in the cost generation sequence relation, the table type is corrected according to the sequence relation.
- 3. The method for automatically analyzing and structuring multiple tables of power distribution network cost files according to claim 2, wherein the step of ordering all table level structure units according to the basic position marks to construct a cost generation sequence relation comprises the following steps: determining page level precedence relationship and page internal arrangement sequence based on basic position marks of each table level structural unit; generating an initial sequence of the surface level structural units according to the page level precedence relationship and the page internal arrangement sequence; Based on the initial sequence, identifying page-crossing continuous relations between adjacent table-level structural units, and combining the table-level structural units of the page-crossing continuous relations into continuous table units; And determining the relative successive stages of each table level structure unit in the cost formation based on the overall arrangement position of the continuous table units in the sequence, and constructing a cost generation sequence relation.
- 4. The method for automatically resolving and structuring multiple power distribution network cost files according to claim 3, wherein determining the intra-section positional relationship based on the relative positions of the table-level structural units and the adjacent section titles comprises: Identifying a title text unit with a chapter identification feature in the cost file, and generating a chapter title sequence according to the appearance sequence of the title text unit; Generating a sequence relation by combining a chapter title sequence and a manufacturing cost aiming at each table level structure unit, and locating a chapter title which is positioned in front of the table and is nearest to the table as a chapter attribution of the table level structure unit; based on the section attribution, the intra-section position relation and inter-section position relation among the cost tables are constructed.
- 5. The method for automatically resolving and structuring multiple tables of power distribution network cost files according to claim 4, wherein the correcting the table types according to the sequence relation comprises: Defining a candidate list type range corresponding to the cost list based on the stage positions of the cost list in the cost generation sequence relation; If the initial list type of the cost list does not belong to the candidate list type range, determining a target list type which is logically continuous with the cost according to the confirmed list type and the stage sequence relation of the adjacent cost list, and taking the target list type as the corrected list type of the cost list.
- 6. The method for automatically resolving and structuring multiple tables of cost files for a power distribution network according to claim 5, wherein said constructing a field-level cost role mapping relationship for data fields having the same field name but different cost roles comprises: Based on the table type of the cost table and the stage position of the cost table in the cost generation sequence, carrying out stage grouping on the homonymous data fields; Based on the grouping result, determining different cost roles borne by the same-name data fields at different stages by combining the calculation participation mode of the data fields in the cost formation process and the cross-table transfer relation; The homonymous data fields bearing different cost roles are respectively mapped into corresponding field role instances, and the applicable stages are associated to form a cost role mapping relation.
- 7. A system for using the method for automatically resolving and structuring multiple forms of construction files for distribution networks according to any one of claims 1 to 6, characterized in that it comprises the following modules: The cost list structure analysis module is used for acquiring a cost file set of a target power distribution network project, analyzing cost lists in the cost file set, and identifying list types of the cost lists and the position relation of the cost lists in the cost files; the cost field role identification module is used for identifying the cost roles of the data fields in each cost table based on the table types and the position relations; The data field mapping module is used for constructing a field-level cost role mapping relation for data fields with the same field names but different cost roles and generating corresponding cost constraints; the cost data reorganization module is used for carrying out structural reorganization on the analysis data of each cost table based on cost constraint and writing the analysis data into a cost database.
Description
Multi-table automatic analysis and structured warehousing method and system for power distribution network cost file Technical Field The invention relates to the technical field of power grid engineering cost data processing, in particular to a method and a system for automatically analyzing and structuring a plurality of tables of power distribution network cost files. Background Along with the continuous expansion of the engineering construction scale of the power distribution network, engineering cost management is gradually changed from manual auditing to an informatization and automatic processing mode. At present, a power distribution network engineering cost file is generally composed of a plurality of cost tables, different cost tables are respectively used for describing engineering quantity, quota, cost calculation, summarization and other cost links, and the same cost element often repeatedly appears in the plurality of cost tables and bears different cost roles in different cost stages. The conventional multi-table automatic analysis technology for the cost files of the power distribution network generally analyzes and constructs the cost table data based on table types or field names, and cannot identify role differences and constraint relations corresponding to the same cost elements in different cost tables, so that the data of the same name but different cost roles are mixed in the warehouse process. Therefore, the construction cost data after structured warehousing only has a formal storage structure, lacks the consistency of construction cost stages and role constraint relation, and is difficult to be used for subsequent construction cost checking, association calculation and consistency analysis, so that the actual usability of the construction cost data of the power distribution network is affected. In the technical scheme, at least the following technical problems are existed that the multi-table automatic analysis technology of the existing power distribution network cost file can not identify and maintain the role difference and constraint relation of the same cost element in a plurality of tables, so that the data after structured warehouse entry is not available logically and can not be checked. Disclosure of Invention In order to overcome the defects of the prior art, the embodiment of the invention provides a multi-table automatic analysis and structured warehousing method and system for a power distribution network cost file, which are characterized in that a cost generation sequence relation reflecting a cost formation process is constructed, field-level cost role mapping and constraint description are established for homonymous data fields on the basis of the cost generation sequence relation, and the effective control of cost element cross-table role difference and transmission paths is realized, so that the cross-phase misuse of cost data is avoided, and the structured warehousing result has logic consistency and verifiability. In order to achieve the above purpose, the present invention provides the following technical solutions: A multi-table automatic analysis and structured warehousing method for the cost files of the power distribution network comprises the steps of obtaining a cost file set of a target power distribution network project, analyzing cost tables in the cost file set, identifying the table types of all the cost tables and the position relation of the table types and the position relation in the cost files, identifying the cost roles of data fields in all the cost tables based on the table types and the position relation, constructing field-level cost role mapping relation aiming at data fields with the same field names but different cost roles, generating corresponding cost constraint, carrying out structured recombination on analysis data of all the cost tables based on the cost constraint, and writing the analysis data into a cost database. In a preferred embodiment, the method for analyzing the cost list in the cost file set and identifying the list type of each cost list and the position relation of each cost list in the cost file comprises the steps of analyzing the cost file, extracting the data area of each page in the cost file as a list level structure unit, generating basic position marks of the data area based on page numbers and page sequences of each list level structure unit, ordering all list level structure units according to the basic position marks to construct a cost generation sequence relation, determining the position relation in a section based on the relative positions of the list level structure units and adjacent section titles, judging the initial list type of each cost list based on the cost generation sequence relation and the position relation in the section, and correcting the list type according to the sequence relation when the initial list type is inconsistent with the stage in the cost generation sequence relation. In a pr