CN-122021610-A - Structured analysis and reconstruction method and system for power communication demand text
Abstract
The invention discloses a method and a system for structural analysis and reconstruction of an electric power communication demand text, wherein the method comprises the steps of carrying out structural analysis on an original demand document, carrying out paging, constructing a double-branch multi-mode deep learning model comprising text branches and visual branches, carrying out content type judgment and key content positioning, constructing a prompt word template based on key content under each content type, enabling a large language model to extract key elements and attribute characteristics related to the key content, carrying out conflict detection on the key elements and the attribute characteristics, carrying out arbitration based on the large language model when the key elements and the attribute characteristics conflict, outputting a final attribute characteristic value, constructing a structural demand data model based on the key elements and the final attribute characteristic value, and mapping the structural demand data model into a standardized demand specification document.
Inventors
- Quan Yizhan
- WANG LEI
- CAI XINZHONG
- JIN JING
- ZHANG YUJIA
- FENG HAO
- AN ZHIYUAN
- ZHU PENGYU
- LIU YAN
- DING RUIQI
- SHENG LEI
- LIU MING
Assignees
- 南京南瑞信息通信科技有限公司
- 国网河南省电力公司信息通信分公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251231
Claims (10)
- 1. The method for structural analysis and reconstruction of the power communication demand text is characterized by comprising the following steps of: Performing structure analysis on the original demand document and then paging; Constructing a double-branch multi-mode deep learning model comprising text branches and visual branches, extracting text semantic features from each page of document through the text branches, extracting page layout structural features of each page of document through the visual branches, and judging the content type and positioning key content of each page of document based on the text semantic features and the page layout structural features; Constructing a prompt word template based on the key content under each content type, so that the large language model extracts key elements and attribute features related to the key content based on the constructed prompt word template; performing conflict detection on the association comparison of the key elements and the attribute characteristics, and constructing a new prompt word based on all information, conflict details and knowledge fields corresponding to the key elements related to the conflict when the key elements are in conflict, performing semantic arbitration on the large language model based on the new prompt word, and outputting a final attribute characteristic value of the key elements under the conflict; And mapping the structured demand data model into a standardized demand specification document to complete the reconstruction of the power communication demand text.
- 2. The method for structured parsing and reconstruction of power communication demand text according to claim 1, wherein: paging the original demand document after structural analysis includes: Calculating text density of each text block, and performing preliminary clustering based on the spatial positions of the text blocks after performing preliminary merging when the density difference of adjacent blocks is smaller than a preset density threshold; In each cluster, extracting semantic embedded vectors of text blocks, calculating cosine similarity of the semantic embedded vectors, judging relevance among the text blocks according to a preset semantic threshold and the cosine similarity, confirming a coherent region based on the relevance, and paging based on the coherent region.
- 3. The method for structured parsing and reconstruction of power communication demand text according to claim 1, wherein: The text semantic features comprise text integral semantic features and text total semantic features, the text integral semantic features are used for representing the integral features of the text, the text total semantic features are used for representing all detail features of the text, the cross-modal attention mechanism is utilized to fuse the text integral semantic features with page layout structure features, and the fused feature vectors are classified to obtain the content type of each page of document.
- 4. The method for structured parsing and reconstruction of power communication demand text according to claim 3, wherein: the step of dynamically matching all semantic features of the text with a keyword library corresponding to the content type to locate the key content comprises the following steps: And mapping the keyword library to a text feature space, calculating semantic similarity with all semantic features of the text, and determining the position of the key content according to the semantic similarity.
- 5. The method for structured parsing and reconstruction of power communication demand text according to claim 4, wherein: determining the search scope of each page of documents according to the keyword types in the keyword library comprises: and capturing the attribute keywords by adopting a horizontal neighborhood, intercepting the structural keywords by adopting a longitudinal area, and executing full-text association positioning on the entity keywords.
- 6. The method for structured parsing and reconstruction of power communication demand text according to claim 5, wherein: determining an associated area of the keyword according to the following formula, and determining the associated area as a search range: Wherein, the Representing keywords Is (are) associated with the area of the substrate is defined by the area, Representing keywords Is provided with a coordinate of the position of (c), Representing pages as distance measure functions At any position And keywords Is used for the distance of (a), Is a dynamically adjusted cutoff threshold based on keyword type.
- 7. The method for structured parsing and reconstruction of power communication demand text according to claim 5, wherein: the capture range is determined for the attribute keywords by adopting the following formula in the transverse neighborhood capture: Wherein, the Represents the width of the lateral capture of the attribute-class key, As the average width of the character is, Is a keyword The length, α and β are adaptive tuning parameters.
- 8. A method and system for structural analysis and reconstruction of a power communication demand text, implementing the method for structural analysis and reconstruction of a power communication demand text according to any one of claims 1 to 7, characterized in that the system comprises: the document analysis and paging module is used for paging the original demand document after carrying out structural analysis; The content classification module is used for constructing a double-branch multi-mode deep learning model comprising text branches and visual branches, extracting text semantic features from each page of document through the text branches, extracting page layout structural features of each page of document through the visual branches, and judging the content type and positioning key content of each page of document based on the text semantic features and the page layout structural features; The object extraction and attribute identification module is used for constructing a prompt word template based on the key content under each content type so that the large language model extracts key elements and attribute characteristics related to the key content based on the constructed prompt word template; The relation construction module is used for carrying out conflict detection on the association comparison of the key elements and the attribute characteristics, and when the conflict exists, constructing a new prompt word based on all information, conflict details and knowledge fields corresponding to the key elements related to the conflict, carrying out semantic arbitration on the large language model based on the new prompt word, and outputting the final attribute characteristic value of the key elements under the conflict; And the data reconstruction module is used for mapping the structured demand data model into a standardized demand specification document to complete the reconstruction of the power communication demand text.
- 9. An electronic device comprises a processor and a storage medium, and is characterized in that: The storage medium is used for storing instructions; The processor is operative to perform the steps of the structured parsing and reconstruction method of power communication demand text according to any one of claims 1-7 in accordance with the instructions.
- 10. A computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the structured parsing and reconstruction method of power communication demand text according to any of claims 1-7.
Description
Structured analysis and reconstruction method and system for power communication demand text Technical Field The invention belongs to the technical field of informatization and text intelligent processing of power systems, and particularly relates to a method and a system for structural analysis and reconstruction of a text of power communication requirements. Background In planning, designing and operation and maintenance management of an electric power communication system, a demand document is used as a core carrier for bearing business targets, function specifications and technical indexes, and the content of the demand document is usually presented in an unstructured natural language text form. The document structure is various, the technical terms are dense, the logic relationship is implicit, and the traditional mode of relying on manual reading and arrangement has the problems of low efficiency, incomplete information extraction, easiness in influence of subjective understanding deviation and the like. Particularly, when large-scale and multi-version demand documents are processed, unified equipment objects, performance parameters, business logic and constraint conditions are difficult to quickly and accurately extract by a manual mode, so that information faults and transmission loss exist between links such as demand management, subsequent scheme design and resource allocation, and the whole flow digitization and automation level of power communication engineering are severely restricted. To effectively solve the above-mentioned problems, the prior art has begun to explore automated analysis of text using natural language processing techniques. However, conventional methods rely on rule-based pattern matching or traditional machine learning models, which have weak generalization capability, are difficult to adapt to complex language expressions and professional contexts in the power communication field, and are more incapable of achieving deep semantic understanding and cross-content logical association. Although the large language model shows strong capability in general text understanding, the direct application of the large language model to the structural analysis of professional demand documents still faces the challenges that a single model is difficult to accurately finish a series of layering tasks such as page classification, key information positioning, field object extraction and relation construction, and the like, and secondly, business logic relations among different information entities are loose, and the lack of an effective mechanism correlates, aligns and integrates the same entity information scattered in different paragraphs and different pages, so that the output information is fragmented and cannot form a complete and consistent business view. Disclosure of Invention In order to solve the defects in the prior art, the invention provides the method and the system for structured analysis and reconstruction of the power communication demand text, which improve the depth of understanding the power communication professional text and the integrity of structured output, provide a complete automatic solution from document analysis to data generation for power communication demand management, and remarkably improve the accuracy and the efficiency of demand processing. The invention adopts the following technical scheme. The first aspect of the invention provides a method for structural analysis and reconstruction of a power communication demand text, which comprises the following steps: Performing structure analysis on the original demand document and then paging; Constructing a double-branch multi-mode deep learning model comprising text branches and visual branches, extracting text semantic features from each page of document through the text branches, extracting page layout structural features of each page of document through the visual branches, and judging the content type and positioning key content of each page of document based on the text semantic features and the page layout structural features; Constructing a prompt word template based on the key content under each content type, so that the large language model extracts key elements and attribute features related to the key content based on the constructed prompt word template; performing conflict detection on the association comparison of the key elements and the attribute characteristics, and constructing a new prompt word based on all information, conflict details and knowledge fields corresponding to the key elements related to the conflict when the key elements are in conflict, performing semantic arbitration on the large language model based on the new prompt word, and outputting a final attribute characteristic value of the key elements under the conflict; And mapping the structured demand data model into a standardized demand specification document to complete the reconstruction of the power communication demand te