Search

CN-121981224-A - Construction method of full life cycle knowledge base of energy equipment based on multi-source data

CN121981224ACN 121981224 ACN121981224 ACN 121981224ACN-121981224-A

Abstract

The invention relates to the technical field of data processing and artificial intelligence, in particular to a method for constructing a full life cycle knowledge base of energy equipment based on multi-source data, which comprises the steps of collecting the multi-source data; classifying the multisource data according to a file format, calling a corresponding analysis model to extract text content, outputting corresponding text features and semantic content to form a primary knowledge unit, disassembling and factorization processing the primary knowledge unit, combining a knowledge graph technology to construct a hierarchical relationship, a causal relationship and a cooperative relationship among elements, performing multi-role collaborative assessment on the generated knowledge unit, screening high-quality knowledge units through comparison of weighted comprehensive scores and preset thresholds, and classifying and integrating the screened high-quality knowledge units according to service links and application fields to construct a knowledge base. By the method, the problem of cross-link information isolation in the prior art can be effectively solved, and the data management and knowledge management level of the whole equipment manufacturing and operation and maintenance process can be improved.

Inventors

  • WANG YUXIANG
  • HUANG LEI
  • ZHAO HANMING
  • LIU WENBO
  • DONG HUALI
  • ZHOU HONGLIN
  • LIAO HONG
  • ZHANG CHUAN

Assignees

  • 东方电气自动控制工程有限公司

Dates

Publication Date
20260505
Application Date
20260112

Claims (10)

  1. 1. The full life cycle knowledge base construction method of the energy equipment based on the multi-source data is characterized by comprising the following steps: s 1 , collecting structured and unstructured multi-source data, preprocessing the multi-source data, and establishing a unified data storage model and a document index system; s 2 , based on an index system, integrating three core characteristics of associated file formats, content information and metadata, classifying multi-source data according to the file formats, calling a corresponding analysis model to extract text content, outputting the corresponding text characteristics and semantic content, and forming a primary knowledge unit; S 3 , carrying out disassembly and factorization on the original knowledge units, and constructing a hierarchical relationship, a causal relationship and a cooperative relationship among the elements by combining a knowledge graph technology to realize multidimensional knowledge fusion; s 4 , performing multi-role collaborative evaluation on the generated knowledge units, and screening high-quality knowledge units by comparing the weighted comprehensive scores with a preset threshold value; And step 5 , classifying and integrating the screened high-quality knowledge units according to the service links and the application fields, constructing a cross-field and multi-level knowledge base, and simultaneously eliminating repeated redundant contents to support dynamic updating and expansion.
  2. 2. The method for constructing a full life cycle knowledge base of energy equipment based on multi-source data according to claim 1, wherein the multi-source data comprises expert knowledge documents, CAD/CAE drawing files and simulation analysis reports of a design stage, technical rules and quality detection reports of a manufacturing stage, performance test records of a test stage, fault maintenance sheets of an operation and maintenance stage and sensor monitoring logs.
  3. 3. The method for constructing the full life cycle knowledge base of the energy equipment based on the multi-source data according to claim 1, wherein the preprocessing is specifically to clean, format and time synchronization processing of the multi-source data by adopting a data standardization method under the support of metadata management.
  4. 4. The method for constructing a full life cycle knowledge base of an energy device based on multi-source data as set forth in claim 1, wherein the step S 2 comprises the following steps: S 21 , performing preliminary classification on the multi-source data according to a file format by adopting a rule matching mechanism to obtain four types of text files, non-text pictures, CAD design drawings and files which cannot be edited directly; Step S 22 , analyzing the file content by adopting a multi-mode combination algorithm to carry out fine granularity classification and identification, wherein the method comprises the following steps: For the text file, carrying out content classification by adopting a text classification model based on a support vector machine; for the non-text pictures, performing image content classification by adopting a pre-trained AlexNet model; analyzing metadata and layer information of the CAD design drawing through a ezdxf tool library, taking the extracted metadata and layer information as core classification characteristics, inputting the core classification characteristics into a support vector machine model, and completing content classification of the CAD design drawing; And S 23 , analyzing the classified file content, extracting core knowledge elements and forming a primary knowledge unit.
  5. 5. The method for constructing a full life cycle knowledge base of energy equipment based on multi-source data as claimed in claim 4, wherein the step S 21 is specifically to read header information of the first several bytes of a file, and combine a preset 'header-type' mapping table to perform rule matching, and meanwhile, a Python-magic library of Python language is adopted to call a built-in mature header rule set, so as to execute automatic identification of physical types of the file and realize preliminary classification of the multi-source data.
  6. 6. The method for constructing the full life cycle knowledge base of the energy equipment based on the multi-source data according to claim 1, wherein the method is characterized by comprising the following steps of: The method comprises the steps of using python-docx to uniformly convert texts with different formats into pure text character strings of UTF-8 codes through data preprocessing, filtering invalid characters in the texts through a regular expression, simultaneously performing word segmentation on the texts, disassembling the long texts into words/phrase units capable of being analyzed, constructing feature vectors meeting the input requirements of a model by taking keyword frequency and semantic vectors as cores, inputting the feature vectors into a trained SVM model, and outputting class probability of text contents.
  7. 7. The method for constructing the full life cycle knowledge base of the energy equipment based on the multi-source data, which is disclosed in claim 6, is characterized in that the method for constructing the feature vector which meets the input requirements of the model by taking the keyword frequency and the semantic vector as a core comprises the following steps: The method comprises the steps of calculating the weight of each Word after Word segmentation by adopting a TF-IDF algorithm to generate a basic feature matrix, introducing a pre-training Word2Vec model to map Word segmentation results into high-dimensional semantic vectors, capturing semantic association among the words, splicing the basic feature matrix and the semantic vectors according to columns to form high-dimensional combined features, and carrying out dimension reduction processing on the high-dimensional combined features to generate feature vectors meeting the input requirements of the model.
  8. 8. The method for constructing a full life cycle knowledge base of energy equipment based on multi-source data as set forth in claim 1, wherein the step S 3 is specifically to decompose a native knowledge unit into normalized knowledge elements, perform semantic labeling and version management, and construct a multi-stage cross-domain knowledge correlation model including overlay design, manufacturing, operation and maintenance and management by using a graph neural network in combination with a knowledge graph technology.
  9. 9. The method for constructing the full life cycle knowledge base of the energy equipment based on the multi-source data, which is characterized in that the multi-role collaborative evaluation specifically comprises the steps of combining preliminary quality evaluation of a document uploading person, compliance evaluation of an approver and professional evaluation of a field expert to construct a multi-source evaluation system.
  10. 10. The method for constructing the full life cycle knowledge base of the energy equipment based on the multi-source data, which is disclosed in claim 1, is characterized in that the step S 5 specifically comprises the steps of carrying out semantic alignment according to a preset cross-domain mapping rule and a conflict resolution mechanism based on the high-quality knowledge units screened in the step S 4 and knowledge elements constructed in the step S 3 and association relations thereof, constructing a multi-layer index structure and a multi-dimensional label system based on the association relations of the knowledge patterns, and forming a multi-layer multi-dimensional intelligent knowledge base of the energy equipment covering a design optimization base, a manufacturing process base, an operation and maintenance guide base and an operation management base.

Description

Construction method of full life cycle knowledge base of energy equipment based on multi-source data Technical Field The invention relates to the technical field of data processing and artificial intelligence, in particular to a method for constructing a full life cycle knowledge base of energy equipment based on multi-source data. Background Currently, energy equipment manufacturing is in a deep transformation stage of digitization and intellectualization. The industry accumulates a large amount of multi-source heterogeneous information resources including design drawings, manufacturing process specifications, test detection records, operation, maintenance and repair data and the like. However, the existing knowledge base construction mode mainly depends on manual arrangement experience and static document storage, and has a plurality of limitations, namely: (1) Knowledge update lags. With the continuous emergence of new technology, new technology and advanced operation and maintenance experience, the prior knowledge base cannot realize the real-time adoption and rapid update of the latest knowledge, and the knowledge maintenance work is highly dependent on manual intervention, so that the knowledge system is difficult to reflect the latest dynamics of actual production and operation and maintenance in time. (2) The knowledge island effect is prominent. Information of various links such as design, manufacture, detection, operation and maintenance are often in a relatively independent system, and lack of effective cross-domain association and integration, so that knowledge resources are scattered and repeatedly built, and a complete full life cycle knowledge network is difficult to form. (3) The intelligent reasoning and dynamic matching capability is insufficient. In the face of complex and changeable production, manufacturing and operation and maintenance decision-making scenes, the traditional static knowledge base lacks intelligent reasoning and dynamic knowledge matching capability based on context, and is difficult to meet the real-time and personalized knowledge service requirements. (4) Knowledge representation lacks structuring and computational power. A large amount of key knowledge exists in the form of unstructured text or two-dimensional design drawing, a unified and computable expression system is lacked, and the realization of automatic machine reasoning, knowledge mining and intelligent calling is limited. For example, several knowledge base construction schemes have been proposed in the prior art, namely, the chinese invention patent document with publication number CN119809387a and the chinese invention patent document with publication number CN118014072 a. The schemes generally lack the capability of effectively extracting and fusing multi-modal knowledge such as design drawings, operation and maintenance logs and the like, and are difficult to support automatic identification and extraction of entities and semantic relations thereof from multi-source heterogeneous data such as CAD files, sensor data and the like, so that information among business links such as design, manufacture, operation and maintenance and the like is split, and a cross-link information island is formed. Disclosure of Invention In order to solve the technical problems, the invention provides a multi-source data-based full life cycle knowledge base construction method for energy equipment, which can effectively solve the problem of cross-link information isolation in the prior art and can improve the data management and knowledge management level of the whole equipment manufacturing and operation and maintenance process. The invention is realized by adopting the following technical scheme: A full life cycle knowledge base construction method of energy equipment based on multi-source data comprises the following steps: s 1, collecting structured and unstructured multi-source data, preprocessing the multi-source data, and establishing a unified data storage model and a document index system; s 2, based on an index system, integrating three core characteristics of associated file formats, content information and metadata, classifying multi-source data according to the file formats, calling a corresponding analysis model to extract text content, outputting the corresponding text characteristics and semantic content, and forming a primary knowledge unit; S 3, carrying out disassembly and factorization on the original knowledge units, and constructing a hierarchical relationship, a causal relationship and a cooperative relationship among the elements by combining a knowledge graph technology to realize multidimensional knowledge fusion; s 4, performing multi-role collaborative evaluation on the generated knowledge units, and screening high-quality knowledge units by comparing the weighted comprehensive scores with a preset threshold value; And step 5, classifying and integrating the screened high-quality knowledge units ac