Search

CN-122024956-A - Polyolefin material-oriented synthesis-structure-performance correlation knowledge graph construction method and application system

CN122024956ACN 122024956 ACN122024956 ACN 122024956ACN-122024956-A

Abstract

The invention discloses a polyolefin material-oriented synthesis-structure-performance correlation knowledge graph construction method and an application system. The method of the invention constructs a domain ontology layer containing products, processes, monomers, performances, applications, standards, assistants and structures aiming at unstructured and complicated brands of technical data sheets, patents and documents in the polyolefin domain, utilizes a large language model introducing domain constraint rules to carry out structured extraction and entity alignment on multi-source heterogeneous data, particularly realizes high-precision mapping on process identification, monomers, performances and structural analysis, and finally constructs a knowledge graph taking SSP logic as a core in a graph database. The application system of the invention realizes the performance analysis, material recommendation and new material design assistance of polyolefin brands through a multi-hop path reasoning and sub-graph matching algorithm based on the atlas.

Inventors

  • HONG XIAODONG
  • REN CONGJING
  • LIAO ZUWEI
  • Jia Tinghao
  • YANG YAO
  • YANG YONGRONG

Assignees

  • 浙江大学杭州国际科创中心

Dates

Publication Date
20260512
Application Date
20260123

Claims (10)

  1. 1. The method for constructing the synthesis-structure-performance correlation knowledge graph for the polyolefin material is characterized by comprising the following steps of: S1, constructing a domain ontology, and defining a special synthesis-structure-performance ontology model of polyolefin, wherein the ontology model at least comprises product nodes, process nodes, monomer nodes, auxiliary agent nodes, performance nodes, application nodes, standard nodes and structure nodes; S2, multi-source data acquisition and preprocessing are carried out to acquire a technical data table, patent text and literature data of polyolefin materials, and text cleaning, noise removal and format conversion are carried out to obtain a preprocessed unstructured text; S3, extracting entity relationships based on constraint rules, namely extracting the entity and attributes thereof from the unstructured text after pretreatment by utilizing a pre-training large language model in combination with the constraint rules in the polyolefin field, and identifying the association relationship and association attributes among the entities, wherein the constraint rules at least comprise product identification rules, process identification rules, monomer extraction rules, auxiliary agent extraction rules, performance extraction rules, application extraction rules, standard extraction rules and structure analysis rules; S4, aligning and normalizing the multi-granularity entity, and performing standardized mapping on the extracted entity, wherein the standardized mapping comprises normalization processing on product names, monomer names, process names, auxiliary agent names, application names, standard names, performance names and structure names so as to eliminate synonyms, abbreviations and ambiguity, and the multi-granularity comprises coarse granularity distance calculation and fine granularity semantic matching; And S5, carrying out map instantiation and storage, and storing the aligned entities and relations into a map database to form a polyolefin field knowledge map supporting multi-hop reasoning, wherein the map database supports node index and relation query optimization.
  2. 2. The method according to claim 1, wherein in the step S1, the product node comprises name, manufacturer and type attributes, the process node comprises name, reactor type and catalyst system attributes, the monomer node comprises name and chemical formula attributes, the auxiliary node comprises auxiliary name and type attributes, the performance node comprises name, unit and test standard attributes, the application node comprises name and industry attributes, the standard node comprises name and grade attributes, the structure node comprises microstructure characterization name and type attributes, the synthesis-structure-performance ontology model characterizes specific data by storing quantified attributes on the association relationship between entities, wherein the product node and the process node are connected through a 'produced in' relationship, the attribute of the 'produced in' relationship records the process operation condition, the product node and the monomer node are connected through a 'produced in' relationship, the attribute of the 'produced in' relationship records the comonomer usage or proportion, the product node and the auxiliary node are connected through a 'component containing' relationship, the auxiliary agent recording in a 'component containing relationship, the product node and the performance node have a microstructure property recording relationship, the product node and the performance node have a specific data by a' performance recording relationship, the attribute records of the microstructure relation comprise vectorized molecular weight distribution data and vectorized chemical composition distribution data, the product node is connected with the application node through the application relation, and the performance node is connected with the standard node through the test relation.
  3. 3. The method for constructing a synthetic-structure-property association knowledge graph for polyolefin materials according to claim 1, wherein in step S2, the format conversion includes converting a PDF file into a resolvable text.
  4. 4. The method for constructing the synthetic-structure-performance correlation knowledge graph for the polyolefin material according to claim 1, wherein in the step S3, the correlation at least comprises the steps of producing, aggregating, containing components, having performance, applying, having microstructure and testing, extracting the entity and the attribute thereof and identifying the correlation among the entities specifically comprises the steps of extracting the process condition parameters from texts and mapping the process condition parameters to the attribute of the relation, extracting the monomer proportion data and mapping the monomer proportion data to the attribute of the relation, extracting the additive amount data and mapping the additive amount data to the attribute of the relation, extracting the performance test result and mapping the additive amount data to the attribute of the relation, wherein the test result comprises numerical value, curve and picture, extracting or converting microscopic distribution curve data and mapping the microscopic distribution curve data to the vectorization attribute of the relation, and extracting the application fitness to the attribute of the relation; The pre-training large language model in the step S3 adopts a few-sample learning or prompt word engineering technology to enhance the field adaptability; In step S3, the product identification rules are used for identifying business marks and manufacturer entities in a technical data table and experimental sample identifications or synthetic material numbers in a literature text and filtering non-specific indicated general polyolefin nouns, the process identification rules are used for mapping keywords of a reactor type and a catalyst system and process operation conditions, the monomer extraction rules are used for identifying and analyzing monomer and comonomer abbreviations and the compositions of the monomer and the comonomer in a polymer, the performance extraction rules are used for separating result-unit-test condition triplets from unstructured descriptions and associating test standards, the application extraction rules are used for extracting processing modes and terminal application fields of materials from the text and mapping unstructured application descriptions into standard industry classifications and obtaining fitness evaluation values through semantic analysis, the auxiliary extraction rules are used for identifying types, chemical names and specific additive amount values or proportions of the auxiliary, the structure analysis rules are used for extracting characteristic parameters or curve characteristic data of molecular weight distribution and chemical composition distribution from the descriptive text or data table and providing structural input for the standardized representation, and the extraction rules are used for identifying test standard vector number and version information.
  5. 5. The method for constructing the synthetic-structure-performance associated knowledge graph for the polyolefin material according to claim 1, wherein in the step S4, a character string similarity algorithm is adopted for coarse-grained distance calculation, an embedded model is adopted for fine-grained semantic matching to calculate context similarity, and a predefined dictionary or a chemical database is used for normalization processing to normalize monomer names.
  6. 6. The method for constructing the synthetic-structure-performance associated knowledge graph for the polyolefin material according to claim 1, wherein in the step S5, the graph database is Neo4j or other graph databases supporting a Cypher query language, the storing process comprises the steps of creating nodes and relations, optimizing high-frequency query attributes by using indexes, verifying the integrity of the graph through SPARQL or equivalent query, and ensuring consistency of no isolated nodes and relations.
  7. 7. The method for constructing a synthetic-structure-property related knowledge graph for polyolefin materials according to any one of claims 1 to 6, further comprising the steps of: And S6, verifying and updating the knowledge graph, verifying the quality of the graph by calculating entity coverage rate and relation accuracy rate, and supporting incremental data import to realize dynamic updating of the graph.
  8. 8. A polyolefin material-oriented synthesis-structure-performance correlation knowledge graph application system, characterized by comprising: A knowledge graph construction module for executing the polyolefin material-oriented synthesis-structure-performance correlation knowledge graph construction method of any one of claims 1 to 7 to construct a synthesis-structure-performance correlation knowledge graph; The reasoning engine module is used for executing multi-hop path reasoning and sub-graph matching algorithm based on the constructed synthesis-structure-performance association knowledge graph, the multi-hop path reasoning uses Cypher query to realize association traversal among entities, and the sub-graph matching algorithm adopts GRAPHSAGE or other graph neural networks to calculate similarity; The user interaction module comprises a front-end interface and a back-end service and is used for receiving user query input and outputting a visual result; And the recommendation module is used for realizing performance bid matching of polyolefin brands, application recommendation and new material formula auxiliary design based on the reasoning engine module, and the new material formula auxiliary design deduces a process node and monomer node combination from a target performance node through reverse reasoning.
  9. 9. The polyolefin material-oriented synthesis-structure-performance correlation knowledge graph application system of claim 8, wherein the inference engine module comprises a machine learning component that uses Graph Neural Network to enhance sub-graph similarity calculations to support performance-similar brand matching; The back-end service of the user interaction module adopts a RESTful API interface, supports cloud deployment and data encryption, and ensures that the query response time is not more than 5 seconds; And when the recommendation module executes new material formula auxiliary design, a feasible process parameter combination is generated by combining a constraint optimization algorithm, and a confidence score is output.
  10. 10. The polyolefin material-oriented synthesis-structure-performance correlation knowledge graph construction method according to any one of claims 1 to 7 or the polyolefin material-oriented synthesis-structure-performance correlation knowledge graph application system according to claim 8 or 9 is used for application of performance analysis, material recommendation and new material design assistance of polyolefin materials.

Description

Polyolefin material-oriented synthesis-structure-performance correlation knowledge graph construction method and application system Technical Field The invention relates to the fields of material science, knowledge engineering and artificial intelligence, in particular to a method for constructing a polyolefin material synthesis-structure-performance (SSP) associated knowledge graph by utilizing a large language model and graph database technology and an application system thereof. Background Polyolefin (such as PE, PP, etc.) is used as a polymer material with the largest global yield and the most wide application, and plays an irreplaceable role in the key fields of packaging, automobiles, buildings, medical treatment, etc. With the increasing demand for custom and high-end materials properties in the downstream industry, polyolefin development and production is accelerating from traditional trial-and-error empirical driving to data-driven mode transformation. However, the current polyolefin field faces serious challenges in terms of data management and knowledge utilization, namely, on the one hand, data multisource isomerism and low utilization rate. The core technical data are scattered in a technical data Table (TDS), a patent specification and academic literature, are usually in the form of PDF or unstructured text, and contain massive information from a synthesis process, monomer composition to a microstructure, macroscopic performance and an application scene, so that the retrieval is difficult due to lack of uniform organization standards, and effective association analysis is difficult to carry out. On the other hand, the brands are complex, and the associated logic is deep. Polyolefin products are numerous in number and have highly nonlinear coupling logic between their synthetic-structure-properties. For example, the choice of polymerization process (e.g., gas phase process, slurry process) and monomer type (e.g., 1-butene, 1-hexene) directly determines the molecular chain structure and thus the final properties of tensile strength, impact strength, etc. However, existing data is often in an islanding state, lacking an inference model that can open up these cross-source knowledge and support performance targeting, bid replacement, or new material reverse engineering. In order to solve the problems, the academic world and the industry gradually introduce knowledge engineering and artificial intelligence technology to aim at constructing a knowledge graph in the material field. The knowledge graph is used as a structured semantic knowledge base, and can construct an interconnected data network through entities, attributes and relations to support complex semantic query and reasoning. In the field of general material science, research has been conducted on defining the composition-structure-performance relationship of materials by using an ontology model, and automatically extracting material information from mass documents in combination with Natural Language Processing (NLP) technology, so as to implement preliminary construction of a material database (CN 114238663B). In the knowledge extraction technology, with the development of deep learning, a Large Language Model (LLM) is excellent in the information extraction task. Through a pre-training model combined with field prompt engineering (Prompt Engineering), entities and relations can be extracted from unstructured texts efficiently, and a technical path is provided for constructing high-quality maps. In the field of polyolefin subdivision, related intelligent exploration has also been developed, and academia and industry attempt to build material knowledge bases to support performance prediction, for example, a preliminary structure-activity relationship model is established by associating a catalyst system with polymer performance by using a machine learning algorithm. In general, the prior art lays a foundation for intelligent management of polyolefin from ontology construction and data extraction to a preliminary application level. Despite the advances made in the art, there are significant limitations to the deep needs of polyolefin research and development (1) lack of domain-specific SSP bulk logic. The existing general material patterns are difficult to accurately characterize multidimensional association specific to polyolefin, in particular deep coupling logic between a 'synthetic layer' (such as a reactor type, a catalyst system and an initiator), 'structural layer' (such as a main monomer, a comonomer type and a proportion) and a 'performance layer' (such as numerical values under different test standards and conditions), and (2) entity extraction precision and normalization are insufficient. The polyolefin field has a large number of synonyms, abbreviations (such as C4/C6 refers to monomers) and complex process descriptions, the traditional NLP method is extremely easy to be interfered by context noise, so that extraction errors ar