CN-121979940-A - Automatic generation standardized OT data management method and system based on large model
Abstract
The invention provides a method and a system for automatically generating standardized OT data based on a large model, wherein the method comprises the following steps of S1, constructing an OT metadata body base into an OT term vector space according to standard regularized metadata meeting preset requirements, S2, mapping a natural language instruction to the OT term vector space through the large model to obtain standard structured OT data, and S3, performing quality verification and enhancement processing on the standard structured OT data. The method can improve the consistency of data, solve naming conflict in the integration process through unified cloud standard specification, remarkably enhance the interoperability of data, implement efficiency revolution, effectively reduce the workload of data cleaning, realize the quality change from project system to product from traditional manpower integration and provide basic support of high-quality training data for large-scale industrial deployment.
Inventors
- GUO YINCHEN
- WU YIPING
- WU HAOYANG
- Feng Juanyong
- Wu Yuecong
- DAI ZHENHU
Assignees
- 上海宝信软件股份有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260115
Claims (10)
- 1. The method for automatically generating and managing the standardized OT data based on the large model is characterized by comprising the following steps of: s1, constructing an OT metadata body library into an OT term vector space according to standard regularized metadata meeting preset requirements; S2, mapping a natural language instruction to an OT term vector space through a large model to obtain standard structured OT data; and S3, performing quality verification and enhancement processing on the standard structured OT data.
- 2. The method for automatically generating standardized OT data management based on the large model according to claim 1, wherein the step S1 comprises: S1.1, natural language description information of various activities, states and events in an OT system is obtained; s1.2, analyzing natural language description information through semantic role labeling based on a field self-adaptive pre-training model to extract element groups meeting preset requirements, and establishing term relationship topology by adopting an ontology reasoning engine to form an OT metadata ontology library; S1.3, identifying conflict terms by using a term alignment model constructed by standard standardized metadata and based on a Transformer, and adjusting the conflict terms by using a preset expert decision matrix to form a heterogeneous document knowledge graph; And S1.4, converting natural language specifications into multi-mode machine instructions through computable rule compilation, wherein the method comprises the steps of coding event sequence rules by using a sequential logic expression, defining data field constraint by using a lightweight JSON Schema, and constructing an OT term vector space to realize concept semantic indexing.
- 3. The method for automatically generating standardized OT data management based on a large model according to claim 1, wherein the step S2 further comprises training the large model to enable the large model to obtain standard structured OT data by mapping natural language instructions based on OT term vector space; The training base of the Large model adopts DeBERTa-v3-Large to pretrain in multiple modes including time sequence, images and texts, so that cross-mode alignment of data semantic metadata and space-time context metadata is realized.
- 4. The method for automatically generating standardized OT data management based on the large model according to claim 1, wherein the step S2 comprises: S2.1, deconstructing user natural language input by adopting an instruction analysis tree, and executing multi-granularity semantic matching based on an OT term vector space to generate a candidate standardized OT data entity set; s2.2, inputting the candidate standardized OT data entity set into a differential algebraic equation solver, performing physical rationality verification on the candidate entity values, and screening the entities meeting preset constraints; S2.3, performing system-level consistency check on entities meeting preset constraint through a graph neural network propagation layer; And S2.4, performing edge side JSON Schema format verification and ontology logic rule verification by a lightweight rule verification container through a system-level consistency verification entity, and outputting final standard structured OT data.
- 5. The method for automatically generating standardized OT data management based on the large model according to claim 1, wherein the step S3 comprises: S3.1, verifying event link compliance by adopting a sequential logic model detector, and detecting physical rule abnormality by using an abnormality diagnosis model deployed in a device causal graph; S3.2, carrying out multi-mode confidence score C Comprehensive synthesis =δ·C Model +(1-δ)·C Expert on the generated data points, wherein delta is weight, C Model is model evaluation score , C Expert and expert committee score, triggering event sequence compression algorithm to extract time sequence mode when the confidence is lower than a threshold value, and associating expert evaluation annotation text to construct multi-mode fault cases; and S3.3, adding the self-adaptive Laplace noise to the aggregated query result to realize data enhancement processing.
- 6. An automated generation standardized OT data management system based on a large model, comprising: The module M1 constructs an OT metadata body library into an OT term vector space according to standard regularized metadata meeting preset requirements; The module M2 is used for mapping natural language instructions to an OT term vector space through a large model to obtain standard structured OT data; And a module M3, performing quality verification and enhancement processing on the standard structured OT data.
- 7. The automated generation of standardized OT data management system based on large models of claim 6 wherein the module M1 comprises: the module M1.1 is used for acquiring natural language description information of various activities, states and events in the OT system; the module M1.2 is used for extracting element groups meeting preset requirements through analyzing natural language description information through semantic role labeling based on a field self-adaptive pre-training model, and then establishing term relation topology by adopting an ontology reasoning engine to form an OT metadata ontology library; the module M1.3, utilizing a rule alignment model based on a Transformer constructed by standard standardization metadata to identify conflict rules and utilizing a preset expert decision matrix to adjust the conflict rules so as to form a heterogeneous document knowledge graph; The module M1.4 converts natural language specifications into multi-mode machine instructions through computable rule compilation, and comprises the steps of coding event sequence rules by using a sequential logic expression, defining data field constraint by lightweight JSON Schema, and constructing OT term vector space to realize concept semantical indexing.
- 8. The automated generation of standardized OT data management systems based on large models of claim 6 wherein the module M2 further comprises training the large model to enable the large model to spatially map natural language instructions based on OT term vectors to standard structured OT data; The training base of the Large model adopts DeBERTa-v3-Large to pretrain in multiple modes including time sequence, images and texts, so that cross-mode alignment of data semantic metadata and space-time context metadata is realized.
- 9. The automated generation of standardized OT data management system based on large models of claim 6 wherein the module M2 comprises: a module M2.1, adopting an instruction analysis tree to deconstruct user natural language input, and executing multi-granularity semantic matching based on an OT term vector space to generate a candidate standardized OT data entity set; Inputting the candidate standardized OT data entity set into a differential algebraic equation solver, performing physical rationality verification on the candidate entity values, and screening the entities meeting preset constraints; a module M2.3, performing system-level consistency check on entities meeting preset constraint through a graph neural network propagation layer; And the module M2.4 is used for executing edge side JSON Schema format verification and ontology logic rule verification by a lightweight rule verification container through a system-level consistency verification entity and outputting final standard structured OT data.
- 10. The automated generation of standardized OT data management system based on large models of claim 6 wherein the module M3 comprises: The module M3.1 adopts a sequential logic model detector to verify event link compliance and an abnormality diagnosis model deployed in the equipment causal graph to detect physical rule abnormality; The module M3.2 is used for carrying out multi-mode confidence score C Comprehensive synthesis =δ·C Model +(1-δ)·C Expert on the generated data points, wherein delta is weight, C Model is model evaluation score , C Expert and expert committee score, and when the confidence level is lower than a threshold value, an event sequence compression algorithm is triggered to extract a time sequence mode, and related expert evaluation annotation text is used for constructing multi-mode fault cases; And a module M3.3, adding the aggregation query result into the self-adaptive Laplace noise to realize data enhancement processing.
Description
Automatic generation standardized OT data management method and system based on large model Technical Field The invention relates to the technical field of data management, in particular to a method and a system for automatically generating standardized OT data management according to a large model. Background In the industry, data governors need to manually enter and deposit data into appropriate standard data according to their own expertise when using an extensible and configurable industrial OT data management tool. This process is not only time consuming and laborious, but also prone to error. The standard of data input is inconsistent due to different expertise levels and experiences of different data governors, so that the quality and usability of the data are affected. In addition, the data approval process is complicated, and the data can be issued and used only after approval by multi-level management staff. The method not only prolongs the release time of the data, but also can reduce the timeliness of the data, and can not meet the requirements of production operation in time. Disclosure of Invention Aiming at the defects in the prior art, the invention aims to provide a method and a system for automatically generating standardized OT data management based on a large model. The invention provides a method for automatically generating standardized OT data management based on a large model, which comprises the following steps: s1, constructing an OT metadata body library into an OT term vector space according to standard regularized metadata meeting preset requirements; S2, mapping a natural language instruction to an OT term vector space through a large model to obtain standard structured OT data; and S3, performing quality verification and enhancement processing on the standard structured OT data. Preferably, the step S1 includes: S1.1, natural language description information of various activities, states and events in an OT system is obtained; s1.2, analyzing natural language description information through semantic role labeling based on a field self-adaptive pre-training model to extract element groups meeting preset requirements, and establishing term relationship topology by adopting an ontology reasoning engine to form an OT metadata ontology library; S1.3, identifying conflict terms by using a term alignment model constructed by standard standardized metadata and based on a Transformer, and adjusting the conflict terms by using a preset expert decision matrix to form a heterogeneous document knowledge graph; And S1.4, converting natural language specifications into multi-mode machine instructions through computable rule compilation, wherein the method comprises the steps of coding event sequence rules by using a sequential logic expression, defining data field constraint by using a lightweight JSON Schema, and constructing an OT term vector space to realize concept semantic indexing. Preferably, the step S2 further comprises training the large model to enable the large model to realize that the natural language instruction is mapped to standard structured OT data based on the OT term vector space; The training base of the Large model adopts DeBERTa-v3-Large to pretrain in multiple modes including time sequence, images and texts, so that cross-mode alignment of data semantic metadata and space-time context metadata is realized. Preferably, the step S2 includes: S2.1, deconstructing user natural language input by adopting an instruction analysis tree, and executing multi-granularity semantic matching based on an OT term vector space to generate a candidate standardized OT data entity set; s2.2, inputting the candidate standardized OT data entity set into a differential algebraic equation solver, performing physical rationality verification on the candidate entity values, and screening the entities meeting preset constraints; S2.3, performing system-level consistency check on entities meeting preset constraint through a graph neural network propagation layer; And S2.4, performing edge side JSON Schema format verification and ontology logic rule verification by a lightweight rule verification container through a system-level consistency verification entity, and outputting final standard structured OT data. Preferably, the step S3 includes: S3.1, verifying event link compliance by adopting a sequential logic model detector, and detecting physical rule abnormality by using an abnormality diagnosis model deployed in a device causal graph; S3.2, carrying out multi-mode confidence score C Comprehensive synthesis =δ·C Model +(1-δ)·C Expert on the generated data points, wherein delta is weight, C Model is model evaluation score ,C Expert and expert committee score, triggering event sequence compression algorithm to extract time sequence mode when the confidence is lower than a threshold value, and associating expert evaluation annotation text to construct multi-mode fault cases; and S3.3, addi