CN-122021704-A - Industrial data processing method, equipment, medium and product based on large model
Abstract
The application relates to the technical field of industrial control, in particular to an industrial data processing method, equipment, medium and product based on a large model. According to the method, industrial data are subjected to examination analysis through a data examination strategy, the obtained data examination result is subjected to optimization processing, the optimized data are subjected to variable relevance analysis through a multi-agent dialect strategy, and the causal link relation of each characteristic variable is determined, so that characteristic selection and dimension reduction processing are carried out on each characteristic variable through the causal link relation, a target characteristic set is obtained, and then iterative self-correction is carried out on the target characteristic set through a retrieval enhancement generation strategy and a thinking link reasoning strategy to obtain standardized data which are used for model training, so that a trained industrial process prediction model is obtained, target parameters in an industrial process are predicted, and the processing efficiency of the industrial data is improved.
Inventors
- DU WENLI
- DUAN ZHAOYANG
- YE ZHENCHENG
- QIAN FENG
Assignees
- 华东理工大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260213
Claims (10)
- 1. A method of large model-based industrial data processing, the method comprising: The method comprises the steps of carrying out examination analysis on industrial data based on a preset data examination strategy to obtain a data examination result, wherein the data examination strategy is based on a large language model and a code interpreter mode to generate and execute a code program so as to carry out statistical analysis and boundary detection on the data; optimizing the industrial data based on the data examination result to obtain optimized data, wherein the optimized data represents the industrial data subjected to quality optimization and supplementation; Performing variable relevance analysis on the optimized data based on a preset multi-agent forensic strategy to determine causal link relation of each characteristic variable in the optimized data, wherein the multi-agent forensic strategy is characterized by performing semantic reasoning and statistical verification on the optimized data based on a domain expert agent, a data statistics agent and a decision agent to identify causal link relation among the variables; Performing feature selection and dimension reduction processing on the feature variables based on the causal link relation to obtain a target feature set, wherein the target feature set characterizes the data subset of the key features, which is compact in dimension and keeps the key features after causal verification of the optimized data; based on a preset search enhancement generation strategy and a thinking chain reasoning strategy, carrying out iterative self-correction on the target feature set to obtain standardized data; Model training is carried out based on the standardized data, and a trained industrial process prediction model is obtained so as to predict target parameters in the industrial process.
- 2. The method of claim 1, wherein the performing a censoring analysis on the industrial data based on the preset data censoring strategy to obtain a data censoring result comprises: Generating a code program based on a preset large language model to calculate statistics corresponding to the industrial data; And determining coverage range and noise level of the industrial data based on the statistic, and generating the data inspection result, wherein the data inspection result comprises at least one of data missing prompt, noise interval mark and data distribution evaluation.
- 3. The method of claim 1, wherein the performing variable relevance analysis on the optimization data based on a preset multi-agent forensic strategy, determining causal links of feature variables in the optimization data, comprises: Instantiating based on the multi-agent forensic strategy to obtain a domain expert agent, a data statistics agent and a decision agent; based on the domain expert intelligent agent, carrying out physical principle analysis on each characteristic variable, and determining causal assumptions among the characteristic variables; based on the data statistics agent, carrying out statistics test on the optimized data to obtain a statistics verification result of the causal hypothesis; And based on the decision agent, comprehensively arbitrating the causal hypothesis and the statistical verification result to determine the causal link relation.
- 4. The method of claim 1, wherein performing feature selection and dimension reduction processing on the feature variables based on the causal link relationship to obtain a target feature set, comprising: Based on the causal chain relation and the large language model, carrying out semantic grouping on each characteristic variable to obtain a plurality of functional groups; Based on the dimension reduction model corresponding to each functional group, respectively carrying out dimension compression on the characteristic variables in the corresponding functional group to generate hidden variables corresponding to each functional group; And obtaining the target feature set based on hidden variables of each functional group.
- 5. The method of claim 1, wherein iteratively self-correcting the set of target features based on a preset search enhancement generation strategy and a mental chain reasoning strategy to obtain normalized data comprises: determining matched standardized rules from a preset regular database based on the metadata of the target feature set; when no matched standardized rule is searched, based on the search enhancement generation strategy, acquiring corresponding industry standards from a preset industry knowledge base; based on the standardized rule or the industry standard, generating a data conversion code by combining the thought chain reasoning strategy; Based on the data conversion code, carrying out format unification and unit conversion on the target feature set; and when the data conversion code fails to execute or the data conversion generates an abnormal value, feeding back error information to a large language model, and iteratively carrying out code correction and re-conversion based on the large language model until the standardized data is obtained.
- 6. The method of claim 1, wherein prior to conducting the censoring analysis on the industrial data based on the preset data censoring strategy, the method further comprises: Acquiring original industrial data; And carrying out structural pretreatment on the original industrial data to obtain the industrial data, wherein the structural pretreatment comprises at least one of missing value processing, abnormal value detection and data consistency check.
- 7. The method of claim 1, wherein the target parameter comprises at least one of a reactor outlet concentration, a temperature, or a pressure.
- 8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that, The processor, when executing the computer program, implements the steps of the method of any one of claims 1 to 7.
- 9. A computer storage medium having stored thereon computer program instructions, characterized in that, Which computer program instructions, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 7.
- 10. A computer program product comprising computer program instructions, characterized in that, Which computer program instructions, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 7.
Description
Industrial data processing method, equipment, medium and product based on large model Technical Field The application relates to the technical field of industrial process control, in particular to an industrial data processing method, equipment, medium and product based on a large model. Background The chemical engineering is used as a basic stone in modern industry and is widely applied to the key fields of energy, chemical industry, pharmacy, materials and the like. With the deep advancement of industrial intelligent transformation, a modeling technology based on data driving has become a key means for improving the production efficiency and optimizing the operation of chemical processes. The technology builds a prediction model by analyzing mass data generated in the industrial process so as to realize soft measurement and optimal control of key parameters, and has important significance for realizing intelligent and refined operation of the industrial process. Currently, industrial data processing and modeling mainly rely on traditional data preprocessing, feature engineering and machine learning methods. In the data preprocessing stage, automatic scripts based on threshold values or statistical rules are generally adopted to fill missing values and reject abnormal values, in the characteristic engineering stage, most methods rely on linear analysis technologies such as Pearson correlation coefficients and the like to carry out characteristic screening or use unsupervised methods such as principal component analysis and the like to reduce dimensions, and in the data standardization stage, manual writing rules or fixed scripts are generally adopted to process multi-source heterogeneous data. However, the conventional data preprocessing method cannot actively identify the integrity defect of the data coverage, and still generates unreliable prediction results when the prediction request exceeds the training data range. Moreover, conventional feature selection methods have difficulty in mining complex nonlinear causal relationships between variables, resulting in a lack of physical interpretability of feature subsets. In addition, the standardized method based on the fixed rule has high maintenance cost and poor adaptability and severely restricts the performance and prediction accuracy of the soft measurement model in the face of the heterogeneous and dynamic data characteristics of the industrial field. Therefore, there is a need for a large model-based industrial data processing method to improve the industrial data processing efficiency. Disclosure of Invention The invention provides an industrial data processing method, equipment, medium and product based on a large model, which are used for improving the prediction efficiency and accuracy of the thickness of a coke layer of an ethylene cracking furnace. In a first aspect, the present application provides a method for large model-based industrial data processing, the method comprising: The method comprises the steps of carrying out examination analysis on industrial data based on a preset data examination strategy to obtain a data examination result, wherein the data examination strategy is based on a large language model and a code interpreter mode to generate and execute a code program so as to carry out statistical analysis and boundary detection on the data; optimizing the industrial data based on the data examination result to obtain optimized data, wherein the optimized data represents the industrial data subjected to quality optimization and supplementation; Performing variable relevance analysis on the optimized data based on a preset multi-agent forensic strategy to determine causal link relation of each characteristic variable in the optimized data, wherein the multi-agent forensic strategy is characterized by performing semantic reasoning and statistical verification on the optimized data based on a domain expert agent, a data statistics agent and a decision agent to identify causal link relation among the variables; Performing feature selection and dimension reduction processing on the feature variables based on the causal link relation to obtain a target feature set, wherein the target feature set characterizes the data subset of the key features, which is compact in dimension and keeps the key features after causal verification of the optimized data; based on a preset search enhancement generation strategy and a thinking chain reasoning strategy, carrying out iterative self-correction on the target feature set to obtain standardized data; Model training is carried out based on the standardized data, and a trained industrial process prediction model is obtained so as to predict target parameters in the industrial process. Optionally, the performing the censoring analysis on the industrial data based on the preset data censoring policy to obtain a data censoring result includes: Generating a code program based on a preset large language mo