CN-122019983-A - Large-model-based program code and industrial data association analysis and mining method
Abstract
The invention relates to a program code and industrial data association analysis and mining method based on a large model, in particular to the field of industrial data association analysis and mining, which is characterized in that the real-time sensing of data deficiency, noise and abnormality is carried out through dynamic quality evaluation, high-fidelity data is generated under the constraint of a physical rule by utilizing condition countermeasures and repair, the information reliability is improved from the source, a quality sensing element controller drives an analysis model to dynamically adjust parameters and strategies according to the data quality, the accurate and self-adaptive association mining is realized, and finally, the quality evaluation, repair and decision module is subjected to global collaborative optimization through reinforcement learning, so that the whole system can be automatically adapted to complex industrial environments such as sensor drift, process change and the like in long-term operation, and the accuracy of analysis conclusion, the robustness of decision and the stability of full life cycle are obviously improved.
Inventors
- QIN JUN
- MAO XIANXIN
- LEI JIAN
- ZHANG JINGRU
Assignees
- 北京领翼工软科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260213
Claims (10)
- 1. The large model-based program code and industrial data association analysis and mining method is characterized by comprising the following steps of: Step S1, responding to input of an original industrial data stream from a sensor network and associated environmental context data by a quality evaluation module, carrying out sliding window segmentation on the original industrial data stream to obtain a data window to be evaluated, parallelly calculating a plurality of basic quality indexes of the data window, including a deletion rate, a signal-to-noise ratio and a statistical anomaly score, calculating the uncertainty of a prediction error of a current data window by using a lightweight prediction model pre-trained based on a historical segment of the original industrial data stream as a context deviation degree, combining the deletion rate, the signal-to-noise ratio, the statistical anomaly score and the context deviation degree to form a dynamic quality vector of the current data window, and arranging the dynamic quality vectors corresponding to the data windows according to time sequence to form a dynamic quality vector sequence; S2, inputting the dynamic quality vector obtained in the step S1 and a corresponding data window obtained by segmentation in the step S1 into a condition countermeasure repair network together; the condition countermeasure repair network comprises a generator and a discriminator, wherein the generator takes a data window and a dynamic quality vector as condition input, generates a repaired data window, and generates a specific behavior mode of the repaired data window according to a quality state represented by the dynamic quality vector, and is defined as a repair strategy; the discriminator takes the data window sample to be discriminated and the corresponding dynamic quality vector as the condition input, and performs the discrimination of the authenticity and rationality of the data window sample under the corresponding quality condition, wherein the data window sample to be discriminated comprises the repaired data window generated by the generator and the preset real clean data sample; S3, inputting the repaired data window and the corresponding dynamic quality vector output in the step S2 into a quality-aware meta-controller, analyzing the dynamic quality vector sequence by the quality-aware meta-controller, and outputting a fine tuning parameter or a strategy instruction aiming at an associated analysis program code model; and S4, constructing the quality evaluation module in the step S1, the condition countermeasure repair network in the step S2 and the quality perception meta-controller in the step S3 together into a reinforcement learning intelligent agent, wherein the reinforcement learning intelligent agent takes a dynamic quality vector sequence, a model performance index and historical actions of the reinforcement learning intelligent agent as states, takes a threshold value of the quality evaluation module, a repair strategy of the condition countermeasure repair network and a strategy instruction of the quality perception meta-controller as actions, takes comprehensive analysis precision and long-term accumulated return of a system stability index as total rewards, and continuously optimizes the strategy of the reinforcement learning intelligent agent by using a reinforcement learning algorithm by maximizing the long-term accumulated return.
- 2. The method for analyzing and mining the correlation between the program code and the industrial data based on the large model as set forth in claim 1, wherein in the step S1, the specific process of calculating the plurality of basic quality indexes of the data window in parallel is as follows: For each data window, calculating the ratio of the number of invalid or null data points to the total number of data points to obtain the missing rate; Separating an original industrial data stream in a data window into a trend item representing a long-term change trend and a residual item representing random fluctuation through a preset signal decomposition algorithm, calculating the square of each data point value contained in the trend item, summing all square values to obtain trend energy, calculating the square of each data point value contained in the residual item, summing all square values to obtain residual energy, then calculating the ratio of the trend energy to the residual energy, and then calculating the logarithm value of the ratio, which is ten at the bottom, and multiplying the logarithm value by ten to obtain the signal to noise ratio; Performing unsupervised anomaly detection on all data points in a data window by adopting an isolated forest algorithm, and calculating the average value of anomaly scores of all data points to obtain statistical anomaly scores; The raw industrial data stream comprises time sequence data collected by one or more of a temperature sensor, a pressure sensor and a vibration sensor, and the environmental context data comprises equipment running state identification, environmental temperature data and environmental humidity data.
- 3. The method for analyzing and mining the correlation between the program code and the industrial data based on the large model as set forth in claim 2, wherein the calculating the uncertainty of the prediction error of the current data window by using the lightweight prediction model pre-trained based on the historical segments of the original industrial data stream comprises the following specific steps: Predicting each data point in the current data window by using a light-weight prediction model pre-trained based on the historical segments of the original industrial data stream, and outputting a predicted value mean value estimation and a predicted value standard deviation estimation corresponding to each data point; for each data point in the current data window, determining a numerical interval according to the predicted value mean value estimation, the predicted value standard deviation estimation and a preset confidence level coefficient of the data point, wherein the center of the interval is the predicted value mean value estimation, and the radius of the interval is the product of the predicted value standard deviation estimation and the confidence level coefficient; For each data point in the current data window, judging whether the actual observed value of the data point falls outside a corresponding numerical value interval, if so, generating a first judging value, and if so, generating a second judging value different from the first judging value; calculating, for each data point in the current data window, a quotient of the standard deviation estimate of the predicted value of the data point and the mean value estimate of the predicted value of the data point; multiplying a quotient corresponding to each data point in the current data window by a first judgment value or a second judgment value corresponding to the data point to obtain a weighted deviation amount of the data point; And adding the weighted deviation amounts of all the data points in the current data window, dividing the weighted deviation amounts by the total number of the data points in the current data window, and obtaining a result which is the context deviation degree.
- 4. The method for analyzing and mining the correlation between the program code and the industrial data based on the large model as set forth in claim 3, wherein in the step S2, the generator inputs the data window and the dynamic quality vector as conditions, and the specific operation of generating the repaired data window is as follows: The generator internally comprises a quality perception routing module which analyzes the deletion rate, the signal-to-noise ratio, the statistical anomaly score and the context deviation degree contained in the dynamic quality vector, and selectively activates different processing sub-networks in the generator according to the comparison result of the values of the deletion rate, the signal-to-noise ratio, the statistical anomaly score and the context deviation degree and the corresponding preset threshold value set in the generator: When the value of the missing rate in the dynamic quality vector is higher than a first preset threshold value for judging whether to trigger data complementation, the quality perception routing module activates a data complementation sub-network, and the sub-network fills the missing position in a generating mode by utilizing the space-time correlation of effective data points in a data window; when the reciprocal of the signal-to-noise ratio value in the dynamic quality vector is higher than a second preset threshold value for judging whether to trigger noise reduction, the quality perception routing module activates a noise reduction filtering sub-network and carries out self-adaptive filtering processing on the data window; when the value of the statistical anomaly score or the value of the context deviation degree in the dynamic quality vector is higher than a third preset threshold value for judging whether to trigger the anomaly reconstruction, the quality perception routing module activates the anomaly reconstruction sub-network, and the sub-network reconstructs the anomaly data according to a physical rule defined by a simplified process mechanism equation and an adjacent normal mode of a data window; the generator synthesizes the output of the activated sub-network, the generated result is the repaired data window, the generator selectively activates different processing sub-networks through the quality perception routing module according to the dynamic quality vector and synthesizes the output of the processing sub-network, and the specific behavior mode is defined as the repairing strategy of the generator.
- 5. The method for analyzing and mining the correlation between the program code and the industrial data based on the large model as set forth in claim 4, wherein the specific process of reconstructing the loss function by introducing physical constraints and applying the physical rule constraint based on the simplified process mechanism equation to the output of the generator is as follows: predefining a simplified mechanism equation operator describing the physical relationship of the industrial process, inputting the repaired data window output by the generator into the simplified mechanism equation operator, and outputting a physical residual vector corresponding to the repaired data window after calculation of the simplified mechanism equation operator; Calculating the square of the numerical value of each element of the physical residual vector in the physical residual vector, adding and summing all the square values, and taking the result obtained by summation as a physical constraint reconstruction loss value; And when the condition countermeasure repairing network is trained, the physical constraint reconstruction loss value and the generator countermeasure loss generated by the judgment of the output result of the generator by the discriminator are subjected to weighted summation to form a total loss function of the generator, wherein the generator countermeasure loss is used for enabling the output result of the generator to meet the authenticity requirement.
- 6. The method for analyzing and mining the correlation between the program code and the industrial data based on the large model as set forth in claim 5, wherein the training process of the condition countermeasure repair network is specifically as follows: The discriminator takes the data window sample to be discriminated and the corresponding dynamic quality vector as the condition input, and performs the discrimination of the authenticity and rationality of the data window sample under the quality condition represented by the corresponding dynamic quality vector, wherein the data window sample to be discriminated comprises the repaired data window generated by the generator and the preset real clean data sample, and in the training context, the sample is the real clean data window.
- 7. The method for correlation analysis and mining of large model-based program code and industrial data according to claim 6, wherein in step S3, the quality-aware meta-controller analyzes the dynamic quality vector sequence and outputs fine tuning parameters or policy instructions for correlation analysis program code model as follows: The quality-aware meta-controller firstly uses a dynamic quality vector sequence based on an attention mechanism to process and generate a quality state semantic representation, then a decision logic unit in the quality-aware meta-controller generates a regulation instruction aiming at the associated analysis program code model based on the quality state semantic representation, wherein the regulation instruction comprises two forms, namely a first form is a group of increment values for adjusting trainable parameters in the associated analysis program code model, namely fine tuning parameters, each independent numerical value in the group corresponds to a correction amount of weight or bias of a network layer in the associated analysis program code model, and a second form is a selective instruction for controlling operation logic of the associated analysis program code model, namely a strategy instruction, which is used for selecting one activation from a plurality of preset processing paths in the associated analysis program code model according to the quality condition indicated by the dynamic quality vector and simultaneously bypassing other paths.
- 8. The method for analyzing and mining the correlation between the program code and the industrial data based on the large model as set forth in claim 7, wherein the specific process of adaptively adjusting the analysis behavior according to the quality state characterized by the dynamic quality vector and generating the model performance index is as follows: The association analysis program code model receives and applies fine tuning parameters or strategy instructions output by the quality-aware meta-controller while receiving the repaired data window; If the received instruction is a trimming parameter, the association analysis program code model adds a group of basic parameters stored in the association analysis program code model with corresponding numerical values in the received trimming parameter one by one so as to obtain a group of updated model parameters; According to the valid bit indicated by the code, the association analysis program code model selects one function or a group of functions corresponding to the valid bit from a plurality of preset and different feature extraction functions or data fusion functions in the association analysis program code model to form a sub-process activation, and reconstructs the internal data processing path according to the sub-process activation, and bypasses other feature extraction functions or data fusion functions which are not indicated by the code; after the task is executed, a quantized model performance index is generated; The quality-aware meta-controller realizes the function by optimizing the parameters of the quality-aware meta-controller, and the optimization goal is to minimize a total loss function, wherein the total loss function is formed by adding two parts, namely, the first part is called instant task loss, the value of the first part is the loss generated by performing association analysis and excavation on the repaired data window after the association analysis program code model is applied with the regulation and control instruction output by the quality-aware meta-controller; The second part is called a performance stability regular term, which is defined as an expected value of variance of a model performance index of a correlation analysis program code model after a control instruction output by a quality-aware meta-controller is applied on a dynamic quality vector sequence formed by historical dynamic quality vectors, and the performance stability regular term is multiplied by a preset positive coefficient before addition in a total loss function.
- 9. The method for analyzing and mining the correlation between the program code and the industrial data based on the large model as set forth in claim 8, wherein in the step S4, the reinforcement learning agent uses the dynamic quality vector sequence, the model performance index and the historical actions of the reinforcement learning agent as the specific operations of the state: The state is composed of three parts of information, wherein the first part of information is a dynamic quality vector sequence which is formed by the dynamic quality vectors of the last L continuous moments and comes from the step S1; The second part of information is a model performance index sequence which is formed by model performance indexes generated by the associated analysis program code model at the last K continuous moments and comes from the step S3; The third part of information is a historical action sequence formed by historical actions executed by the reinforcement learning agent at the latest M continuous moments, and the historical action sequence records past decision behaviors of the reinforcement learning agent.
- 10. The method for correlation analysis and mining of large model-based program code and industrial data according to claim 9, wherein the long-term cumulative return on comprehensive analysis accuracy and system stability index is as a total rewards concrete process comprising: The total prize is calculated from three parts: the first part is called instant analysis precision rewards, and the numerical value of the rewards is positively correlated with a single model performance index newly generated by a correlation analysis program code model after an action is executed; The second part is called multiscale stability rewards, the values of which are positively correlated with the smoothness and trend robustness of a model performance index sequence generated by a correlation analysis program code model in the last continuous time period, when the rewards are calculated, smoothing processing is respectively carried out on the model performance index sequence for a plurality of different time periods, the different time periods form a preset window width set, for each window width, a corresponding smoothed sequence under the window width is obtained by calculating the moving average value of data points in the window width, then the difference degree between the sequences obtained by the smoothing processing of different window widths is calculated, the specific calculation process is that the squares of the differences between the data point values corresponding to each two different smoothed sequences at the same moment are calculated, and the squares of all the moments are summed to obtain a difference square sum; The third part is called an action amplitude penalty term, the value of which is inversely related to the amplitude of the action vector executed by the reinforcement learning agent; Finally, the total rewards are weighted and summed after the instant analysis precision rewards and the multiscale stability rewards are multiplied by a first preset weight coefficient and a second preset weight coefficient respectively.
Description
Large-model-based program code and industrial data association analysis and mining method Technical Field The invention relates to the field of industrial data association analysis and mining, in particular to a large-model-based program code and industrial data association analysis and mining method. Background In the quality control link of an intelligent manufacturing factory, a sensor network widely deployed in a production line continuously collects multi-dimensional time sequence industrial data such as temperature, pressure, vibration and the like, the data are core basis of production system state sensing, process parameter optimization and equipment predictive maintenance, a typical control flow is that a background analysis system relies on a preset program code model to conduct real-time correlation analysis and mode mining on the industrial data flowing in real time, so that an abnormal mode of weak symptoms of equipment performance degradation or process deviation is identified in advance, and therefore an early warning or automatic adjustment instruction is triggered, however, the industrial field environment is complex, internal factors such as sensor precision drift, data transmission network congestion, external factors such as electromagnetic interference, mechanical vibration and equipment long-term operation aging in the production environment commonly cause continuous missing values, high-frequency noise interference and non-working condition related outliers formed by the collected original data packet loss, and the inherent data quality problems are caused, and the reliability of the whole quality control link is directly affected by the input premise of the analysis model and the program code constructed based on clean and complete data hypothesis. In the prior art, in order to cope with the quality problem of industrial data, an independent preprocessing link is usually adopted before the data enter an analysis model, common methods include moving average filtering and median filtering based on traditional mathematical statistics to smooth noise, or simple threshold rules and regression interpolation methods are applied to process abnormal values and missing values, and although the methods improve the regularity of the data to a certain extent, the limitations of the methods are increasingly prominent, firstly, the methods are mainly static and fixed rule processing, and are difficult to adaptively cope with complex pollution modes with high-dimensional, nonlinear and time-varying characteristics of the industrial data, secondly, the preprocessing steps and analysis program codes are often decoupled, namely, a data cleaning strategy cannot be dynamically adjusted according to real-time feedback and performance of a subsequent specific analysis task, for example, the filtering algorithm may generally smooth all high-frequency signals, but exactly filters out characteristic frequency components which are critical to specific cutter wear pre-warning, and the preprocessing and analysis model is respectively a political framework, so that the system is stiff and inefficient when facing complex and changeable data quality problems, and the core problem is that a mechanism capable of deeply understanding industrial data context semantics and evaluating quality conditions in real time and performing intelligent collaboration and dynamic adaptation with upper-layer analysis program codes is lacked, and as a result, an analysis program which depends on low-quality data input is extremely easy to generate false alarms and false alarms, not only causes preventive maintenance failure, causes unplanned shutdown and production resource waste, but also is more likely to cause safety accidents due to failure in timely identifying critical equipment fault precursors, and causes obvious economic loss. Disclosure of Invention Aiming at the technical problems in the prior art, the invention provides a large-model-based program code and industrial data association analysis and mining method, so as to solve the problems in the background art. The technical scheme for solving the technical problems comprises the following steps: Step S1, responding to input of an original industrial data stream from a sensor network and associated environmental context data by a quality evaluation module, carrying out sliding window segmentation on the original industrial data stream to obtain a data window to be evaluated, parallelly calculating a plurality of basic quality indexes of the data window, including a deletion rate, a signal-to-noise ratio and a statistical anomaly score, calculating the uncertainty of a prediction error of a current data window by using a lightweight prediction model pre-trained based on a historical segment of the original industrial data stream as a context deviation degree, combining the deletion rate, the signal-to-noise ratio, the statistical anomaly score and the context deviation degr