CN-122019973-A - Method and system for cleaning and acquiring AI modeling data based on probability strategy
Abstract
The application discloses a method and a system for cleaning and acquiring AI modeling data based on a probability strategy, and relates to the technical field of industrial time sequence data processing and artificial intelligence modeling; the method comprises the steps of establishing a time sequence statistical model containing trend items, carrying out parameter estimation on the time sequence statistical model based on a probability strategy, calculating probability weights corresponding to data points, carrying out weighted modeling on the time sequence statistical model to obtain cleaned time sequence representation, extracting trend characteristic parameters reflecting the system running state corresponding to target time sequence data based on the cleaned time sequence representation, judging the dynamic state and the steady state of the target time sequence data according to the trend characteristic parameters to obtain cleaned dynamic data and cleaned steady state data, and realizing classified acquisition of the dynamic data and the steady state data of the cleaned data. The application can improve the precision, efficiency and adaptability of data processing.
Inventors
- TAO SHAOHUI
- WANG XIJUN
- ZHANG XILONG
- ZHAO JUN
- SUN XIAOYAN
- LI ZHENGYONG
- XIA LI
- XIANG SHUGUANG
Assignees
- 青岛科技大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260128
Claims (10)
- 1. The method for cleaning and acquiring the AI modeling data based on the probability strategy is characterized by comprising the following steps: Acquiring target time sequence data; constructing a time sequence statistical model containing trend items according to the target time sequence data, wherein the time sequence statistical model is used for describing trend characteristics and random disturbance characteristics of different data points along with time variation; Carrying out parameter estimation on the time sequence statistical model based on a probability strategy, and calculating probability weights corresponding to all data points; according to the probability weight corresponding to each data point, carrying out weighted modeling on the time sequence statistical model, so that abnormal data points are adaptively weakened in the model parameter estimation process, and a cleaned time sequence representation is obtained; Extracting trend characteristic parameters reflecting the system running state corresponding to the target time sequence data based on the cleaned time sequence representation; And judging the dynamic state and the steady state of the target time sequence data according to the trend characteristic parameters to obtain cleaned dynamic data and cleaned steady state data, and realizing classified acquisition of the dynamic data and the steady state data of the cleaned data.
- 2. The method for cleaning and acquiring AI modeling data based on probability policies according to claim 1, wherein constructing a time series statistical model including trend terms according to the target time series data specifically comprises: segmenting the target time sequence data by adopting a sliding window method to obtain a plurality of time windows; And respectively constructing a time sequence statistical model containing trend items based on each time window.
- 3. The method for cleaning and acquiring AI modeling data based on probability policies according to claim 1, wherein parameter estimation is performed on the time sequence statistical model based on probability policies, and probability weights corresponding to data points are calculated, specifically comprising: based on the degree of each data point deviating from the time sequence statistical model, respectively calculating the abnormal probability or the trusted probability corresponding to each data point; And respectively determining probability weights of the data points in the model parameter estimation process according to the abnormal probability or the trusted probability corresponding to the data points.
- 4. The probabilistic policy-based AI modeling data cleaning and retrieval method of claim 3, wherein the probabilistic weights satisfy the following characteristics: the smaller the degree of any one data point deviates from the time sequence statistical model, the larger the probability weight of the data point; the greater the degree to which any one of the data points deviates from the time series statistical model, the less the probability weight of the data point.
- 5. The method for cleaning and acquiring AI modeling data based on a probability strategy of claim 1, wherein the trend feature parameters include trend slope parameters and their corresponding statistical significance indicators; According to the trend characteristic parameters, judging the dynamic state and the steady state of the target time sequence data to obtain cleaned dynamic data and cleaned steady state data, and realizing classified acquisition of the dynamic data and the steady state data of the cleaned data, specifically comprising the following steps: judging whether the data of the target time sequence data in a time window show a significant change trend or not according to the trend slope parameter and the corresponding statistical significance index; when the data of the target time sequence data in the time window does not have a significant change trend, the corresponding data is judged to be steady-state data, and the cleaned steady-state data is obtained; when the data of the target time sequence data in the time window has a significant change trend, the corresponding data is judged to be dynamic data, and the cleaned dynamic data is obtained.
- 6. The method for cleaning and acquiring AI modeling data based on probability policies according to claim 1, wherein the cleaned dynamic data is used to construct a dynamic artificial intelligence model, and the dynamic artificial intelligence model comprises a recurrent neural network model, a long-short-term memory network model or other artificial intelligence models with dynamic data processing capability; the cleaned steady-state data is used for constructing a steady-state artificial intelligent model, and the steady-state artificial intelligent model comprises a support vector regression model, a multi-layer perceptron model, a Gaussian process model or other artificial intelligent models with steady-state data processing capability.
- 7. The method for cleaning and acquiring AI modeling data based on probability policies according to claim 2, wherein in the sliding window method, the window size and the step size of a sliding window are adaptively adjusted or manually set according to the characteristics of the target time sequence data and the actual application scene.
- 8. The probabilistic policy-based AI modeling data cleansing and retrieval method of claim 1, wherein the target time series data comprises at least one of data collected during an industrial operation, financial domain time series data, environmental monitoring time series data, medical monitoring time series data; The data collected during the industrial operation process comprises at least one of temperature, pressure, flow, liquid level, concentration, voltage and current.
- 9. The method for cleaning and acquiring AI modeling data based on probability policies according to claim 1, wherein when parameter estimation is performed on the time series statistical model, a likelihood function is adopted in combination with an iterative optimization algorithm, and the iterative optimization algorithm comprises at least one of an EM algorithm, a gradient descent algorithm and a newton iterative algorithm.
- 10. A system for cleaning and acquiring AI modeling data based on a probability policy, wherein the system for cleaning and acquiring AI modeling data based on a probability policy is configured to implement the method for cleaning and acquiring AI modeling data based on a probability policy according to any one of claims 1 to 9, the system for cleaning and acquiring AI modeling data based on a probability policy comprising: The data acquisition module is used for acquiring target time sequence data; The time sequence statistical model construction module is used for constructing a time sequence statistical model containing trend items according to the target time sequence data, wherein the time sequence statistical model is used for describing trend characteristics and random disturbance characteristics of different data points changing along with time; the parameter estimation module is used for carrying out parameter estimation on the time sequence statistical model based on a probability strategy and calculating probability weights corresponding to all data points; The weighted modeling cleaning module is used for performing weighted modeling on the time sequence statistical model according to the probability weights corresponding to the data points, so that abnormal data points are adaptively weakened in the model parameter estimation process, and cleaned time sequence representation is obtained; The trend characteristic parameter extraction module is used for extracting trend characteristic parameters reflecting the system running state corresponding to the target time sequence data based on the cleaned time sequence representation; And the dynamic and steady state judging module is used for judging the dynamic state and the steady state of the target time sequence data according to the trend characteristic parameters to obtain cleaned dynamic data and cleaned steady state data, and realizing classified acquisition of the dynamic data and the steady state data of the cleaned data.
Description
Method and system for cleaning and acquiring AI modeling data based on probability strategy Technical Field The application relates to the technical field of industrial time sequence data processing and artificial intelligence modeling, in particular to a method and a system for cleaning and acquiring AI modeling data based on a probability strategy. Background In the artificial intelligence modeling process, the data quality directly determines the performance and reliability of the model, and data cleaning and dynamic and steady-state data classification are key pre-steps for constructing a high-quality AI model. However, currently existing data cleansing and status classification methods mainly include linear regression, moving average, threshold detection, etc. Although the method can identify partial abnormal data and steady-state sections to a certain extent, the comparison relies on manually set thresholds or priori knowledge, so that the data processing efficiency is low. In addition, the method cannot accurately distinguish effective data from abnormal data, cannot accurately distinguish dynamic data from steady-state data, and has low data processing precision. In addition, the method has insufficient applicability, and is difficult to meet the processing requirements of different fields and different types of time series data. Therefore, how to improve the accuracy, efficiency and adaptability of data processing and provide a high-quality data base for artificial intelligent modeling becomes a technical problem to be solved in the field. Disclosure of Invention The application aims to provide a method and a system for cleaning and acquiring AI modeling data based on a probability strategy, which can improve the accuracy, efficiency and adaptability of data processing. In order to achieve the above object, the present application provides the following. In a first aspect, the present application provides a method for cleaning and acquiring AI modeling data based on a probability policy, where the method for cleaning and acquiring AI modeling data based on a probability policy includes the following steps. Target time-series data is acquired. And constructing a time sequence statistical model containing trend items according to the target time sequence data, wherein the time sequence statistical model is used for describing trend characteristics and random disturbance characteristics of different data points along with time change. And carrying out parameter estimation on the time sequence statistical model based on a probability strategy, and calculating probability weights corresponding to the data points. And carrying out weighted modeling on the time sequence statistical model according to the probability weights corresponding to the data points, so that abnormal data points are adaptively weakened in the model parameter estimation process, and the cleaned time sequence representation is obtained. And extracting trend characteristic parameters reflecting the system running state corresponding to the target time sequence data based on the cleaned time sequence representation. And judging the dynamic state and the steady state of the target time sequence data according to the trend characteristic parameters to obtain cleaned dynamic data and cleaned steady state data, and realizing classified acquisition of the dynamic data and the steady state data of the cleaned data. Optionally, constructing a time sequence statistical model containing trend items according to the target time sequence data, and specifically comprising the following steps. And segmenting the target time sequence data by adopting a sliding window method to obtain a plurality of time windows. And respectively constructing a time sequence statistical model containing trend items based on each time window. Optionally, parameter estimation is performed on the time sequence statistical model based on a probability strategy, and probability weights corresponding to all data points are calculated. Based on the degree of each data point deviating from the time sequence statistical model, the abnormal probability or the trusted probability corresponding to each data point is calculated respectively. And respectively determining probability weights of the data points in the model parameter estimation process according to the abnormal probability or the trusted probability corresponding to the data points. Optionally, the probability weights satisfy the following characteristics. The smaller the degree of deviation of any one data point from the time series statistical model, the larger the probability weight of the data point. The greater the degree to which any one of the data points deviates from the time series statistical model, the less the probability weight of the data point. Optionally, the trend characteristic parameter includes a trend slope parameter and a corresponding statistical significance index thereof. And judging the dy