Search

CN-122020511-A - Time sequence prediction method, system, equipment and medium based on stream batch fusion

CN122020511ACN 122020511 ACN122020511 ACN 122020511ACN-122020511-A

Abstract

The invention is suitable for the field of time sequence prediction, and discloses a time sequence prediction method, a system, equipment and a medium based on stream batch fusion, wherein the method comprises the steps of obtaining a feature set of a multi-variable time sequence, and obtaining a feature subset to be verified by calculating the maximum information coefficient between each feature and a prediction target; the method comprises the steps of carrying out stability test on a feature subset to be verified, analyzing causal dependency relationship by means of the Granges causal test to obtain a causal feature subset, determining an optimal hysteresis order and constructing a vector autoregressive prediction model based on the causal feature subset, carrying out parallel processing on historical batch data and online stream data based on the trained vector autoregressive prediction model, and carrying out weighted fusion on a generated batch processing prediction result and a stream processing prediction result to obtain a time sequence prediction value. According to the method, the time sequence prediction of the fusion of the flow batch is combined through feature selection, so that the high-accuracy low-delay prediction of the time sequence is realized while the input quality of a model is ensured.

Inventors

  • MIAO XINPING
  • ZHANG ZHAO
  • WANG YIZHANG
  • HUANG LIHUANG
  • ZHANG LAN
  • LI WENKE
  • TIAN YUE
  • ZHU CHANGHUI
  • LIU LINJUN
  • Tian Changyuan
  • LIANG XIAOQIAN
  • YANG JIAN
  • SUN SHOUYU

Assignees

  • 贵州电网有限责任公司

Dates

Publication Date
20260512
Application Date
20251226

Claims (10)

  1. 1. A time sequence prediction method based on stream batch fusion is characterized by comprising the following steps: Acquiring a feature set of a multivariable time sequence, and obtaining a feature subset to be verified by calculating the maximum information coefficient between each feature and a predicted target; The causal dependency relationship is analyzed by carrying out stability test on the feature subset to be verified and utilizing the Granges causal test to obtain a causal feature subset; based on the causal feature subset, determining an optimal hysteresis order, constructing a vector autoregressive prediction model, and performing training estimation on model parameters by using historical data; based on the trained vector autoregressive prediction model, respectively carrying out parallel processing on historical batch data and online stream data to obtain a batch processing prediction result and a stream processing prediction result; and carrying out weighted fusion on the batch processing predicted result and the stream processing predicted result to obtain a final time sequence predicted value.
  2. 2. The method for predicting a time series based on flow fusion according to claim 1, wherein obtaining a feature set of a multivariate time series and obtaining a feature subset to be verified by calculating a maximum information coefficient between each feature and a prediction target comprises: based on the feature set, respectively calculating the maximum information coefficient between each feature and the prediction target; Arranging the features according to the maximum information coefficient; and selecting the features within a preset threshold range to form a feature subset to be verified.
  3. 3. The method for predicting a time series based on stream batch fusion as set forth in claim 2, wherein the step of obtaining the causal feature subset by performing a stationarity check on the feature subset to be verified and analyzing causal dependencies by using a glanger causal check includes: based on each feature passing through the stationarity test, respectively constructing a non-limiting autoregressive model containing a feature hysteresis order and a limiting autoregressive model only containing a prediction target hysteresis order; obtaining statistics by calculating prediction errors of the limiting autoregressive model and the non-limiting autoregressive model respectively; when the confidence level corresponding to the statistic meets a preset condition, causal relation exists between the judging features and the prediction targets to form a causal feature subset.
  4. 4. A method of time series prediction based on batch fusion as claimed in claim 3 wherein said determining an optimal hysteresis order comprises: Traversing different hysteresis orders within a preset range; and adopting a red pool information criterion AIC, and selecting a hysteresis order corresponding to the minimum AIC value as an optimal hysteresis order of the vector autoregressive prediction model.
  5. 5. A time series prediction method based on flow batch fusion as set forth in claim 4 wherein the constructing a vector autoregressive prediction model and training and estimating model parameters by using historical data includes: Constructing a vector autoregressive prediction model, wherein the current value of a prediction target is jointly determined by the hysteresis order of the prediction target and the hysteresis order of each feature in a causal feature subset; And carrying out parameter estimation on the coefficient matrix and the intercept vector in the vector autoregressive prediction model by using a least square method to obtain the minimum residual error square sum.
  6. 6. A time series prediction method based on stream batch fusion as defined in claim 5, wherein the obtaining the batch prediction result includes: Utilizing offline stored historical batch data as input and preprocessing; Based on the preprocessed historical batch data, a batch processing prediction result is obtained through calculation of a trained vector autoregressive prediction model.
  7. 7. A time series prediction method based on stream batch fusion as defined in claim 6, wherein the obtaining the stream processing prediction result includes: Using real-time arriving online stream data as input and preprocessing; And carrying out data distribution detection based on the preprocessed online stream data, and calculating to obtain a stream processing prediction result through a trained vector autoregressive prediction model if the concept drift does not occur.
  8. 8. A time series prediction system based on flow fusion, applying the method according to any one of claims 1-7, comprising: The first screening module is used for acquiring a feature set of the multivariate time sequence, and obtaining a feature subset to be verified by calculating the maximum information coefficient between each feature and the prediction target; the second screening module is used for obtaining a causal feature subset by carrying out stability test on the feature subset to be verified and analyzing causal dependency relationship by utilizing the Grangel causal test; The model construction module is used for determining an optimal hysteresis order and constructing a vector autoregressive prediction model based on the causal feature subset, and training and estimating model parameters by utilizing historical data; The parallel prediction module is used for respectively carrying out parallel processing on the historical batch data and the online stream data based on the trained vector autoregressive prediction model to obtain a batch processing prediction result and a stream processing prediction result; and the weighted fusion module is used for carrying out weighted fusion on the batch processing predicted result and the stream processing predicted result to obtain a final time sequence predicted value.
  9. 9. An electronic device, comprising: A memory and a processor; the memory is configured to store computer-executable instructions that, when executed by a processor, implement the steps of the batch fusion-based time series prediction method of any one of claims 1 to 7.
  10. 10. A computer readable storage medium, comprising computer executable instructions stored thereon, which when executed by a processor, implement the steps of a batch fusion based time series prediction method of any one of claims 1 to 7.

Description

Time sequence prediction method, system, equipment and medium based on stream batch fusion Technical Field The invention relates to the field of time sequence prediction, in particular to a time sequence prediction method, a system, equipment and a medium based on stream batch fusion. Background At present, the time series prediction is widely applied to various fields such as energy, medical treatment, weather forecast and the like, and plays a role in guiding production and life. Aiming at the problem of time sequence prediction, a statistical method such as an autoregressive model AR, a differential autoregressive moving average model, a vector autoregressive model VAR and the like can be utilized for processing, and the method has the advantages of simple model, easy training and the like. In contrast, machine learning-based methods have complex models, more parameters, and are difficult to train. Because of the typical characteristics of data in the big data age, there is a high requirement on the storage and processing capabilities of the system. The need for real-time response, efficient computing is becoming increasingly important. In practice, the predicted variables in a real scene are typically not independent but in a system that is interrelated. Determining factors related to predicted variables from among a plurality of influencing factors of the whole system is a primary problem to be solved in multi-variable prediction. Feature selection is introduced in the preprocessing process of time sequence prediction, and factors with strong influence on prediction are screened out, so that the prediction efficiency and accuracy of the model are improved. But would have a negative impact if the feature selection was not appropriate. Such as the introduction of noise, which will lead to a decrease in the accuracy of the predictive model. The removal of features in the original feature set that are not related to the predicted target or redundant features is the purpose of feature selection. The former is to realize data noise reduction, and the latter is to realize data dimension reduction. Too high a data dimension not only increases the running time of the machine learning algorithm, but also may cause problems such as overfitting. Therefore, a time sequence prediction method based on stream-batch fusion needs to be provided, and the real-time performance and the reliability of stream data and batch data are considered, so that the accurate prediction of the time sequence is realized. Disclosure of Invention The present invention has been made in view of the above problems of both aging and reliability. Therefore, the invention provides a time sequence prediction method, a system, equipment and a medium based on stream batch fusion, which solve the problem of difficult selection of the existing characteristics, and only use one of real-time stream data or statically stored batch data for prediction. The two data have different characteristics, stream data continuously arrives, the volume of the data is small, the data quality is low, the reliability is lack only depending on the stream data, the timeliness is lack only depending on the historical data, and the problem of local characteristics is ignored. In order to solve the technical problems, the invention provides the following technical scheme: In a first aspect, the present invention provides a method for predicting a time sequence based on stream batch fusion, including: Acquiring a feature set of a multivariable time sequence, and obtaining a feature subset to be verified by calculating the maximum information coefficient between each feature and a predicted target; The causal dependency relationship is analyzed by carrying out stability test on the feature subset to be verified and utilizing the Granges causal test to obtain a causal feature subset; based on the causal feature subset, determining an optimal hysteresis order, constructing a vector autoregressive prediction model, and performing training estimation on model parameters by using historical data; based on the trained vector autoregressive prediction model, respectively carrying out parallel processing on historical batch data and online stream data to obtain a batch processing prediction result and a stream processing prediction result; and carrying out weighted fusion on the batch processing predicted result and the stream processing predicted result to obtain a final time sequence predicted value. The method for predicting the time sequence based on the stream batch fusion is used for obtaining a feature set of a multi-variable time sequence, and obtaining a feature subset to be verified by calculating the maximum information coefficient between each feature and a predicted target, and comprises the following steps: based on the feature set, respectively calculating the maximum information coefficient between each feature and the prediction target; Arranging the features accord