CN-121981325-A - Mixed time sequence network electric energy consumption prediction method for reinforcement learning dynamic calibration

CN121981325ACN 121981325 ACN121981325 ACN 121981325ACN-121981325-A

Abstract

A method for predicting the electric energy consumption of a hybrid time sequence network includes such steps as collecting the time sequence data of multi-variable electric energy consumption, preprocessing the data by resampling, feature engineering, normalization and sliding window technique, generating a supervised learning sample set, dividing it into training set, verification set and test set, building the hybrid prediction model including dynamic capture module, long-term dependence modeling module, regression prediction module and reinforced learning dynamic fine tuning module, training and optimizing to obtain the model, real-time collecting the time sequence data of multi-variable electric energy consumption, preprocessing, real-time predicting the total consumption in future by the model, and outputting the final prediction result. The method not only can realize accurate and efficient prediction of the power grid load, but also can provide a reliable technical solution for the scenes of power system dispatching optimization, demand side management, market transaction and the like.

Inventors

JIANG XIAOJING
SU RUIXIAN
ZHONG YUMENG
XUE JINGYANG
DONG SHIJIAN
YANG SHIYU
XU HUIMIN
LIU XIAONA
SHAO YIQIN

Assignees

沈阳华盛冶金技术与装备有限责任公司

Dates

Publication Date: 20260505
Application Date: 20260120

Claims (8)

1. A method for predicting the power consumption of a hybrid time sequence network by reinforcement learning dynamic calibration is characterized by comprising the following steps of; Firstly, constructing and preprocessing a data set; Collecting multivariable power consumption time sequence data, completing data preprocessing through resampling, feature engineering, normalization and sliding window technology, generating a supervised learning sample set, and dividing the supervised learning sample set into a training set, a verification set and a test set according to time sequence; Step two, constructing a mixed prediction model fusing long-term features and dynamic deviation correction decisions; s21, constructing a dynamic capture module by utilizing a two-way long-short-term memory network, and capturing short-term local dependence and context information of time sequence data in two directions; s22, introducing a transducer encoder to construct a long-term dependence modeling module, and capturing global correlation of time-span scales in time sequence data; s23, constructing a regression prediction module based on a multi-layer perceptron network, and mapping the global feature sequence into a preliminary power consumption prediction value; s24, constructing a reinforcement learning dynamic fine adjustment module based on a depth Q network, carrying out self-adaptive calibration on the preliminary predicted value, and correcting the predicted deviation; S25, training and optimizing by adopting a staged training strategy, performing joint optimization on the dynamic capture module, the long-term dependence modeling module and the regression prediction module through the training of the first stage, and optimizing the reinforcement learning dynamic fine adjustment module through the training of the second stage to ensure the parameter convergence of each module, and finally obtaining a mixed prediction model; Step three, predicting the power consumption in real time; And collecting the multivariable power consumption time sequence data in real time, preprocessing, taking the preprocessed data as input, utilizing a hybrid prediction model to predict the total consumption in the future in real time, and outputting a final prediction result after dynamic fine adjustment.
2. The method for predicting power consumption of a hybrid time-series network for reinforcement learning dynamic calibration of claim 1, wherein the data set construction and preprocessing in step one is as follows: S11, collecting high-frequency power consumption original data comprising multiple customer power consumption and corresponding time stamps, and ensuring that the data cover complete power consumption scenes and time dimensions; S12, resampling and aggregating the data, namely resampling and aggregating the collected high-frequency original data according to the hour, smoothing short-term random noise with the minute level and below, highlighting the periodicity rules of the hour, day and Zhou Weidu, and reducing the data redundancy bottom; s13, periodically constructing periodic characteristics based on time stamp information, respectively converting the daily period and Zhou Zhouqi variable into two pairs of continuous characteristics by using a formula (1) and a formula (2) through sine/cosine coding, and combining the continuous characteristics with the power consumption data of the clients to form a multidimensional characteristic matrix; (1); (2); In the formula, For the sinusoidal code feature values, Is the current time value; for the cosine-encoded eigenvalues, Is the period length; S14, carrying out data normalization processing, namely carrying out normalization processing on all feature data in the multidimensional feature matrix by adopting a Min-Max scaler, mapping all values to a [0,1] interval, and eliminating the influence of different feature dimension differences on model training; and S15, sampling by sliding windows, rolling and sampling on the normalized time sequence characteristic sequence by adopting a fixed-length sliding window technology, sliding the windows for one time step each time, wherein each window corresponds to a group of sample pairs, historical time sequence data in the window is used as an input characteristic, and the total electricity consumption of the next time step outside the window is used as a target value, so as to form a supervised learning sample set.
3. The method for predicting power consumption of a hybrid time-series network by reinforcement learning dynamic calibration according to claim 2, wherein in step S21, the process of constructing the dynamic capturing module is as follows: S21-1, taking a two-way long-short-term memory network as a core, wherein the two-way long-term memory network comprises a forward long-term memory network and a backward LSTM, and the forward LSTM positively processes an input sequence from the beginning to the end of a time sequence Backward LSTM processes input sequence in reverse from time sequence end to start The dual-direction context information capture is realized; S21-2 each LSTM cell passes through a forgetting gate Input door And an output door The composed gating mechanism is dynamically updated according to the formula (3) and the formula (4) respectively Cell state at time And hidden state ; (3); (4); In the formula, , The function is activated for Sigmoid, 、、、 Respectively a forgetting gate, an input gate, a cell state and a leavable weight of an output gate, 、、、 A learnable bias vector of a forgetting gate, an input gate, a cell state and an output gate respectively, Is that The hidden state vector of the moment in time, Is that The input feature vector for the moment of time, For the vector concatenation operation, For the multiplication on an element-by-element basis, Is that Cell state vector at time; s21-3 hidden state sequence outputting Forward LSTM And hidden state sequence output by backward LSTM Splicing one by one according to time steps to obtain an enhanced feature representation sequence fused with the dual-context information As input to the subsequent module.
4. A method for predicting power consumption of a hybrid time-series network by reinforcement learning dynamic calibration according to claim 3, wherein in step S22, the long-term dependence modeling module processes are as follows: S22-1, taking an enhanced feature sequence H output by a dynamic capture module as input, and respectively generating a query matrix Q, a key matrix K and a value matrix V through three independent linear projection layers to ensure that the three dimensions are consistent so as to meet the attention calculation requirement; s22-2, calculating a plurality of attention heads in parallel through a multi-head self-attention mechanism, and capturing long-distance dependence relations from different feature subspaces, wherein each attention head calculates a weighted feature according to a formula (5) ; (5); In the formula, Is a transpose of the key matrix, As the dimension of the key vector, In order for the scaling factor to be a factor, Is a normalization function; s22-3, horizontally splicing output results of all the attention heads according to a formula (6), and performing feature integration through a linear projection layer to obtain final output of a multi-head self-attention mechanism; (6); In the formula, Is the first The output matrix of the individual attention heads, In order to pay attention to the number of heads, For the matrix horizontal stitching operation, A learnable weight matrix for the output projection layer; S22-4, sequentially carrying out residual connection, layer normalization and feedforward neural network on the multi-head attention output to form a complete transducer encoder layer, and strengthening modeling capability of a model on global long-distance dependence through multi-layer encoder layer stacking to output a final global feature sequence.
5. The method for predicting power consumption of a hybrid time-series network by reinforcement learning dynamic calibration according to claim 4, wherein in step S23, the construction process of the regression prediction module is as follows: S23-1, extracting hidden state vector output by last time step of transducer encoder The vector integrates local and global features of the entire historical time series data as a summary representation of the input sequence; s23-2, high-dimensional feature vector Inputting the power consumption data into a multi-layer perceptron network formed by a plurality of fully-connected layers, mapping the power consumption data into a single scalar value, namely a preliminary predicted value of the total power consumption at the next moment through nonlinear transformation As shown in formula (7); (7); In the formula, And The weight vector and scalar offset of the output layer respectively, Output vector for the L-th hidden layer based on A calculation is performed, wherein, And MLP respectively Layer and the first The output vector of the layer concealment layer, And Respectively the first The weight matrix and bias vector of the layer hidden layer, Is a nonlinear activation function.
6. The method for predicting power consumption of a hybrid timing network for reinforcement learning dynamic calibration according to claim 5, wherein in step S24, the reinforcement learning dynamic fine adjustment module is constructed as follows: s24-1, modeling a predictive adaptation process as a Markov decision process, defining the state of the agent Final hidden state vector extracted for regression prediction module As shown in the formula (8), (8); S24-2, designing a limited motion space A= { up-regulation, hold and down-regulation }, corresponding to the preliminary predicted value, containing three discrete motions according to a formula (9) The up-regulation corresponds to positive regulation, the corresponding maintenance is unchanged, and the down-regulation corresponds to negative regulation; (9); In the formula, To pair(s) Performing a fixed step length Is used for the positive adjustment of (a), To keep acting, not to The adjustment is carried out so that the adjustment is carried out, To pair(s) Performing a fixed step length Is adjusted in the negative direction; S24-3, adopting a deep Q network as a strategy network of the intelligent agent to take a state Output as input, the estimated Q value of each discrete action, i.e , wherein, Is a learnable parameter of the DQN, S24-4 constructing a multi-component hybrid bonus function according to equation (10) Guiding the intelligent agent to learn an optimal fine tuning strategy, pursuing the minimization of prediction deviation and avoiding excessive adjustment; (10); In the formula, To be a true value of the value, In order to make a preliminary prediction value, the method comprises, In order to adjust the predicted value of the current value, To balance short term benefits with long term target superparameters.
7. The method for predicting power consumption of a hybrid time-series network for reinforcement learning dynamic calibration according to claim 6, wherein in step S25, the training and optimizing process using the staged training strategy is as follows: s25-1, fixing DQN parameters in the first stage of basic model pre-training, performing combined training optimization on parameters of a dynamic capture module, a long-term dependence modeling module and a regression prediction module by taking mean square error as a loss function, and constructing an objective function according to a formula (11) ; (11); In the formula, As a basis for the predictive model, In order to input the sequence of events, Is a true value; s25-2, in the second stage reinforcement learning training, fixing the parameters of the basic model after the convergence of the training Constructing experience playback pools Storing agent interactions Sample, adopting time sequence difference error as loss function, updating DQN network parameter Wherein the loss function is constructed according to equation (12) ; (12); In the formula, Representing a pool of experience playback for a desired operator The average of the samples in (a) is, For a single sample in the experience playback pool, In the event of a current state, For an action to be performed in the current state, In order to obtain an instant prize after the action is performed, To perform the next state to transition to after the action, The discount factor is a value range of [0,1], To be in all possible next actions Is to select the target The action of the maximum value is that, Is the object of Of a network The value is output and the value is output, Is the object of Parameters of the network.
8. The method for predicting power consumption of a hybrid time-series network by reinforcement learning dynamic calibration according to claim 7, wherein in the third step, the hybrid prediction model is used for predicting the total future consumption in real time, and the process of outputting the final prediction result after dynamic fine adjustment is as follows: s31, resampling, periodic feature engineering and normalization processing are carried out on the multivariable power consumption time sequence data acquired in real time, and real-time feature data meeting the input requirements of a model are generated; S32, inputting the preprocessed real-time characteristic data into a trained mixed prediction model, and outputting a final result after basic prediction and dynamic fine adjustment; S32-1, real-time characteristic data Inputting the basic prediction model to obtain Preliminary predicted value of time of day Simultaneously extracting corresponding final hidden state vector ; S32-2, state vector Inputting the DQN agent after training convergence, selecting according to the Q value maximization principle Time of day optimal action As shown in formula (13); (13); In the formula, To train the converged DQN network optimization parameters, Is in a state of Execute action downwards A kind of electronic device A value; S32-3 according to the optimal action For preliminary predicted values Adjusting to obtain final predicted value Will (i) be And (5) converting the data into an original data scale through Min-Max inverse normalization, and outputting the data as a final power consumption prediction result.

Description

Mixed time sequence network electric energy consumption prediction method for reinforcement learning dynamic calibration Technical Field The invention belongs to the technical field of data processing and intelligent prediction, and particularly relates to a hybrid time sequence network power consumption prediction method for reinforcement learning dynamic calibration. Background Complex short-term local dependencies and nonlinear dynamics are interleaved in the power consumption data. On the microcosmic time scales of minutes, hours and the like, the power consumption is easily disturbed by various random and sudden factors. The instantaneous power consumption of residents, the start and stop of industrial production lines, the sudden peak of commercial activities, the short-term extreme weather of thunderstorms, cold waves and the like can lead to the power load sequence to show obvious high-frequency fluctuation, nonlinear dynamics and strong time sequence dependency characteristics. The accurate capture of such fast-varying local dynamic patterns, which are closely linked to the time-of-day and time-of-day, is the basis for achieving high-precision short-term predictions. Although the conventional recurrent neural network (Recurrent Neural Network, RNN for Short) and its variants (such as Long Short-Term Memory (LSTM for Short)) are designed for sequence data processing, when the conventional recurrent neural network faces extremely Long sequences, the information transmission in the time dimension is easy to attenuate, and the sequential processing mechanism limits the parallel computing efficiency, so that not only the prediction precision is reduced along with the increase of the time dimension, but also the problems of high calculation power requirement, long calculation time and the like exist. Meanwhile, the power system data also contains long-term global association and multiple periodicity laws with extremely large spans. On a macroscopic time scale, the power consumption data exhibit clear and superimposed periodic features of a peak-valley pattern of day-to-day power consumption in "day" units, a difference pattern of weekdays and weekends in "week" units, and a seasonal pattern in "year" units (peak of winter and summer power consumption, slow spring Qiu Ping). There is also a complex global association between nodes at a far distance, for example, the load curve of the current monday is highly similar to the load curve of the current monday before several weeks, and important holidays such as spring festival, national celebration festival and the like can change the current day power consumption mode and can generate profound and non-local global influence on the power consumption in the pre-festival preparation period and the post-festival recovery period. The effective capture of such long-term global dependence with large time span and discontinuity is a key for improving the accuracy and reliability of medium-long term prediction. The LSTM and other sequence models rely on iterative information transfer step by step, natural bottlenecks exist on long-distance dependent modeling, and direct and efficient association is difficult to establish between remote time points. The existing prediction model has the limitation of prediction, namely end point, and lacks the capability of dynamic self-adaptive calibration and correction. Most deep learning prediction models adopt static end-to-end training and prediction paradigms, namely, after the model structure is determined, the internal parameters and prediction logic after training are solidified. However, any model is simplified and approximated to the real world, and is inevitably subject to systematic or random deviations due to inherent limitations of model structure assumptions, noise which cannot be completely removed from training data, and continuous influences of unmodeled hidden variables such as burst system aging, process updating and the like. The traditional error correction means (such as linear regression correction based on historical error statistics, simple moving average filtering and Kalman filtering) have simple models and limited learning ability, and cannot be combined with real-time prediction context information and potential deviation modes to realize intelligent, dynamic and nonlinear adjustment. This results in a significant decrease in robustness of the model in the face of data distribution drift or unknown anomalies, with dramatic deterioration in predictive performance. At present, although the power consumption prediction method system is rich, obvious short plates exist in various methods when the three main challenges are handled. The linear and stationarity assumption of a statistical model (such as an autoregressive integral moving average model (Autoregressive Integrated Moving Average, abbreviated as ARIMA) and an exponential smoothing method) is too severe, nonlinear dynamic characteristics with compli