Search

CN-122019991-A - Wind power data-oriented multi-scale contrast graph neural network time sequence interpolation method

CN122019991ACN 122019991 ACN122019991 ACN 122019991ACN-122019991-A

Abstract

The invention relates to a wind power data-oriented multi-scale contrast graph neural network time sequence interpolation method, which comprises the following steps: step 1: data preprocessing and graph structure construction, wherein step 2: constructing a multiscale contrast graph neural network model, and constructing an end-to-end neural network model integrating multiscale modeling, contrast learning and graph structure information, wherein the step 3 is as follows: model training, step 4: interpolation and output of missing data; according to the method, the end-to-end framework integrating the multi-scale pyramid attention, the seasonal-trend comparison learning and the graph neural network is constructed, so that the high-precision and high-efficiency interpolation of missing data of the wind power time sequence is realized, and the method is particularly suitable for long-sequence and multivariable wind power scenes.

Inventors

  • GU HUIJIE
  • HU YAPING
  • Lai Kaiting
  • ZHOU HUAFENG
  • Fan tengfei
  • LI JIN

Assignees

  • 中国南方电网有限责任公司

Dates

Publication Date
20260512
Application Date
20251230

Claims (7)

  1. 1. A wind power data-oriented multi-scale contrast graph neural network time sequence interpolation method is characterized by comprising the following steps: step 1, data preprocessing and graph structure construction, Step 2, constructing a multiscale contrast graph neural network model, constructing an end-to-end neural network model which fuses multiscale modeling, contrast learning and graph structure information, Step 3, training a model, And 4, interpolation and output of missing data.
  2. 2. The method for time series interpolation of the multi-scale contrast graph neural network for wind power data according to claim 1 is characterized in that the method comprises the following steps of data preprocessing and graph structure construction, 1-1, Data acquisition and standardization, namely acquiring the time series data of the wind speeds, the powers and the wind directions of a plurality of fans in a wind power plant, wherein the sampling frequency is usually 10 minutes or 30 minutes, carrying out standardization processing on the original data, adopting Z-score standardization to eliminate dimension differences, Wherein the method comprises the steps of Is a normalized data value, x i,t,d represents an original measured value of a d variable of an ith fan at a t time point, mu d 、σ d is a mean value and a standard deviation of d variables in all fans and time respectively, N is the number of fans, D is a variable of the fan, T is a time length, 1-2, Missing data simulation and division, namely, for training a model, introducing missing into a complete data set according to random block (point) or point (point) with a certain probability (2.5%), and generating a corresponding mask matrix M epsilon {0,1} N×T×D : The method comprises the steps of randomly setting m i,t,d =0 according to a probability p p , randomly selecting a starting time t 0 and a length l-uniformity (l min ,l max ) according to a probability p b , setting continuous l time points as the missing, and dividing a data set into a training set, a verification set and a test set according to the proportion of 70% to 10% to 20%; 1-3, constructing a space adjacency matrix, namely calculating Euclidean distance d ij =||p i -p j I by using geographical coordinates among fans and fan geographical coordinates p i =(lat i ,lon i and constructing the space adjacency matrix The threshold gaussian kernel method is as follows: Wherein b is bandwidth parameter, r 0 is connection radius, and the Pelson correlation coefficient of each pair of fans on the cycle correlation is calculated by k neighbor correlation graph method Taking Top-k to construct an adjacency matrix: Final adjacency matrix normalization: Wherein the method comprises the steps of For the normalized adjacency matrix, A is an N×N matrix, which represents the connection relationship among N nodes in the graph, and S is a degree matrix, so as to capture the spatial dependency relationship among fans.
  3. 3. The wind power data-oriented multi-scale contrast map neural network time sequence interpolation method according to claim 2 is characterized in that the multi-scale contrast map neural network model is constructed in the following steps, Step 2-1 Multi-scale pyramid attention encoder (PAM Encoder) Setting the length of an input sequence as L, dividing the input sequence into N s scales by PAM, gradually compressing the length of the sequence by a downsampling factor s k in each scale to ensure that the sequence of the k-layer scale is Wherein the method comprises the steps of And (3) s is a step factor, modeling local dependence by adopting an adjacent tree attention mechanism in each scale, and performing feature aggregation among scales through hierarchical jump connection to realize cross-scale information interaction, wherein pyramid attention output is defined as follows: Z (k) =MSA(LN(W q Q (k) ,W k K (k) ,W v V (k) ))+DownSample(Z (k-1) ) (5) Wherein MSA is multi-head self-attention, LN is layer normalization, downSample (-) is maximum pooling or convolution downsampling, jump connection realizes O (1) signal propagation path, total time complexity is O (L), which is obviously superior to O (L 2 ) of a transducer, step 2-2 is a season-trend comparison learning module, (1) Introducing a Trend Feature Decoupler (TFD) and a Seasonal Feature Decoupler (SFD) based on the intermediate representation of the encoder output; (2) Trend Feature Decoupler (TFD) modeling trend using hybrid autoregressive expert network: wherein E is expert number, w e (·) is gating weight, Extracting time domain trend characteristics for an e-th autoregressive expert through TFD, and optimizing through time domain contrast loss (L time ) so as to keep consistent trend characterization after different enhancement of the same time sequence; (3) Seasonal Feature Decoupler (SFD) extracting frequency domain seasonal features using a learnable Fourier layer: Wherein the method comprises the steps of In the form of a fourier transform, And The amplitude spectrum and the phase spectrum are respectively, the MLP is a multi-layer perceptron, and the robustness of the seasonal pattern in the frequency domain is ensured through the optimization of the frequency domain contrast loss (comprising the amplitude loss L amp and the phase loss L phase ; (4) And finally, accurately aligning the trend of the time sequence data with the seasonal characteristic by the time corresponding relation of the time domain constraint trend characteristic and the consistency of the amplitude and the phase of the frequency domain constraint seasonal characteristic, and distinguishing the matched time sequence sample pair from the unmatched time sequence sample pair. Time domain contrast loss, acting on trend characterization: Wherein the method comprises the steps of A vector is embedded for the trend of sample 1, Embedding vectors for the trend of the kth sample, wherein tau is a temperature parameter, and frequency domain contrast loss is used for season characterization: Wherein, the amplitude loss is as follows: Phase loss: The total contrast loss is: Wherein gamma is the equilibrium super-parameter, Step 2-3 graphic neural network fusion module (Graph Fusion Module) Hidden state of each fan in time dimension As graph node characteristics: Spatial information aggregation is performed by adopting a two-layer graph rolling network (GCN): where W (0) ,W (1) is a matrix of leachable weights, σ is an activation function, To normalize the adjacency matrix, a final spatio-temporal joint representation is output
  4. 4. The wind-power-data-oriented multi-scale contrast graph neural network time sequence interpolation method according to claim 2, wherein the step 3 is model training, specifically as follows, (1) Main interpolation loss function: Setting the model output interpolation value as The true value is X, and the loss is calculated only at the observation position: Wherein Ω= { (i, t, d) |m i,t,d =1 } is an observation set, robust against outliers is guaranteed using MAE; (2) The loss function is combined with the sum of the loss functions, The total loss is the weighted sum of the reconstruction loss and the contrast loss: wherein lambda >0 is a weight coefficient, controls the contribution of contrast learning, (3) The training strategy comprises the steps of using an Adam or AdamW optimizer, setting an initial learning rate to be le-4, adopting a cosine annealing learning rate scheduler, setting 200 to be the training round number, adopting an early stopping mechanism based on the performance of a verification set, additionally randomly masking 5% values of input data during training to enhance the robustness of a model to noise and loss, and adopting a pre-training strategy for a single-step prediction task, wherein a network is constructed as a self-encoder in the initial round, and reconstruction and prediction are carried out simultaneously to force the network to learn long-range dependence.
  5. 5. The wind-power-data-oriented multi-scale contrast graph neural network time sequence interpolation method according to claim 2, wherein the step 4 is to interpolate and output missing data, specifically as follows, Inputting a to-be-interpolated wind power sequence χ miss containing a deficiency into a trained MSC-GNI model, and outputting a complete reconstructed sequence after coding, contrast learning, graph fusion and decoding The decoder is a simple full-connection layer: where v i is the final characterization of the ith fan.
  6. 6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements a method of multi-scale contrast graph neural network time series interpolation for wind power data according to any of the preceding claims 1 to 5 when executing the program.
  7. 7. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method of wind data oriented multiscale contrast graph neural network time series interpolation according to any of claims 1 to 5.

Description

Wind power data-oriented multi-scale contrast graph neural network time sequence interpolation method Technical Field The invention belongs to the technical field of artificial intelligence and energy big data processing, and particularly relates to a time sequence missing value interpolation method for wind power data. Background With the propulsion of the 'two carbon' target, the duty ratio of wind power generation in the novel electric power system is continuously increased. However, because wind farms are usually deployed in severe environments such as remote, high altitude or offshore, sensors are susceptible to factors such as corrosion, freezing, communication interruption and the like, so that a great number of random defects, block defects and even systematic fault defects exist in the acquired key operation parameters such as wind speed, power, wind direction, temperature and the like. These incomplete data severely limit the accuracy of advanced applications such as power prediction, fault diagnosis, state assessment, etc. Existing time series interpolation methods mainly include statistical methods (e.g., ARIMA, linear interpolation) and methods based on deep learning (e.g., RNN, transformer). The statistical method assumes that the data has strong stationarity, and is difficult to capture complex nonlinear dynamics and multi-scale periodicity (such as daily period and seasonal fluctuation) in wind power data. RNN-based models (e.g., BRITS) can handle sequence dependencies, but have limited modeling capabilities for long-range dependencies, and the computational path length increases linearly with sequence growth, with lower efficiency. Although the model based on the transducer can capture long-distance dependence through a self-attention mechanism, the computational complexity is as high as O (L 2), and huge computation and memory overhead are faced when long-sequence wind power data are processed. In recent years, sparse attention mechanisms (e.g., informer, logTrans) and pyramid structures (e.g., pyraformer) have been proposed to reduce computational complexity. Meanwhile, contrast learning (such as CoST) in self-supervision learning constructs positive and negative samples through data enhancement, robust representation of constant disturbance is learned, and generalization capability of the model is improved. In addition, the spatial positions of a plurality of fans in the wind power plant have geographic correlation, and the operation data of the fans have spatial dependence, but the existing interpolation method mostly ignores the structural information of the graph. Therefore, a wind power time sequence interpolation method capable of effectively fusing time multi-scale characteristics and space diagram structure information and having high-efficiency long sequence modeling capability and robust characterization learning is needed. Disclosure of Invention The invention aims to overcome the defects of the prior art and provide a multi-scale contrast graph neural network time sequence interpolation method for wind power data. According to the method, the end-to-end framework integrating multi-scale pyramid attention, season-trend comparison learning and graph neural network is constructed, so that high-precision and high-efficiency interpolation of missing data of the wind power time sequence is realized, and the method is particularly suitable for long-sequence and multivariable wind power scenes. The technical scheme is that in order to achieve the purpose, the invention provides a multi-scale contrast graph neural network time sequence interpolation method for wind power data, which comprises the following steps: Step 1, data preprocessing and graph structure construction (1) And (3) data acquisition and standardization, namely acquiring multi-variable time series data such as wind speeds, power, wind directions and the like of a plurality of fans in the wind power plant, wherein the sampling frequency is usually 10 minutes or 30 minutes. The original data are standardized, and the dimension difference is eliminated by adopting Z-score standardization: Wherein the method comprises the steps of Is a normalized data value, x i,t,d represents an original measured value of a D variable of an ith fan at a T time point, mu d、σd is a mean value and a standard deviation of D-th variables in all fans and time respectively, N is the number of fans, D is a variable of a fan, and T is a time length. (2) The missing data simulation and division is that, for training models, the missing is introduced into a complete data set according to random block (point) or point (point) with a certain probability (2.5%), and a corresponding mask matrix M epsilon {0,1} N×T×D is generated: The method comprises the steps of randomly setting m i,t,d =0 according to a probability p p, randomly selecting a starting time t 0 and a length l-uniformity (l min,lmax) according to a probability p b, setting continuous l ti