CN-121983966-A - Single-station photovoltaic power prediction method based on vision and language multi-mode fusion

CN121983966ACN 121983966 ACN121983966 ACN 121983966ACN-121983966-A

Abstract

The invention discloses a single-station photovoltaic power prediction method based on vision and language multi-mode fusion, which relates to the technical field of new energy generation power prediction and comprises the following steps of S1, collecting historical power data, numerical weather forecast data, satellite cloud image data at corresponding time and weather text description data of a target photovoltaic power station; the method fully integrates four mode information of the historical power data, the numerical weather forecast data, the satellite cloud image visual data and the weather text semantic data, breaks through the limitation of single data source of the traditional method, can comprehensively capture various factors influencing the photovoltaic power, provides richer information support for prediction, and remarkably improves the prediction precision.

Inventors

FAN HANG
WANG SHUAIKANG
ZHANG ZIEN
JIA HEPING
XU XIAOFENG
LIU DUNNAN
WANG HANFU
Run Wencai
TAN XIAOWEI

Assignees

华北电力大学（保定）

Dates

Publication Date: 20260505
Application Date: 20260208

Claims (10)

1. The single-station photovoltaic power prediction method based on vision and language multi-mode fusion is characterized by comprising the following steps of: s1, collecting historical power data, numerical weather forecast data, satellite cloud image data at corresponding time and weather text description data of a target photovoltaic power station; S2, carrying out normalization processing on the historical power data and the numerical weather forecast data to form numerical data; s3, extracting time sequence feature vectors in the numerical data based on a time sequence backbone network, dividing a long sequence into local segments through a Patch embedding layer, dividing the local segments into local overlapping patches, and extracting dimension features through linear mapping; s4, respectively extracting visual feature vectors of the satellite cloud image data and semantic feature vectors of the meteorological text description data based on a trained CLIP model; s5, based on an adaptive mode gating mechanism, carrying out weighted fusion on the time sequence feature vector, the visual feature vector and the semantic feature vector to generate a final predicted power sequence; And S6, constructing a loss function, and performing iterative optimization by adopting a three-stage progressive strategy.
2. The method for predicting the photovoltaic power of the single-station based on the combination of vision and language multi-mode according to claim 1, wherein in S2, the specific formula of the normalization process is: ; Wherein: is normalized value; Is the original input data; is the mean value of the input sequence; Is the standard deviation of the input sequence; is a normalization constant.
3. The method for predicting the photovoltaic power of the single-station based on the combination of vision and language multi-mode according to claim 1, wherein in S3, the specific formula of the linear projection is: ; Wherein: Is the first The embedded vectors of the individual Patch's, Slicing the original sequence; The projection weight matrix is adopted; Is a bias term; is a position code which can be learned.
4. The single-station photovoltaic power prediction method based on vision and language multi-mode fusion according to claim 1, wherein in S4, the CLIP model extracts the vision feature vector of the satellite cloud image data and the semantic feature vector of the meteorological text description data respectively, specifically, the extracted feature sequence is input into a transducer encoder, dynamic dependence is captured by using a self-attention mechanism, three weight matrixes are obtained through different matrix transformations, an attention score matrix of the input data is calculated, the attention score is normalized to attention weight, the matrixes are weighted and summed to obtain a context-related feature representation, and nonlinear transformation is performed through a feedforward neural network to further extract features.
5. The method for predicting the photovoltaic power of the single-station based on the vision and language multi-mode fusion according to claim 4, wherein the three weight matrices are specifically a query vector matrix, a key vector matrix and a value vector matrix, and the specific formulas are as follows: ; Wherein: is a query vector matrix, Is a key vector matrix, In the form of a matrix of value vectors, 、 And Is a corresponding learnable weight matrix.
6. The method for predicting the photovoltaic power of the single-station based on the vision and language multi-mode fusion according to claim 4, wherein the specific formula of the attention score matrix is as follows: ; ; Wherein: Feature output for the self-attention layer; a matrix of attention scores; Scaling factors for feature dimensions.
7. The single-station photovoltaic power prediction method based on vision and language multi-mode fusion according to claim 4, wherein the feedforward neural network is specifically: ; Wherein: Output for final characteristics of the encoder; , the weight of the feedforward neural network; , Is biased; to activate the function.
8. The single-station photovoltaic power prediction method based on vision and language multi-mode fusion according to claim 1, wherein in S5, the adaptive mode gating mechanism specifically calculates gating weight, and the specific formula is: ; Wherein: Is a gating weight; Outputting characteristics for a time sequence backbone; Is a joint feature of vision and text; generating final fusion characteristics and outputting a prediction result, wherein the specific formula is as follows: ; ; Wherein: is the final fusion feature; Is a multi-modal scaling factor; predicting a sequence for power at a future time; , weights and offsets for the pre-measurement heads.
9. The method for predicting the photovoltaic power of the single-station based on the combination of vision and language multi-mode according to claim 1, wherein in S6, the specific formula of the loss function is: ; Wherein: Is a loss function value; is the true value of the power; And n is the prediction step length.
10. The single-station photovoltaic power prediction method based on vision and language multi-mode fusion according to claim 1, wherein in S6, the three-stage progressive strategy is specifically: freezing multi-mode branch parameters, optimizing the time sequence backbone network parameters only by using the loss function value, and establishing basic prediction capability; The second stage is to freeze the time sequence backbone network parameters, optimize the multi-mode encoder and the gating parameters and learn the visual language characteristics; And the third stage, thawing all parameters, and collaborative optimizing the model by using a differential learning rate. When the verification set is no longer descending in the set number of rounds, triggering an early stopping mechanism and stopping iteration.

Description

Single-station photovoltaic power prediction method based on vision and language multi-mode fusion Technical Field The invention relates to the technical field of new energy generated power prediction, in particular to a single-station photovoltaic power prediction method based on vision and language multi-mode fusion. Background With the acceleration and promotion of energy transformation, the large-scale grid connection of the photovoltaic power generation makes the operation characteristics of the power system more complex, and also brings greater challenges to the photovoltaic power prediction. For power grid dispatching personnel, photovoltaic power prediction is used as a core link of intelligent power grid dispatching, and accurate prediction results can assist power grid enterprises in optimizing power generation plans, guiding new energy to be efficiently consumed, supporting power balance and ensuring safe and reliable operation of a power system. The traditional photovoltaic power prediction method mainly depends on historical power data and numerical weather forecast data, the data mode is single, in the multi-mode data fusion process, the existing method often lacks more dimensional information support, technicians often adopt a simple splicing or fixed weight fusion mode, the contribution degree of the photovoltaic power prediction method cannot be adaptively adjusted according to the quality and the correlation of different mode data, the fusion effect is poor, the prediction accuracy is further affected, an effective iterative optimization strategy is lacking in the model training process, the convergence speed and the prediction performance of a model are difficult to consider when the technicians are debugged, and the situation that the technicians are out of phase often occurs. Disclosure of Invention The embodiment of the invention provides a single-station photovoltaic power prediction method based on vision and language multi-mode fusion, which aims to solve the problems that the data mode is single, the fusion effect is poor, the convergence speed and the prediction performance of a model are difficult to consider. In order to achieve the above purpose, the embodiment of the present invention adopts the following technical scheme: A single-station photovoltaic power prediction method based on vision and language multi-mode fusion comprises the following steps: s1, collecting historical power data, numerical weather forecast data, satellite cloud image data at corresponding time and weather text description data of a target photovoltaic power station; S2, carrying out normalization processing on the historical power data and the numerical weather forecast data to form numerical data; s3, extracting time sequence feature vectors in the numerical data based on a time sequence backbone network, dividing a long sequence into local segments through a Patch embedding layer, dividing the local segments into local overlapping patches, and extracting dimension features through linear mapping; s4, respectively extracting visual feature vectors of the satellite cloud image data and semantic feature vectors of the meteorological text description data based on a trained CLIP model; s5, based on an adaptive mode gating mechanism, carrying out weighted fusion on the time sequence feature vector, the visual feature vector and the semantic feature vector to generate a final predicted power sequence; And S6, constructing a loss function, and performing iterative optimization by adopting a three-stage progressive strategy. Further, in S2, the specific formula of the normalization process is: Wherein: is normalized value; Is the original input data; is the mean value of the input sequence; Is the standard deviation of the input sequence; is a normalization constant. Further, in S3, the specific formula of the linear projection is: Wherein: Is the first The embedded vectors of the individual Patch's,Slicing the original sequence; The projection weight matrix is adopted; Is a bias term; is a position code which can be learned. Further, in S4, the CLIP model respectively extracts the visual feature vector of the satellite cloud image data and the semantic feature vector of the meteorological text description data, specifically, the extracted feature sequence is input into a transducer encoder, dynamic dependence is captured by using a self-attention mechanism, three weight matrixes are obtained through different matrix transformations, an attention score matrix of the input data is calculated, the attention score is normalized to attention weight, the matrixes are weighted and summed to obtain a context-related feature representation, and nonlinear transformation is performed through a feedforward neural network to further extract features. Further, the three weight matrices are specifically a query vector matrix, a key vector matrix and a value vector matrix, and the specific formulas are as follows