CN-122020391-A - ENSO prediction method and prediction system based on time-space efficient probability mechanism

CN122020391ACN 122020391 ACN122020391 ACN 122020391ACN-122020391-A

Abstract

The invention provides an ENSO prediction method and system based on a time-space efficient probability mechanism, which belong to the technical field of artificial intelligence, and comprise the following steps of preprocessing sea surface temperature data of a time-series gridding to form a five-dimensional tensor and then carrying out patch division; the method comprises the steps of projecting the features of each patch through a full connection layer, adding the projected features, time sequence position codes and space learning embedded vectors, transmitting the sequence after space-time embedding to a transducer encoder through a layer normalization layer, and carrying out self-attention calculation on the output sequence of the encoder by a decoder to fully learn space-time information in the sequence. Letting the output sequence be a Query, interrogating the memory matrix output by the encoder, and then outputting a projection. The deep learning method can improve the robustness of the model, and has the obvious technical advantages of greatly reducing the calculation cost and the memory use and greatly improving the ENSO prediction effect.

Inventors

HU CHUNDI
XU CHUNHUI
Lei Chengjie

Assignees

浙江大学

Dates

Publication Date: 20260512
Application Date: 20260209

Claims (10)

1. An ENSO prediction method based on a time-space efficient probability mechanism is characterized by comprising the following steps: Preprocessing the sea surface temperature data with the sequential gridding to form a five-dimensional tensor for patch division, projecting the characteristics of each patch through a full-connection layer, adding the projected characteristics, the position codes generated for each time and the leachable embedded vectors distributed for each space patch, and outputting an embedded sequence through layer normalization; inputting the embedded sequence into an encoder, and outputting a memory matrix rich in global space-time context information by the encoder; The decoder performs self-attention calculation on the output sequence, the probability cross-attention is used as a Query, the memory matrix output by the encoder is queried, history context information is obtained, projection is output, and finally, the output is performed after the Patch reconstruction operation.
2. The siso prediction method of claim 1, wherein the method of patch division comprises the steps of: The spatial dimension of the input tensor is divided by using non-overlapping sliding windows, and pixel values in each window are flattened, so that the shape of the input tensor is changed, and the following formula is adopted: (B × T_in × C × H × W) → (B × T_in × (C × k_h × k_w) × S) ... (1) Where s= (H/k_h) x (W/k_w).
3. The ENSO prediction method of claim 2, wherein the method of encoding the position generated for each time comprises the steps of encoding the sequence of time steps using a transform native sine-cosine function, encoding the position pos and the dimension index i for the time steps as follows: PE_{(pos, 2i)} = sin(pos / 10000^{2i/d_model}) (2) PE_{(pos, 2i+1)} = cos(pos / 10000^{2i/d_model}) (3)。
4. The method of claim 3, wherein the overall formula of the encoder is: Encoder(X) = LayerNorm( L_{N_enc}( ... L_2( L_1(X) ) ... ) ) (5) each encoder comprises two sublayers, wherein a probability self-attention layer is adopted by a sublayer 1, and a feedforward network is adopted by a sublayer 2: L_i(X) = LayerNorm_2( X + FFN( LayerNorm_1( X + ProbAttention(X)))) (6); The probability self-attention layer reshapes the input characteristic to treat the time sequence of each space patch as an independent sequence, and performs time attention calculation on the sequence in parallel, and the feedforward network consists of two linear transformation layers, an activation function and a Dropout layer between the two linear transformation layers.
5. The siso prediction method according to claim 4, wherein the operation mechanism of the probabilistic self-attention layer of the sub-layer 1 comprises the steps of: The input tensor was split into H heads, randomly sampled from the complete K, with the number of samples u_part=min (factor x ⌈ log l_k ⌉, l_k.) (11) Screening a subset with the largest information quantity from all queries, calculating an importance score M, namely performing small-scale dot product operation on the sampled key K_sample and all queries Q to obtain a preliminary attention score matrix, and then calculating the difference between the maximum value and the average value of the corresponding score row of each query as the importance score of the key K_sample: M = max(Q • K_sample^T) - (1 / U_part) * sum(Q • K_sample^T) (12) according to the importance score M, preserving the inquiry of the u before ranking to form Q_reduce; u = min( factor × ⌈log L_q⌉, L_q ) ... (13) Sparse attention weight calculation-calculation using only q_reduce and complete K: Attention_weights = softmax( (Q_reduce • K^T) / √d_head ) (14) initializing context: initializing a context matrix v_init: V_init = { mean(V), without masking; cumsum(V), with masking } ... (15) The context location corresponding to q_reduce is updated using only the calculated attention weight: C_final[ M_top ] = Attention_weights • V ... (16) Wherein M_top is the index of the top level query, and the context of other positions is kept unchanged; the updated outputs of the multi-head component are re-stitched and transformed into dimensions, restored to (B x S) x T x d _ model, and finally passed through the Dropout layer.
6. The method of claim 4, wherein the sub-layer 2 operates in a feed forward network according to the following formula: output = x • gate + out • (1 - gate) (7) gate = sigmoid( sigmoid(x) + sigmoid( Normalization(out - x) ) ) (8) where x is the sub-layer input and out is the sub-layer output.
7. The siso prediction method of claim 1, wherein said decoder Symmetrical to the encoder, the overall formula is: Decoder(Y, Memory) = LayerNorm( L_{N_dec}( ... L_2( L_1(Y, Memory) ) ... ) ) ... (9) Each of the decoder layers comprises three sub-layers, as shown in equation (10): L_i(Y, M) = LayerNorm_3( Y + FFN( LayerNorm_2( Y + CrossAttention( LayerNorm_1( Y + SelfAttention(Y) ), M ) ) ) ) (10) Sub-layer 1 is a probabilistic self-attention layer, sub-layer 2 is a probabilistic cross-attention layer, and sub-layer 3 is a feed-forward network.
8. The prediction system of the ENSO prediction method according to any one of claims 1 to 7, comprising: The division module is used for preprocessing the sea surface temperature data of the sequential meshing to form a five-dimensional tensor for patch division, projecting the characteristics of each patch through a fully-connected layer, adding the projected characteristics, the position codes generated for each time and the leachable embedded vectors distributed for each space patch, and outputting an embedded sequence through layer normalization; The coding module is used for inputting the embedded sequence into an encoder, and the encoder outputs a memory matrix rich in global space-time context information; the decoding module is used for carrying out self-attention calculation on the output sequence by the decoder, enabling the probability cross attention to enable the output sequence to be used as a Query, inquiring the memory matrix output by the encoder, acquiring historical context information, then outputting projection, and finally outputting after the Patch reconstruction operation.
9. A computer readable storage medium having stored thereon a computer program, characterized in that the program when executed implements the steps of the siso prediction method according to any one of claims 1-7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the siso prediction method according to any one of claims 1-7 when said program is executed.

Description

ENSO prediction method and prediction system based on time-space efficient probability mechanism Technical Field The invention relates to the field of abnormal weather prediction, in particular to an ENSO prediction method and an ENSO prediction system based on a time-space efficient probability mechanism. Background El nino-nan surge (ENSO) is a large scale sea-air coupling phenomenon occurring in the pacific region of the tropics, marked by abnormal heating (el nino) or chilling (lanina) of its sea surface temperature. As the strongest annual change rate signal in the earth climate system, the ENSO has profound effects on the global climate through the atmospheric remote related process, and can cause extreme weather events including storm, flood, drought and heat waves to directly influence agricultural yield, water resource management, ecological system balance and even social and economic stability. Scientists have relied primarily on two broad classes of models for the past half of the century to conduct ENSO predictions, kinetic models and statistical models. The kinetic model is a numerical framework constructed based on physical equations (e.g., navier-Stokes equations, thermodynamic equations) that control the marine-atmospheric system, in morphology ranging from a simplified conceptual model (e.g., recharge oscillator) to a complex fully-coupled universal loop model (CGCMs). These models model and predict the occurrence, evolution and decay of an ENSO event by solving a system of partial differential equations. The advantage of the kinetic model is that it has a clear physical mechanism and a solid theoretical basis, and is able to simulate the intrinsic variability of the climate system starting from the first principle of nature. However, the limitation is that the model has inherent systematic deviations, such as "too strong cold tongue" deviations, erroneous sea-gas feedback process characterization, etc., which are common in the tropical pacific region, and these errors can be amplified as the integration time increases, severely affecting the prediction accuracy. Minor errors in the ocean and atmospheric initial fields are one of the major sources of prediction uncertainty. Despite improvements in data assimilation techniques, accurate acquisition of the initial field remains a significant challenge. The high-resolution full-coupling model needs to perform mass calculation, which severely limits the capability of the full-coupling model for large-scale set forecasting, long-term integration and high-resolution simulation, and is difficult to meet the real-time requirement of business forecasting. The statistical model does not directly solve the physical equation, but rather establishes an empirical relationship between the predictor and the predicted object (e.g., ni ñ o 3.4 index) using statistical methods (e.g., regression analysis, principal component analysis, machine learning, etc.) based on historical observation data. The statistical model has the advantages of higher calculation efficiency and sensitivity to statistical rules in historical data. The limitation of statistical models is the dependence on data length, the reliability of the model being severely dependent on sufficiently long, high quality observations. The problem of insufficient data is particularly pronounced for the prediction of chronometric changes or rare extreme events. Inherent constraint of methodology, the traditional linear statistical method is difficult to capture complex nonlinear characteristics in the evolution of ENSO. Its extrapolation capability is limited and the predictive performance is significantly degraded when the climate system is in a new state where the training data is not covered. The physical mechanism is ambiguous, the model is more of a 'black box' or 'gray box' operation, clear explanation of the underlying physical process is lacking, and physical insight is difficult to provide. In recent years, with the rapid development of artificial intelligence technology, a deep learning model is successfully introduced into the field of ENSO prediction, and great potential is shown. Unlike traditional statistical models, deep learning models can automatically learn complex nonlinear mapping relationships from massive climate model simulation data and observation data. Overview of existing deep learning techniques: 1. convolutional Neural Network (CNN) is widely used for processing space grid data such as sea surface temperature field, air pressure field and the like, and can effectively extract local space characteristics. However, the inherent local ligation properties of CNNs make it difficult to capture the far-away spatiotemporal dependencies in long sequences, and information decay problems are likely to occur. 2. The graphic neural network (GCN) models the climate system as a graph structure and can process non-Euclidean data. But its performance depends largely on a predefined or stati