Search

CN-122020137-A - Rainfall prediction method based on climate partition and LGBM model

CN122020137ACN 122020137 ACN122020137 ACN 122020137ACN-122020137-A

Abstract

The invention discloses a rainfall prediction method based on climate zones and LGBM models, which relates to the technical field of meteorological processing, and comprises the steps of obtaining rainfall-related multisource data of a coverage target area, constructing a structured original data set, dividing the structured original data set according to climate zone rules of the target area, preprocessing rainfall scene exclusive data, generating standardized rainfall modeling data sets of all the climate zones, respectively training LGBM rainfall prediction models based on the standardized rainfall modeling data sets of all the climate zones, generating optimal rainfall prediction models matched with all the climate zones, obtaining rainfall-related data of the target area in a period to be predicted, dividing the rainfall-related data according to the climate zones, respectively inputting the corresponding optimal rainfall prediction models, generating rainfall prediction results of all the climate zones, and fusing the rainfall prediction results according to space positions to form a complete rainfall prediction distribution map of the target area. And forming a rainfall prediction scheme adapting to the complex climate and the topography features through model adaptation optimization of rainfall multisource associated data fusion.

Inventors

  • ZHAO PENGGUO
  • LIU YONG
  • LUAN QINGZU

Assignees

  • 成都信息工程大学

Dates

Publication Date
20260512
Application Date
20260415

Claims (10)

  1. 1. The rainfall prediction method based on the climate partition and LGBM model is characterized by comprising the following steps of: S1, acquiring precipitation associated multi-source data covering a target area, and constructing a structured original data set; s2, dividing the structured original data set according to climate zones according to climate zone rules of the target zone, preprocessing special data of a precipitation scene, and generating standardized precipitation modeling data sets of all the climate zones; S3, training LGBM precipitation prediction models based on standardized precipitation modeling data sets of all the climate areas, performing independent optimization on core super parameters of LGBM precipitation prediction models of all the climate areas by adopting Bayesian optimization, and performing model verification by combining time series cross verification to generate optimal precipitation prediction models matched with all the climate areas; S4, acquiring precipitation related data of a target area in a period to be predicted, dividing the target area according to weather areas, respectively inputting a corresponding optimal precipitation prediction model, generating precipitation prediction results of each weather area, and fusing the precipitation prediction results according to space positions to form a precipitation prediction distribution map of the complete target area.
  2. 2. The precipitation prediction method based on the climate partition and LGBM model according to claim 1, wherein the precipitation-related multisource data includes weather station observation data, satellite remote sensing data, atmospheric analysis data and geographic auxiliary data; screening precipitation-related multi-source data to obtain effective data, classifying the effective data into a plurality of subsets according to data types, and constructing three-dimensional structural attributes comprising time stamps, space longitude and latitude and factor values for each subset to obtain a structural original data set; The structured raw data set comprises a core target data subset, a thermal factor data subset, a dynamic factor data subset, a cloud micro-physical factor data subset, and an underlying and cofactor data subset.
  3. 3. The rainfall prediction method based on the climate zone and LGBM model as claimed in claim 2, wherein the specific process of the rainfall scene proprietary data preprocessing is: S21, carrying out space-time alignment and resolution unification on the structured original data set, unifying time windows and spatial resolutions of all subsets in the structured original data set to be precipitation observation standards, and simultaneously reserving topography details to define the corresponding relation between interpolation methods of all subsets and the original resolution; S22, carrying out precipitation data cleaning on the core target data subset subjected to space-time alignment and resolution unification by adopting a threshold screening and layered sampling strategy; s23, deriving power factor cooperative characteristics from the power factor data subsets subjected to space-time alignment and resolution unification; s24, filling the missing values of the subsets processed in the steps S21-S23; s25, performing factor standardization and global space-time fusion on all subsets processed in the step S24 to obtain single-table structured data; s26, dividing the single-table structured data to obtain standardized rainfall modeling data sets of all the climate areas.
  4. 4. A method for predicting precipitation based on climate zone and LGBM model as claimed in claim 3, wherein the specific process of step S21 is: Unifying time windows of all subset data to be daily scales matched with precipitation observation periods, and unifying spatial resolution to be preset resolution: For the subset with the resolution lower than the preset resolution, the resolution of the subset is improved by adopting a bilinear interpolation method, and for the subset with the resolution higher than or equal to the preset resolution, the resolution of the subset is reduced by adopting a nearest neighbor interpolation method, and meanwhile, the details of topography and land utilization are reserved; And adding the special factors of the precipitation scene into the data subsets of the underlying surface and the auxiliary factors after the resolution is unified.
  5. 5. The rainfall prediction method based on the climate partition and LGBM model as claimed in claim 4, wherein the underlying surface and auxiliary factor data subset with uniform resolution comprises DEM data with preset resolution, the topography relief F h is calculated according to the DEM data with preset resolution, and the topography relief is used as a rainfall scene exclusive factor; The calculation process of the topographic relief degree F h is as follows: F h is equal to the maximum value of the altitude in the rectangular window minus the minimum value of the altitude in the rectangular window, wherein the rectangular window is the adjacent grid range of DEM data at the preset resolution.
  6. 6. A precipitation prediction method based on climate zone and LGBM model according to claim 3, wherein the power factor data subset includes wind speed WS and wind direction WD of ERA5 data set, and the specific process of deriving the power factor synergy feature is: from the wind speed WS, the wind direction WD, a wind speed-wind direction cofactor F ws_wd , F ws_wd =cos [ ws× (WD-180 °) ] for quantifying the water vapor transport intensity is calculated.
  7. 7. A rainfall prediction method based on climate zone and LGBM model according to claim 3, characterized in that the specific process of filling the missing values is: the method of combining the weather area mean value with the inverse distance weight interpolation IDW is adopted, firstly, the weather area to which the structured original data set belongs is calculated, the contemporaneous factor mean value is taken as a missing value initial value, then the IDW method is combined with the effective factor value correction of the peripheral 3 preset resolution grids, and finally, the filled missing value V is equal to the sum of products of the peripheral 3 grid effective factor values and corresponding distance reciprocal squares, divided by the sum of the peripheral 3 grid distance reciprocal squares, and the result is expressed as: V= ; where v i denotes the effective factor value of the ith peripheral grid and di denotes the distance of the ith peripheral grid to the location of the missing value.
  8. 8. A precipitation prediction method based on climate zones and LGBM model according to claim 3, characterized in that the specific process of obtaining the single-table structured data is: Dimension normalization, namely, adopting Z-score normalization to all factors in all subsets except the core target data subset after being processed in the step S24, adapting to normal distribution characteristics of precipitation factors and avoiding extreme value distortion data distribution; And global space-time fusion, namely fusing the processed thermal factor data subset, the power factor data subset, the cloud micro physical factor data subset, the underlying surface and the auxiliary factor data subset serving as factor columns with a core target data subset target column based on a day scale time stamp and a longitude and latitude grid with preset resolution to form single-table structured data.
  9. 9. The method for predicting precipitation based on climate zones and LGBM model according to claim 8, wherein generating an optimal model for predicting precipitation adapted to each climate zone in step S3 specifically comprises: s31, constructing LGBM a basic model, reserving GOSS gradient unilateral sampling and EFB exclusive feature binding core mechanisms which are native to the LGBM basic model, and initializing initial training parameters of the LGBM basic model; s32, according to the precipitation characteristics of the standardized precipitation modeling data set, adjusting the high gradient sample retention proportion and the low gradient sample sampling proportion in the single-side sampling of the GOSS gradient, classifying according to a precipitation physical mechanism, and binding high-dimensional factors in the standardized precipitation modeling data set to obtain LGBM precipitation prediction models; S33, setting a core super-parameter search range for a precipitation scene for a LGBM precipitation prediction model, and iteratively optimizing by adopting a Bayesian optimization method with the aim of minimizing root mean square error and maximizing a decision coefficient; S34, verifying the validity of the optimal super-parameter combination by adopting a time sequence ten-fold cross verification method, and ensuring that the training set time is earlier than the verification set time: and S35, training LGBM the precipitation prediction model by using the full training set based on the verified effective optimal super-parameter combination to obtain an optimal precipitation prediction model.
  10. 10. A method of predicting precipitation based on climate zone and LGBM model as claimed in claim 2, further comprising the steps of: s5, based on a preset climate partition rule, carrying out partition verification on the standardized precipitation modeling data set and the optimal precipitation prediction model of each climate region, and outputting a partition precipitation mechanism targeting interpretation result; s6, embedding an optimal rainfall prediction model into a SHAP interpretation frame, performing multidimensional SHAP analysis on the standardized rainfall modeling data set of each climate zone, and outputting SHAP targeting interpretation results; and S7, carrying out structural integration on the regional precipitation mechanism targeting interpretation result and the SHAP targeting interpretation result, constructing a global-regional precipitation driving mechanism comparison system, and determining common driving rules of precipitation in different climatic regions and a regional specific regulation and control mechanism.

Description

Rainfall prediction method based on climate partition and LGBM model Technical Field The invention relates to the technical field of meteorological treatment, in particular to a rainfall prediction method based on climate zone and LGBM models. Background Precipitation is used as a core link of earth water circulation, and the time-space distribution characteristics of the precipitation directly regulate and control the evolution of an ecological system, the agricultural production layout and the water resource safety, and are also key driving factors for inducing meteorological disasters such as flood, drought and the like. Because the climate and the topography change of the position of the precipitation area cause the regional precipitation to present obvious space-time heterogeneity and extreme precipitation events frequently occur, the method is a key premise for improving the precipitation prediction precision and formulating regional differentiation disaster prevention and reduction strategies, and is also a research hotspot and difficulty in the current atmospheric science and meteorological fields. At present, research technological means of influence of precipitation factors are divided into two major categories of traditional statistical methods and conventional machine learning methods, and both the two categories of technologies have obvious limitations. The traditional statistical method takes linear association as a core assumption, the technical system is mature, the calculation cost is low, the interpretability is high, but the inherent defects are that the linear assumption restricts the capturing capacity of nonlinear relations, threshold effects cannot be identified, the processing capacity of high-dimensional variables is lacking, the multi-factor cooperative regulation and control mechanism and interaction effect intensity are difficult to quantify, and the capturing capacity of small-scale local precipitation characteristics is lacking. The conventional machine learning method breaks through linear assumption based on data driving, and shows advantages in scenes such as extreme precipitation event identification and short-term precipitation prediction, but has inherent defects of a black box model, such as incapability of accurately quantifying contribution proportion and positive and negative effects of each influence factor, lack of local interpretability, incapability of defining factor threshold ranges in different climatic regions and under different precipitation intensities, difficulty in intuitively showing interaction effect forms and contribution intensities among the influence factors, and remarkable model generalization capability influenced by data distribution. In addition, the existing research has the defects of systematic frame that the research scale is limited to a single climate area or a few watercourses, the national scale system research is lacking, the comprehensive consideration of underlying factors and cloud micro physical factors is insufficient on one side of variable selection, the spatial variation rule of influence factors is not clear, regional heterogeneity is ignored, and the understanding of precipitation formation mechanisms is remained on the phenomenon association level due to the lack of a targeted nonlinear frame quantification synergistic mechanism. The defects cause that the existing research results are difficult to effectively support the differential precipitation regulation strategy of the region and cope with meteorological disasters, and a new technical scheme is urgently needed to be developed to break through the existing limitation. Disclosure of Invention The invention aims to provide a rainfall prediction method based on a climate partition and LGBM model, which forms a rainfall prediction scheme adapting to complex climate and topography characteristics by constructing multi-source rainfall association data fusion, rainfall scene exclusive pretreatment and LGBM model rainfall adaptation optimization. In order to achieve the above object, the present application proposes the following solutions: in one aspect, the application provides a rainfall prediction method based on climate zones and LGBM models, which specifically comprises the following steps: S1, acquiring precipitation associated multi-source data covering a target area, and constructing a structured original data set; s2, dividing the structured original data set according to climate zones according to climate zone rules of the target zone, preprocessing special data of a precipitation scene, and generating standardized precipitation modeling data sets of all the climate zones; S3, training LGBM precipitation prediction models based on standardized precipitation modeling data sets of all the climate areas, performing independent optimization on core super parameters of LGBM precipitation prediction models of all the climate areas by adopting Bayesian optimization, and per