CN-121996976-A - Multi-source cooperative variable driven near-surface air temperature data downscaling method
Abstract
The invention discloses a multisource cooperative variable driven near-surface air temperature data downscaling method, which relates to the technical field of air temperature data processing and comprises the steps of data preprocessing, sample dividing, downscaling model construction and air temperature data processing, wherein elements such as cloud cover, long/short wave radiation, display/latent heat flux and the like are introduced as cooperative variables based on a mode output statistics thought to generate a multidimensional multisource variable data set, and the complete data set is formed through preprocessing such as resampling; the method comprises the steps of carrying out sample division and balance by adopting K-fold cross validation and a first adaptation decremental algorithm to obtain a total training set and an independent validation set, screening stable and effective characteristics, further constructing three models of an elastic network, a Bayesian optimized random forest and a long-period memory network based on neural architecture search, inputting data to be processed into the model to generate high-resolution air temperature data, and guaranteeing data accuracy through multidimensional validation.
Inventors
- WU XIAOJUAN
- Yan Yiding
- HUANG JIAWEI
- CUI ZHENYING
- YANG JIE
- WANG FUQING
- DU XIAOXIAO
- NIU HAO
- Ren Baiyao
- PENG JUNYI
Assignees
- 成都信息工程大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260130
Claims (6)
- 1. The multi-source cooperative variable driven near-surface air temperature data downscaling method is characterized by comprising the following steps of: Step one, data preprocessing, which is to integrate various collaborative variables in a database to obtain a multi-dimensional feature set, and preprocessing near-surface air temperature data in the database based on the multi-dimensional feature set to obtain a complete data set; sample division, namely carrying out sample division and balancing on the complete data set to obtain sample data of each fold, and constructing a total training set and an independent verification set; constructing a downscaling model, namely constructing an elastic network model, a Bayesian optimization-based random forest model and a long-term and short-term memory network model based on neural architecture search based on a total training sample set; and fourthly, processing air temperature data, namely inputting the near-surface air temperature data into three models, generating high-resolution near-surface air temperature data, verifying the high-resolution near-surface air temperature data based on an independent verification set, and outputting a final data set.
- 2. The multi-source cooperative variable driven near-surface air temperature data downscaling method of claim 1, wherein the integrating of various cooperative variables in the database is as follows: The multidimensional under-pad cooperative variable subset comprises terrain features, ground and vegetation features, human activity intensity features and ground surface radiation features, wherein the terrain features comprise SRTM-DEM data and derivative data slopes, slope directions, maximum terrain fluctuation, average terrain fluctuation, maximum slope change values and average steady gradient degree change values thereof, the ground feature data are from a ground coverage type of MODIS-MCD12Q1, the vegetation feature data are vegetation coverage of GK-2A AMI, the human activity data are month Night light data synthesized by using VIIRS Day/right Band (DNB) average radiance, and the data of a landing feature, building height, a built surface and 2018 reference year of a local land climate zone of GHSL R A are used as static background assistance, and the ground surface radiation aspect adopts GK-2A AMI specific emissivity data and black sky narrow Band albedo data for representing ground surface radiation and reflection features; Based on a mode output statistical thought (Model Output Statistics, MOS), re-analyzing the homologous driving characteristics of the product and the data of the near-surface air temperature to be corrected by using ERA5 and ERA5-Land, and introducing variables with mechanical association and statistical correlation with air temperature deviation, wherein the variables comprise cloud cover, long/short wave radiation, display/latent heat flux, planet boundary layer height, 10m wind speed, sea level air pressure, dew point/humidity, surface temperature and snow depth factors as MOS cooperative variables; and integrating the multidimensional underlying cooperative variable subset and the MOS cooperative variable subset to construct a multidimensional and multisource cooperative variable data set.
- 3. The multi-source cooperative variable driven near-surface air temperature data downscaling method according to claim 2, wherein the preprocessing is performed on the near-surface air temperature data in the database, and the specific processing process is as follows: Acquiring near-surface air temperature data in an ERA5-Land re-analysis product of a middle weather forecast center, wherein the near-surface air temperature data comprises daily average air temperature, daily highest air temperature and daily lowest air temperature, the original spatial resolution is preset basic resolution, a time range covers a research period and is used as coarse resolution scale data to be reduced, and effective weather station measured air temperature data in a research area is collected; And carrying out daily scale time synchronization on the actually measured site data and the coarse resolution to-be-reduced scale data according to the observation time stamp, and unifying the multidimensional multi-source cooperative variable data set, the coarse resolution to-be-reduced scale data and the actually measured air temperature label set into a WGS84 coordinate system and a preset projection mode. Based on the actually measured site data set, performing up-down sampling on the multi-dimensional multi-source cooperative variable data set and the coarse resolution to-be-reduced scale data according to target resolution by adopting methods such as Nearest-neighbor (Nearest-neighbor), bilinear (Bilinear interpolation), cubic convolution (cube-convolution interpolation), area weight average (Area-WEIGHTED AVERAGING), vector average (Vector mean), mode algorithm (Majority resampling) and the like, and extracting corresponding lattice points to finally obtain the coarse resolution to-be-reduced scale data, the multi-dimensional multi-source cooperative variable and the actually measured air temperature label after space-time matching; And combining the coarse resolution to-be-downscaled data, the multidimensional multi-source cooperative variable and the actually measured air temperature label after space-time matching according to sample dimensions to form a complete data set containing input features, to-be-downscaled target variable and truth value labels, wherein the sample size is determined according to the number of sites of a research area and the time length.
- 4. A multi-source co-variable driven near-surface air temperature data downscaling method as claimed in claim 3, wherein the whole data set is subjected to sample division as follows: And marking the coarse resolution to-be-reduced scale data, the multidimensional multi-source cooperative variable and the actually measured air temperature label in the same space-time as the same sample to obtain each sample, marking all samples of the same site into the same group by taking the observation site as a grouping unit to obtain each sample of each site. Based on K-fold cross-validation division of site grouping, the spatial consistency of sample data of the same observation site is reduced, information leakage possibly exists when common cross-validation is performed, and the samples among all folds are ensured to be free from overlapping in space. Meanwhile, the number of samples of each fold is balanced by the first adaptive progressive and subtractive method, and the samples are completely processed by code logic in a self-running way, so that human factors are prevented from being introduced during sample division. Finally, a total training set and a final verification set are obtained, and a sub-training set and a sub-test set combination are further divided in the total training set. Except for the actually measured air temperature, the dummy variable, the category variable and the angle variable, the rest continuous characteristics execute Z-Score standardization according to the training fold statistics, and the standardized parameters are applied to the corresponding test fold and the final verification set so as to ensure the independence and comparability of the evaluation process.
- 5. The multi-source cooperative variable driven near-surface air temperature data downscaling method of claim 1, wherein the three models are respectively an elastic network model, a Bayesian optimization-based random forest model and a neural architecture search-based long-term and short-term memory network model, and the specific construction process is as follows: And (3) an elastic network model, namely determining the proportion of regularized intensity Alpha and L1 by using self-adaptive multi-round grid search, wherein the proportion of regularized intensity Alpha and L1 takes a value in a preset logarithmic scale range, and the proportion of regularized intensity Alpha and L1 takes a value in a preset interval. Initially evaluating regularization strength and L1 proportion of a preset group number, taking the minimum RMSE as a target, and when the improvement amplitude of the optimal RMSE in continuous multi-round fine search is smaller than a preset value or iteration times, converging tolerance reaches the preset value; Defining a plurality of types of super-parameter searching ranges such as the number of trees, the maximum depth of decision trees and the like, providing three optional modes for feature sampling proportion, setting preset searching rounds by taking the root mean square error of a minimized subtest set as a target through BayesSearchCV tools, screening candidate super-parameters by means of a probability agent model and an expected improvement criterion in each round, sorting features according to variance reduction after training, selecting preset number of features for each fold, counting cross-fold stable features, and retraining based on the feature set and the optimal super-parameters to form a final model; The long-term and short-term memory network model based on neural architecture search is characterized in that an input sequence is constructed by taking a preset length and a step length as sliding windows, single-step regression prediction actual measurement data is adopted, super-parameter search spaces such as the number of hidden units and the number of stacked layers are set, a preset group of architectures and super-parameter combinations are randomly extracted, an early-stop mechanism is started during training of each group of candidate architectures, an optimal architecture is selected for root mean square errors of sub-test sets, total training set time sequence data is imported, training is carried out according to set batch sizes and round times, and the model construction is completed by utilizing gating adjustment time sequence information through counter propagation optimization weights.
- 6. The multi-source cooperative variable driven near-surface air temperature data downscaling method according to claim 1, wherein the high-resolution near-surface air temperature data is verified by the following specific verification process: And (3) performing core statistics index verification, extreme air temperature event verification, space-time stability verification and visual verification on the high-resolution near-surface air temperature data, and outputting a high-resolution near-surface air temperature data set after the verification is qualified.
Description
Multi-source cooperative variable driven near-surface air temperature data downscaling method Technical Field The invention relates to the technical field of air temperature data processing, in particular to a near-surface air temperature data downscaling method driven by multi-source cooperative variables. Background The near-surface air temperature is core basic data in the fields of climate system monitoring, ecological environment simulation, disaster risk assessment and the like, and the acquisition of air temperature data with high spatial resolution and high space-time continuity is of great significance, so that a multi-source cooperative variable driven near-surface air temperature data downscaling method is needed. The existing method for acquiring near-surface air temperature data mainly comprises three types of meteorological site observation, satellite remote sensing and analysis data, wherein the three types of meteorological site observation, satellite remote sensing and analysis data are all limited by arrangement density, spatial heterogeneity under complex terrains is difficult to reflect, satellite remote sensing data are easy to be shielded by cloud layers, long-time sequence space-time integrity is insufficient, and analysis data such as ERA5-Land and CLDAS-V2.0 are good in space-time continuity, but low in spatial resolution and cannot meet the requirements of fine application. The downscaling process becomes a key technology for improving the spatial resolution of the analysis data, the main flow method is divided into dynamic downscaling and statistical downscaling, the dynamic downscaling is based on an atmospheric physical process, the mechanism interpretation is strong, the precision is easily influenced by a complex underlying surface, the universality is poor, the statistical downscaling realizes downscaling by constructing a statistical relationship between a large scale and regional climate variables, the universality is better, and the existing statistical downscaling technology has a plurality of defects. The existing statistical downscaling technology has a plurality of defects that 1, systematic deviation correction is insufficient, the homologous driving characteristics of re-analysis data are not utilized, the systematic deviation of near-surface air temperature cannot be effectively restrained, and the problem of cold deviation under complex terrains is outstanding. 2. The nonlinear relation is weak in capture, and the traditional linear model or the simple machine learning model is difficult to describe complex nonlinear association between deviation and cooperative variables, so that the scale-down precision is limited to be improved. 3. The extreme air temperature correction effect is limited, the deviation correction capability of the extreme high and low temperature samples is insufficient, the RMSE (root mean square error) improvement of the extreme deviation subset is not obvious, and the occasional extreme deviation samples can be degraded in accuracy. 4. The space-time stability is insufficient, the correction effect is greatly influenced by seasons, the deviation improvement amplitude fluctuation of key time periods such as winter, early spring and summer is obvious, and the adaptability to different terrain areas is uneven in space. 5. The feature fusion is incomplete, the system is not integrated with multisource underlying surface cooperative variables such as topography, ground class, human activity, surface radiation and the like, the dimension of input information is insufficient, and the capability of supporting high-resolution inversion is limited. Disclosure of Invention Aiming at the technical defects, the invention aims to provide a multi-source cooperative variable driven near-surface air temperature data downscaling method. The invention provides a multi-source cooperative variable driven near-surface air temperature data scale reduction method, which comprises the following steps of integrating various cooperative variables in a database to obtain a multi-dimensional feature set, and preprocessing near-surface air temperature data in the database based on the multi-dimensional feature set to obtain a complete data set. And secondly, sample division, namely performing sample division and balancing on the complete data set to obtain sample data of each fold, and constructing a total training set and an independent verification set. And thirdly, constructing a downscaling model, namely constructing an elastic network model, a Bayesian optimization-based random forest model and a long-term and short-term memory network model based on neural architecture search based on the total training sample set. And fourthly, processing air temperature data, namely inputting the near-surface air temperature data into three models, generating high-resolution near-surface air temperature data, verifying the high-resolution near-surface air temperature data based on