CN-122020608-A - Global climate long sequence data reconstruction method for heat stress risk assessment

CN122020608ACN 122020608 ACN122020608 ACN 122020608ACN-122020608-A

Abstract

The invention discloses a global climate long sequence data reconstruction method for heat stress risk assessment, which comprises the steps of carrying out space interpolation on meteorological data missing values of a target station through a ridge regression model based on spatially adjacent reference station data, and then carrying out time interpolation through the ridge regression model based on physical relations among different meteorological variables of the target station. The invention adopts the ridge regression technology, stabilizes coefficient estimation by introducing regularization term, effectively overcomes multiple collinearity, and realizes space-time collaborative interpolation with high precision and self-adaption.

Inventors

ZHANG SIQI
REN YUYU
REN GUOYU
WU PING
ZHANG YONGQIANG

Assignees

国家气候中心

Dates

Publication Date: 20260512
Application Date: 20260202

Claims (10)

1. The global climate long sequence data reconstruction method for heat stress risk assessment is characterized in that the missing value of meteorological data of a target station is firstly subjected to spatial interpolation through a ridge regression model based on spatially adjacent reference station data, and then the time interpolation is performed through the ridge regression model based on the physical relationship among different meteorological variables of the target station.
2. The global climate long sequence data reconstruction method for heat stress risk assessment according to claim 1, wherein the spatial interpolation comprises the steps of grouping daily scale data of a target station and reference stations according to variables and month, taking a target variable effective observation sequence of the month of the target station as a dependent variable Y, selecting synchronous same variable data of K reference stations with the forefront ranking to form an independent variable matrix X, and constructing a ridge regression model, wherein a regression coefficient calculation formula of the ridge regression model is as follows: ; and carrying out interpolation prediction on the missing value of the month target variable of the target station by using the constructed ridge regression model.
3. The method for reconstructing global climate long sequence data for heat stress risk assessment according to claim 1 or 2, wherein the time interpolation comprises grouping the spatially interpolated multivariate daily scale data of the target station by month, wherein the multivariate comprises at least the target variable and the key covariates, constructing a bivariate ridge regression model by using the physical relationship among the variables for each month grouping, wherein the bivariate ridge regression model takes the target variable as a dependent variable and takes at least one other meteorological variable as an independent variable, and the regression coefficient thereof The calculation formula is as follows: ; Wherein the method comprises the steps of As a sequence of the independent variables, In order to be a sequence of dependent variables, In order for the parameters to be regularized, And performing secondary interpolation on the target variable values which are still missing after spatial interpolation by using the constructed bivariate ridge regression model.
4. The method for reconstructing global climate long sequence data for thermal stress risk assessment according to claim 1, further comprising the model optimization step of calculating a decision coefficient R 2 of each ridge regression model for space interpolation and time interpolation respectively, setting a lowest performance threshold of R 2 , adopting only the output value of the model with R 2 higher than the preset threshold, and selecting the highest output value of R 2 as the final interpolation value of the missing data point when the output values of a plurality of qualified models exist in the same missing data point.
5. The global climate long sequence data reconstruction method for heat stress risk assessment according to claim 1, wherein the ridge parameter λ in the ridge regression model is determined by automated, adaptive optimization through a random forest algorithm.
6. The method for reconstructing global weather length sequence data for heat stress risk assessment according to claim 1, wherein the method further comprises a data preprocessing step of mapping all weather stations in the world into a regular longitude and latitude grid with preset resolution, wherein in each grid unit, a representative station is selected according to a preset data integrity comprehensive index, the data integrity comprehensive index is the number of days when target interpolation variables and key covariates have effective observation on the same calendar date, weighting scoring is carried out by combining the total effective record years of the station, and the station with the highest score is selected as the representative station of the grid.
7. The global climate long sequence data reconstruction method for heat stress risk assessment according to claim 6, wherein the regular longitude and latitude grid is a 5 ° x 5 ° global regular grid.
8. A global climate long sequence data reconstruction method for heat stress risk assessment according to claim 3, wherein the other meteorological variables comprise dew point temperature and local station barometric pressure, and the key covariates comprise daily average dew point temperature and daily average local station barometric pressure.
9. The global climate long sequence data reconstruction method for heat stress risk assessment according to claim 2, wherein the step of constructing a space reference station set is characterized in that a grid where a target station is located is taken as a center, all other stations in the grid and a plurality of grids which are adjacent geographically are initially selected as space reference station candidate sets, a comprehensive geographic distance index between the target station and each reference station candidate is calculated, the comprehensive geographic distance index is calculated based on spherical distance, longitude and latitude differences and altitude differences, all reference station candidates are ordered from small to large according to the comprehensive geographic distance index, K reference stations which are ordered earlier are selected to form the space reference station set, and K is set to be 5-10 according to data density.
10. The global climate long sequence data reconstruction method for heat stress risk assessment according to claim 2 or 3, wherein when constructing a ridge regression model, effective data of corresponding months are divided into a training set and a verification set according to the proportion of 8:2, and the training set training model is utilized to verify the performance of the model through the verification set.

Description

Global climate long sequence data reconstruction method for heat stress risk assessment Technical Field The invention relates to the technical field of meteorological data reconstruction, in particular to a global climate long sequence data reconstruction method for heat stress risk assessment. Background In the context of global climate change, thermal stress risk assessment is critical to protecting public health, guiding outdoor operations, and developing adaptive policies. Wet-bulb black-bulb temperature (WBGT) is adopted by the international standardization organization as a core parameter for evaluating environmental heat load as a composite index comprehensively considering temperature and humidity, and the long-term evolution trend is a basic stone for risk assessment. Currently, WBGT trend assessment and risk analysis methods commonly employed in the industry are mainly based on two types of data sources, weather analysis data and raw weather station observation data. The typical implementation path is that the temperature and humidity fields provided by the data sources are directly utilized to calculate the historical WBGT daily sequence of each grid point or station, then the long-term change trend of the WBGT is analyzed through a statistical method (such as linear fitting), the frequency of days exceeding a preset risk threshold (for example, the WBGT is more than or equal to 30 ℃) and the change of the days are calculated, and accordingly qualitative or semi-quantitative judgment is made on the increase and decrease trend of the regional thermal risk. However, the prior art solutions described above have the following inherent drawbacks: The space-time reliability of the evaluation result is limited by the quality of the basic data, and the analysis data has the advantages of continuous space-time coverage and complete sequence, but is essentially a fusion product of a numerical forecasting mode and an assimilation system for sparse observation. Systematic deviations of the patterns themselves, changes in assimilation schemes, and changes in inhomogeneities of the observation system itself are all introduced into the analysis data as "artifacts", resulting in uncertainty in the reproduction of the true climate conditions, thereby affecting the reliability of the derived WBGT evaluation conclusion. On the other hand, although ground station observation can reflect local real climate state most, global observation network distribution is very uneven (land surface is dense, ocean is sparse, northern hemisphere is dense, southern hemisphere is sparse), and historical data commonly has a large number of missing measurement, intermittent and various quality problems. The trend statistics and risk assessment are directly carried out based on the incomplete and sparse original sequence, and in a data scarcity area (such as Africa in a large background of continuously aggravated global climate change, the heat stress risk assessment becomes a key support for guaranteeing public health, standardizing outdoor operation activities and formulating scientific climate adaptation policies, the wet-bulb black-bulb temperature (WBGT) is used as a composite index for comprehensively considering multiple factors such as air temperature, humidity and the like, and by virtue of the accurate representation capability of the wet-bulb black-bulb temperature (WBGT) on environmental heat load, the wet-bulb black-bulb temperature is established by an international organization (ISO) as a core parameter of the heat stress assessment, and a long-term evolution rule of the wet-bulb black-bulb temperature is a core foundation for developing regional and even global heat risk assessment. Therefore, the space-time reliability of the evaluation result is severely limited, and the following two aspects are embodied: on the one hand, although the weather re-analysis data has the advantages of continuous space-time coverage and complete data sequence, the weather re-analysis data is essentially a product obtained by fusing the global sparse observation data by a numerical forecasting mode and a data assimilation system. In the data generation process, systematic deviation existing in a numerical forecasting mode, iterative updating of an assimilation scheme and non-uniformity change existing in a global observation system in different periods are introduced into analysis data in the form of 'non-natural climate signals', so that inherent uncertainty exists in the restoration accuracy of the real climate state, and the reliability of WBGT evaluation conclusion based on the data is directly influenced. On the other hand, although the ground meteorological station observation data can most directly reflect local real climate conditions, the global meteorological observation network has obvious imbalance in distribution, land area observation stations are dense, ocean areas are extremely sparse, the distribution density of northern h