CN-122021319-A - Daily-scale runoff prediction method based on feature screening and interpretable analysis
Abstract
The invention discloses a daily-scale runoff prediction method based on feature screening and interpretable analysis, and belongs to the field of hydrology and water resources. Establishing a SWAT model based on the river basin space data, the meteorological data and the hydrologic data, performing sensitivity analysis and calibration on hydrologic parameters, outputting related hydrologic process variables, screening input factors through meteorological elements and SWAT output variables by adopting a feature selection method combining Spearman rank correlation analysis and random forests, designing different feature input sets, constructing BiLSTM models, optimizing key super parameters by a Bayesian optimization algorithm, evaluating different model performances by adopting common hydrologic statistical indexes, analyzing simulation performance under extreme runoff conditions, introducing a SHAP interpretability analysis method, and deeply analyzing a behavior mechanism of a coupling model. The invention combines the advantages of a physical mechanism model and a deep learning model, and realizes the high-precision simulation of the runoff process and extreme runoffs.
Inventors
- JIN LINA
- Jia Xufan
- JIANG ZHIQIANG
- WAN LI
- WANG JINGYI
- LIANG YONGZHEN
- LI ZHIJIN
- ZHANG CHI
- LUO ZHIMIN
- LU QIANG
Assignees
- 华中科技大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260202
Claims (8)
- 1. The daily-scale runoff prediction method based on feature screening and interpretable analysis is characterized by comprising the following steps of: S1, acquiring space data such as river basin topography, land utilization, soil and the like and daily meteorological and hydrological observation data, and performing outlier processing, missing value filling, projection unification and resampling pretreatment to form a basic data set; s2, carrying out Hydrographic Response Unit (HRU) division based on the river basin topography, setting model time periods, sub-river basin and HRU parameters, completing model basic parameter initialization, and constructing a SWAT model; S3, carrying out sensitivity analysis and parameter calibration on hydrological parameters of the SWAT model, and outputting a daily-scale runoff process and related hydrological process variables; S4, based on the meteorological elements in the step S1 and SWAT output variables in the step S3, carrying out feature screening according to a feature selection method combining Spearman rank correlation analysis and random forest RF to form an optimal feature input set; s5, constructing different input sets based on the characteristic variables in the steps S1, S3 and S4, inputting the different input sets into a two-way long-short-term memory network BiLSTM model for training, adopting a Bayes optimization BO algorithm to optimize the model super-parameters, and outputting a daily-scale runoff prediction result; S6, evaluating the performance of each model by adopting a correlation coefficient, a deterministic coefficient and a root mean square error index, and analyzing the polar runoff simulation capacity by combining a flow duration curve; S7, carrying out interpretability analysis on the coupling model in the step S5 by adopting a SHAP method, and using the interpretability analysis to interpret model output and identify key driving factors.
- 2. The method for predicting daily-scale runoff based on feature screening and interpretable analysis according to claim 1, wherein the meteorological observation data in S1 includes precipitation PRE, maximum temperature Tmax, minimum temperature Tmin, relative humidity RH, wind speed WIN, and sunlight duration SD.
- 3. The method of claim 1, wherein the hydrologic process variables in S3 include potential evaporative PET, actual evaporative ET, soil moisture SW, leakage PERC, surface runoff SURQ, lateral flow LATQ, simulated runoff Q sim .
- 4. The method for predicting daily-scale runoff based on feature screening and interpretable analysis of claim 1, wherein S4 comprises, S41, reserving key features with the accumulation percentage smaller than 0.95 through feature importance scoring of an RF algorithm; s42, calculating a Spearman rank correlation coefficient of the reserved key features, removing features with the correlation higher than a preset threshold and low importance, and accordingly identifying redundant variables to form an optimal input feature set.
- 5. The method for predicting daily-scale runoff based on feature screening and interpretable analysis according to claim 1 or 4, wherein the RF algorithm adopts a replacement importance method based on out-of-bag samples OOB, and the importance is quantified by disturbing feature values and measuring error increments for a plurality of times, wherein the importance score is calculated as follows: ; wherein FI j represents the importance of feature j, R represents the number of repetitions; for the out-of-bag OOB error after the r-th permutation of feature j, Reference OOB error for the original data; To identify key features, the cumulative percentage CP of the kth feature is calculated: ; Wherein FI i represents the importance of the ith feature after the sorting from the big to the small of FI, and m is the total number of features.
- 6. The method for prediction of daily-scale runoff based on feature screening and interpretable analysis of claim 4, wherein the Spearman rank correlation coefficient is defined as: ; wherein n is the number of samples, r k,i 、r k,j respectively expresses the rank of the ith and jth variable values in the variable sample sequence in the kth sample; 、 Respectively representing the average value of the ith variable rank sequence and the jth variable rank sequence; the Spearman rank correlation coefficient of the ith and jth variables is represented by [ -1, 1], the closer the value is + -1, the stronger the correlation is represented by the value, the closer the value is 0, the weaker the correlation is represented by the value, the correlation threshold is set to be 0.75, and the variables exceeding the threshold are regarded as redundant and removed.
- 7. The method for predicting daily-scale runoff based on feature screening and interpretable analysis according to claim 1, wherein the different input sets in S5 include an input set containing only meteorological elements, a full-feature input set of meteorological elements and output variables of a SWAT model, and an optimal feature input set obtained by the screening in S4 on the basis of the full features.
- 8. The method of claim 1, wherein the SHAP method in S7 is an interpretable analysis tool based on Shapley values, which aims to quantify the contribution of each feature to the output of a complex machine learning model, wherein Shapley values are originally derived from collaborative game theory and represent the average marginal contribution of each participant to the total benefit in different combinations: ; wherein M is the total number of features, N= {1,2,., M }, S is any feature subset that does not contain features i, v (S) is the model output when only subset S is used; the higher the absolute value of the contribution of feature i to the predicted shape, the greater the impact of the corresponding feature.
Description
Daily-scale runoff prediction method based on feature screening and interpretable analysis Technical Field The invention belongs to the field of hydrology and water resources, and particularly relates to a daily-scale runoff prediction method based on feature screening and interpretable analysis. Background Surface runoff is an important component of water circulation and is closely related to the amount of water resources and the space-time distribution of the water resources. The accurate prediction of the basin runoff process has important significance for flood control, disaster reduction and water resource management, and provides a scientific basis for deepening basin hydrologic theory research. However, in the context of global warming and increased human activity, runoffs exhibit non-linear, stochastic, and non-stationary characteristics, resulting in reduced applicability and reliability of conventional runoff prediction models. How to further improve the prediction precision is still a hot spot and a difficult problem in the current hydrologic prediction field. The current runoff prediction method is mainly divided into a physical hydrologic model and a data driving model. The hydrologic model is based on a physical mechanism or a conceptualization process, has definite hydrologic meaning, is easily influenced by uncertainty of parameters and boundary conditions, can learn and capture complex nonlinear relations from a large amount of data, has simple modeling and higher precision, lacks physical mechanism constraint, and has the defects of weak black box characteristic and generalization capability and the like. In recent years, there have been studies on coupling two types of models, such as replacing a part of sub-modules in a hydrologic model with a machine learning model, training a machine learning model to simulate runoff using a hydrologic model output, predicting a residual error of the hydrologic model by the machine learning model, correcting the output, and the like. However, due to the influence of the model structure and the coupling mode, the existing coupling model still faces the problems of error accumulation, extreme adaptability deficiency, poor interpretation and the like. Therefore, further exploration of more reasonable and robust runoff prediction models is needed to solve the defects of the prior art in terms of accuracy, calculation efficiency and interpretability, and provide more reliable technical support for applications such as runoff prediction in river basins, flood control and disaster reduction, water resource management and the like. Disclosure of Invention Aiming at the defects or improvement demands of the prior art, the invention provides a river basin daily-scale runoff prediction method with coupling of a physical mechanism and deep learning, which takes the deep learning as a post-processor of a physical hydrologic model, improves the model precision and reduces uncertainty, combines feature screening and SHAP interpretation analysis, improves the stability and the interpretability of simulation under extreme situations, and provides reliable daily-scale runoff process basis for reservoir dispatching and river basin water resource management. The technical scheme adopted for solving the technical problems is that the daily-scale runoff prediction method based on feature screening and interpretable analysis comprises the following steps: S1, spatial data such as river basin topography, land utilization, soil and the like and daily meteorological and hydrological observation data are obtained, and preprocessing such as outlier processing, missing value filling, projection unification, resampling and the like is carried out to form a basic data set. S2, performing Hydrological Response Unit (HRU) division based on the river basin topography, setting model time periods, sub-river basin and HRU parameters, completing model basic parameter initialization, and constructing a SWAT model. S3, carrying out sensitivity analysis and parameter calibration on the hydrological parameters of the SWAT model, and outputting a daily-scale runoff process and related hydrological process variables. S4, based on the meteorological elements in the step S1 and SWAT output variables in the step S3, feature screening is carried out according to a feature selection method combining Spearman rank correlation analysis and Random Forest (RF), and an optimal feature input set is formed. S5, constructing different input sets based on the characteristic variables in the steps S1, S3 and S4, inputting the different input sets into a long-short-term memory network (BiLSTM) model for training, optimizing the model super-parameters by adopting a Bayes Optimization (BO) algorithm, and outputting a daily-scale runoff prediction result. S6, evaluating the performance of each model by adopting indexes such as a correlation coefficient, a deterministic coefficient, a root mean square error and the