CN-122022075-A - Runoff prediction method, device and medium integrating physical priori and machine learning
Abstract
The invention discloses a runoff prediction method, a device and a medium for combining physical priori and machine learning. The runoff prediction method comprises the steps of taking a basin basic unit as a minimum space unit, running multiple types of physical/conceptual hydrologic models in parallel, extracting physical priori factors from output of each model to construct a physical priori factor library, performing systematic characteristic engineering on original time sequence weather, remote sensing, topography and priori factors, training an integrated interpretable machine learning model through combined characteristics, calculating PBIAS for representing water balance after training to verify physical consistency, calculating factor comprehensive importance scores through an integrated interpretation frame, extracting response thresholds, and then partitioning the basin basic unit based on contribution vector clustering to realize a method and a system for adaptively selecting and matching partitions and runoff prediction according to dominant factors. The implementation result shows that the method can improve the simulation precision of the daily-scale runoff under various climates and terrains and enhance the physical interpretability and engineering operability of the model.
Inventors
- SHEN XIAOXUAN
- ZHANG KE
- LIU JIE
- JIANG YUANAN
- YU BIXIN
- HUO DA
Assignees
- 河海大学
- 新疆维吾尔自治区气象台
Dates
- Publication Date
- 20260512
- Application Date
- 20260410
Claims (10)
- 1. The runoff prediction method integrating physical priori and machine learning is characterized by comprising the following steps of: s1, dividing a target river basin into river basin basic units, and collecting multi-source data of each river basin basic unit; S2, based on the collected multi-source data, respectively operating a plurality of different types of hydrologic models for each basin basic unit to extract candidate physical priori factors, wherein the different types of hydrologic models comprise a physical process hydrologic model, a semi-physical semi-empirical hydrologic model and a conceptual hydrologic model; S3, single-factor disturbance sensitivity analysis is carried out on the extracted candidate physical prior factors, and a core physical prior factor is screened out from the candidate physical prior factors to construct a combined training feature set; S4, constructing a Stacking integrated model, taking the combined training feature set as input, performing daily-scale runoff prediction training on the Stacking integrated model, and calculating a percentage deviation index PBIAS after training is completed so as to verify the water quantity balance physical consistency of a prediction result; S5, performing an interpretive analysis on the trained Stacking integrated model by adopting an integrated interpretation frame, calculating the comprehensive importance scores of the feature factors in the combined training feature set, extracting the response threshold values of the feature factors with the comprehensive importance scores higher than a preset threshold value in the combined training feature set, and constructing HRU-level contribution vectors for each basin basic unit based on the comprehensive importance scores, the response threshold values and the interpretive analysis results of single factor contribution intensities; s6, based on HRU-level contribution vectors of the basic units of the drainage basins, clustering and partitioning the target drainage basins by using the K-means method; S7, evaluating the inside of each partition after clustering the partitions in a candidate model pool formed by the various hydrologic models of different types operated in the step S2 and the Stacking integrated model trained in the step S4, determining a preferred model set of the partition, and outputting a daily-scale runoff simulation result according to the preferred model set determined by each partition.
- 2. The runoff prediction method according to claim 1, wherein in step S2, based on the collected multisource data, a plurality of different types of hydrologic models are respectively operated for each basin basic unit to extract candidate physical prior factors, and the method comprises the steps of uniformly processing time-varying meteorological hydrologic continuous variables in the multisource data into a time sequence of day scales, preprocessing static topography continuous features and category features in the multisource data, operating selected hydrologic models for each basin basic unit in a preset time range, extracting hydrologic process variables from output of each hydrologic model as candidate physical prior factors during operation of the hydrologic models, constructing a multiscale sliding window statistical feature by taking the time-varying meteorological hydrologic continuous variables and the candidate physical prior factors as continuous input variables, wherein the continuous input variables comprise precipitation Latent evapotranspiration The highest air temperature of the day Minimum daily air temperature Average daily air temperature Dew point temperature Wind speed Solar radiation Radiation of heat Relative humidity of Clean radiation 、 And the extracted candidate physical prior factors; For any continuous input variable, calculate the past separately Sliding window statistical characteristics in the daily window, wherein the sliding window statistical characteristics comprise an accumulated value, a mean value, a standard deviation, a maximum value, a minimum value and a linear slope, and W takes values of 5, 15 and 30; and building a precipitation hysteresis accumulation feature for precipitation variables in the continuous input variables.
- 3. The runoff prediction method according to claim 2, wherein in step S3, the method for performing single factor disturbance sensitivity analysis on the extracted candidate physical prior factors and screening the core physical prior factors from the candidate physical prior factors to construct the joint training feature set comprises the following steps: Applying single factor disturbance of +/-delta% to each candidate physical prior factor, and calculating the relative variable delta NSE between the baseline NSE of the corresponding hydrological model of the candidate physical prior factor under the reference running condition and the NSE after disturbance is applied, wherein when delta NSE is more than or equal to 5%, the candidate physical prior factor is judged to be a core physical prior factor, and delta is 5% -20%; according to the determined core physical prior factors, constructing statistical derivative features and ratio derivative features; Performing independent thermal coding on the class features to obtain class coding features, and performing Z-score standardization processing on continuous variables of weather time sequences, continuous features of static terrains, candidate physical prior factors, statistical derivative features of the candidate physical prior factors and ratio derivative features; The characteristic vector of the combined training is constructed by adopting the characteristic after the standardized processing, and a combined training characteristic set is formed by the combined training characteristic vector, wherein the combined training characteristic vector comprises a time-varying meteorological hydrologic continuous variable and a sliding window statistical characteristic thereof, a precipitation hysteresis accumulation characteristic, a static terrain continuous characteristic, a category coding characteristic, a determined core physical priori factor, a statistical derivative characteristic constructed by the core physical priori factor and a specific derivative characteristic in multi-source data.
- 4. A runoff prediction method according to claim 3 is characterized in that in step S4, the Stacking integrated model comprises a base learner and a meta learner, the base learner comprises two or more than two of XGBoost, catBoost, lightGBM and random forests, the meta learner is linear regression or Lasso, parameters of the base learner and the meta learner are optimized by K-fold cross validation, and early stop strategies are adopted for learning devices supporting iterative training to prevent overfitting, wherein K is 3-10.
- 5. The runoff prediction method according to claim 4, wherein in step S4, the calculation method of the percentage deviation index PBIAS is as follows: Wherein, the Respectively the first And the daily runoff predicted value and the observed value are n which is the total number of daily scale samples participating in PBIAS calculation.
- 6. The runoff prediction method according to claim 1, wherein in step S5, the method for extracting the response threshold of the feature factors with the combined importance scores higher than the preset threshold in the combined training feature set is as follows: Performing smooth spline fitting on the PDP curve of the characteristic factors, calculating the absolute value of the second derivative of the PDP curve at each characteristic value after fitting, taking the 95 th percentile value of the absolute value sequence of the second derivative as a curvature screening threshold, determining the characteristic value position corresponding to the curvature screening threshold as a first candidate response threshold, determining 25% quantile value and 75% quantile value of the characteristic factor sample value distribution as a second candidate response threshold, and combining the first candidate response threshold and the second candidate response threshold to form a candidate response threshold set.
- 7. The method of claim 6, wherein in step S5, the overall importance score of each feature factor in the joint training feature set is: Wherein, the Set th for the joint training feature The overall importance score of the individual feature factors, 、 For SHAP and PFI values normalized to [0,1] respectively, Is a factor of Is used for the PDP fitting goodness of the (4), Weight for ICE curve consistency measurement Automatically determining and based on a verification set 。
- 8. A method of predicting runoff according to any one of claims 1-7, further comprising, after step S7: S8, performing online precision feedback according to preset performance triggering conditions, and performing iterative optimization of rescreening, retraining and partition/model adaptation when triggering; In the step S8, the online precision triggering condition is that automatic iterative optimization is triggered when the NSE < gamma or RMSE of any month exceeds p% relative to a base line in M months continuously, the optimization is completed, the verification is carried out through at least 30 days of actual measurement data, the verification period NSE is more than or equal to 0.75 side of curable optimization result, p=20%, gamma=0.7, and M=2.
- 9. A runoff predicting device comprising a processor and a memory, wherein the memory has stored therein a program or instructions that are loaded and executed by the processor to implement the steps of the runoff predicting method according to any one of claims 1 to 8.
- 10. A computer readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the runoff prediction method according to any one of claims 1 to 8.
Description
Runoff prediction method, device and medium integrating physical priori and machine learning Technical Field The invention belongs to the field of hydrology and water resource engineering, relates to a basin runoff prediction, interpretable machine learning and zoning modeling technology, and particularly relates to a method and a system for inputting multi-class physical/conceptual hydrologic model output as physical priori and meteorological and underlying factors into a machine learning model together, quantitatively extracting factor contribution and response threshold values based on joint analysis of an integrated interpretation framework (SHAP, PDP, ICE, PFI, abbreviated as HyPS-IF), and then partitioning a basin basic unit (HRU) based on contribution vector clustering (K-means) to realize self-adaptive matching and dynamic simulation according to dominant factors. Background The traditional process type hydrologic model (for example SWAT, TOPMODEL) has good physical interpretability, but has the problems of complex parameter calibration, high data demand, difficult cross-domain migration and the like, and the pure data driving method (comprising deep learning and traditional machine learning) can obtain higher fitting precision under many conditions, but often neglects physical prior, lacks interpretability and is easy to generate physical unreasonable prediction under small sample or extrapolation conditions. Existing research attempts to integrate physical models with data-driven models, but generally face how to systematically extract and screen "verifiable" physical prior factors, how to introduce physical consistency checks under the assurance of training efficiency, and how to convert interpretative results into engineering partition and model adaptation rules. Meanwhile, the existing runoff prediction scheme depends on a single physical model or a single data driving model, is difficult to adaptively match with differences of production and convergence mechanisms under different climatic regions, different terrains and different underlying conditions, is easy to solve the problems of simulation precision reduction, insufficient physical consistency, limited regional applicability and the like in a complex river basin or a region with larger climatic gradient change, and is difficult to meet the actual requirements of engineering, long-sequence and multi-partition collaborative runoff prediction. Disclosure of Invention The invention aims to provide a runoff prediction method, a device and a storage medium for improving prediction precision and integrating physical priori and machine learning. In order to solve the technical problems, the invention provides the following technical scheme: the invention firstly provides a runoff prediction method integrating physical priori and machine learning, which comprises the following steps: s1, dividing a target river basin into river basin basic units, and collecting multi-source data of each river basin basic unit; S2, based on the collected multi-source data, respectively operating a plurality of different types of hydrologic models for each basin basic unit to extract candidate physical priori factors, wherein the different types of hydrologic models comprise a physical process hydrologic model, a semi-physical semi-empirical hydrologic model and a conceptual hydrologic model; S3, single-factor disturbance sensitivity analysis is carried out on the extracted candidate physical prior factors, and a core physical prior factor is screened out from the candidate physical prior factors to construct a combined training feature set; S4, constructing a Stacking integrated model, taking the combined training feature set as input, performing daily-scale runoff prediction training on the Stacking integrated model, and calculating a percentage deviation index PBIAS after training is completed so as to verify the water quantity balance physical consistency of a prediction result; S5, performing an interpretive analysis on the trained Stacking integrated model by adopting an integrated interpretation frame, calculating the comprehensive importance scores of the feature factors in the combined training feature set, extracting the response threshold values of the feature factors with the comprehensive importance scores higher than a preset threshold value in the combined training feature set, and constructing HRU-level contribution vectors for each basin basic unit based on the comprehensive importance scores, the response threshold values and the interpretive analysis results of single factor contribution intensities; s6, based on HRU-level contribution vectors of the basic units of the drainage basins, clustering and partitioning the target drainage basins by using the K-means method; S7, evaluating the inside of each partition after clustering the partitions in a candidate model pool formed by the various hydrologic models of different types operated in the step