CN-121981407-A - Water ecological environment index level prediction method based on order regression stacking
Abstract
The invention discloses a water ecological environment index grade prediction method based on order regression stacking, which comprises the following steps of S1, collecting water ecological environment indexes to be predicted and driving factor data thereof, identifying abnormal values, marking the abnormal values as missing values, interpolating missing values or eliminating observation points containing missing values, S2, dividing the water ecological environment indexes into different grades according to threshold values of the water ecological environment indexes to be predicted, S3, selecting an order machine learning method as a base learner, using an order generalized linear regression model as a meta learner, establishing a response relation between the water ecological environment index grade to be predicted and the driving factor by using order regression stacking, realizing integrated prediction of the target water ecological environment index grade, S4, evaluating the effect of the order regression stacking model by using an integrated multidimensional index evaluation system, and carrying out grade prediction on the water ecological environment indexes by using the optimized order regression stacking model.
Inventors
- HUO SHOULIANG
- ZHANG HANXIAO
Assignees
- 北京师范大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260407
Claims (5)
- 1. The water ecological environment index grade prediction method based on order regression stacking is characterized by comprising the following steps of: s1, collecting water ecological environment indexes to be predicted and driving factor data thereof, identifying abnormal values, marking the abnormal values as missing values, and then interpolating the missing values or eliminating observation points containing the missing values; S2, dividing the water ecological environment index into different grades according to a threshold value of the water ecological environment index to be predicted; S3, selecting a sequence machine learning method as a base learner, using a sequence generalized linear regression model as a meta learner, and establishing a response relation between the water ecological environment index level to be predicted and the driving factor by adopting sequence regression stacking to realize the integrated prediction of the target water ecological environment index level; S4, evaluating the effect of the sequence regression stacking model by comprehensively adopting an accuracy, consistency and correlation multidimensional index evaluation system, improving the prediction performance of the sequence response variable by adopting a Bayesian multi-objective optimization model, and carrying out grade prediction on the water ecological environment index by utilizing the optimized sequence regression stacking model.
- 2. The method for predicting the water ecological environment index level based on sequential regression stacking of claim 1 wherein the specific process of step S1 is: S11, decomposing trend items, period items and residual items of the time sequence with the periodic characteristic by adopting an STL method, identifying abnormal values of the residual items by adopting an isolated forest method, and taking the sum of the trend items and the period items obtained by decomposing the STL method at the time point of the abnormal values as an interpolation value; S12, for a time sequence without periodicity, stripping a trend item by using an LOESS method, identifying an abnormal value of a residual error item by using an isolated forest method, and taking a trend item value stripped by using the LOESS method at a missing value time point as an interpolation value; s13, for the space aggregation data, directly adopting an isolated forest method to identify the abnormal value, marking the abnormal value as a missing value, and eliminating the observation point containing the missing value.
- 3. The method for predicting the water ecological environment index level based on sequential regression stacking of claim 1 wherein the specific process of step S3 is: s31, selecting order classification and regression trees, ordered random forests and order forests as basic learners, and respectively establishing an ordered prediction model between the water ecological environment index level and the driving factors to obtain the prediction results of the basic learners; The order classification and regression tree converts the ordered multi-classification task into a series of binary division problems respecting the class order, and the splitting process of the tree only allows the division of the shapes Y which are less than or equal to Y and Y > Y, wherein Y is an ordered class variable; the ordered random forest converts the ordered multi-classification task into a series of binary classification problems, if the response variable has N grades, constructing N-1 binary variables to describe the cumulative probability, respectively training independent random forest classifiers, obtaining the probability of each original class by the cumulative probability difference, calculating a formula, , wherein, N is the category number in the ordered category; Is the first Final estimated probability that the individual observation samples belong to class n; Is the first The estimated cumulative probability that each observation sample belongs to the category less than or equal to n; Is the first The estimated cumulative probability that each observation sample belongs to the category less than or equal to n-1; The ordered forest presumes that the observed ordered categories are derived from potential continuous variables, diversity combining is achieved through constructing candidates, the candidates are mapped to the potential variables to achieve classification, the optimal score is selected by maximizing the performance outside the bag, the score is utilized to train and aggregate regression trees to obtain continuous predicted values, and finally the nearest ordered categories are mapped back; S32, selecting an order logistic regression model in the order generalized linear regression model as a meta learner, taking a prediction result of the basic learner as an input variable, taking an actual level of the water ecological environment index as an output variable, and establishing an order regression stacking model to realize the integrated prediction of the level of the target water ecological environment index.
- 4. The method for predicting the water environmental indicator level based on sequential regression stacking of claim 3 wherein the specific process of step S4 is: s41, constructing a comprehensive index evaluation system, and evaluating performance of the sequential regression stacking model from a multidimensional angle: In the accuracy dimension, the degree of coincidence between the classification result of the regression stacking model and the true grade is measured by adopting the prediction accuracy and the macro average AUC, and the accuracy formula is as follows: , wherein, The accuracy is achieved; is the total number of grades; grade number; To at the first The number of data instances that are correctly predicted in level; is the total number of samples; the macro average AUC formula is: , , , , wherein, Macro average AUC value; is the total number of categories; Numbering the categories; Is the first A classification AUC for each class and the remaining classes; operating the area under the characteristic curve for the receiver; as a function of the true case rate with false case rate as an argument; A differential term for false positive rate; Is the true example rate; is the false positive rate; is the real number of cases; is the number of false negative examples; for the number of false positive examples, Is the true negative example number; in the consistency dimension, the consistency of the predicted result and the actual observation in the rank order is measured by adopting Cohen's Kappa coefficient, and the formula is as follows: , , , wherein, The consistency coefficient is used for measuring the consistency of the predicted result and the actual observed result; To observe the consistency; is the consistency rate in the random case; Is the first The sum of the number of observation samples, Is the first A sum of the number of predicted samples; in the correlation dimension, the correlation degree of the prediction grade and the real grade in the sequence is described by adopting Kendall rank correlation coefficient, and the formula is as follows: , wherein, The Kendall rank correlation coefficient is used for representing the correlation degree of the prediction grade and the real grade in the sequence; d is the sample logarithm of which the predicted ordering is opposite to the real ordering direction; S42, based on a multidimensional index evaluation system, adopting a Bayes multi-objective optimization adjustment order to regress the super parameters of the stacking model, and taking accuracy, macro average AUC, kappa coefficient and Kendall' S Tau as combined optimization targets to obtain balanced super parameter combinations.
- 5. The method for predicting the water environmental indicator level based on sequential regression stacking of claim 4 wherein the steps of step S42 are as follows: S421, constructing a multi-objective utility function: , wherein, Super-parametric combinations for sequential regression stacking models; Is super parameter combination A corresponding comprehensive utility value; 、 、 And Respectively the weight coefficients; The normalized accuracy, macro average AUC, kappa coefficient and Kendall's Tau correlation coefficient are respectively; s422, taking the utility function as an objective function of Bayesian optimization, and randomly sampling a plurality of super-parameter combinations in a super-parameter space , ,..., Substitution order regression stacking model training and corresponding calculation , ,..., Based on the initialized sample set, utilizing Gaussian process pair Modeling posterior distribution of (2), selecting new candidate hyper-parameters by acquisition function Calculating a utility value, updating a history sample set and posterior distribution, and performing loop iteration until a preset iteration number threshold is reached, so as to obtain a super-parameter combination with the maximum comprehensive utility value; S423, training a sequence regression stack model by adopting the super-parameter combination with the maximum comprehensive utility value, and carrying out level prediction on the water ecological environment index by using the optimized sequence regression stack model.
Description
Water ecological environment index level prediction method based on order regression stacking Technical Field The invention belongs to the technical field of water quality prediction, and particularly relates to a water ecological environment index level prediction method based on sequential regression stacking. Background The water ecological system such as lakes, reservoirs and rivers is easy to be subjected to eutrophication, cyanobacteria bloom and ecological degradation due to the superposition and influence of factors such as excessive input of nutrient salts, climate change and human activity interference. Therefore, the level prediction of key water ecological environment indexes such as chlorophyll a, total nitrogen, total phosphorus, dissolved oxygen and the like is carried out, and the method has become an important technical requirement for risk prevention and control and fine treatment. The existing methods such as multiple linear regression, support vector machines, random forests and XGBoost perform well in unordered classification and regression tasks, but commonly neglect the ordered characteristics of the index level, so that the predicted result has defects in the aspects of consistency and interpretation. Meanwhile, when a single model processes complex nonlinear, high-dimensional, unbalanced and noise data, generalization capability is limited, and prediction accuracy and stability are difficult to ensure. Although the integrated learning method (such as Stacking, boosting, bagging) is gradually applied to environmental data modeling in recent years and improves the prediction effect to a certain extent, the research on the stacking method of ordered multi-classification scenes is still relatively deficient, and a systematic framework capable of taking accuracy, consistency and correlation into consideration is lacking. Therefore, a new method for integrating ordered features, integrated learning and multi-objective optimization is needed to achieve comprehensive improvement and scientific early warning of the level prediction of the water ecological environment index. Disclosure of Invention In order to solve the problems, the invention provides a water ecological environment index level prediction method based on order regression stacking, which is direct and easy to use, integrates multiple models by integrating order information and multiple targets, realizes efficient prediction of the water ecological environment index level, and ensures balanced performance of results in accuracy, consistency and correlation. In order to achieve the above purpose, the present invention adopts the following technical scheme: a water ecological environment index grade prediction method based on order regression stacking comprises the following steps: s1, collecting water ecological environment indexes to be predicted and driving factor data thereof, identifying abnormal values, marking the abnormal values as missing values, and then interpolating the missing values or eliminating observation points containing the missing values; S2, dividing the water ecological environment index into different grades according to a threshold value of the water ecological environment index to be predicted; S3, selecting a sequence machine learning method as a base learner, using a sequence generalized linear regression model as a meta learner, and establishing a response relation between the water ecological environment index level to be predicted and the driving factor by adopting sequence regression stacking to realize the integrated prediction of the target water ecological environment index level; S4, evaluating the effect of the sequence regression stacking model by comprehensively adopting an accuracy, consistency and correlation multidimensional index evaluation system, improving the prediction performance of the sequence response variable by adopting a Bayesian multi-objective optimization model, and carrying out grade prediction on the water ecological environment index by utilizing the optimized sequence regression stacking model. Preferably, the specific process of step S1 is: S11, decomposing trend items, period items and residual items of the time sequence with the periodic characteristic by adopting an STL method, identifying abnormal values of the residual items by adopting an isolated forest method, and taking the sum of the trend items and the period items obtained by decomposing the STL method at the time point of the abnormal values as an interpolation value; S12, for a time sequence without periodicity, stripping a trend item by using an LOESS method, identifying an abnormal value of a residual error item by using an isolated forest method, and taking a trend item value stripped by using the LOESS method at a missing value time point as an interpolation value; s13, for the space aggregation data, directly adopting an isolated forest method to identify the abnormal value, marking the abnormal val