CN-121983167-A - Dissolved oxygen concentration prediction method and device based on machine learning
Abstract
The invention relates to the technical field of dissolved oxygen concentration prediction, in particular to a dissolved oxygen concentration prediction method, a device, computer equipment and a storage medium based on machine learning, wherein the method comprises the steps of obtaining a dissolved oxygen concentration daily scale hysteresis characteristic time sequence and a water quality parameter daily scale hysteresis characteristic time sequence of a sample area; combining and dividing training sets according to the time series of the daily scale lag characteristics of the dissolved oxygen concentration and the time series of the daily scale lag characteristics of the water quality parameters to construct a plurality of training sets, carrying out model construction and model training according to the plurality of training sets by adopting a machine learning method to obtain a predicted dissolved oxygen concentration model, obtaining the time series of the daily scale lag characteristics of the water quality parameters of a target area, inputting the time series of the daily scale lag characteristics of the water quality parameters of the target area into the predicted dissolved oxygen concentration model to predict the dissolved oxygen concentration, and obtaining a predicted dissolved oxygen concentration result of the target area.
Inventors
- LU YIDA
- CHEN XINGDA
- CHEN SHUISEN
- Ci Qianling
- HAO JIAHUI
- WANG ZHONGYANG
- LI DAN
Assignees
- 广东省科学院广州地理研究所
Dates
- Publication Date
- 20260505
- Application Date
- 20251230
Claims (10)
- 1. The dissolved oxygen concentration prediction method based on machine learning is characterized by comprising the following steps of: obtaining a dissolved oxygen concentration day-scale lag characteristic time sequence and a water quality parameter day-scale lag characteristic time sequence of a sample area, wherein the dissolved oxygen concentration day-scale lag characteristic time sequence comprises continuous dissolved oxygen concentration data for a plurality of days; combining and dividing training sets according to the dissolved oxygen concentration daily scale lag characteristic time sequence and the water quality parameter daily scale lag characteristic time sequence to construct a plurality of training sets; obtaining a water quality parameter daily scale lag characteristic time sequence of a target area, and inputting the water quality parameter daily scale lag characteristic time sequence of the target area into the dissolved oxygen concentration prediction model to predict the dissolved oxygen concentration, so as to obtain a dissolved oxygen concentration prediction result of the target area.
- 2. The machine learning-based dissolved oxygen concentration prediction method of claim 1, wherein the dissolved oxygen concentration prediction model comprises a first dissolved oxygen concentration prediction model and a second dissolved oxygen concentration prediction model; the method for obtaining the dissolved oxygen concentration prediction model by adopting a machine learning method comprises the following steps of: Calculating standard deviation of the dissolved oxygen concentration data of a plurality of days in each training set to obtain the standard deviation of the dissolved oxygen concentration of each training set; classifying and dividing each training set according to the standard deviation of the dissolved oxygen concentration of each training set and a preset dissolved oxygen concentration threshold value to obtain a plurality of first training sets and a plurality of second training sets; inputting a plurality of first training sets into a preset first machine learning model to perform model training, and obtaining a first dissolved oxygen concentration prediction model; And inputting the second training sets into the first machine learning model to perform model training to obtain a second dissolved oxygen concentration prediction model.
- 3. The method for predicting concentration of dissolved oxygen based on machine learning of claim 2, wherein the first machine learning model comprises a plurality of XGBoost learners; inputting a plurality of first training sets into a preset first machine learning model for model training to obtain a first dissolved oxygen concentration prediction model, wherein the method comprises the following steps of: Taking the first samples as input data sets of the current iteration respectively, inputting each input data set into each XGBoost learner of the current iteration to predict the dissolved oxygen concentration, and obtaining the predicted data of the dissolved oxygen concentration of each input data set of the current iteration, wherein the first samples comprise a plurality of water quality parameter data; Training each XGBoost learner of the current iteration according to each input dataset of the current iteration, the dissolved oxygen concentration prediction data of the corresponding input dataset of the current iteration and the dissolved oxygen concentration prediction data of the corresponding input dataset of the previous iteration to obtain each XGBoost learner of the next iteration; And respectively taking the first sample as an input data set of the next iteration, inputting each input data set into each XGBoost learner of the current iteration, and repeatedly executing the dissolved oxygen concentration prediction and model training to obtain the first dissolved oxygen concentration prediction model.
- 4. A machine learning based dissolved oxygen concentration prediction method as claimed in claim 3, wherein the training of each XGBoost learner of the current iteration to obtain each XGBoost learner of the next iteration based on each input dataset of the current iteration, the dissolved oxygen concentration prediction data of the corresponding input dataset of the current iteration, and the dissolved oxygen concentration prediction data of the corresponding input dataset of the previous iteration comprises the steps of: Obtaining loss values of each XGBoost learner of the current iteration according to each input dataset of the current iteration, dissolved oxygen concentration prediction data of a corresponding input dataset of the last iteration and a preset regularization loss function, training each XGBoost learner of the current iteration according to the loss values, and obtaining each XGBoost learner of the next iteration, wherein the regularization loss function is as follows: In the formula, The loss value of XGBoost learners for the t-th iteration, Predicted data for the dissolved oxygen concentration for the corresponding input dataset of the current iteration, Predicted data for the dissolved oxygen concentration for the corresponding input dataset of the last iteration, In order to make a decision tree function, For the input data set of the current iteration, Is a regularization term.
- 5. The machine learning-based dissolved oxygen concentration prediction method of claim 4, wherein the second machine learning model comprises a base learner layer and a meta learning layer, the base learner layer comprising a plurality of sub-dissolved oxygen concentration prediction models; According to a plurality of second training sets, the water quality parameter data is taken as an independent variable, the dissolved oxygen concentration data is taken as an independent variable, a second machine learning model is constructed, a plurality of second training sets are input into the first machine learning model for model training, and a second dissolved oxygen concentration prediction model is obtained, and the method comprises the following steps: Performing model construction according to the second samples to obtain a plurality of sub dissolved oxygen concentration prediction models, wherein the second samples comprise a plurality of water quality parameter data; Respectively inputting a plurality of second training sets into a plurality of sub-dissolved oxygen concentration prediction models to perform model training, and obtaining a plurality of trained sub-dissolved oxygen concentration prediction models and dissolved oxygen concentration prediction data corresponding to each second sample output by the plurality of trained sub-dissolved oxygen concentration prediction models; Performing predictive vector conversion on the dissolved oxygen concentration predictive data corresponding to each second sample output by each trained sub dissolved oxygen concentration predictive model to obtain a dissolved oxygen concentration predictive vector corresponding to each second sample; And carrying out random forest model construction and model training according to the training feature matrix of each second sample and the dissolved oxygen concentration label data to obtain a trained random forest model.
- 6. The method for predicting dissolved oxygen concentration based on machine learning according to claim 5, wherein the sub-dissolved oxygen concentration prediction model is a machine learning model constructed by taking the water quality parameter data as an independent variable and taking the dissolved oxygen concentration data as a dependent variable, and comprises a decision tree model, an extremely random tree model, a gradient lifting model and an extreme gradient lifting model; The decision tree model and the extremely random tree model both adopt a minimized mean square error function as an objective function, wherein the minimized mean square error function is: In the formula, In order to minimize the mean square error of the signal, For the number of second training sets, For the dissolved oxygen concentration prediction data corresponding to the second training set, The dissolved oxygen concentration label data corresponding to the second training set are obtained from a time sequence of the dissolved oxygen concentration daily scale lag characteristic corresponding to the second training set; The gradient lifting model adopts a mean square error function as an objective function, and the extreme gradient lifting model adopts a regularized loss function as the objective function, wherein the mean square error function is as follows: In the formula, In the case of a mean square error, The dissolved oxygen concentration prediction data output by the gradient lifting model for the mth iteration, In order for the rate of learning to be high, The data processing function of the model is lifted for the gradient of the mth iteration.
- 7. The machine learning-based dissolved oxygen concentration prediction method according to claim 6, wherein the random forest model construction and model training based on the training feature matrix and the dissolved oxygen concentration label data of each of the second samples comprises the steps of: performing put-back sampling on the training feature matrix of each second sample to generate a plurality of third samples, wherein the third samples comprise a plurality of candidate features, and the candidate features are dissolved oxygen concentration prediction vectors; Selecting target features as nodes to split according to the feature values of a plurality of candidate features in the candidate feature set of the current node according to a gain maximization principle, stopping splitting when a preset recursion splitting condition is met, constructing each decision tree, combining each decision tree, and constructing the random forest model; And inputting the training feature matrix of each second sample and the dissolved oxygen concentration label data into the random forest model for training to obtain a trained random forest model.
- 8. A dissolved oxygen concentration prediction apparatus based on machine learning, comprising: The system comprises a data acquisition module, a water quality parameter daily scale lag characteristic time sequence and a water quality parameter daily scale lag characteristic time sequence, wherein the data acquisition module is used for acquiring a dissolved oxygen concentration daily scale lag characteristic time sequence and a water quality parameter daily scale lag characteristic time sequence of a sample area, wherein the dissolved oxygen concentration daily scale lag characteristic time sequence comprises continuous dissolved oxygen concentration data for a plurality of days; The model training module is used for combining and dividing training sets according to the dissolved oxygen concentration daily scale lag characteristic time sequence and the water quality parameter daily scale lag characteristic time sequence to construct a plurality of training sets; the dissolved oxygen concentration prediction module is used for obtaining a water quality parameter day-scale lag characteristic time sequence of the target area, inputting the water quality parameter day-scale lag characteristic time sequence of the target area into the dissolved oxygen concentration prediction model for carrying out dissolved oxygen concentration prediction, and obtaining a dissolved oxygen concentration prediction result of the target area.
- 9. A computer device comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the machine learning based dissolved oxygen concentration prediction method of any one of claims 1 to 7 when the computer program is executed.
- 10. A storage medium storing a computer program which, when executed by a processor, implements the steps of the machine learning-based dissolved oxygen concentration prediction method according to any one of claims 1 to 7.
Description
Dissolved oxygen concentration prediction method and device based on machine learning Technical Field The invention relates to the technical field of dissolved oxygen concentration prediction, in particular to a dissolved oxygen concentration prediction method, a device, computer equipment and a storage medium based on machine learning. Background Dissolved oxygen is a key index for evaluating the quality of the surface water environment, and is especially important for the health of estuaries and coastal areas. The low dissolved oxygen bodies flowing into the ocean negatively impact the marine habitat and the offshore environment, while the dramatic changes in coastal river water quality and complex water environmental characteristics make dissolved oxygen concentration prediction a challenging task. Traditional predictive methods water quality models rely on complex physical, chemical and biological process equations, although they model well the diffusion process and spatial distribution of contaminants. But it is complex to construct, computationally expensive, and extremely demanding in terms of input data. Although the conventional statistical method of multiple linear regression can quantify the long-term influence of a single environmental factor on dissolved oxygen, the complex nonlinear relationship existing between the dissolved oxygen and other water quality parameters in the water environment cannot be captured and expressed, so that the prediction performance is limited. Disclosure of Invention Based on the above, the invention aims to provide a dissolved oxygen concentration prediction method, a device, computer equipment and a storage medium based on machine learning, which utilize a dissolved oxygen concentration day-scale lag characteristic time sequence and a water quality parameter day-scale lag characteristic time sequence to carry out model construction and model training, capture and express complex nonlinear relations existing between dissolved oxygen and other water quality parameters in a water environment, and improve the accuracy and efficiency of dissolved oxygen concentration prediction. In a first aspect, an embodiment of the present application provides a method for predicting a concentration of dissolved oxygen based on machine learning, including the steps of: obtaining a dissolved oxygen concentration day-scale lag characteristic time sequence and a water quality parameter day-scale lag characteristic time sequence of a sample area, wherein the dissolved oxygen concentration day-scale lag characteristic time sequence comprises continuous dissolved oxygen concentration data for a plurality of days; combining and dividing training sets according to the dissolved oxygen concentration daily scale lag characteristic time sequence and the water quality parameter daily scale lag characteristic time sequence to construct a plurality of training sets; obtaining a water quality parameter daily scale lag characteristic time sequence of a target area, and inputting the water quality parameter daily scale lag characteristic time sequence of the target area into the dissolved oxygen concentration prediction model to predict the dissolved oxygen concentration, so as to obtain a dissolved oxygen concentration prediction result of the target area. In a second aspect, an embodiment of the present application provides a dissolved oxygen concentration prediction apparatus based on machine learning, including: The system comprises a data acquisition module, a water quality parameter daily scale lag characteristic time sequence and a water quality parameter daily scale lag characteristic time sequence, wherein the data acquisition module is used for acquiring a dissolved oxygen concentration daily scale lag characteristic time sequence and a water quality parameter daily scale lag characteristic time sequence of a sample area, wherein the dissolved oxygen concentration daily scale lag characteristic time sequence comprises continuous dissolved oxygen concentration data for a plurality of days; The model training module is used for combining and dividing training sets according to the dissolved oxygen concentration daily scale lag characteristic time sequence and the water quality parameter daily scale lag characteristic time sequence to construct a plurality of training sets; the dissolved oxygen concentration prediction module is used for obtaining a water quality parameter day-scale lag characteristic time sequence of the target area, inputting the water quality parameter day-scale lag characteristic time sequence of the target area into the dissolved oxygen concentration prediction model for carrying out dissolved oxygen concentration prediction, and obtaining a dissolved oxygen concentration prediction result of the target area. In a third aspect, an embodiment of the present application provides a computer device, including a processor, a memory, and a computer program stored in the memory