CN-121981302-A - Two-stage machine learning algal bloom prediction method based on stationary satellite
Abstract
The invention discloses a two-stage machine learning algal bloom prediction method based on stationary satellites, which comprises the steps of acquiring hour-by-hour environmental parameter data, acquiring remote sensing inverted Chla data by using the stationary satellite data, determining environmental characteristics of an optimal time interval based on correlation analysis of the environmental parameter data and the Chla data, expanding a sample database by the environmental characteristics and the Chla data at different moments, training a first-stage machine learning model, thereby realizing missing Chla data prediction based on the environmental characteristics, acquiring continuous Chla concentration data, determining Chla concentration characteristics of the optimal time interval based on the continuous Chla concentration data by adopting the same method, and realizing second-stage machine model training by combining the Chla concentration characteristics, the environmental characteristics and the continuous Chla concentration data, thereby predicting subsequent Chla concentration change of an eutrophicated lake based on the environmental characteristics and the Chla concentration characteristics. The method can realize the prediction of the average value of the full lake Chla with daily scale, and provides technical support for the treatment of eutrophic lakes and the prevention and control of algal bloom.
Inventors
- HUANG ZEHUI
- MA RONGHUA
- XUE KUN
- HU MINQI
Assignees
- 中国科学院南京地理与湖泊研究所
Dates
- Publication Date
- 20260505
- Application Date
- 20260122
Claims (10)
- 1. The two-stage machine learning algal bloom prediction method based on the stationary satellite is characterized by comprising the following steps of: obtaining inverted Chla concentration data by using stationary satellite remote sensing data; acquiring hour-by-hour data of a plurality of environmental parameters, and calculating statistical values of the environmental parameter data in different time periods before a date to be predicted as alternative parameters; Carrying out correlation analysis on each environment parameter, the alternative parameter and the Chla concentration of the date to be predicted, and obtaining at least one alternative parameter with highest correlation as a characteristic variable of the environment parameter; Taking characteristic variables of environmental parameters as input, and predicting Chla concentration data by using a trained first machine learning model on a missing date in the Chla concentration data to obtain a continuous daily scale Chla data set; Calculating statistical values of environmental parameter data in different time periods before a date to be predicted from a daily scale Chla data set, performing correlation analysis on the statistical values as alternative parameters and Chla concentration of the date to be predicted, and obtaining at least one alternative parameter with highest correlation as a characteristic variable of the Chla concentration data; and taking the environment parameters and characteristic variables of the Chla concentration data as input, and predicting the future Chla concentration change of the lake by using a trained second machine learning model.
- 2. The method of claim 1, wherein the environmental parameters include wind speed, air temperature, solar radiation, air pressure, rainfall, surface runoff, and evaporation.
- 3. The method of claim 1, wherein the statistics comprise a mean value, a maximum value, and a cumulative value.
- 4. The method according to claim 2, characterized in that for wind speed, air temperature, solar radiation and air pressure parameters, the mean and maximum values of the parameters are obtained as alternative parameters in different time periods before the date to be predicted; for rainfall, surface runoff and evaporation capacity parameters, acquiring average values and accumulated values in different time periods before a date to be predicted as alternative parameters; And acquiring the mean value of Chla concentration data in different time periods before the date to be predicted as an alternative parameter.
- 5. The method of claim 1, wherein the correlation analysis is performed by feature importance ranking of random forests.
- 6. The method of claim 1, wherein the correlation analysis is performed by: And establishing a random forest model by taking a certain environmental parameter or Chla concentration data and alternative parameters thereof as input and Chla concentration data of a predicted date as output, and selecting the characteristic variable of the n top digits in the ranking through characteristic importance ranking.
- 7. The method according to claim 1, wherein Chla concentration values at different moments acquired based on stationary satellite remote sensing data are taken as sample values, corresponding feature variables are acquired with each moment taken as a prediction date, and a sample database for model training is extended by using the sample values and the corresponding feature variables.
- 8. The method of claim 1, wherein the first machine learning model, the second machine learning model are extreme gradient lifting models.
- 9. The method of claim 1, wherein the first machine learning model is trained by taking the inverted Chla concentration data as a true value and characteristic variables of the environmental parameters at corresponding moments as inputs; The training mode of the second machine learning model is that a daily scale Chla data set is used as a true value, and characteristic variables of environmental parameters and characteristic variables of Chla concentration data at corresponding moments are used as inputs to train the second machine learning model.
- 10. The method of claim 1, wherein the stationary satellite remote sensing data is selected from the group consisting of GOCI, GOCI-II satellite data.
Description
Two-stage machine learning algal bloom prediction method based on stationary satellite Technical Field The invention belongs to the technical field of satellite remote sensing and water environment analysis, and particularly relates to a two-stage machine learning algal bloom prediction method based on a stationary satellite. Background Harmful algal bloom can cause water body dissolved oxygen consumption, toxin accumulation, malodor release and ecological system degradation, and becomes an important environmental problem threatening water safety and public health. The global lake algal bloom presents an expanding trend under the influence of human activities and climate change. Because algal bloom has obvious space-time heterogeneity in a short time, the overall situation is difficult to comprehensively understand only by on-site monitoring. Satellite remote sensing has the advantages of wide coverage, high speed, periodicity and the like, and has been widely used for monitoring algal bloom in lakes. Aiming at algal bloom prediction, the prior technical means is mainly divided into a mechanism-driven prediction model and a data-driven prediction model. The mechanism prediction model has the advantages of multiple parameters, complex model and poor cross-region applicability, and the data-driven prediction model generally depends on a large amount of measured data and is difficult to meet the prediction requirement under the condition of lacking high-frequency observation. Chlorophyll a (Chla) is an important index for measuring the eutrophication degree of water and the occurrence of algal bloom, and related researches have been attempted to construct a prediction model by using satellite data. Then, the existing method predicts performance degradation in a scene where the time sequence is discontinuous or the measured data is limited. Therefore, a technical scheme capable of fully utilizing long-term satellite observation data and still realizing continuous and high-precision Chla prediction under the condition of lack of measured data is needed so as to improve the timeliness and reliability of algal bloom early warning. Disclosure of Invention The invention aims to provide a two-stage machine learning algal bloom prediction method based on stationary satellites. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: A stationary satellite-based two-stage machine learning algal bloom prediction method, the method comprising: obtaining inverted Chla concentration data by using stationary satellite remote sensing data; acquiring hour-by-hour data of a plurality of environmental parameters, and calculating statistical values of the environmental parameter data in different time periods before a date to be predicted as alternative parameters; Carrying out correlation analysis on each environment parameter, the alternative parameter and the Chla concentration of the date to be predicted, and obtaining at least one alternative parameter with highest correlation as a characteristic variable of the environment parameter; Taking characteristic variables of environmental parameters as input, and predicting Chla concentration data by using a trained first machine learning model on a missing date in the Chla concentration data to obtain a continuous daily scale Chla data set; Calculating statistical values of environmental parameter data in different time periods before a date to be predicted from a daily scale Chla data set, performing correlation analysis on the statistical values as alternative parameters and Chla concentration of the date to be predicted, and obtaining at least one alternative parameter with highest correlation as a characteristic variable of the Chla concentration data; and taking the environment parameters and characteristic variables of the Chla concentration data as input, and predicting the future Chla concentration change of the lake by using a trained second machine learning model. In some embodiments of the invention, the environmental parameters include wind speed, air temperature, solar radiation, air pressure, rainfall, surface runoff, and evaporation. In some embodiments of the invention, the statistics include a mean, a maximum, and a cumulative value. In some embodiments of the invention, for wind speed, air temperature, solar radiation and air pressure parameters, the average value and the maximum value of the parameters in different time periods before the date to be predicted are obtained as alternative parameters; for rainfall, surface runoff and evaporation capacity parameters, acquiring average values and accumulated values in different time periods before a date to be predicted as alternative parameters; And acquiring the mean value of Chla concentration data in different time periods before the date to be predicted as an alternative parameter. In some embodiments of the invention, the correlation analysis is performed by feature