Search

CN-120544692-B - Method and system for identifying agro-track flow pollution in surface water by utilizing microbial fingerprints

CN120544692BCN 120544692 BCN120544692 BCN 120544692BCN-120544692-B

Abstract

The invention discloses a method and a system for identifying agro-track flow pollution in surface water by utilizing microbial fingerprints. The method comprises the steps of collecting microorganism samples of multiple types of water bodies, obtaining microorganism composition data through 16S rDNA sequencing, determining farmland runoff pollution microorganism fingerprints based on sensitivity-specificity screening indexes and combining specificity anaerobic characteristics, establishing a double-model integrated judging mechanism, processing microorganism type data by utilizing an artificial neural network, analyzing microorganism relative abundance data by utilizing a XGBoost model, extracting a double-model classification decision basis through a controlled variable method, and judging results based on logic rules by adopting the double model. The invention creatively combines the microbial fingerprint and the machine learning classification model integration, can realize the accurate and simple identification of the agricultural runoff pollution in the surface water, and improves the accuracy and the efficiency of the identification of the agricultural runoff pollution.

Inventors

  • LIU XINHUI
  • ZHANG XIN
  • LI PENGCHENG
  • LI LIPING
  • DONG LU
  • XUE MENGZHU
  • LI BAIHAN
  • XIA GUOHUI
  • WANG KENING
  • ZHANG HANDAN

Assignees

  • 北京师范大学

Dates

Publication Date
20260508
Application Date
20250515

Claims (6)

  1. 1. A method for identifying agro-path flow pollution in surface water by using microbial fingerprints, the method comprising: acquiring a plurality of different types of water samples, and acquiring pollution source microorganism composition data of each water sample based on a 16S rDNA sequencing method, wherein each water sample corresponds to one water environment; based on a sensitivity-specificity analysis method, combining with a special anaerobic characteristic, screening out a plurality of farmland runoff pollution related microorganisms from a plurality of pollution source microorganism composition data, and taking the farmland runoff pollution related microorganisms as farmland runoff pollution microorganism fingerprints, wherein the farmland runoff pollution microorganism fingerprints comprise f_ Desulfuromonadaceae, g _Geobater, f_ AKAU3564_ sediment _group, o_ Dehalococcoidales and g_ CITRIFERMENTANS; Based on farmland runoff pollution microorganism fingerprint category data corresponding to each of the plurality of farmland runoff pollution microorganism fingerprints and farmland runoff pollution microorganism fingerprint relative abundance data corresponding to each of the plurality of farmland runoff pollution microorganism fingerprints, respectively constructing an artificial neural network model and a XGBoost model; Acquiring target data of target microorganism fingerprints in a target surface water sample to be identified, and inputting the target data into the artificial neural network model and the XGBoost model, wherein the target data comprises target microorganism fingerprint category data corresponding to the target microorganism fingerprints and target microorganism fingerprint relative abundance data corresponding to the target microorganism fingerprints; Based on the judgment results respectively output by the artificial neural network model and the XGBoost model, determining whether farmland runoff pollution exists in a target surface water sample corresponding to the target microorganism fingerprint target data by adopting a logic rule; changing farmland runoff pollution microorganism fingerprint category data input into the artificial neural network model based on a control variable method to obtain a plurality of first output results; Based on a plurality of the first output results, determining a binarization judgment condition meeting the output positive judgment result of the artificial neural network model; Changing relative abundance data of farmland runoff pollution microorganism fingerprints input to the XGBoost model based on a control variable method to obtain a plurality of second output results; determining a relative abundance threshold interval that satisfies the XGBoost model output positive determination result based on a plurality of the plurality of second output results; Wherein determining, based on the plurality of first output results, a binarization judgment condition that satisfies the artificial neural network model output positive judgment result includes: Performing binarization processing on the farmland runoff pollution microorganism fingerprint category data to obtain a binarization 0/1 matrix, wherein 0 represents absence and 1 represents presence; Based on the binarization 0/1 matrix, analyzing the single farmland runoff pollution microorganism fingerprint and/or the combination of a plurality of farmland runoff pollution microorganism fingerprints, and determining the binarization judgment condition comprises: when the existence of the g_ CITRIFERMENTANS microbial fingerprint is detected, the extraction result is 1, and the condition of binary judgment is determined to be met, or, And simultaneously detecting that any two or more than two microbial fingerprints except for the f_ Desulfuromonadaceae and g_Geobater combinations exist, wherein the extraction result is 1, and determining that the binarization judgment condition is met.
  2. 2. The method of claim 1, wherein determining, based on a plurality of the plurality of second output results, a relative abundance threshold interval that satisfies the XGBoost model output positive determination result comprises: Adjusting relative abundance data of farmland runoff contaminating microorganism fingerprints according to an accuracy gradient of 0.0000001%, determining a relative abundance threshold interval comprises: if only a single farmland runoff pollution microorganism fingerprint is detected, the relative abundance threshold interval needs to meet 0.0067339%≥f_AKAU3564_sediment_group≥0.0050275%、f_Desulfuromonadaceae≥0.0177225%、0.0397715%≥g_Geobacter≥0.0077625%、0.0078340%≥o_Dehalococcoidales≥0.0031500% or 0.0105874 percent or more, o_ Dehalococcoidales or more than 0.0091936 percent; if f_ Desulfuromonadaceae and g_Geobater are detected at the same time, the relative abundance threshold interval needs to be satisfied that f_ Desulfuromonadaceae is larger than or equal to 0.0177225% or f_ Desulfuromonadaceae is smaller than 0.0177225% and 0.0397719% is larger than or equal to g_Geobater is larger than or equal to 0.0077645%.
  3. 3. The method for identifying agricultural runoff pollution in surface water by using microbial fingerprints according to claim 1, wherein determining whether agricultural runoff pollution exists in a target surface water sample corresponding to the target microbial fingerprint category data by using logic rules based on determination results output by the artificial neural network model and the XGBoost model respectively comprises: Matching whether the target microorganism fingerprint category data meets a binarization judgment condition; And when the target microorganism fingerprint type data meets the binarization judgment condition, determining that farmland runoff pollution exists in a target surface water sample corresponding to the target microorganism fingerprint type data when the judgment result output by the artificial neural network model is determined to be a positive judgment result.
  4. 4. The method for identifying agricultural runoff pollution in surface water by utilizing microbial fingerprints according to claim 1, wherein determining whether agricultural runoff pollution exists in a target surface water sample corresponding to target microbial fingerprint target data by adopting logic rules based on determination results output by the artificial neural network model and the XGBoost model respectively comprises: When the judgment result output by the artificial neural network model is a negative judgment result, acquiring the relative abundance data of the target microorganism fingerprint in the target surface water sample; Matching whether the relative abundance data of the target microorganism fingerprint meets a relative abundance threshold interval; When the relative abundance data of the target microorganism fingerprint meets the relative abundance threshold interval, determining XGBoost that the judgment result output by the model is a positive judgment result, and determining that farmland runoff pollution exists in the target surface water sample corresponding to the relative abundance data of the target microorganism fingerprint.
  5. 5. The method for identifying agricultural runoff pollution in surface water by utilizing microbial fingerprints according to claim 1, wherein determining whether agricultural runoff pollution exists in a target surface water sample corresponding to target microbial fingerprint target data by adopting logic rules based on determination results output by the artificial neural network model and the XGBoost model respectively comprises: Matching whether the target microorganism fingerprint category data meets a binarization judgment condition; matching whether the relative abundance data of the target microorganism fingerprint in the target surface water sample meets a relative abundance threshold interval; And when the target microorganism fingerprint type data meets the binarization judging condition and/or the relative abundance data of the target microorganism fingerprint meets the relative abundance threshold interval, determining whether farmland runoff pollution exists in the target surface water sample corresponding to the target microorganism fingerprint target data.
  6. 6. A system for identifying agro-path flow pollution in surface water using microbial fingerprints, the system comprising: The system comprises a water body sample acquisition module, a water body analysis module and a water body analysis module, wherein the water body sample acquisition module is used for acquiring a plurality of different types of water body samples, and acquiring microorganism composition data of each water body sample based on a 16S rDNA sequencing method, wherein each water body sample corresponds to one water body environment; The farmland runoff pollution microorganism fingerprint determining module is used for screening a plurality of farmland runoff pollution related microorganisms from a plurality of pollution source microorganism composition data based on a sensitivity-specificity analysis method in combination with a specificity anaerobic characteristic, and taking the farmland runoff pollution related microorganisms as farmland runoff pollution microorganism fingerprints, wherein the farmland runoff pollution microorganism fingerprints comprise f_ Desulfuromonadaceae, g _Geobater, f_ AKAU3564_ sediment _group, o_ Dehalococcoidales and g_ CITRIFERMENTANS; The model construction module is used for respectively constructing an artificial neural network model and a XGBoost model based on farmland runoff pollution microorganism fingerprint category data corresponding to each of the plurality of farmland runoff pollution microorganism fingerprints and farmland runoff pollution relative abundance data corresponding to each of the plurality of farmland runoff pollution microorganism fingerprints; The data acquisition module is used for acquiring target data of target microorganism fingerprints in a target surface water sample to be identified, and inputting the target data into the artificial neural network model and the XGBoost model, wherein the target data comprises target microorganism fingerprint category data corresponding to the target microorganisms and target microorganism relative abundance data corresponding to the target microorganisms; The farmland runoff pollution determining module is used for determining whether farmland runoff pollution exists in a target surface water sample corresponding to the target microorganism fingerprint target data or not by adopting a logic rule based on the judging results output by the artificial neural network model and the XGBoost model respectively; The first output result determining module is used for changing the farmland runoff pollution microorganism fingerprint category data input into the artificial neural network model based on a control variable method to obtain a plurality of first output results; the binarization judgment condition determining module is used for determining binarization judgment conditions meeting the positive judgment result output by the artificial neural network model based on a plurality of first output results; the second output result determining module is used for changing relative abundance data of farmland runoff pollution microorganism fingerprints input to the XGBoost model based on a control variable method to obtain a plurality of second output results; A relative abundance threshold interval determining module, configured to determine, based on a plurality of the second output results, a relative abundance threshold interval that satisfies the XGBoost model output positive determination result; The binarization judgment condition determining module includes: the binarization processing unit is used for carrying out binarization processing on the farmland runoff pollution microorganism fingerprint category data to obtain a binarization 0/1 matrix, wherein 0 represents absence and 1 represents presence; The binarization judgment condition determining unit is used for analyzing the single farmland runoff pollution microorganism fingerprint and/or the combination of a plurality of farmland runoff pollution microorganism fingerprints based on the binarization 0/1 matrix to determine the binarization judgment condition; The binarization judgment condition determining unit includes: A first judging condition subunit for determining that the binarization judging condition is satisfied when the extraction result is 1 or when the g_ CITRIFERMENTANS microbial fingerprint is detected to exist, And a second judgment condition subunit for determining that the binarization judgment condition is satisfied when any two or more microbial fingerprints other than the combination of f_ Desulfuromonadaceae and g_Geobater are detected to exist simultaneously, and the extraction result is 1.

Description

Method and system for identifying agro-track flow pollution in surface water by utilizing microbial fingerprints Technical Field The invention belongs to the technical field of environmental pollution source identification, and particularly relates to a method and a system for identifying agro-track flow pollution in surface water by utilizing microbial fingerprints. Background With the expansion of agricultural production scale, agricultural non-point source pollution becomes an important source of water pollution, and compared with orchards, grasslands and forest lands, the farmland with intensive management has a larger influence on the quality of surface water, and used fertilizers, pesticides and the like carry pollutants into the water under the action of rainfall flushing and the like, so that the human health and ecological safety are threatened. The current pollution source identification method depends on fingerprints such as stable isotopes, water chemistry indexes, characteristic compounds and the like, and is accurate in certain situations, but is limited to specific pollutants and cannot distinguish farmland pollution from other agricultural sources. The microbial fingerprint is sensitive to environmental changes and can reflect complex pollution modes, so that the microbial fingerprint is a promising alternative. In recent years, water pollution identification research based on microbial fingerprints has been significantly advanced. The prior art mainly establishes species fingerprint spectrums of aquaculture sources and planting sources by constructing a microorganism fingerprint database of a specific pollution source, such as a 16S rRNA gene adopted by patent CN117512147A, and a rice field pollution specific gene fingerprint spectrum proposed by CN 117535435A. However, there are three technical bottlenecks in the art, namely, the traditional fingerprint matching method relies on detection of microbial fingerprints, biological limitation exists in the aspects of host specificity and sensitivity, the false positive rate of results is high, secondly, pollution identification based on machine learning focuses on abundance data of the microbial fingerprints, importance of microbial category information is ignored, a robust and accurate pollution identification framework cannot be constructed due to neglect of indication effect of certain fingerprints on pollution sources, thirdly, although machine learning models are good at finding relativity and relevance to improve prediction accuracy, complicated model construction, optimization and screening are often needed, causal reasoning or statistical interpretation capability is lacked, and particularly, a neural network and an integrated model are provided, and the black box characteristic of the neural network leads to non-traceability of decision paths, so that the requirement of evidence chain construction in environmental law enforcement is difficult to be met. Therefore, there is a need to develop a precise, simple and convenient method for identifying pollution sources by fusing multidimensional microbial fingerprint features with interpretable and transparent machine learning. Disclosure of Invention Aiming at the defects of the prior art, the invention aims to provide a method and a system for identifying the farmland runoff pollution in surface water by utilizing microbial fingerprints so as to improve the identification efficiency and accuracy of the farmland runoff pollution. In accordance with one aspect of the present application, a method for identifying agricultural track and field flow pollution in surface water using microbial fingerprints is disclosed, the method comprising: Acquiring a plurality of different types of water samples, and acquiring the polluted microorganism composition data of each water sample based on a 16Sr DNA sequencing method, wherein each water sample corresponds to one water environment; based on a sensitivity-specificity analysis method, combining with a special anaerobic characteristic, screening out a plurality of farmland runoff pollution related microorganisms from a plurality of pollution source microorganism composition data, and taking the farmland runoff pollution related microorganisms as farmland runoff pollution microorganism fingerprints, wherein the farmland runoff pollution microorganism fingerprints comprise f_ Desulfuromonadaceae, g _Geobater, f_ AKAU3564_ sediment _group, o_ Dehalococcoidales and g_ CITRIFERMENTANS; Based on farmland runoff pollution microorganism category data corresponding to the plurality of farmland runoff pollution microorganism fingerprints and farmland runoff pollution microorganism relative abundance data corresponding to the plurality of farmland runoff pollution microorganism fingerprints, respectively constructing an artificial neural network model and a XGBoost model; acquiring target data of target microorganism fingerprints in a target surface water sample to