Search

CN-122020211-A - Water quality data online detection and analysis method for intelligent water affair

CN122020211ACN 122020211 ACN122020211 ACN 122020211ACN-122020211-A

Abstract

The invention relates to the field of data analysis, in particular to a water quality data online detection and analysis method for intelligent water affairs, which comprises the steps of acquiring, preprocessing and model training related data of water quality to obtain a water quality prediction model; the method comprises the steps of obtaining an optimal clustering result through clustering analysis and clustering evaluation of historical water quality data, obtaining real abnormal confidence coefficient through mapping data vectors with deviation exceeding a threshold value to the optimal clustering result and carrying out trend analysis, obtaining real abnormal confidence coefficient through comprehensively judging the real abnormal confidence coefficient, obtaining a water quality abnormal detection result through carrying out combined judgment on prediction deviation and the real abnormal confidence coefficient, and solving the problem that prediction errors caused by real water quality abnormality and index relation change cannot be distinguished in the existing water quality monitoring method based on the LSTM prediction deviation threshold value.

Inventors

  • SI LILI
  • WANG SHUGUANG
  • WANG XINHUA
  • SONG CHAO
  • LIU SHIJUN
  • QI MINGMING
  • DONG PENG
  • CHENG XUEFENG

Assignees

  • 山东恒坤环境工程技术有限公司
  • 山东大学

Dates

Publication Date
20260512
Application Date
20260415

Claims (10)

  1. 1. The online detection and analysis method for the water quality data of intelligent water affairs is characterized by comprising the following steps: step S1, acquiring a water quality prediction model by collecting, preprocessing and model training related data of water quality; S2, obtaining an optimal clustering result by carrying out clustering analysis and clustering evaluation on the historical water quality data; step S3, mapping the data vector with the deviation exceeding the threshold value to an optimal clustering result and carrying out trend analysis to obtain the true abnormal confidence coefficient; s4, obtaining a true anomaly confidence index by comprehensively judging the true anomaly confidence coefficient; and S5, acquiring a water quality abnormality detection result by carrying out joint judgment on the prediction deviation and the true abnormality confidence index.
  2. 2. The method for online detection and analysis of water quality data for intelligent water affairs according to claim 1, wherein the acquiring the water quality prediction model by collecting, preprocessing and model training the water quality related data comprises: Arranging on-line water quality monitoring equipment in a water source area, a water delivery pipe network or a water treatment facility, setting a sampling period, continuously collecting a monitored water body according to the sampling period, and obtaining water quality related data, wherein the water quality related data at least comprises chemical oxygen demand monitoring data, dissolved oxygen monitoring data, pH value monitoring data, turbidity monitoring data and ammonia nitrogen monitoring data; Filling the missing value of the chemical oxygen demand monitoring data, the dissolved oxygen monitoring data, the pH value monitoring data, the turbidity monitoring data and the ammonia nitrogen monitoring data, removing the abnormal value, aligning the abnormal value with a time sequence, and carrying out standardization or normalization on the treated chemical oxygen demand monitoring data, the dissolved oxygen monitoring data, the pH value monitoring data, the turbidity monitoring data and the ammonia nitrogen monitoring data to obtain historical water quality data; And taking the dissolved oxygen monitoring data, the pH value monitoring data, the turbidity monitoring data and the ammonia nitrogen monitoring data as input characteristics, taking chemical oxygen demand monitoring data at a time point corresponding to the input characteristics as a prediction target, and training a long-short-period memory network (LSTM) to obtain a water quality prediction model.
  3. 3. The method for online detection and analysis of water quality data for intelligent water affairs according to claim 1, wherein the obtaining an optimal clustering result by performing cluster analysis and cluster evaluation on the historical water quality data comprises: obtaining a candidate clustering result set by carrying out candidate cluster number analysis and clustering treatment on the historical water quality data; and obtaining an optimal clustering result by carrying out clustering evaluation processing on the candidate clustering result set.
  4. 4. The method for online detection and analysis of water quality data for intelligent water affairs according to claim 3, wherein the obtaining a candidate clustering result set by performing candidate cluster number analysis and clustering processing on the historical water quality data comprises: For any target time point in the historical water quality data, combining chemical oxygen demand monitoring data, dissolved oxygen monitoring data, pH value monitoring data, turbidity monitoring data and ammonia nitrogen monitoring data corresponding to the target time point to obtain historical water quality data vectors corresponding to the target time point, and taking a set of historical water quality data vectors corresponding to all the target time points as a historical water quality data vector set for cluster analysis; The method comprises the steps of obtaining the number of water quality data index types corresponding to a historical water quality data vector set, taking the number of water quality data index types as the maximum candidate cluster number, constructing a candidate cluster number set consisting of 1 to the maximum candidate cluster number, analyzing the historical water quality data vector set by adopting a GAP STATISTIC algorithm for any candidate cluster number in the candidate cluster number set to obtain a Gap value under the corresponding candidate cluster number, and determining the candidate cluster number with the maximum Gap value as an ideal cluster number; The method comprises the steps of taking the ideal cluster number as a center, constructing a candidate cluster number range, carrying out clustering treatment on a historical water quality data vector set by adopting a K-means clustering algorithm for any candidate cluster number in the candidate cluster number range to obtain candidate clustering results under the corresponding candidate cluster number, and taking a set of candidate clustering results corresponding to all candidate cluster numbers in the candidate cluster number range as a candidate clustering result set.
  5. 5. The method for online detection and analysis of water quality data for intelligent water affairs according to claim 3, wherein the obtaining the optimal clustering result by performing clustering evaluation processing on the candidate clustering result set comprises: For any candidate clustering result in the candidate clustering result set, obtaining the candidate cluster number, the total sample number and the sample number of each cluster corresponding to the candidate clustering result, taking the minimum value in the sample number of each cluster as the minimum cluster sample number, and taking the calculation result of dividing the minimum cluster sample number by the total sample number as the minimum cluster duty ratio evaluation; setting a small cluster dividing coefficient for any candidate clustering result in a candidate clustering result set, determining clusters, which are larger than the product of the small cluster dividing coefficient and the total number of samples, in the number of samples of each cluster as effective clusters, acquiring cluster sample number average values corresponding to all effective clusters, taking absolute values of differences between the cluster sample numbers of the effective clusters and the cluster sample number average values as corresponding cluster scale deviation evaluation for any effective clusters, carrying out exponential mapping with natural constants as base numbers on the opposite numbers of calculation results added by the cluster scale deviation evaluation corresponding to all the effective clusters, and taking the mapping results obtained correspondingly as cluster scale consistency evaluation; Taking the calculation result of adding the minimum cluster duty ratio evaluation and the cluster scale consistency evaluation as a cluster scale rationality factor corresponding to the candidate cluster result; The method comprises the steps of obtaining a Gap value under the number of candidate clusters corresponding to a candidate cluster result from a candidate cluster result set, carrying out hyperbolic tangent function mapping on the Gap value, adding the Gap value with a constant 1, dividing the added result by a constant 2 to obtain a Gap value normalization evaluation, dividing a cluster scale rationality factor by the constant 2 to obtain a cluster scale rationality enhancement evaluation, multiplying the Gap value normalization evaluation and the cluster scale rationality enhancement evaluation by corresponding influence weights respectively, and adding to obtain a cluster result evaluation score corresponding to the candidate cluster result; And comparing the cluster result evaluation scores corresponding to all the candidate cluster results in the candidate cluster result set, and determining the candidate cluster result with the largest cluster result evaluation score as the optimal cluster result.
  6. 6. The online detection and analysis method for intelligent water quality data according to claim 1, wherein the obtaining real anomaly confidence level by mapping the data vector with the deviation exceeding the threshold value to the optimal clustering result and performing trend analysis comprises: clustering mapping processing is carried out on the data vector with the deviation exceeding the threshold value, so that a target attribution cluster is obtained; the typical real abnormal confidence coefficient is obtained by carrying out typical abnormal matching analysis processing on the target attribution clustering data; and obtaining atypical real anomaly confidence through trend continuity analysis processing on the local residual sequence data.
  7. 7. The method for online detection and analysis of water quality data for intelligent water affairs according to claim 6, wherein the obtaining the target home cluster by performing cluster mapping processing on the data vector with the deviation exceeding the threshold value comprises: For any target moment, chemical oxygen demand monitoring data, dissolved oxygen monitoring data, pH value monitoring data, turbidity monitoring data and ammonia nitrogen monitoring data corresponding to the target moment are obtained, and the chemical oxygen demand monitoring data, the dissolved oxygen monitoring data, the pH value monitoring data, the turbidity monitoring data and the ammonia nitrogen monitoring data corresponding to the target moment are combined to obtain data vectors corresponding to the target moment; Comparing the absolute value of the predicted deviation with a preset deviation threshold value, and determining a data vector corresponding to the target moment as a data vector with a deviation exceeding the threshold value when the absolute value of the predicted deviation is larger than the preset deviation threshold value; And determining the cluster corresponding to the cluster center with the minimum Euclidean distance as the target attribution cluster corresponding to the data vector with the super-threshold deviation.
  8. 8. The method for online detection and analysis of water quality data for intelligent water affairs according to claim 6, wherein the step of obtaining a typical true anomaly confidence level by performing a typical anomaly matching analysis process on target home cluster data comprises the steps of: for any data vector with the deviation exceeding the threshold value, acquiring the number of cluster samples in a target home cluster set corresponding to the data vector with the deviation exceeding the threshold value, and acquiring the total number of samples and the number of candidate clusters corresponding to the optimal clustering result; The method comprises the steps of obtaining the number of cluster samples corresponding to all clusters in an optimal clustering result according to a data vector with any deviation exceeding a threshold value, taking the maximum value in the number of all cluster samples as the maximum number of cluster samples, taking the minimum value in the number of all cluster samples as the minimum number of cluster samples, taking the difference value between the maximum number of cluster samples and the minimum number of cluster samples as a cluster scale normalization scale, adding a calculation result obtained by dividing the cluster scale difference evaluation by the cluster scale normalization scale with a constant 1, and multiplying the addition result by one half to obtain a cluster scale normalization evaluation, taking the difference value between the constant 1 and the cluster scale normalization evaluation as a cluster scale abnormal tendency evaluation; For any data vector with the deviation exceeding the threshold value, acquiring the Euclidean distance between the data vector with the deviation exceeding the threshold value and the cluster center corresponding to the target home cluster; And taking the calculation result of the cluster scale abnormal tendency evaluation and the cluster center proximity evaluation as a typical real abnormal confidence corresponding to the data vector with the deviation exceeding the threshold value.
  9. 9. The online detection and analysis method for intelligent water service water quality data according to claim 6, wherein the obtaining atypical real anomaly confidence level by performing trend continuity analysis processing on the local residual sequence data comprises: for a target moment corresponding to a data vector with any deviation exceeding a threshold value, obtaining a prediction deviation corresponding to each moment in a local time range taking the target moment as an end moment, and constructing a local residual sequence; For any residual data in a local residual sequence, acquiring the vertical distance from corresponding points of the residual data to the residual fitting straight line, taking the calculation result of adding the vertical distances from corresponding points of all residual data in the local residual sequence to the residual fitting straight line as trend deviation evaluation, setting a trend direction function as 1 when the slope is not equal to 0, setting the trend direction function as 0 when the slope is equal to 0, carrying out exponential mapping with natural constants as the base number on the opposite number of the trend deviation evaluation, and multiplying the mapping result by the trend direction function to acquire trend continuity evaluation; For the local residual sequence, obtaining a first-order autocorrelation coefficient between residual data at adjacent moments, adding the first-order autocorrelation coefficient with a constant 1, dividing the first-order autocorrelation coefficient by a constant 2, and obtaining autocorrelation normalization evaluation; And taking the calculation result of the addition of the trend continuity evaluation and the autocorrelation normalization evaluation as atypical real abnormal confidence corresponding to the data vector with the deviation exceeding the threshold value.
  10. 10. The online detection and analysis method for intelligent water service water quality data according to claim 1, wherein the obtaining the true anomaly confidence index by comprehensively determining the true anomaly confidence comprises: For any data vector with the deviation exceeding the threshold value, obtaining typical real abnormal confidence coefficient and atypical real abnormal confidence coefficient corresponding to the data vector with the deviation exceeding the threshold value, dividing the typical real abnormal confidence coefficient by a constant 2 to obtain typical real abnormal normalized evaluation, dividing the atypical real abnormal confidence coefficient by the constant 2 to obtain atypical real abnormal normalized evaluation; Comparing the typical real abnormality normalization evaluation with the atypical real abnormality normalization evaluation, and determining a larger value in the typical real abnormality normalization evaluation and the atypical real abnormality normalization evaluation as a real abnormality confidence index corresponding to the data vector with the deviation exceeding the threshold value.

Description

Water quality data online detection and analysis method for intelligent water affair Technical Field The invention relates to the technical field of data analysis, in particular to a water quality data online detection and analysis method for intelligent water affairs. Background In the application scenario of intelligent water service, a plurality of water quality indexes such as chemical oxygen demand, dissolved oxygen, pH value, turbidity, ammonia nitrogen and the like are continuously collected by arranging on-line water quality monitoring equipment in a water source area, a water delivery pipe network or a water treatment facility, and the water quality state is analyzed on line based on the acquired multi-index time series data. In the prior art, a method for modeling multi-index water quality data by utilizing a long-short-period memory network (LSTM) generally takes historical monitoring data of indexes such as dissolved oxygen, pH value, turbidity, ammonia nitrogen and the like as input characteristics, takes chemical oxygen demand monitoring data as a prediction target, and obtains a water quality prediction model through training, and in the actual operation process, the multi-index water quality data acquired at the current moment are input into the water quality prediction model to obtain a corresponding chemical oxygen demand predicted value, and the chemical oxygen demand predicted value is compared with the actual monitoring value, so that when the deviation between the chemical oxygen demand predicted value and the actual monitoring value exceeds a preset range, the current water quality state is considered to be possibly abnormal, and the online detection analysis of the water quality data is realized. However, since the LSTM model is established based on the association relationship between the water quality indexes in the historical period, in the actual water service operation process, factors such as water source switching, pollutant input change, seasonal fluctuation, water treatment process adjustment and the like may all cause the change of the relationship between different water quality indexes. Under the condition, even if the current water body does not have real abnormality, the model can generate larger deviation between the predicted value of the chemical oxygen demand and the actual monitoring value due to insufficient adaptation to the new relation mode, so that the system misjudges the predicted deviation caused by index relation change or model error as real water quality abnormality, thereby causing misalarm and affecting the accuracy and reliability of the online water quality detection result. Therefore, how to further distinguish whether the prediction deviation is derived from the real water quality abnormality or from the index relation change or the model error on the basis of the online detection based on the water quality prediction model becomes the technical problem to be solved currently. Disclosure of Invention In view of the above, the present invention aims to provide an online detection and analysis method for water quality data of intelligent water affairs, so as to solve the problem that the existing water quality monitoring method based on LSTM prediction deviation threshold cannot distinguish prediction errors caused by real water quality abnormality and index relation change. In order to achieve the above purpose, the technical scheme of the invention is realized as follows: the method for online detecting and analyzing the water quality data of intelligent water affairs comprises the following steps: step S1, acquiring a water quality prediction model by collecting, preprocessing and model training related data of water quality; S2, obtaining an optimal clustering result by carrying out clustering analysis and clustering evaluation on the historical water quality data; step S3, mapping the data vector with the deviation exceeding the threshold value to an optimal clustering result and carrying out trend analysis to obtain the true abnormal confidence coefficient; s4, obtaining a true anomaly confidence index by comprehensively judging the true anomaly confidence coefficient; and S5, acquiring a water quality abnormality detection result by carrying out joint judgment on the prediction deviation and the true abnormality confidence index. Further, the acquiring the water quality prediction model by collecting, preprocessing and model training the water quality related data includes: Arranging on-line water quality monitoring equipment in a water source area, a water delivery pipe network or a water treatment facility, setting a sampling period, continuously collecting a monitored water body according to the sampling period, and obtaining water quality related data, wherein the water quality related data at least comprises chemical oxygen demand monitoring data, dissolved oxygen monitoring data, pH value monitoring data, turbidity monitoring data and am