CN-122020461-A - Precipitation data anomaly identification model construction method, device, equipment and storage medium
Abstract
The invention relates to the technical field of meteorological data anomaly identification, in particular to a method, a device, equipment and a storage medium for constructing a rainfall data anomaly identification model; the method comprises the steps of obtaining a plurality of first historical precipitation data sets, clustering the plurality of first historical precipitation data sets to obtain a first clustering result, screening the plurality of first historical precipitation data sets according to the first clustering result, obtaining a plurality of first precipitation classes according to the screening result and the first clustering result, and finally constructing a precipitation data anomaly identification model for each first precipitation class. The method does not need to carry out data labeling in advance or a large number of training or calculating processes, so that the method consumes less resources no matter a model is built or applied. In addition, due to the fact that data are abnormal through clustering and model double analysis, the accuracy of the result of abnormal judgment is higher, and the judgment effect is guaranteed.
Inventors
- MA SHUANGYU
- HUANG ZHONGKANG
- DONG BAOHUA
- ZHANG JIN
- YANG JING
- WANG YILIN
- WU HUINAN
- YUAN XIAOYU
- LI WANG
- ZHANG MEI
Assignees
- 河北省气象信息中心
Dates
- Publication Date
- 20260512
- Application Date
- 20260123
Claims (10)
- 1. The precipitation data anomaly identification model construction method is characterized by comprising the following steps of: acquiring a plurality of first historical precipitation data sets, wherein the first historical precipitation data sets comprise a plurality of precipitation data and a plurality of factor data influencing the precipitation data; clustering the plurality of first historical precipitation data sets to obtain a first clustering result; Screening the plurality of first historical precipitation data sets according to the first clustering result, and obtaining a plurality of first precipitation classes according to the screening result and the first clustering result; and constructing a precipitation data anomaly identification model for each first precipitation class, wherein the precipitation data anomaly identification model determines an anomaly index according to an input precipitation data set in a data dimension transformation mode.
- 2. The method for constructing a model for identifying anomalies in precipitation data according to claim 1, wherein the clustering the plurality of first historical precipitation data sets to obtain a plurality of first precipitation classes includes: obtaining a neighborhood radius and a first friendly neighbours number; The first historical precipitation data sets with the number of the first historical precipitation data sets being larger than that of the first friendly neighbours in the neighborhood radius are used as initial data sets; Extracting a dataset from a plurality of initial datasets from an unclassified dataset as an originating dataset, and performing the following steps after extraction: Searching a first historical precipitation data set which is not clustered in a neighborhood radius of the original data set by taking the original data set as a center; If the first non-clustered historical precipitation data set is searched in the neighborhood radius, adding the searched data set into the class of the original data set, taking the searched data set as the original data set, and jumping to the step of taking the original data set as the center to search the first non-clustered historical precipitation data set in the neighborhood radius of the original data set; And if the initial data sets are all clustered, carrying out non-clustering identification on the first non-clustered historical precipitation data sets.
- 3. The precipitation data anomaly identification model construction method of claim 2, wherein the neighborhood radius is determined by a plurality of clusters, comprising: Acquiring a plurality of first cluster numbers; Clustering the plurality of first historical precipitation data sets according to the first clustering quantity to obtain a second clustering result; determining a first aggregation index according to a first formula for each clustering result, wherein the first formula is as follows: In the formula, As an index of the first degree of polymerization, For the first number of clusters, For the number of first historical precipitation datasets in the class, Is the first The third class A first set of historical precipitation data, Is the first A class center of the individual class; drawing a first graph of the aggregation index changing along with the number of clusters by taking the first number of clusters as a coordinate horizontal axis and taking a first aggregation index as a vertical axis; finding out a point with the maximum curvature from the first graph, and taking the first clustering quantity corresponding to the point with the maximum curvature as a target clustering quantity; Taking a second clustering result obtained according to the target clustering quantity as a target clustering result; for each class in the target clustering result, determining an average neighborhood radius, wherein the average neighborhood radius is the average value of a plurality of first neighborhood radii, and the first neighborhood radius is the distance between a first historical precipitation data set and a nearest first historical precipitation data set in the class; And taking the maximum value in a plurality of average neighborhood radii as the neighborhood radius.
- 4. The precipitation data anomaly identification model construction method of claim 1, wherein the screening the plurality of first historical precipitation data sets according to the first clustering result, and obtaining a plurality of first precipitation classes according to the screening result and the first clustering result, comprises: The first clustering result comprises a plurality of second precipitation classes and a plurality of first historical precipitation data sets marked with non-clustering marks; deleting the first historical precipitation data set marked with the non-clustered mark; taking the second precipitation class with the largest data set in the class as a reference class; determining a threshold number according to a first proportional threshold and the number of data sets in the reference class; deleting a second precipitation class with the number of data sets in the class smaller than the threshold number; and taking the remaining second precipitation classes as the first precipitation classes.
- 5. The method for constructing a model for identifying abnormal precipitation data according to any one of claims 1 to 4, wherein said constructing a model for identifying abnormal precipitation data for each first precipitation class comprises: Obtaining a basic model and a plurality of first coefficient arrays, wherein the basic model is provided with a plurality of first coefficients, and the number of the coefficients in the first coefficient arrays is the same as that of the plurality of first coefficients; Substituting the plurality of first coefficient arrays into the basic model respectively to obtain a plurality of process models; Substituting the data set in the first precipitation class into the process model respectively for each process model, and taking the average value of the obtained deviation indexes as a model deviation index; If the iteration number threshold is not reached, adjusting the plurality of first coefficient arrays according to a plurality of model deviation indexes; Skipping to the step of substituting the plurality of first coefficient arrays into the basic model respectively to obtain a plurality of process models; if the iteration number threshold is reached, using a process model with the minimum model deviation index as the anomaly identification model; And determining an abnormality judgment threshold according to the model deviation index of the abnormality recognition model.
- 6. The precipitation data anomaly identification model construction method of claim 5, wherein the base model is: In the formula, In order to compress the vector quantity, For a native vector constructed from the data set, In order to restore the vector quantity, In order to compress the coefficient matrix, Is the compression coefficient matrix Line 1 The elements of the column are arranged such that, In order to reduce the coefficient matrix of the sample, Is the reduction coefficient matrix Line 1 The elements of the column are arranged such that, To compress the total number of elements in the vector, As a total number of data in the dataset, Is the deviation index.
- 7. The method of claim 6, wherein adjusting the plurality of first coefficient arrays according to a plurality of model deviation indices comprises: Obtaining a plurality of deviation index queues, wherein each queue corresponds to a first coefficient array; Adding the model deviation index into a deviation index queue corresponding to the coefficient array; finding out an index with the smallest value from each deviation index queue, and taking a historical first coefficient array corresponding to the smallest index as a process optimal array; Taking a first coefficient array corresponding to an index with the minimum model deviation index value as a current optimal array; For each first coefficient array, adjusting according to a second formula, a process optimal array and the current optimal array, wherein the second formula is as follows: In the formula, Is the first Second-order-adjusted first coefficient array The data of the plurality of data, Is the first Second-order-adjusted first coefficient array The data of the plurality of data, Is the first coefficient array of the history The data of the plurality of data, As a result of the first adjustment factor, Is the first of the current optimal array The data of the plurality of data, Is the second adjustment coefficient.
- 8. A precipitation data anomaly identification model construction apparatus for implementing the precipitation data anomaly identification model construction method according to any one of claims 1 to 7, the precipitation data anomaly identification model construction apparatus comprising: The historical precipitation number acquisition module is used for acquiring a plurality of first historical precipitation data sets, wherein the first historical precipitation data sets comprise a plurality of precipitation data and a plurality of factor data influencing the precipitation data; the data clustering module is used for clustering the plurality of first historical precipitation data sets to obtain a first clustering result; the data screening module is used for screening the plurality of first historical precipitation data sets according to the first clustering result and obtaining a plurality of first precipitation types according to the screening result and the first clustering result; And The abnormal recognition model construction module is used for constructing a precipitation data abnormal recognition model for each first precipitation class, wherein the precipitation data abnormal recognition model determines an abnormal index according to an input precipitation data set in a data dimension transformation mode.
- 9. An electronic device comprising a memory and a processor, the memory having stored therein a computer program executable on the processor, characterized in that the processor implements the steps of the method according to any of the preceding claims 1 to 7 when the computer program is executed.
- 10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any of the preceding claims 1 to 7.
Description
Precipitation data anomaly identification model construction method, device, equipment and storage medium Technical Field The invention relates to the technical field of meteorological data anomaly identification, in particular to a rainfall data anomaly identification model construction method, device and equipment and a storage medium. Background The precipitation data common knowledge is a key link in meteorological research and disaster prevention and reduction work, and the key is to accurately screen 'abnormal values' (including error values, extreme values and the like) deviating from a normal rule from massive precipitation observations (such as site observations, satellite inversion and radar estimation) or simulation data. The abnormal values may be caused by instrument faults and human recording errors, or may be real extreme weather events (such as ultra-heavy rain and abnormal drought), so that the identification process needs to be comprehensively judged by combining data characteristics, physical laws and business requirements. Some precipitation data anomalies are currently cross-validated by correlations between features. Because of the many elements of precipitation data (such as air temperature, air pressure, humidity, wind speed, etc.), there are strong coupling and complex relationships between the data. Therefore, the method has the problems that the model is too complex when facing larger data dimension, and more manpower and calculation force are needed to model and apply the model. The technical capacity of modeling and the investment of resources are greatly examined. Based on the method, a precipitation data anomaly identification model construction method needs to be developed and designed. Disclosure of Invention The embodiment of the invention provides a precipitation data anomaly identification model construction method, device and equipment and a storage medium, which are used for solving the problem that more resources are needed to be input in the prior art. In a first aspect, an embodiment of the present invention provides a method for constructing a precipitation data anomaly identification model, including: acquiring a plurality of first historical precipitation data sets, wherein the first historical precipitation data sets comprise a plurality of precipitation data and a plurality of factor data influencing the precipitation data; clustering the plurality of first historical precipitation data sets to obtain a first clustering result; Screening the plurality of first historical precipitation data sets according to the first clustering result, and obtaining a plurality of first precipitation classes according to the screening result and the first clustering result; and constructing a precipitation data anomaly identification model for each first precipitation class, wherein the precipitation data anomaly identification model determines an anomaly index according to an input precipitation data set in a data dimension transformation mode. In one possible implementation manner, the clustering the plurality of first historical precipitation data sets to obtain a plurality of first precipitation classes includes: obtaining a neighborhood radius and a first friendly neighbours number; The first historical precipitation data sets with the number of the first historical precipitation data sets being larger than that of the first friendly neighbours in the neighborhood radius are used as initial data sets; Extracting a dataset from a plurality of initial datasets from an unclassified dataset as an originating dataset, and performing the following steps after extraction: Searching a first historical precipitation data set which is not clustered in a neighborhood radius of the original data set by taking the original data set as a center; If the first non-clustered historical precipitation data set is searched in the neighborhood radius, adding the searched data set into the class of the original data set, taking the searched data set as the original data set, and jumping to the step of taking the original data set as the center to search the first non-clustered historical precipitation data set in the neighborhood radius of the original data set; And if the initial data sets are all clustered, carrying out non-clustering identification on the first non-clustered historical precipitation data sets. In one possible implementation, the neighborhood radius is determined by a plurality of clusters, including: Acquiring a plurality of first cluster numbers; Clustering the plurality of first historical precipitation data sets according to the first clustering quantity to obtain a plurality of second clustering results; determining a first aggregation index according to a first formula for each clustering result, wherein the first formula is as follows: In the formula, As an index of the first degree of polymerization,For the first number of clusters,For the number of first historical pr