CN-116482597-B - Electric energy meter operation data variable screening method, electronic equipment and storage medium
Abstract
The invention discloses an electric energy meter operation data variable screening method, electronic equipment and a storage medium, wherein basic errors BE at all times and variable sets corresponding to the basic errors BE are constructed, maximum information coefficients between the basic errors at all times in basic error vectors of an intelligent electric meter and all samples in a sample data set are calculated, samples with the maximum information coefficients smaller than a set threshold value are deleted, the rest samples form the variable sets, abnormal values are detected, abnormal value detection results of KNN are quantitatively analyzed by utilizing improved weighted Euclidean distances and SC and CH, basic error abnormal value correction is carried out by utilizing the detected weights, basic errors are corrected, and the corrected basic error vectors and variable sets are subjected to second-step variable screening, so that the basic error vectors and the variable sets are finally determined. The invention not only can effectively and scientifically detect abnormal data and ensure the integrity of the data, but also can rapidly and reasonably screen uncorrelated variables and joint variables and avoid the problem of collinearity.
Inventors
- QIN YUHONG
- LUO YUN
- WU XIANG
- JIANG ZHIBO
- TANG RENBO
- YU SHAOQIN
Assignees
- 中国电建集团中南勘测设计研究院有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20230426
Claims (6)
- 1. The method for screening the operation data variable of the electric energy meter is characterized by comprising the following steps of: S1, constructing a variable set theta of theta= [ T, R, P, V, W, YW, IP, T ] under the condition that the basic error BE and the basic error BE correspond to each moment, wherein T, R, P, V, W, YW, IP, T respectively represents temperature, humidity, pressure, illumination, wind speed, salt fog, electric power and time; S2, calculating the maximum information coefficient between the basic error of each moment in the basic error vector of the intelligent ammeter and each sample in the sample data set, deleting samples with the maximum information coefficient smaller than a set threshold value, and forming a variable set theta 1 by the residual samples; S3, correcting the basic error BE by using the following formula: Wherein, the method comprises the steps of, In order to correct the basic errors after the correction, W s denotes the correction weight value of the s-th data point, For the weighted euclidean distance from the s-th data point to the g-th data point, Weighting of Is the maximum information coefficient between the basic error and the ith variable in the variable set theta 1 , x s,i represents the ith variable of the data point s in the variable set theta 1 , x g,i represents the ith variable of the data point g in the variable set theta 1 , y s is the basic error of the data point s, y g is the basic error of the data point g, n is the total number of variables in the variable set theta 1 ; the corrected basic error vector and the variable set theta 1 are used as the variables for screening; S4, further screening variables from the corrected basic error vector and variable set theta 1 , wherein the screened basic error vector and variable set are the finally selected variables; in step S4, the specific acquisition process of the filtered variable set θ 2 includes: Carrying out spearman correlation analysis on each basic error in the corrected basic error vector and each variable in the variable set theta 1 to obtain g spearman correlation coefficients, wherein g is the number of variables in the variable set theta 1 ; And for all the spearman correlation coefficients larger than the threshold value, reserving the variable corresponding to the maximum value of the spearman correlation coefficients, and deleting the variables corresponding to the rest spearman correlation coefficients.
- 2. The method for filtering operation data variables of an electric energy meter according to claim 1, wherein in step S2, a maximum information coefficient calculation formula is: wherein MIC (x, y) is the maximum information coefficient; The grid segment numbers are divided in the X direction and the Y direction respectively; B is set to the 0.6 th power of the total grid number, i.e., the number of the grid numbers is equal to the number of the grid numbers, and X and Y are random variables, Y corresponds to the basic error in the basic error vector, X corresponds to the samples in the sample data set θ, and I (X, Y) is mutual information.
- 3. The method for filtering the operation data variable of the electric energy meter according to claim 1, wherein the specific implementation process for judging whether the basic error of the data point is normal comprises the following steps: 1) Initializing k and m, wherein k is a positive integer which is 0<m to be less than or equal to 1; 2) Calculating a weighted Euclidean distance from the current data point to each other data point; 3) Sorting all the distance values, selecting k adjacent points with the smallest distance, and selecting a maximum distance value from the k adjacent points; 4) Repeating the step 2) and the step 3) for each other data point, and traversing all the data points to obtain M distance values, wherein M is the number of the data points; 5) The method comprises the steps of setting points with distances larger than a threshold value in M distance values as abnormal points and other points as normal points, assigning the abnormal points with a first label value and the normal points with a second label value, wherein the threshold value is set as a weighted Euclidean distance maximum value between an MxMth data point and k adjacent points; 6) Calculation of SC and CH: , wherein a is the average of all distance values in M distance values and b is the class corresponding to the first label value or the class corresponding to the second label value; the method comprises the steps of taking a total number of samples in a basic error vector and variable set theta 1 , wherein e is a category number, tr (B e ) is a covariance matrix between category data, tr (W e ) is a covariance matrix of category internal data, and tr is a trace of a matrix; 7) The value of k is increased by a set step length, the value of m is increased by a set step length, the step 2) is returned until the upper limit of the adjacent value interval and the abnormal proportion interval is reached, and the adjacent value and the abnormal proportion corresponding to the SC and CH optimal values are taken as the final adjacent value and abnormal proportion; 8) And (3) repeating the steps 3) to 5) by taking the final adjacent value and the abnormal proportion as inputs for each data point corresponding to the basic error vector.
- 4. The method of claim 1, wherein the threshold is set to 0.8.
- 5. An electronic device, comprising: one or more processors; A memory having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the steps of the method of any of claims 1-4.
- 6. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the steps of the method according to any one of claims 1-4.
Description
Electric energy meter operation data variable screening method, electronic equipment and storage medium Technical Field The invention relates to an electric energy meter operation data variable processing technology, in particular to an electric energy meter operation data variable screening method, electronic equipment and a storage medium. Background In the operation process of the electric energy meter, the measurement accuracy and reliability are mainly influenced by environmental factors such as temperature, humidity and the like, in addition, the electric energy meter can be seriously influenced by salt fog, wind speed and the like in coastal areas, the electric energy meter can be influenced by environmental factors such as altitude, illumination and the like in high-altitude areas, and the electric energy meter can be influenced by different degrees when measuring different electric powers. Under the combined action of a plurality of factors, the measurement accuracy and the operation reliability of the electric energy meter are gradually reduced, however, the influence of part of factors is far greater than that of other factors, the pressure in the same area is always a stable variable, the variables have strong correlation, and the problem of colinear is often caused by bringing a plurality of strong correlation variables into a model. The electric energy meter is used as a terminal nerve of the intelligent power grid, the research on the metering accuracy and reliability of the electric energy meter under multiple influencing factors is of great importance, certain abnormal values exist in the operation data of the electric energy meter, the existence of the abnormal values can seriously influence the convergence of a follow-up research model, and the reliability of a calculation result is high. Study of variable screening and anomaly detection by scholars and experts is as follows: The invention patent application with the application number of CN115346682A discloses a variable screening method and a system based on breast cancer data and a readable storage medium, which belong to the technical field of medical data processing, wherein the variable screening method based on the breast cancer data comprises the steps of acquiring the breast cancer data; the method comprises the steps of preprocessing breast cancer data, preprocessing the breast cancer data, performing correlation analysis on the preprocessed breast cancer data through a maximum information coefficient method to obtain a first screening result, and performing variable screening on the first screening result to obtain a second screening result, wherein the variable screening method comprises one or a combination of Lasso algorithm, random forest, SIS variable screening and DC-SIS variable screening. The method can solve the problems of marginal uncorrelation and joint correlation between variables, reduces error rate and makes data have biological significance. The invention patent with the application number of CN109459409B discloses a near infrared abnormal spectrum identification method based on KNN, aiming at the problem that in near infrared spectrum analysis, the existence of abnormal spectrum data seriously affects the accuracy and reliability of a spectrum analysis model. The method comprises the steps of selecting similarity measures, selecting super-parameters k, calculating inter-spectrum distance measures, finding k shortest distance samples, calculating sample anomaly measures, sorting samples according to anomaly measures, and identifying and eliminating anomaly measure high samples. And the method is used for identifying and eliminating the abnormal spectrum in the near infrared spectrum analysis model. However, the above scheme does not relate to the influence of each variable on the detection of abnormal values of the operation data of the electric energy meter, which is a monitoring object, and the screening steps for screening irrelevant and united relevant variables are too complex. Disclosure of Invention Aiming at the defects of the prior art, the invention provides the method for screening the operating data variables of the electric energy meter, the electronic equipment and the storage medium, which are used for detecting abnormal data, ensuring the integrity of the data, and rapidly and reasonably screening irrelevant variables and joint variables to avoid the problem of colinearity. In order to solve the technical problems, the technical scheme adopted by the invention is that the method for screening the operation data variable of the electric energy meter comprises the following steps: S1, constructing a variable set theta of theta= [ T, R, P, V, W, YW, IP, T ] under the condition that the basic error BE and the basic error BE correspond to each moment, wherein T, R, P, V, W, YW, IP, T respectively represents temperature, humidity, pressure, illumination, wind speed, salt fog, electric power and time;