CN-122020228-A - Power load curve coding and clustering method and system, storage medium and terminal
Abstract
The invention discloses a power load curve coding and clustering method and system, a storage medium and a terminal, and belongs to the technical field of power load data analysis. The method comprises the steps of collecting and preprocessing batch power load curve data, constructing an initial encoder comprising units such as orthogonal square wave basis function generation and the like, fusing multiple codes by a hybrid coding unit to obtain a hybrid coding matrix, training the encoder by adopting an unsupervised mode and combining an on-line clustering loss function of attractive force and repulsive force loss, inputting the preprocessed data into the trained encoder to obtain codes, clustering and coding by combining multiple indexes and consistency decision rules to obtain final clustering numbers and clustering centers, preprocessing new data, and distributing the new data to the clusters of the nearest clustering centers after coding. The method and the device fully extract the characteristics of the load curve, improve the clustering precision and efficiency, enhance the interpretability of the model, ensure the data safety, reduce the application cost and have excellent clustering distribution effect, and are suitable for the fine management of the power system.
Inventors
- LI WENZHAO
- LIU GE
- FAN TAO
- CHEN XI
- GUO DALEI
- LI BORANG
- MA JIAN
Assignees
- 国网天津市电力公司城西供电分公司
- 国网天津市电力公司
- 国家电网有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260416
Claims (10)
- 1. The power load curve coding and clustering method is characterized by comprising the following steps of: Step 1, collecting batch power load curve data Lx, and performing cleaning and standardization processing to obtain preprocessed power load curve data; Step 2, constructing an initial encoder, wherein the initial encoder comprises an orthogonal square wave basic function generating unit, a frequency domain parameter extracting unit, a hybrid encoding unit, a transducer encoding part, an adaptive weighting pooling unit and an output encoding layer which are sequentially connected; The hybrid coding unit is used for carrying out feature coding, absolute position coding, relative frequency coding and hybrid coding on frequency domain parameter vectors of the batch power load curve data to obtain a hybrid coding matrix of the power load curve data, wherein the hybrid coding matrix is the sum of the feature coding matrix, the relative frequency coding matrix and the absolute position coding matrix; Training the initial encoder to obtain a trained encoder, wherein the training of the initial encoder adopts an unsupervised training mode and uses an online clustering loss function combining attractive force loss and repulsive force loss; Step 4, inputting the preprocessed power load curve data into a trained encoder to obtain batch power load curve data codes z; Step 5, clustering the code z of the batch power load curve data to obtain the final clustering number of the batch power load curve data Clustering center of individual clusters ; Step 6, the newly collected power load curve data After cleaning and standardization treatment, new power load curve data after pretreatment are obtained; inputting the new power load curve data into a trained encoder to obtain a new power load curve data code Encoding new electrical load curve data Assigned to nearest cluster centers And (5) clustering.
- 2. The method according to claim 1, wherein in step 1, each power load curve data Lx contains active power values at L time points in a day.
- 3. A method for encoding and clustering electrical load curves as claimed in claim 2, wherein, In step 2, an orthogonal square wave basis function generating unit is configured to generate an orthogonal square wave basis function combination w, where w includes A square wave basis function of the frequency, wherein, Half the number of data points is contained for each piece of power load curve data; the frequency domain parameter extraction unit is used for calculating dot products according to each preprocessed power load curve data and each frequency square wave in w respectively, and combining the dot product results of the same power load curve data into a vector to obtain frequency domain parameter vectors f of the batch power load curve data; The hybrid coding unit is specifically used for the following: (1) Mapping the frequency domain parameter vector f of the batch power load curve data to a high-dimensional feature space to obtain a feature coding matrix The formula is as follows: ; Wherein: coding a weight matrix for the feature; Encoding a bias vector for the feature; Is a feature dimension; t is the transposition of the matrix; (2) The absolute position of each element in the frequency domain parameter vector f of the power load curve data is encoded by using a sinusoidal European function, and the position encoding parameter of each absolute position pos is as follows: ; Wherein: Index absolute position in the frequency domain parameter vector; For the dimension index to be a function of the dimension index, ; Is the position Encoded values in the even dimension 2 p; Position of the position Encoded values in the odd dimension 2p+1; Combining absolute position codes of various pos positions into a position code matrix : ; (3) Aiming at a frequency domain parameter vector f of batch power load curve data, taking the relative relation among frequency domain parameters into consideration, capturing the internal relation among frequencies, and constructing a relative relation matrix Wherein The element at the (i, j) position in R is ; Wherein: Is a relative relation matrix; Is the first And (b) The relation strength of the frequency domain parameters; the frequency domain index difference represents the frequency distance; is the absolute value of the frequency domain parameter; Is small constant, prevents zero removal, usually takes ; Representing the relative distance of the frequency domain locations; Representing the relative magnitudes of the frequency-domain parameter values; after the construction of the relative relation matrix R, the relative frequency coding matrix is calculated : Wherein, the Encoding a weight matrix for a learnable, trainable relative frequency; (4) Will be Adding the three coding matrixes to obtain a mixed coding matrix H: ; The converter coding component unit is used for inputting the mixed coding matrix of the power load curve data into a converter model for processing to obtain a converter code of the batch power load curve data; an adaptive weighting pooling unit for use in transform coding of batch strip power load curve data The frequency dimensions are weighted and pooled, and the global feature vector after pooling is extracted The formula is as follows: Wherein: an output encoding a component unit for the transducer; Is the first A vector of frequency dimensions; Is a learnable attention weight vector; Is the first R is a frequency dimension index, which is used for carrying out normalization summation on the attention weights of all frequency dimensions in denominator; an output coding layer for pooling global feature vectors The final encoder encoding vector z1 mapped to the final low-dimensional encoding vector to obtain the batch power load curve data is given by the following formula: ; Wherein: is a first layer weight matrix; is a first layer bias vector; A second layer weight matrix; Is a second layer bias vector; To hide layer dimensions, take ; For the final encoding dimension, take 。
- 4. A method for encoding and clustering electrical load curves as claimed in claim 3, wherein, In step3, the online clustering loss function includes the following steps: (1) Attraction loss The average distance between the sample and the nearest cluster center is measured, and the formula is as follows: ; Wherein: For the number of samples processed in each training, take ; First of all The encoded vector of each sample has a dimension of ; Is the first The code vectors of the individual samples are used as potential clustering centers in online clustering; To minimize the operator, find distance The nearest cluster center of the samples; A Euclidean distance function; (2) Loss of repulsive force The method is used for measuring the degree of insufficient distance between clustering centers, and the formula is as follows: ; Wherein: For double summation, all the different pairs of cluster centers are traversed, together Pairing; Returning to the original value when the bracket is positive, and otherwise, returning to 0; for distance boundary, superparameter, controlling the minimum expected distance between clustering centers, ; Is a cluster center And A Euclidean distance between them; Is a normalization factor; (3) Total loss of The objective function for model training optimization is formulated as follows: ; Wherein: for the weight coefficient, the importance of the attractive force loss and the repulsive force loss is balanced, 。
- 5. A method for encoding and clustering electrical load curves as claimed in claim 4, wherein, Step 5, specifically comprising the following steps: step 5.1, determining the value range of the candidate cluster number K as [2, ]; ; Wherein K max is the maximum clustering number; Is a rounding operation; step 5.2, calculating an effective candidate optimal K value set based on the data codes z of the batch power load curves The method is characterized by comprising the following steps: (1) Calculation of In the method, the clustering parameters of the values of the number of each cluster are specifically as follows: For a pair of The value of each cluster number is calculated by using a K-means clustering algorithm and Euclidean distance between codes as distance measurement, and when the value of the candidate cluster number is K, the cluster center corresponding to each cluster is calculated Power load curve cluster of each cluster center K is the mark of each of the K clusters; (2) Determining a first candidate optimal cluster number using an elbow method The method is characterized by comprising the following steps: for each candidate cluster number Calculating clustering inertia corresponding to the clustering result of each K value : ; Wherein, the Is attributed to the clustering center K is the mark of each cluster in the K candidate clusters; Based on cluster inertia Selecting a point with the largest change of the inertia curvature by using an elbow method as a first candidate optimal cluster number ; (3) Determining a second candidate optimal cluster number using the optimal profile coefficients The method is characterized by comprising the following steps: for each candidate cluster number Calculating global contour coefficients corresponding to clustering results of all K values Selecting The largest candidate cluster number K is the second candidate optimal cluster number ; (4) Determining a third candidate optimal cluster number using an optimal Calinski-Harabasz index The method is characterized by comprising the following steps: for each candidate cluster number Calculating Calinski-Harabasz indexes CH (K) corresponding to clustering results of all K values, and selecting the largest clustering number K of CH (K) as the third candidate optimal clustering number ; (5) Determining a fourth candidate optimal cluster number using an optimal Davies-Bouldin index The method is characterized by comprising the following steps: for each candidate cluster number Calculating Davies-Bouldin index DB (K) corresponding to the clustering result of each K value, and selecting the largest clustering number K of DB (K) as the fourth candidate optimal clustering number ; (6) Collecting valid candidate optimal K value sets : ; Step 5.3 effective candidate optimal K value set Generating final cluster number Number of final clusters Corresponding cluster center Power load curve cluster of each cluster center The method is characterized by comprising the following steps: step 5.31 based on Obtaining consistency metrics including range consistency, mode consistency and median consistency, comprising the following specific steps: Wherein the range is consistent The formula is as follows: ; Wherein, the Is set at the maximum value of (c), Is that Is the minimum of (2); Wherein, mode uniformity, the formula is as follows: ; wherein, the median consistency The formula is as follows: ; Step 5.32, determining the final cluster number according to the consistency measure and combining the decision rule According to the final clustering number Obtaining a corresponding clustering center Power load curve cluster of each cluster center The method comprises the following steps: Executing the following rules a1-a4 according to the priority, wherein the priority of the rules a1-a4 is reduced in sequence, and if a certain rule is met in the executing process, skipping the subsequent rule to directly output a result, and if the result is not met, continuing executing the next rule according to the priority; Wherein, the rule a1 is as follows : ; The condition is that at least 3 indexes recommend the same K value; the rule a2 exists as follows : ; Provided that the maximum difference of all K values does not exceed 2; Rule a3 only two primary candidate values And : ; Then a K value with a greater profile factor is selected: ; Rule a4 presence And no mode: preferential trust elbow rules, but require verification: ; otherwise try the next best choice: ; obtaining the final cluster number according to the execution structure of the rule ; According to the final clustering number Obtaining a corresponding clustering center Power load curve cluster of each cluster center 。
- 6. A method for encoding and clustering electrical load curves as claimed in claim 5, wherein, After the step 5, a step 7 of generating a cluster center representative curve vector is further included, and specifically includes the following steps: step 7.1 calculating Weights of each of the power load curve data q Weighting of Inversely proportional to the distance of the coded vector in the cluster from the cluster center: ; Wherein: coding vector for power load curve data q To a cluster center O is the cluster Is used for the sample index of the (c), For the sample Euclidean distance from the code vector of (a) to the cluster center; Controlling the sensitivity of the weight to the distance to obtain 0.5 as the weight attenuation coefficient, and weighing Satisfy the following requirements ; Step 7.2, according to the weight Generating a cluster center representative curve vector The method is characterized by comprising the following steps: Wherein: For the final cluster number Is the q-th power load curve data of (2); 。
- 7. A power load curve coding and clustering system for executing the power load curve coding and clustering method according to any one of claims 1 to 6, which is characterized by comprising a data acquisition processing module, an initial encoder construction module, a training module, a coding module, a clustering center output module and a clustering module; The data acquisition processing module is used for acquiring batch power load curve data Lx, and performing cleaning and standardization processing to obtain preprocessed power load curve data; The initial encoder construction module is used for constructing an initial encoder, and the initial encoder comprises an orthogonal square wave basic function generation unit, a frequency domain parameter extraction unit, a hybrid coding unit, a transform coding part, an adaptive weighting pooling unit and an output coding layer which are connected in sequence; The hybrid coding unit is used for carrying out feature coding, absolute position coding, relative frequency coding and hybrid coding on frequency domain parameter vectors of the batch power load curve data to obtain a hybrid coding matrix of the power load curve data, wherein the hybrid coding matrix is the sum of the feature coding matrix, the relative frequency coding matrix and the absolute position coding matrix; The training module is used for training the initial encoder to obtain a trained encoder, wherein the training of the initial encoder adopts an unsupervised training mode and uses an online clustering loss function combining attraction loss and repulsive force loss; The coding module is used for inputting the preprocessed power load curve data into the trained encoder to obtain batch power load curve data codes z; the clustering center output module is used for clustering the data codes z of the batch power load curves to obtain the final clustering number of the data of the batch power load curves Clustering center of individual clusters ; The clustering module is used for clustering the newly acquired power load curve data After cleaning and standardization treatment, new power load curve data after pretreatment are obtained; inputting the new power load curve data into a trained encoder to obtain a new power load curve data code Encoding new electrical load curve data Assigned to nearest cluster centers And (5) clustering.
- 8. An electrical load curve encoding and clustering system according to claim 7, characterized in that, Each power load curve data Lx contains active power values at L time points in the day.
- 9. A storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by a processor to implement the power load curve encoding and clustering method of any one of claims 1-6.
- 10. An electronic terminal comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the power load curve encoding and clustering method of any one of claims 1-6.
Description
Power load curve coding and clustering method and system, storage medium and terminal Technical Field The invention relates to the technical field of power load data analysis, in particular to a power load curve coding and clustering method and system, a storage medium and a terminal. Background Along with the acceleration of the smart grid construction process and the continuous development of an energy management system, the analysis and processing technology of the power load curve plays an increasingly important role in the related applications of the power energy fields such as load prediction, demand response, energy efficiency evaluation and the like, and becomes a key technical support for the fine management of the power system. The prior power load curve analysis related technology and method still have a plurality of defects and shortcomings in the practical application process, and are difficult to meet the power load data analysis requirements of large scale, high precision and high safety, and the specific problems are as follows: (1) The traditional analysis method is mostly based on statistical feature extraction or simple data transformation mode, can not effectively capture the complex time sequence mode and frequency domain features contained in the power load curve, and has obvious defects on key information capture reflecting load characteristics such as load fluctuation law, equipment start-stop feature, peak-valley power distribution and the like; (2) The clustering capability is limited, the existing clustering method is generally used for carrying out similarity measurement based on the space distance of the original data, the accuracy of identifying the morphological similarity of the power load curve is insufficient, and deep load modes behind the data are difficult to excavate; (3) The interpretation is poor, wherein the related deep learning model has typical black box characteristics, and the coding result output by the model lacks definite physical meaning and cannot be effectively interpreted in the coding process and result; (4) The calculation efficiency is low, the training period of part of complex analysis models is long, the calculation complexity is high, and the real-time performance of data processing is insufficient; (5) The potential safety hazard of the data is prominent, wherein in the prior art, the load curve is displayed, transmitted and stored in an original form, key information such as the form, energy utilization rule and the like of the load curve is directly reflected in the curve data, and the potential hazard of directly exposing the key data is easy to cause energy utilization characteristic leakage to influence the data safety; (6) The data marking cost is high, a large amount of load curve data is required to be manually marked by adopting a coding method with supervised learning, so that a large amount of labor and material resource workload is consumed, the quality requirements on accuracy, standardization and the like of marking work are high, the marking difficulty is high, and the cost and threshold of technical application are greatly improved. In summary, the existing power load curve analysis technology has the problems to be solved in aspects of feature extraction, clustering effect, interpretability, calculation efficiency, data security, labeling application and the like, and a novel load curve processing method is needed to make up the defects of the prior art and improve the overall performance of power load curve analysis. Disclosure of Invention Aiming at the technical problems pointed out in the background art, the invention aims to provide a power load curve coding and clustering method and system, a storage medium and a terminal. In order to achieve the purpose of the invention, the technical scheme provided by the invention is as follows: First aspect The invention provides a power load curve coding and clustering method, which comprises the following steps: Step 1, collecting batch power load curve data Lx, and performing cleaning and standardization processing to obtain preprocessed power load curve data; Step 2, constructing an initial encoder, wherein the initial encoder comprises an orthogonal square wave basic function generating unit, a frequency domain parameter extracting unit, a hybrid encoding unit, a transducer encoding part, an adaptive weighting pooling unit and an output encoding layer which are sequentially connected; The hybrid coding unit is used for carrying out feature coding, absolute position coding, relative frequency coding and hybrid coding on frequency domain parameter vectors of the batch power load curve data to obtain a hybrid coding matrix of the power load curve data, wherein the hybrid coding matrix is the sum of the feature coding matrix, the relative frequency coding matrix and the absolute position coding matrix; Training the initial encoder to obtain a trained encoder, wherein the training of th