CN-121980455-A - Model perception prediction model for building load unbalance data

CN121980455ACN 121980455 ACN121980455 ACN 121980455ACN-121980455-A

Abstract

The invention provides a mode perception prediction model for building load unbalance data, which relates to the field of power load prediction, and comprises the steps of identifying a typical operation mode in building load data through a K-means clustering algorithm, including startup impact, morning, afternoon and shutdown dormancy, introducing a time window characteristic enhancement strategy to expand an original one-dimensional characteristic space into a three-dimensional characteristic space, constructing a mode perception oversampling method PP-SMOTE, obtaining an optimal balance factor through a balance factor optimization strategy, verifying the data quality generated by the PP-SMOTE, applying the PP-SMOTE to a LightGBM prediction model, and determining the optimal data balance degree through a multi-balance factor comparison experiment to realize optimal balance of prediction precision and data distribution characteristics.

Inventors

JI YING
Wei Haosen
LIAN HUIHUI
YUAN SIBO
YIN XIAOJIA
SHI CHENXIN
XIE JINGCHAO

Assignees

北京工业大学

Dates

Publication Date: 20260505
Application Date: 20260121

Claims (6)

1. The model perception prediction model for the building load unbalance data is characterized in that the construction process comprises the following steps: S1, historical building data acquisition and data preprocessing, wherein the historical building data comprises cold load data, meteorological data and real-time operation parameters, outlier analysis is carried out on the cold load data through a box line graph, abnormal values are removed, then the outliers are filled by a linear interpolation method, the filled data are filtered and smoothed through GCMWSG to obtain a building load data set, and the building load data set is divided into a test set and a training set according to proportion; S2, constructing a mode-aware oversampling method PP-SMOTE by introducing a K-means mode recognition and time window characteristic enhancement mechanism; s3, applying the PP-SMOTE to the training set in the S1, setting 9 balance factors from 0.2 to 1.0, and correspondingly generating different balance data sets in the 9; S4, optimizing the balance factors, selecting LightGBM models to test in the balance data set obtained in the S3, comprehensively evaluating by adopting evaluation indexes to obtain the optimal balance factors, and verifying the quality of data generated by the PP-SMOTE; And S5, after the quality verification of the PP-SMOTE generated data in S4 is passed, the PP-SMOTE is applied to a LightGBM model to obtain the PP-SMOTE-LightGBM.
2. The model for model-aware prediction of building load imbalance data according to claim 1, wherein the interpolation calculation formula in S1 is: ; In the formula, In order to regenerate the sample, In order to be a source sample, Is a neighboring sample; for the distance between two minority samples, rand (0, 1) is a value randomly ranging from 0 to 1.
3. The model for model perception prediction of building load unbalance data according to claim 1, wherein the K-means model recognition in S2 is realized by twice K-means clustering, wherein the first time is used for clustering and recognizing working and non-working time of all load data, the second time is used for clustering and classifying the load data in the working time period into minority class and majority class, and the following operations are specifically executed: let the building load data set be Grouping statistics are carried out according to the hour type, and the statistics characteristics of each hour are calculated: ; ; ; Wherein the method comprises the steps of As a feature vector of the object set, The value range is set to be 0-23 for the hour type, For the corresponding load value(s), 、、 Average load, standard deviation and coefficient of variation respectively, Is a constant value, and is used for the treatment of the skin, Is in hours A corresponding sample index set; introducing time code to acquire periodic characteristics of capturing time: ; ; Constructing a clustering feature vector: ; After the feature vector is subjected to standardized processing, clustering is carried out by applying a K-means algorithm: ; Wherein, the For the result of the clustering the number of the clusters, Is the first And clustering centers.
4. A model for model-aware prediction of building load imbalance data according to claim 3, wherein the time window feature enhancement mechanism expands the original one-dimensional feature space into a three-dimensional feature space, and the time window matches feature vectors to data in the generated building load dataset: For the point in time Define the size as Time window of (2) The time window feature extraction function is defined as: ; Wherein, the For the point in time Is a feature of the original feature vector of (a); is the mean value of the continuous features in the window; Standard deviation of continuous features in the window; ; ; Wherein, the Representing that the continuous feature component and the discrete feature in the feature vector remain unchanged; Performing neighbor search in a three-dimensional space of enhanced time window features: ; The time window collaborative generation mechanism interpolates the feature vector and the load value at the same time: Let the original sample be And k nearest neighbor samples thereof are The corresponding load value is Selecting random neighbors Generating interpolation coefficients : ; ; Wherein, the And The resulting eigenvector and the load value are respectively, And ensuring synchronous interpolation of the characteristics and the load.
5. The model for model-aware prediction of building load imbalance data according to claim 4, wherein the input building load data set in S3 is Balance factor Number of neighbors Time window size Output as balanced data set 。
6. The model for model-aware prediction of building load unbalance data according to claim 5, wherein in S4, the evaluation index comprises actual error MAE, average absolute error MAPE, root mean square error RMSE and decision coefficient R 2 of the predicted value; ; ; ; ; Wherein, the As a total number of samples, For the i-th predicted value of the set, Is the ith true value; for the i-th predicted value of the set, For the i-th actual observation value, As an average of all the actual observations, As a total number of samples, For the index of the sample, Take a value of 1 to 。

Description

Model perception prediction model for building load unbalance data Technical Field The invention relates to the field of power load prediction, in particular to a mode perception prediction model for building load unbalance data. Background With the rapid development of global economy, building energy consumption is increasing. The construction industry is statistically the main world of global energy consumption and carbon emissions, accounting for about 40% of the total energy consumption worldwide and 36% of the total carbon emissions. Research has demonstrated that the potential of machine learning and big data to assist in decarbonization of buildings is enormous, and more predictive models are used for energy management. The traditional load prediction method mainly comprises three kinds of white box models, gray box models and black box models. The prior research has the following two remarkable defects in the field of building load prediction: most prior studies have not extensively studied the effects of data imbalance on prediction model accuracy, resulting in machine learning models that tend to perform poorly in the face of these rare but critical loading events. Especially in the case of smaller load data sets, the model often cannot effectively capture the features of a few classes of data, thereby affecting overall predictive performance. Although SMOTE algorithms can effectively handle unbalanced data sets, their effect is not ideal when applied directly to building load data sets. These deficiencies result in limitations in reliability, robustness and operating efficiency of the predictive model in practical applications. It is therefore desirable to provide a model of model-aware prediction of building load imbalance data to address the above-mentioned problems. Disclosure of Invention The invention aims to provide a mode perception prediction model for building load unbalance data, and provides a PP-SMOTE oversampling method which effectively solves the problem that the traditional SMOTE ignores time sequence and mutation when processing the building load unbalance data by introducing a K-means mode recognition and time window characteristic enhancement mechanism. The PP-SMOTE processed data can be used for obviously improving the accuracy of the prediction model LightGBM while fully learning the characteristics of rare load events. In order to achieve the above object, the present invention provides a model-aware prediction model for building load imbalance data, and the construction process performs the following steps: S1, historical building data acquisition and data preprocessing, wherein the historical building data comprises cold load data, meteorological data and real-time operation parameters, outlier analysis is carried out on the cold load data through a box line graph, abnormal values are removed, then the outliers are filled by a linear interpolation method, the filled data are filtered and smoothed through GCMWSG to obtain a building load data set, and the building load data set is divided into a test set and a training set according to proportion; S2, constructing a mode-aware oversampling method PP-SMOTE by introducing a K-means mode recognition and time window characteristic enhancement mechanism; s3, applying the PP-SMOTE to the training set in the S1, setting 9 balance factors from 0.2 to 1.0, and correspondingly generating different balance data sets in the 9; S4, optimizing the balance factors, selecting LightGBM models to test in the balance data set obtained in the S3, comprehensively evaluating by adopting evaluation indexes to obtain the optimal balance factors, and verifying the quality of data generated by the PP-SMOTE; And S5, after the quality verification of the PP-SMOTE generated data in S4 is passed, the PP-SMOTE is applied to a LightGBM model to obtain the PP-SMOTE-LightGBM. Preferably, the interpolation calculation formula in S1 is: ; In the formula, In order to regenerate the sample,In order to be a source sample,Is a neighboring sample; for the distance between two minority samples, rand (0, 1) is a value randomly ranging from 0 to 1. Preferably, the K-means pattern recognition in S2 is realized by twice K-means clustering, wherein the first time is used for carrying out clustering recognition on all load data to identify working and non-working time, the second time is used for carrying out clustering classification on the load data in the working time period to form minority classes and majority classes, and the following operations are specifically executed: let the building load data set be Grouping statistics are carried out according to the hour type, and the statistics characteristics of each hour are calculated: ; ; ; Wherein the method comprises the steps of As a feature vector of the object set,The value range is set to be 0-23 for the hour type,For the corresponding load value(s),、、Average load, standard deviation and coefficient of variation