CN-122022068-A - Provincial moon power grid carbon emission factor prediction and space heterogeneity analysis method, system, equipment and medium based on interpretable ensemble learning

CN122022068ACN 122022068 ACN122022068 ACN 122022068ACN-122022068-A

Abstract

The invention discloses a provincial moon power grid carbon emission factor prediction and space heterogeneity analysis method, system, equipment and medium based on interpretable integrated learning, wherein the method comprises the steps of constructing a provincial moon panel sample; preprocessing and encoding, constructing price index hysteresis characteristics and selecting optimal hysteresis, cross-checking by adopting a forward chain time slice, screening dynamic characteristics to determine a unified characteristic subset, training a plurality of integrated regression learners and adjusting parameters to output predicted values, and analyzing the interpretability and the space heterogeneity based on SHAP, PDP/ICE and bivariate PDP. The invention does not need fine-grained scheduling and trans-provincial transaction data, realizes high-frequency prediction and interpretable traceability, and is suitable for carbon accounting, policy evaluation and regional collaborative emission reduction management.

Inventors

WANG JUNFENG
Bao Ai
LIN YUCHEN
ZHANG YAHUI

Assignees

南开大学

Dates

Publication Date: 20260512
Application Date: 20260318

Claims (10)

1. The provincial moon power grid carbon emission factor prediction and space heterogeneity analysis method based on interpretable integrated learning is characterized by comprising the following steps of: Step S1, constructing a panel sample with the province-month as an index, and acquiring and aligning province-level month power grid carbon emission factor data and characteristic data, wherein the characteristic data at least comprises social and economic activity characteristics, electric power system structural characteristics, province marks and month marks; Step S2, preprocessing and encoding the data in the step S1, wherein the preprocessing and encoding at least comprises missing value interpolation, outlier processing, month period encoding and province category encoding; S3, constructing multi-period hysteresis characteristics for price index class characteristics, and selecting the optimal hysteresis of each base variable through multi-model importance average ranking; s4, cross-verifying the training set and the verification set by adopting a forward chain time slice; Step S5, dynamically screening the feature set based on the model importance index in the cross verification framework to obtain a unified feature subset; S6, training a plurality of tree model integrated regression learners under the unified feature subset condition, performing super-parameter optimization in a time slice, selecting a target model based on a prediction performance index, and outputting a provincial month power grid carbon emission factor predicted value; s7, carrying out predictive evaluation and residual diagnosis on the target model; And S8, carrying out interpretation and space heterogeneity analysis on the target model, and outputting driving factor sequencing, marginal/interactive effect and provincial difference results.
2. The method of claim 1, wherein the socioeconomic performance characteristics in step S1 include at least 29 industrial product monthly yields, 15 industrial producer factory price index PPI, a social consumer retail sum, and a resident price index CPI, the electrical power system structural characteristics include at least a thermal power generation ratio and a zero carbon power generation ratio, and the characteristic data optionally includes a weather control variable.
3. The method according to claim 2, wherein the missing value interpolation in the step S2 adopts a two-step strategy, namely piecewise linear interpolation is performed along each province time sequence, then average value interpolation is performed on residual missing values, tail-shrinking processing is adopted in outlier processing, and observed values are truncated within a range of +/-3 times standard deviation of average values of various variables.
4. The method of claim 2, wherein the month period coding of step S2 includes constructing two periodic features of month_sin=sin (2π m/12) and month_cos=cos (2ζ m/12), wherein And the provincial category codes adopt single-hot codes or target statistical codes.
5. The method of claim 2, wherein the price index class characteristics in step S3 include PPI and CPI, the multi-phase lag is a 1-3 phase lag, and the multi-model importance average ranking includes computing importance rankings under random forest regression, gradient-lifting tree, and ordered-lifting tree models, respectively, and averaging the rankings to determine an optimal lag phase for each PPI/CPI-based variable.
6. The method of claim 1, wherein the dynamic screening of step S5 includes a Top-K truncation strategy and a cumulative contribution threshold strategy, wherein Top-K has a K value of 15-100 for the discrete set, and wherein the cumulative contribution threshold has a value of 70% -90% for the discrete set, and wherein the final unified feature subset is determined based on a minimum verification error for time slice cross-validation.
7. The method of claim 1, wherein the multiple tree model ensemble regression learner of step S6 includes at least random forest regression, gradient-lifting-based tree models, and ordered lifting tree models, and performing super-parametric optimization within a time-sliced cross-validation framework by grid search, random search, or bayesian optimization.
8. The method of claim 1, wherein the predictive evaluation index of step S7 includes at least R 2 , RMSE, MAE, and MAPE, and further diagnosing residual distribution, co-variance, and inter-provincial bias to measure model robustness.
9. The method of claim 1, wherein the interpretation of the interpretability and spatial heterogeneity of step S8 includes additively decomposing predictions using SHAP values and obtaining global and local contributions, characterizing marginal effects of key features in combination with partial dependence graphs PDP or individual condition expected curves ICE, characterizing interaction effects by bivariate PDP, and aggregating SHAP values in provinces to obtain spatial heterogeneity results.
10. A grid carbon emission factor prediction and interpretation system in accordance with the method of any one of claims 1-9, comprising: the data access and alignment module is used for executing the step S1; The preprocessing module is used for executing the step S2; the feature construction module is used for executing the step S3; the verification and screening module is used for executing the steps S4 and S5; the modeling and parameter adjusting module is used for executing step S6; An evaluation and diagnosis module and an interpretation and spatial analysis module for performing steps S7 and S8.

Description

Provincial moon power grid carbon emission factor prediction and space heterogeneity analysis method, system, equipment and medium based on interpretable ensemble learning Technical Field The invention relates to the technical fields of carbon emission accounting, socioeconomic activity impact analysis, data-driven prediction and interpretable machine learning of an electric power system, in particular to a method, a system, equipment and a medium for interpretable integrated learning prediction and space heterogeneity analysis of a China provincial month power Grid Carbon Emission Factor (GCEF). Background The power grid carbon emission factor is an important basic parameter for indirect emission accounting at the power consumption side, enterprise carbon investigation, green power transaction and regional collaborative emission reduction management. In the prior practice, the provincial power grid carbon emission factors are released in a annual or regional average mode, so that the monthly fluctuation and short-term disturbance are difficult to reflect, and the dynamic carbon management requirement is difficult to meet. Part of high-frequency calculation methods depend on fine-granularity scheduling data, trans-provincial tide or transaction data, have high data threshold and poor availability, and are difficult to stably apply in multiple areas for a long time. Other methods, while estimated under less data conditions, have limited ability to respond to changes in the monthly socioeconomic performance. The machine learning method can improve the prediction precision, but if strict time sequence verification design is lacking in an electric power carbon factor scene, information leakage is easy to generate and overestimation on generalization performance is easy to cause, and meanwhile, a black box model is difficult to quantify marginal contribution and interaction mechanism of social economic activities and power supply structures to carbon factors and is difficult to support policy evaluation and regional collaborative treatment. Therefore, a technical scheme is needed for realizing the high-frequency prediction of the carbon emission factor of the provincial month power grid under the condition that fine-granularity scheduling and trans-provincial transaction data are not needed, and explaining and identifying key social and economic driving factors, power structure influences and space differences. Disclosure of Invention The invention aims to solve the problems of high provincial month power grid carbon emission factor prediction data threshold, inextensible verification and unexplained model in the prior art, and provides a provincial month power grid carbon emission factor prediction and spatial heterogeneity analysis method, system, equipment and medium based on interpretable integrated learning, which realize provincial month GCEF prediction, driving traceability and spatial heterogeneity analysis. The technical scheme of the invention is as follows: the first aspect of the invention provides a provincial moon power grid carbon emission factor prediction and space heterogeneity analysis method based on interpretable ensemble learning, which comprises the following steps: Step S1, constructing a panel sample with the province-month as an index, and acquiring and aligning province-level month power grid carbon emission factor data and characteristic data, wherein the characteristic data at least comprises social and economic activity characteristics, electric power system structural characteristics, province marks and month marks; Step S2, preprocessing and encoding the data in the step S1, wherein the preprocessing and encoding at least comprises missing value interpolation, outlier processing, month period encoding and province category encoding; S3, constructing multi-period hysteresis characteristics for price index class characteristics, and selecting the optimal hysteresis of each base variable through multi-model importance average ranking; Step S4, cross-verifying the training set and the verification set by adopting a forward chain time slice, training by using a history window only and verifying by using a month section immediately after the training in each trade-off, so that future information leakage is avoided; Step S5, dynamically screening the feature set based on the model importance index in the cross verification framework to obtain a unified feature subset; S6, training a plurality of tree model integrated regression learners under the unified feature subset condition, performing super-parameter optimization in a time slice, selecting a target model based on a prediction performance index, and outputting a provincial month power grid carbon emission factor predicted value; s7, carrying out predictive evaluation and residual diagnosis on the target model; And S8, carrying out interpretation and space heterogeneity analysis on the target model, and outputting driving factor sequ