CN-122022368-A - Industrial park multi-load electric-carbon collaborative management decision-making method based on layered safety reinforcement learning

CN122022368ACN 122022368 ACN122022368 ACN 122022368ACN-122022368-A

Abstract

The invention provides an industrial park multi-load electricity-carbon collaborative management decision-making method based on layered safety reinforcement learning, which comprises the steps of calculating the node carbon potential and the carbon flow rate of a power distribution network, establishing a multi-load collaborative energy consumption model, constructing a layered reinforcement learning decision-making system based on a dynamic electricity-carbon flow coupling model, adopting a double-time scale framework of a first decision period and a second decision period, generating a global scheduling strategy of electricity purchasing power of a main power grid and generating output of renewable energy sources through an upper decision-making module, generating a work local adjustment strategy through a lower decision-making module, converting the global scheduling strategy and the local adjustment strategy into scheduling instructions and sending the scheduling instructions to an execution unit to realize collaborative control of the industrial park multi-load and energy systems, monitoring the operation of the power grid and the state of the carbon emission flow in real time, updating the calculation parameters of the dynamic electricity-carbon flow coupling model and the training strategy of the layered reinforcement learning decision-making system based on monitored feedback data, and completing iterative optimization of the model and strategy.

Inventors

Hou Luyang
SHEN WEIMING
FANG YUJUAN
GE LEIJIAO
WANG HANYU
SHI YANJUN
WU HONGXING

Assignees

福建福耀科技大学

Dates

Publication Date: 20260512
Application Date: 20260213

Claims (10)

1. The multi-load electric-carbon collaborative management decision-making method for the industrial park based on layered safety reinforcement learning is characterized by comprising the following steps of: calculating the node carbon potential and the carbon flow rate of the power distribution network based on multi-source operation data of an industrial park by combining tide calculation and a carbon emission flow theory, introducing a carbon flow entropy index to quantify carbon emission distribution characteristics and taking the carbon emission distribution characteristics as a component part of a reward function of a lower decision module, simultaneously establishing a multi-load cooperative energy consumption model, and constructing a dynamic electric power-carbon flow coupling model; Based on the dynamic electric power-carbon flow coupling model, constructing a layered reinforcement learning decision system, and adopting a double-time-scale framework of a first decision period and a second decision period, wherein the time length of the first decision period is longer than that of the second decision period; generating a global scheduling strategy of main power grid electricity purchasing power and renewable energy power generation output through an upper layer decision-making module, and generating a local adjusting strategy of an industrial park slow load, a low load and an energy storage system through a lower layer decision-making module, wherein the decision-making system introduces an enhanced priority experience playback mechanism to correct the parameter drift problem of multi-agent reinforcement learning, embeds a power grid safety constraint and a carbon quota constraint into a reward function, and quantifies constraint violation cost; The method comprises the steps of converting the global scheduling strategy and the local adjusting strategy into scheduling instructions, transmitting the scheduling instructions to an execution unit, realizing cooperative control of multiple loads and energy systems of an industrial park, monitoring the operation and carbon emission flow states of a park power grid in real time, updating the calculation parameters of the dynamic electric power-carbon flow coupling model and the training strategy of a layered reinforcement learning decision system based on monitored feedback data, and completing iterative optimization of the model and strategy.
2. The industrial park multi-load electricity-carbon collaborative management decision-making method based on layered safety reinforcement learning of claim 1 is characterized in that the industrial park multi-source operation data comprises an electric power parameter, a multi-type load operation parameter, an environmental meteorological parameter and an economic carbon parameter, wherein the multi-source operation data is input into a dynamic electric power-carbon flow coupling model after time synchronization, outlier rejection, missing value completion and standardized pretreatment, and the standardized pretreatment is to normalize original data with different dimensions to a [0,1] interval.
3. The industrial park multi-load electric carbon collaborative management decision-making method based on layered safety reinforcement learning is characterized in that the node carbon potential is power generation side carbon emission corresponding to node unit power consumption, a single node carbon potential is obtained by dividing all branch carbon flow density flowing into the node and branch tide, the node power generation injection power and the corresponding power generation unit carbon emission intensity by the node active flux, the carbon flow rate comprises load carbon flow rate, branch carbon flow rate and system total carbon flow rate, the load carbon flow rate is obtained by multiplying load distribution parameters by corresponding node carbon potential, the branch carbon flow rate is obtained by calculating power distribution network tide distribution and node carbon potential, the carbon flow entropy index is obtained by accumulating calculation results of all nodes after multiplying the ratio of the total carbon emission of each node to the system and the natural logarithm of the ratio.
4. The multi-load electricity-carbon collaborative management decision-making method for the industrial park based on layered safety reinforcement learning, which is disclosed in claim 1, is characterized in that the slow load is a manufacturing production load, the fast load comprises a data center calculation load and a vehicle network interaction charging station charging load, the first decision period is an hour level, the second decision period is a minute level, an upper layer decision result is used as an environmental condition of a lower layer decision to realize state transfer, an upper layer reward function fuses all lower layer reward values in a corresponding period to realize reward fusion, and the energy storage system is required to operate so as to meet upper and lower limit constraints of a charge state and constraints of a charging and discharging power range.
5. The industrial park multi-load electricity-carbon collaborative management decision-making method based on layered safety reinforcement learning according to claim 1 is characterized in that the implementation mode of the enhanced priority experience playback mechanism is that sample priority is calculated according to time sequence difference errors of experience samples, the sample priority is higher as the time sequence difference errors are larger, samples are selected from an experience playback buffer zone based on the priority by adopting a weighted random sampling mode, training deviation caused by non-uniform sampling is corrected by introducing importance sampling weight, the priority of false action-negative rewarding experience which is generated in the training process and causes grid constraint or carbon quota constraint violation is set to be the highest, and a reinforcement learning network is forced to repeatedly learn the experience to correct strategy network parameters.
6. The industrial park multi-load electricity-carbon collaborative management decision-making method based on layered safety reinforcement learning, which is characterized in that the grid safety constraint comprises node voltage constraint, branch current constraint and power flow constraint, constraint violation cost is an accumulated value of node voltage constraint violation degree, branch current constraint violation degree and power flow constraint violation degree, wherein the node voltage constraint violation degree is the sum of the value of node actual voltage exceeding a voltage lower limit and the value of voltage upper limit exceeding the node actual voltage, and only a non-negative part is calculated, the branch current constraint violation degree is the result of subtracting 1 from the ratio of branch actual current to branch rated current, only a non-negative part is calculated, and the power flow constraint violation degree is the result of subtracting 1 from the ratio of branch actual apparent power to branch rated apparent power.
7. The multi-load electricity-carbon collaborative management decision-making method for the industrial park based on layered safety reinforcement learning, which is characterized in that the specific mode of embedding the grid safety constraint and the carbon quota constraint into the reward function is that constraint violation cost is introduced into an original reward function to obtain a corrected reward function, the corrected reward function is obtained by deducting the product of the constraint violation cost and a preset penalty coefficient from the original reward value and adding constraint satisfaction rewards, when the constraint violation cost is zero, the preset constraint satisfaction rewards are obtained, and when the constraint violation cost is not zero, the reward is not available.
8. The multi-load electric-carbon collaborative management decision-making method for the industrial park based on the hierarchical safety reinforcement learning, which is disclosed in claim 1, is characterized in that the hierarchical reinforcement learning decision-making system is constructed based on a multi-main-body dual-delay depth deterministic strategy gradient framework combined with a hierarchical reinforcement learning algorithm and comprises an actor-critic network of each of an upper layer and a lower layer, and target network parameters are updated through a soft update mechanism.
9. The industrial park multi-load electricity-carbon collaborative management decision-making method based on layered safety reinforcement learning according to claim 1, wherein the monitored feedback data comprise actual running power of each execution unit, actual voltage of a power distribution network node, actual power flow of a branch, actual state of charge of an energy storage system, actual monitoring parameters of carbon emission flow, actual running cost of the industrial park and carbon emission, the calculated parameters of the dynamic electricity-carbon flow coupling model are updated online, and the training strategy of the layered reinforcement learning decision-making system is updated by adding real-time running state-action-rewarding experience into an experience playback buffer zone and training the reinforcement learning network online.
10. The industrial park multi-load electric-carbon collaborative management decision-making system based on layered safety reinforcement learning is characterized by comprising a multi-source information sensing module, an electric power-carbon flow coupling modeling module, a layered safety reinforcement learning decision-making module and an execution and feedback closed-loop module, wherein the multi-source information sensing module is used for collecting and preprocessing industrial park multi-source operation data, the electric power-carbon flow coupling modeling module is used for constructing a dynamic electric power-carbon flow coupling model and calculating node carbon potential, carbon flow rate and carbon flow entropy indexes, taking carbon flow entropy as components of a lower layer decision-making module rewarding function, the layered safety reinforcement learning decision-making module is used for constructing a layered reinforcement learning decision-making system and generating a global scheduling strategy and a local adjusting strategy, and the execution and feedback closed-loop module is used for issuing scheduling instructions, monitoring operation states, collecting feedback data and completing iterative optimization of the model and strategy.

Description

Industrial park multi-load electric-carbon collaborative management decision-making method based on layered safety reinforcement learning Technical Field The invention belongs to the technical fields of industrial energy optimization, artificial intelligence, low-carbon environmental protection and the like, in particular to a multi-load electric-carbon collaborative management decision-making method for an industrial park based on layered safety reinforcement learning, the method comprises the technical directions of multi-main body energy collaborative scheduling, electric power-carbon emission flow coupling management, layered safety reinforcement learning algorithm design and the like of an industrial park, and is particularly suitable for complex industrial park energy management and control scenes of integrated intelligent manufacturing units, data centers, vehicle-network interaction (V2G) charging stations, photovoltaic power generation systems and energy storage systems. Background Under the background of overall promotion of a double-carbon strategy, the industrial park is used as a core carrier for the aggregation of manufacturing industry, and is a key link for realizing carbon peak reaching and carbon neutralization in the industrial field. The industrial park integrates technologies such as the Internet of things, big data and artificial intelligence, a complex information physical energy system comprising source-network-load-storage is formed, the energy structure of the complex information physical energy system comprises renewable energy sources such as photovoltaic power generation, main network power supply, an energy storage system, manufacturing production load, data center calculation load, electric automobile charging load and the like, and the complex information physical energy system plays an irreplaceable role in pushing the green transformation of manufacturing industry. However, the low-carbon operation and energy management of the current industrial park face technical challenges of multiple dimensions, and the traditional scheduling method and management and control system are difficult to adapt to complex energy ecology and dynamic operation conditions, and the specific problems are as follows: the multi-type load cooperative scheduling mechanism is missing, and the energy utilization efficiency is low The industrial park has obvious heterogeneous characteristics of slow load and fast load, wherein the slow load takes the production load of a manufacturing unit as a core, has the characteristics of stable energy consumption, long scheduling period and strict process constraint, the dynamic adjustment of a production plan (such as emergency order insertion, product type change and delivery date adjustment) is easy to destroy the original resource scheduling scheme, the fast load comprises the calculation load of a data center (such as tasks of load prediction, production optimization, fault diagnosis and the like) and the charging load of a charging station, and has the characteristics of frequent energy consumption fluctuation, high response speed and great adjustment potential, and the calculation task allocation and the charging and discharging power adjustment are easy to be influenced by real-time requirements. The traditional scheduling method is mainly used for carrying out single-target optimization aiming at a single load type, such as energy consumption minimization of manufacturing load or completion time optimization of charging load, and lack of a collaborative scheduling mechanism of a cross-load type, so that the stability requirement of slow load and the flexibility requirement of fast load are difficult to balance, the renewable energy consumption rate is low, the peak-valley difference of power grid load is large, and the energy resource allocation efficiency is low. Imperfect electric power-carbon flow coupling management system and difficult carbon emission tracing and control The carbon emission of the industrial park is deeply bound with the flow direction of electric power, the carbon emission flow is used as a virtual network flow coexisting with the active power flow, and the distribution of the carbon emission flow is influenced by multiple factors such as the carbon emission intensity of the power generation unit, the power grid topology, the line impedance, the load distribution and the like. However, the prior art does not establish a dynamic coupling model of the power flow and the carbon emission flow, only can realize macroscopic statistics of carbon emission, cannot quantify carbon footprints of each node, each load and each branch, and is difficult to realize accurate traceability and dynamic control of carbon emission. Meanwhile, the existing research is mostly based on static or quasi-static hypothesis analysis of carbon emission flows, and influences of factors such as renewable energy source output fluctuation, load dynamic change, energy