Search

CN-122020180-A - Electric power interaction model training, electric power interaction method, device, equipment and medium

CN122020180ACN 122020180 ACN122020180 ACN 122020180ACN-122020180-A

Abstract

The invention discloses a method, a device, equipment and a medium for training a power resource interaction processing scene model and processing power resource interaction. The method comprises the steps of being applied to unit systems in the power resource interaction system, building a training model aiming at each unit system in the power resource interaction system, building a model structure of cooperation of a strategy network and a value network, and carrying out value difference calculation and parameter adjustment by combining state data, interaction operation and value feedback of the unit systems at different moments so that the interaction operation decision of the unit systems in the power resource interaction system is based on the state data of the unit systems. The embodiment of the invention can improve the model training efficiency and the decision efficiency of the power resource interactive operation.

Inventors

  • CAO JINJIA
  • XU XU
  • YAO WEITAO
  • XUE FEI

Assignees

  • 西交利物浦大学

Dates

Publication Date
20260512
Application Date
20260210

Claims (10)

  1. 1. The power resource interaction processing scene model training method is characterized by being applied to a unit system in a power resource interaction system, wherein the power resource interaction system comprises at least one power output system and a power energy storage system, the unit system comprises the power output system or the power energy storage system, and the method comprises the following steps: Inputting the state data of the unit system at the first moment into a strategy network to obtain the interactive operation of the unit system at the first moment; The state data and the interactive operation of the power resource interactive system at the first moment are input into a first value network to obtain initial value; According to the state data and interactive operation of the unit system at the second moment, the initial value is adjusted to obtain the target value of the unit system at the first moment, wherein the second moment is prior to the first moment; Calculating the value difference between the compensation value of the unit system at the second moment and the target value, wherein the compensation value at the second moment is obtained by inputting the unit system into a second value network for processing according to the state data and the interactive operation of the power resource interactive system at the second moment; And according to the value difference, adjusting parameters of the strategy network and the value network corresponding to the unit system.
  2. 2. The method of claim 1, wherein adjusting the initial value based on the state data and the interaction of the unit system at the second time to obtain the target value of the unit system at the first time comprises: determining the resource exchange cost of the unit system at the second moment according to the state data and the interactive operation of the unit system at the second moment; Calculating a saproliferation value of the power resource interaction system at the first moment according to the interaction operation of the power resource interaction system at the first moment; and adjusting the initial value according to the resource exchange cost and the saprolimus value to obtain the target value of the unit system at the first moment.
  3. 3. The method of claim 2, wherein said adjusting the initial value based on the resource exchange cost and the saprolimus value to obtain a target value for the unit system at a first time comprises: determining the access times of the state data according to the state data of the unit system at the first moment and the state data in the history training process; Calculating exploration rewards of the unit system at the first moment according to the access times; calculating a correction value according to the exploration rewards, the resource exchange cost and the saprolil value; and according to the correction value, adjusting the initial value to obtain the target value of the unit system at the first moment.
  4. 4. The power resource interaction processing method is characterized by being applied to a unit system in a power resource interaction system, wherein the power resource interaction system comprises at least one power output system and a power energy storage system, the unit system comprises the power output system or the power energy storage system, and the method comprises the following steps: acquiring state data at the current moment; the state data of the current moment is input into a pre-trained strategy network to obtain the interactive operation of the current moment, and the strategy network is trained by the power resource interactive processing scene model training method according to any one of claims 1-4.
  5. 5. The method of claim 4, wherein the inputting the state data of the current time into a pre-trained policy network, obtaining the interaction of the current time comprises: When the unit system is the electric power output system, the load level, the energy storage state, the output force and the output cost of the electric power output system in the current time period are input into a pre-trained strategy network, and the external power grid interaction quantity, the system interaction quantity and the load adjustment quantity at the current moment are obtained.
  6. 6. The method of claim 4, wherein the inputting the state data of the current time into a pre-trained policy network, obtaining the interaction of the current time comprises: When the unit system is the electric power energy storage system, the energy storage state and the output cost of the electric power energy storage system in the current time period are input into a pre-trained strategy network, and the system interaction quantity and the output cost adjustment quantity at the current moment are obtained.
  7. 7. The power resource interaction processing scene model training device is characterized by being applied to a unit system in a power resource interaction system, wherein the power resource interaction system comprises at least one power output system and a power energy storage system, the unit system comprises the power output system or the power energy storage system, and the device comprises: The operation determining module is used for inputting the state data of the unit system at the first moment into the strategy network to obtain the interactive operation of the unit system at the first moment; the value evaluation module is used for inputting the state data and the interactive operation of the power resource interactive system at the first moment into a first value network to obtain an initial value; The target value determining module is used for adjusting the initial value according to the state data and the interactive operation of the unit system at a second moment to obtain the target value of the unit system at a first moment, wherein the second moment is earlier than the first moment; The compensation value at the second moment is obtained by inputting the value difference between the compensation value of the unit system at the second moment and the target value into a second value network through the unit system according to the state data and the interactive operation of the power resource interactive system at the second moment; and the parameter adjustment module is used for adjusting parameters of the strategy network and the value network corresponding to the unit system according to the value difference.
  8. 8. The power resource interaction processing device is characterized by being applied to a unit system in a power resource interaction system, wherein the power resource interaction system comprises at least one power output system and a power energy storage system, the unit system comprises the power output system or the power energy storage system, and the device comprises: the data acquisition module is used for acquiring state data at the current moment; The operation output module is used for inputting the state data of the current moment into a pre-trained strategy network to obtain the interactive operation of the current moment, and the strategy network is trained by the power resource interactive processing scene model training method according to any one of claims 1-4.
  9. 9. A power resource interaction processing scenario model training or power resource interaction processing device, characterized in that the power resource interaction processing scenario model training or power resource interaction processing device comprises: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the power resource interaction handling scenario model training or the power resource interaction handling method of any of claims 1-6.
  10. 10. A computer readable storage medium storing computer instructions for causing a processor to implement the power resource interaction handling scenario model training or power resource interaction handling method of any of claims 1-6 when executed.

Description

Electric power interaction model training, electric power interaction method, device, equipment and medium Technical Field The invention relates to the technical field of internet, in particular to a method, a device, equipment and a medium for electric power resource interaction processing scene model training and electric power resource interaction processing. Background Current power systems are facing trends of high-proportion access of renewable energy sources, complicated electric energy structures, rapid growth of distributed energy sources and the like. This trend has prompted the evolution of the power market from centralized scheduling to a pattern of distributed and multi-principal collaborative interactions. The power resource interaction processing means that a plurality of energy main bodies exist in a power system, and the power resource interaction is carried out among the plurality of energy main bodies, so that electric energy is fully and efficiently utilized, and the power generation cost is reduced. Aiming at the problem that how to interact among a plurality of main bodies can maximally reduce the power generation cost, the prior art mostly adopts a method of centralized model training. The centralized model training is to train the model by taking the whole power system as a training object. The state data of all the main bodies are combined to form a high-dimensional input space for end-to-end modeling, the model training method has high calculation complexity and low model training efficiency. When in application, the interaction operation of each main body cannot be output in time. Disclosure of Invention The invention provides a power resource interaction processing scene model training method, a power resource interaction processing scene model training device, a power resource interaction processing device, power resource interaction processing equipment and a power resource interaction processing medium, which can improve model training efficiency and decision efficiency of power resource interaction operation. According to an aspect of the present invention, an embodiment of the present invention provides a method for training a scenario model of power resource interaction processing, where the method includes: Inputting the state data of the unit system at the first moment into a strategy network to obtain the interactive operation of the unit system at the first moment; The state data and the interactive operation of the power resource interactive system at the first moment are input into a first value network to obtain initial value; According to the state data and interactive operation of the unit system at the second moment, the initial value is adjusted to obtain the target value of the unit system at the first moment, wherein the second moment is prior to the first moment; Calculating the value difference between the compensation value of the unit system at the second moment and the target value, wherein the compensation value at the second moment is obtained by inputting the unit system into a second value network for processing according to the state data and the interactive operation of the power resource interactive system at the second moment; And according to the value difference, adjusting parameters of the strategy network and the value network corresponding to the unit system. According to another aspect of the present invention, an embodiment of the present invention further provides an apparatus for training a scenario model for interaction processing of electric power resources, where the apparatus includes: The operation determining module is used for inputting the state data of the unit system at the first moment into the strategy network to obtain the interactive operation of the unit system at the first moment; the value evaluation module is used for inputting the state data and the interactive operation of the power resource interactive system at the first moment into a first value network to obtain an initial value; The target value determining module is used for adjusting the initial value according to the state data and the interactive operation of the unit system at a second moment to obtain the target value of the unit system at a first moment, wherein the second moment is earlier than the first moment; The compensation value at the second moment is obtained by inputting the value difference between the compensation value of the unit system at the second moment and the target value into a second value network through the unit system according to the state data and the interactive operation of the power resource interactive system at the second moment; and the parameter adjustment module is used for adjusting parameters of the strategy network and the value network corresponding to the unit system according to the value difference. According to an aspect of the present invention, an embodiment of the present invention provides a power resource inter