CN-122022400-A - User-side-oriented multi-objective decision execution method and system

CN122022400ACN 122022400 ACN122022400 ACN 122022400ACN-122022400-A

Abstract

The application discloses a multi-target decision execution method and system facing to a user side, and relates to the field of intelligent scheduling decisions; inputting the decision state vector into a preset Markov strategy network to output a decision motion vector, wherein the Markov strategy network is obtained by carrying out iterative training on a feedforward neural network based on action space constraint and state transfer excitation, the action space constraint is obtained by constructing the decision state vector, the state transfer excitation is obtained by constructing a plurality of optimization targets, the plurality of optimization targets comprise operation cost optimization, carbon emission optimization and instruction response optimization, and the decision motion vector is cut according to the action space constraint to obtain an movable motion vector and the movable motion vector is sent to a plurality of user side devices to execute the decision. By implementing the method and the device, the multi-target decision accuracy facing to the user side can be improved.

Inventors

ZHANG HANBING
JI QINGFENG
CHENG XIANG
LIU BIN
JI AOYING
XIA TONG
JIANG CHANG
CHEN YUEJUN
LU WU
YE JICHAO
WU XIAOGANG
HU XINWEI
GONG HUAYONG
HUANG HUI
XU YONGHAI

Assignees

国网浙江省电力有限公司丽水供电公司

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (10)

1. A multi-objective decision execution method facing to a user side, comprising: collecting operation state data of a plurality of user side devices to be decided, and constructing to obtain decision state vectors; Inputting the decision state vector into a preset Markov strategy network and outputting a decision motion vector, wherein the Markov strategy network is obtained by performing iterative training on a feedforward neural network based on action space constraint and state transition excitation, the action space constraint is obtained by constructing each data column in the decision state vector, the state transition excitation is obtained by constructing a plurality of preset optimization targets, and the plurality of optimization targets comprise operation cost optimization, carbon emission optimization and instruction response optimization; And clipping the decision motion vector according to the motion space constraint to obtain an actionable motion vector, and sending the actionable motion vector to the plurality of user side devices so that the plurality of user side devices execute decisions according to the actionable motion vector.
2. The method for performing multi-objective decision on a user side according to claim 1, wherein the collecting operation state data of a plurality of user side devices to be decided, and constructing to obtain a decision state vector, specifically includes: collecting running state data of the plurality of user side devices, wherein the running state data comprises local measurement data of the plurality of user side devices and group perception information obtained by monitoring the plurality of user side devices through a power grid side system; And fusing the local measurement data of the plurality of user side devices and the group perception information to construct a decision state vector.
3. The multi-objective decision execution method for a user side according to claim 1, wherein the markov policy network is obtained by performing iterative training on a feedforward neural network based on action space constraints and state transition excitation, and specifically comprises: constructing and determining action space constraint according to the device types of the plurality of user side devices and combining each data column of the decision state vector; based on the plurality of optimization objectives, constructing and determining state transition incentives in combination with the decision state vector; And setting model parameters of the feedforward neural network according to the decision state vector, and carrying out iterative training on the feedforward neural network by combining the action space constraint and the state transition excitation to obtain a Markov strategy network.
4. A multi-objective decision-making method according to claim 3, wherein said constructing and determining an action space constraint according to the device types of said plurality of user side devices in combination with each data column of said decision state vector specifically comprises: Determining executable motion vectors of the Markov policy network according to the device types of the plurality of user side devices; Determining an action space of the Markov policy network based on each data column of the decision state vector; and constructing action space constraint according to the executable action vector and combining the action space.
5. A multi-objective decision-making method for a user-oriented side according to claim 3, wherein said constructing and determining state transition incentives based on said plurality of optimization objectives in combination with said decision state vector comprises: determining an operation cost optimization incentive based on the operation cost optimization target by combining a system electricity price component and an instruction response component in the decision state vector; Determining a carbon emission optimization incentive based on the carbon emission optimization objective in combination with a system marginal carbon emission intensity component in the decision state vector; determining an instruction response optimization stimulus based on the instruction response optimization objective in combination with the instruction response component in the decision state vector; and taking the weighted sum of the operation cost optimization stimulus, the carbon emission optimization stimulus and the command response optimization stimulus after normalization as the state transition stimulus of the Markov strategy network.
6. The method for performing multi-objective decision on user side according to claim 1, wherein the inputting the decision state vector into a preset markov policy network and outputting a decision motion vector specifically comprises: Determining an output mode of a decision motion vector of the Markov policy network according to the control modes of the plurality of user side devices, wherein the output mode comprises the output vector length of the decision motion vector and the output type of each component; And inputting the decision state vector into the Markov strategy network to obtain a decision motion vector under the control of the output mode.
7. The method for performing multi-objective decision on user side according to claim 1, wherein said clipping the decision motion vector according to the motion space constraint to obtain a movable motion vector specifically comprises: performing inverse normalization on the decision motion vector to obtain an original motion vector; and cutting the original motion vector, and cutting the motion boundary constructed based on the motion space constraint during cutting to obtain the movable motion vector.
8. The multi-objective decision execution system facing to the user side is characterized by comprising a device data acquisition module, a decision action generation module and a decision action execution module; The device data acquisition module is used for acquiring running state data of a plurality of user side devices to be decided and constructing decision state vectors; The decision action generation module is used for inputting the decision state vector into a preset Markov strategy network and outputting a decision action vector, wherein the Markov strategy network is obtained by performing iterative training on a feedforward neural network based on action space constraint and state transition excitation, the action space constraint is obtained by constructing each data column in the decision state vector, and the state transition excitation is obtained by constructing based on a plurality of preset optimization targets, and the plurality of optimization targets comprise operation cost optimization, carbon emission optimization and instruction response optimization; the decision action executing module is used for clipping the decision action vector according to the action space constraint to obtain an actionable vector, and sending the actionable vector to the plurality of user side devices so that the plurality of user side devices execute decisions according to the actionable vector.
9. The user-side-oriented multi-objective decision execution system of claim 8, wherein the markov policy network is obtained by iteratively training a feedforward neural network based on action space constraints and state transition excitation, and specifically comprises: constructing and determining action space constraint according to the device types of the plurality of user side devices and combining each data column of the decision state vector; based on the plurality of optimization objectives, constructing and determining state transition incentives in combination with the decision state vector; And setting model parameters of the feedforward neural network according to the decision state vector, and carrying out iterative training on the feedforward neural network by combining the action space constraint and the state transition excitation to obtain a Markov strategy network.
10. The multi-objective decision-making system according to claim 9, wherein said constructing and determining an action space constraint according to the device types of the plurality of user-side devices in combination with each data column of the decision state vector specifically comprises: Determining executable motion vectors of the Markov policy network according to the device types of the plurality of user side devices; Determining an action space of the Markov policy network based on each data column of the decision state vector; and constructing action space constraint according to the executable action vector and combining the action space.

Description

User-side-oriented multi-objective decision execution method and system Technical Field The application relates to the field of intelligent scheduling decisions, in particular to a multi-objective decision execution method and system facing a user side. Background Along with the large-scale access of the distributed renewable energy sources on the user side, a plurality of to-be-solved problem dimensions such as economic operation, carbon emission reduction and the like are newly added for the decision scheduling of the power system on the user side, and a new challenge is provided for the decision scheduling of the power system on the user side. At present, the decision scheduling of the power system at the user side mainly comprises a decision method based on a traditional control strategy and an intelligent decision method based on an intelligent agent, wherein the decision method based on the traditional control strategy monitors through local information at the user side, lacks system-level cooperative information, is mainly based on single-target optimization in decision, and is difficult to consider newly added problem dimensions, so that the decision scheduling precision after the distributed renewable energy is accessed to the user side cannot meet target requirements, and the intelligent decision method based on the intelligent agent can process multi-target optimization, so that the newly added problem dimensions can be processed, but the complexity of the depending intelligent agent is higher, the response of the user side to the power system instruction cannot be considered, and the actual decision scheduling precision still has a large improvement space. Therefore, how to improve the multi-objective decision accuracy facing to the user side under the condition of large-scale access of the distributed renewable energy source is still a technical problem to be solved in the prior art. Disclosure of Invention The application provides a multi-target decision execution method and a system facing a user side, which aim to solve the technical problem that the multi-target decision precision of the existing user side does not reach target requirements. According to a first aspect of the embodiments of the present application, there is provided a multi-objective decision execution method facing to a user side, including: collecting operation state data of a plurality of user side devices to be decided, and constructing to obtain decision state vectors; Inputting the decision state vector into a preset Markov strategy network and outputting a decision motion vector, wherein the Markov strategy network is obtained by performing iterative training on a feedforward neural network based on action space constraint and state transition excitation, the action space constraint is obtained by constructing each data column in the decision state vector, the state transition excitation is obtained by constructing a plurality of preset optimization targets, and the plurality of optimization targets comprise operation cost optimization, carbon emission optimization and instruction response optimization; And clipping the decision motion vector according to the motion space constraint to obtain an actionable motion vector, and sending the actionable motion vector to the plurality of user side devices so that the plurality of user side devices execute decisions according to the actionable motion vector. Compared with the prior art, the method and the system construct action space constraint of the Markov strategy network based on each data column in the decision state vector, construct state transition excitation of the Markov strategy network based on a plurality of optimization targets of operation cost optimization, carbon emission optimization and instruction response optimization, further train a feedforward neural network to obtain the Markov strategy network by combining the action space constraint and the state transition excitation, generate strategies from a plurality of different decision dimensions by constructing a plurality of optimization targets, improve the comprehensiveness of the decisions, thereby improving the decision accuracy of the user side equipment, fully reflect the response degree of the user side to the power grid decision instructions by constructing the optimization targets of instruction response optimization, optimize the decision action based on the optimization targets of instruction response optimization, and improve the accuracy of the decision action, thereby improving the decision accuracy of the user side equipment. In some embodiments of the present application, the collecting operation state data of a plurality of user side devices to be decided, and constructing to obtain a decision state vector specifically includes: collecting running state data of the plurality of user side devices, wherein the running state data comprises local measurement data of the plurality of user sid