CN-121981850-A - Distributed power supply transaction intelligent clearing and optimizing method and system based on deep reinforcement learning
Abstract
The invention discloses a distributed power transaction intelligent clearing and optimizing method and system based on deep reinforcement learning, and relates to the technical fields of distributed power transaction technology and artificial intelligence. The method comprises the steps of collecting multi-dimensional data of a distributed power supply transaction full scene, preprocessing the data to generate a state vector representing a market state, inputting the state vector into a hybrid intelligent optimization model fused with a learning strategy gradient and an improved hawk search algorithm to obtain an output optimal clearing strategy, automatically executing a clearing process through an intelligent contract deployed on a blockchain based on the optimal clearing strategy, synchronously storing clearing process and result data as historical clearing data in the blockchain, and calling the historical clearing data from the blockchain and carrying out feedback optimization on core parameters of the hybrid intelligent optimization model according to the historical clearing data. Therefore, autonomous optimization and trusted execution of the distributed power transaction clearing strategy are realized.
Inventors
- ZHU YUNAN
- XU HAIYANG
- JI CONG
- CHEN HAO
- JIANG MING
- TANG YIMING
- CAI MINGMING
- LIU YUNPENG
- LIU LIU
- XIA YUHANG
- CHEN JINING
Assignees
- 国网江苏省电力有限公司营销服务中心
- 江苏电力交易中心有限公司
- 江苏方天电力技术有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251226
Claims (10)
- 1. A distributed power transaction intelligent clearing and optimizing method based on deep reinforcement learning is characterized by comprising the following steps: collecting multi-dimensional data of a distributed power supply transaction full scene, and preprocessing the multi-dimensional data to generate a state vector representing the market state; inputting the state vector into a hybrid intelligent optimization model fused with a learning strategy gradient and an improved hawk search algorithm to obtain an output optimal clearing strategy, wherein the improved hawk search algorithm generates an initialization strategy; Based on the optimal clearing strategy, automatically executing a clearing process through an intelligent contract deployed on a blockchain, and synchronously storing the clearing process and result data in the blockchain as historical clearing data, wherein the optimal clearing strategy comprises a transaction matching priority coefficient, a price adjusting coefficient and a deviation processing weight; And the historical clearing data is called from the blockchain, and the core parameters of the hybrid intelligent optimization model are subjected to feedback optimization according to the historical clearing data.
- 2. The method of claim 1, wherein the improved hawk search algorithm generates an initialization strategy comprising: encoding core decision parameters constituting an optimal clearing strategy into population individuals comprising a plurality of strategy combinations; Generating reverse individuals by adopting a reverse learning mechanism, fusing the reverse individuals with a random initial population, and screening high-adaptability individuals to construct an initial population with high diversity; Adopting a three-stage optimization mechanism comprising search space selection, space detection and dive capture to iteratively evolve the population; And mapping the optimal population individuals obtained after iterative evolution into the initialization parameters of the learning strategy gradient.
- 3. The method of claim 2, wherein iteratively evolving the population using a three-stage optimization mechanism comprising search space selection, space exploration, and dive capture comprises: based on the optimal individuals with highest fitness in the current population and the average value of all population individuals, guiding other individuals to move towards the direction between the optimal area and the population central area so as to realize that a candidate strategy area conforming to the constraint of the distributed power supply transaction scene is locked through global range exploration; In the candidate strategy area, carrying out local optimization on the population individuals through refined search tracks, so that the population approaches to an optimal solution in the area; and (3) based on the current population optimal individuals, introducing a step optimization strategy to adjust the individual update amplitude, and obtaining the high-quality population individuals adapting to the distributed power supply transaction requirements.
- 4. The method of claim 1, wherein the learning strategy gradient performs strategy iteration and reinforcement comprising: constructing a multi-scene element training environment comprising states, clearing actions, rewards and the next state based on the distributed power supply transaction full-scene multi-dimensional data; Constructing an objective function by taking the maximization of transaction comprehensive income expectation as a target, and introducing regularization items of strategy entropy and predictive vector entropy into the objective function; And intercepting the counter-propagation computational element gradient through a sliding window, and adopting an adaptive optimizer to update the model parameters in an iterative manner until the variation of the objective function value is lower than a preset convergence threshold.
- 5. The method of claim 1, wherein automatically executing the clearing process via a smart contract deployed on a blockchain based on the optimal clearing policy comprises: Based on the transaction matching priority coefficient in the optimal clearing strategy, ordering declaration orders of the electricity purchasing party and the electricity selling party, taking the sum of electricity purchasing quotations which are not lower than electricity selling quotations, network loss related costs and policy additional costs as a matching constraint, and preferentially locking the electricity purchasing transaction pairs with better comprehensive benefits; according to the price adjustment coefficient in the optimal clearing strategy, checking a bargaining basic price in combination with electricity purchasing price quotation, and superposing the related cost of power transmission and distribution service, the network loss allocation cost and government fund on the bargaining basic price to determine the final bargaining price conforming to policy compliance and market compliance; solidifying core transaction clauses containing identity information of both transaction sides, transaction electric quantity and final transaction price into an electronic contract; And calculating the hash value of the electronic contract and synchronously storing the hash value in the blockchain distributed account book to form a transaction certificate.
- 6. The method of claim 5, wherein the clearing process is automatically performed by a smart contract deployed on a blockchain based on the optimal clearing strategy, further comprising the step of performing a bias treatment: Acquiring actual output data of a distributed power supply and actual power consumption data of a power user in real time through a block chain, and respectively calculating deviation electric quantity and deviation proportion of an electricity seller and an electricity purchaser by combining the agreed power in the electronic contract; When any deviation ratio exceeds a preset allowable deviation threshold, automatically calculating deviation checking cost and executing cost settlement through an intelligent contract based on deviation processing weight in the optimal clearing strategy, and synchronously storing settlement results in a blockchain distributed account book; And when the deviation ratio is in a preset allowable deviation threshold, synchronously proving the deviation data and the related performance records in the blockchain.
- 7. The method of claim 1, wherein the feedback optimizing the core parameters of the hybrid intelligent optimization model based on the historical clearance data comprises: The historical clearing data are called from the blockchain, and feedback feature vectors representing the transaction execution effect and market change are constructed according to the historical clearing data; Based on the feedback feature vector, recalibrating an objective function of the learning strategy gradient, and adjusting a model learning rate and an iteration convergence condition; and dynamically adjusting the control parameters related to the global exploration range and the local convergence speed of the improved hawk search algorithm according to market fluctuation conditions.
- 8. The method of claim 7, wherein synchronizing the skimming process with the result data as historical skimming data is verified in a blockchain, comprising: organizing the clearing process and result data by adopting a tree hash data structure, and storing hash values of the data in a block body of a block chain; And (3) carrying out encryption processing on sensitive data containing transaction quotation and actual electricity consumption by adopting a symmetric encryption algorithm, and managing a key of the symmetric encryption algorithm by combining an asymmetric encryption technology so as to realize data privacy protection.
- 9. The method of claim 5, wherein the automatically executing the clearing process by the intelligent contract deployed on the blockchain further comprises a clearing result consensus verification step, specifically comprising: forming a consensus cluster by a transaction center node and a distributed heavy node, and adopting a distributed consensus algorithm preset by the blockchain network to carry out consistency verification on the clear result output by the intelligent contract; When the number of the identified consensus nodes reaches a preset proportion, the clearing result is finally confirmed and recorded to the blockchain.
- 10. A distributed power transaction intelligent clearing and optimizing method system based on deep reinforcement learning is characterized by comprising the following steps: The state feature extraction module is used for collecting multi-dimensional data of the whole scene of the distributed power supply transaction and preprocessing the multi-dimensional data to generate a state vector representing the market state; The hybrid intelligent optimization module is used for inputting the state vector into a hybrid intelligent optimization model fused with a learning strategy gradient and an improved hawk search algorithm to obtain an output optimal clearing strategy, wherein the improved hawk search algorithm generates an initialization strategy; The intelligent clearing execution module is used for automatically executing a clearing process through an intelligent contract deployed on the blockchain, and synchronously storing the clearing process and result data in the blockchain as historical clearing data, wherein the optimal clearing strategy comprises a transaction matching priority rule, a price adjustment rule and a deviation assessment rule; and the feedback optimization module is used for calling the historical clearing data from the block chain and carrying out feedback optimization on the core parameters of the hybrid intelligent optimization model according to the historical clearing data.
Description
Distributed power supply transaction intelligent clearing and optimizing method and system based on deep reinforcement learning Technical Field The embodiment of the invention relates to the technical field of distributed power supply transaction technology and artificial intelligence, in particular to a distributed power supply transaction intelligent clearing and optimizing method and system based on deep reinforcement learning. Background With the deep advancement of the reform of the electric power system, the distributed power source (such as photovoltaic, wind power, small water power and the like) is rapidly developed due to the clean and flexible characteristics, and gradually becomes an important component of the energy supply system. The distributed power supply transaction has the remarkable characteristics of multiple main bodies, high frequency, data impurity and strong constraint, and the traditional centralized transaction mode faces various bottlenecks, on one hand, the traditional reinforcement learning algorithm which depends on a manual design cost function and a time sequence difference rule is difficult to adapt to the market environment and complex network constraint which are dynamically changed, so that the flexibility of a clearing strategy is insufficient, on the other hand, the single-element heuristic optimization algorithm is easy to sink into local optimization, global optimization of the clearing strategy cannot be realized, and meanwhile, the problems of trust loss, data tampering risk, opaque transaction flow and the like among the main bodies in the distributed transaction further restrict the transaction efficiency and fairness. In the prior art, although the blockchain technology can provide decentralised, untampered and traceable trusted support, the deep fusion with an intelligent optimization algorithm is lacking, the intellectualization and self-adaption of transaction clearing are difficult to realize, and the traditional clearing algorithm depends on fixed rules, and cannot autonomously learn an optimal strategy to balance multi-objective optimization requirements such as success rate, price fairness, power grid loss and the like. Therefore, a closed loop system integrating an advanced intelligent optimization algorithm and a blockchain technology is needed to be constructed, and the core problems of insufficient strategy learning, weak global optimizing capability, insufficient transaction credibility and the like in distributed power supply transaction are solved. . Disclosure of Invention The invention provides a distributed power supply transaction intelligent clearing and optimizing method and system based on deep reinforcement learning by combining the advantages of deep reinforcement learning, meta heuristic optimization and blockchain technology on the technical pain points of distributed power supply transaction, and realizes autonomous optimization and trusted execution of the distributed power supply transaction clearing strategy. In a first aspect, an embodiment of the present invention provides a distributed power transaction intelligent clearing and optimizing method based on deep reinforcement learning, including: collecting multi-dimensional data of a distributed power supply transaction full scene, and preprocessing the multi-dimensional data to generate a state vector representing the market state; inputting the state vector into a hybrid intelligent optimization model fused with a learning strategy gradient and an improved hawk search algorithm to obtain an output optimal clearing strategy, wherein the improved hawk search algorithm generates an initialization strategy; Based on the optimal clearing strategy, automatically executing a clearing process through an intelligent contract deployed on a blockchain, and synchronously storing the clearing process and result data in the blockchain as historical clearing data, wherein the optimal clearing strategy comprises a transaction matching priority coefficient, a price adjusting coefficient and a deviation processing weight; And the historical clearing data is called from the blockchain, and the core parameters of the hybrid intelligent optimization model are subjected to feedback optimization according to the historical clearing data. As a preferred embodiment, the improved hawk search algorithm generates an initialization strategy comprising: encoding core decision parameters constituting an optimal clearing strategy into population individuals comprising a plurality of strategy combinations; Generating reverse individuals by adopting a reverse learning mechanism, fusing the reverse individuals with a random initial population, and screening high-adaptability individuals to construct an initial population with high diversity; Adopting a three-stage optimization mechanism comprising search space selection, space detection and dive capture to iteratively evolve the population; And mapping the optimal