CN-121981471-A - Reinforced learning battery replacement station charging method based on prediction enhancement and motion simplification

CN121981471ACN 121981471 ACN121981471 ACN 121981471ACN-121981471-A

Abstract

The invention relates to the field of energy supply of electric automobiles, in particular to an intelligent scheduling method for a reinforcement learning battery power exchange station, which integrates time sequence prediction and simplified actions. Aiming at the problems of energy management complexity and uncertainty of a power exchange station under the constraint conditions of power price fluctuation, random arrival of vehicles, battery aging and multi-objective operation, an intelligent scheduling scheme with a look-ahead decision-making capability is provided. The method comprises the steps of firstly, carrying out short-term prediction on the future electricity price and the vehicle electricity changing requirement by using a time sequence prediction model, inputting a prediction result as an expansion state into a reinforcement learning decision model so as to enhance the perception capability of a strategy on the future environmental change, and simultaneously constructing a resource-level action abstract mechanism, and realizing automatic matching of a battery and a charging interface by deciding the number of fast charging interfaces and slow charging interfaces and combining a preset battery connection rule, thereby effectively reducing the action space scale and improving the strategy learning stability. Based on verification of a simulation environment of the power exchange station comprising a battery aging model, a vehicle queuing mechanism and a dynamic electricity price, the method has remarkable effects in the aspects of reducing the running cost of the power exchange station, delaying battery aging, reducing vehicle waiting time and the like, can realize efficient and stable intelligent scheduling under uncertain running environments, and has good robustness and popularization and application values.

Inventors

CHEN SHAOMIAO
TENG ZHENTAO
XIAO LIJUN
YANG JIANG
YANG BOYONG
TANG JIAWEI

Assignees

湖南科技大学

Dates

Publication Date: 20260505
Application Date: 20260124

Claims (5)

1. The reinforcement learning power exchange station charging method based on prediction enhancement and motion simplification is characterized by comprising the following 5 steps: Step 1, acquiring running state information of a power exchange station in a closed scene in discrete time slots, wherein the running state information comprises the electric quantity state of each battery in battery stock, queuing information of a vehicle queue, current and historical vehicle battery electric quantity information, vehicle arrival and power grid price information; Step 2, generating a power price prediction sequence of a plurality of time slots in the future through a prediction module based on the power grid power price information, generating a power change demand prediction sequence of a plurality of time slots in the future through a prediction module based on historical data of the vehicle battery power information and the vehicle arrival information, and forming the state input of a reinforcement learning control module by the running state information, the power price prediction sequence and the power change demand prediction sequence together; The method comprises the steps of (1) inputting a state into a reinforcement learning control module and outputting a simplified control action for representing a charging resource starting mode, wherein the simplified control action comprises a proportion parameter for representing the proportion of the number of starting quick charging interfaces to the total number of available charging interfaces in a current time slot and an allocation rule parameter for representing the connection relation between the charging interfaces and a battery to be charged; step 4, after the current time slot is finished, calculating instant rewards according to the operation income, the electric energy consumption cost, the battery aging cost and the vehicle waiting time of the power exchange station; and step 5, updating the control strategy of the reinforcement learning control module based on the instant rewards, the state of the current time slot, the action and the state information of the next time slot so as to obtain an optimized decision strategy for the charging control of the time slot of the subsequent time.
2. The method for charging a reinforcement learning battery exchange station based on prediction enhancement and motion simplification according to claim 1, wherein the step 2 specifically comprises: in the electricity price prediction part, a long-short-period memory network is adopted to predict the future electricity price; in the present module, a power rate sequence of the past n time steps is input Output the predicted electricity price of the future m steps ; In a power change demand prediction part, a long-short-period memory network is adopted to predict the future power change demand; in the present module, a vehicle charge profile sequence of past n time steps is entered And demand sequence Output of future m-step battery change demand predictions ; Wherein the vehicle charge distribution sequence is represented as follows: Wherein, k is {0,1,.,. 9}, The number of vehicles with the electric quantity of the time slot t in the interval of [10k%,10 (k+1)%; The power exchange station running state information, the power price prediction sequence and the power exchange demand prediction sequence together form the state input of the reinforcement learning control module, and the state vector Comprises the following elements: Wherein, the And its subsequent value represents a grid purchase price forecast for the corresponding time period, And its subsequent value represents a replacement vehicle number prediction for the corresponding time period; representing the average waiting time of vehicles in the waiting queue of the time slot t; representing the state of charge of the B batteries in the time slot tback inventory.
3. The reinforcement learning battery exchange station charging method based on prediction enhancement and motion simplification according to claim 1, wherein the step 3 specifically comprises: The simplified actions are as follows: Wherein, the The ratio of the fast charge interface to the available interface in the time slot t battery exchange station is shown, Representing the connection strategy of the time slot t charging interface and the battery; By passing through Determining the number of fast charging interfaces in N charging interfaces Number of slow-filling ports And number of idle interfaces The action restoring module adopts a dynamic programming method to distribute the number of quick charge interfaces and the number of slow charge interfaces of the time slot t to each charge interface based on the preset distribution rule so as to determine the working mode of each charge interface ; A fast charge mode is represented by 1, a slow charge mode is represented by 0, and the charge power of the fast charge mode and the slow charge mode are respectively And , Representing the power of the battery i in the time slot t; Representing the maximum charge of the battery; Auxiliary variables of the dynamic programming method are defined as follows: Wherein, the And Respectively representing the added value of the battery when the interface i selects the fast charge mode or the slow charge mode; And The number of the full-charge batteries increased when the interface i selects the fast charge mode or the slow charge mode is respectively represented; the indicator function is defined as follows: for rule a, the goal is to maximize the total charge of the current time slot, the optimization goal and constraint are as follows: The recursive formula for dynamic programming is as follows: Wherein, the Before representation The individual cells are in use Maximum effective charge amount at the fast charge interface; For rule B, the goal is to maximize the number of full cells at the end of the time slot, the optimization goal and constraints are as follows: The recursive formula for dynamic programming is as follows: Wherein, the Before representation The individual cells are in use The number of full-charge batteries after the quick-charge interfaces; The charging action of the restored power exchange station is as follows through dynamic planning 。
4. The reinforcement learning battery exchange station charging method based on prediction enhancement and motion simplification according to claim 1, wherein the step 4 specifically comprises: The instant rewarding function simultaneously considers the operation income of the power exchange station, the electric energy consumption cost, the battery aging cost and the vehicle waiting time so as to realize the balance of charging efficiency, service quality and battery life; Modeling a battery charging model: The expression for battery charge update is as follows: The charging behavior of the battery is constrained by the charging modes of the charging interface, the charging multiplying power in different charging modes can cause battery loss of different degrees, and the battery aging cost is measured through capacity loss, and the expression is as follows: Wherein, the Representing the cost of capacity loss of all batteries at time slot t; Conversion coefficient representing capacity loss and battery aging cost; And Respectively representing the charging multiplying power of quick charging and slow charging, namely the ratio of the charging power in unit time to the rated capacity of the battery; Representing the maximum capacity of the battery; modeling a vehicle queuing model: A queuing queue representing time slot t, ordered in arrival order, Representing the waiting time of the ith vehicle in the time slot t, the queue length is And when there is a need for power change When the vehicle arrives, the vehicle will enter the maximum capacity as Queuing buffers of (a) battery collection usable for power exchange station use Is replaced with a battery in the vehicle while the battery is recycled into the rechargeable battery collection ; The number of vehicles that can enter the queue is defined as: defining the number of serviceable vehicles in the current time slot as follows: The update expression for the queue length is: the update expression of the waiting time is: at time slot t, the average waiting time of the service vehicle is: Wherein, the Indicating the number of vehicles that slot t can enter the queue, Representing the number of vehicles that can be serviced by the time slot t; modeling an energy supply model; the power supply module provides the required electric energy for the battery charging task, the power exchange station relies on the power grid to supply power, and the power price dynamically fluctuates in different time slots; Wherein, the Indicating the operating revenue of time slot t, Representing the service fee revenue for time slot t, Indicating the electricity rate cost of the time slot t, Representing the price of electricity used by the grid at time slot t, Indicating a charge for providing a primary power exchange service by the power exchange station; The specific reward function is as follows: Wherein, the A trade-off coefficient representing economic benefit versus quality of service; indicating that the time slot t is in state Execute action downwards The instant rewards obtained.
5. The reinforcement learning battery exchange station charging method based on prediction enhancement and motion simplification according to claim 1, wherein the step 5 specifically comprises the following substeps: Step 5.1, forming a training sample from the state of the current time slot, the simplified control action, the instant rewards and the state of the next time slot; Step 5.2, based on the training sample, updating parameters of a control strategy by adopting a reinforcement learning algorithm so as to reduce deviation between predicted gain and actual gain; And 5.3, using the updated control strategy for a charging resource enabling decision of the next time slot.

Description

Reinforced learning battery replacement station charging method based on prediction enhancement and motion simplification Technical Field The invention belongs to the field of energy management, and particularly relates to a reinforced learning battery replacement station charging method based on prediction enhancement and motion simplification. Technical Field The rapid popularization of electric vehicles (ELECTRIC VEHICLE, EV) has driven the diversification of energy supply modes, wherein battery replacement stations (Battery Swap Station, BSS) become an important scheme for solving the problem of long charging time of electric vehicles by virtue of high efficiency and convenience. Compared with the traditional charging pile, the battery replacement can be completed within a few minutes by the power exchange station, so that the user experience and the travel efficiency are greatly improved. Meanwhile, the power exchange station is tightly coupled with a power grid, the service life of a battery and the requirements of vehicles, and an energy scheduling strategy of the power exchange station plays a key role in the economy and the sustainability of the system. Therefore, how to realize intelligent battery management and scheduling of the power exchange station in a dynamic environment has become an important research direction in intelligent traffic and intelligent energy systems. In actual operation, the power exchange station needs to simultaneously cope with power price fluctuation and randomness of power exchange requirements, and coordinates multiple targets such as battery aging, vehicle waiting time, operation cost and the like on the basis. In the environment of multiple charging interfaces and multiple batteries, if each interface and battery are used as independent resources, the intelligent agent selects specific binding, and the number of action combinations can be exponentially increased along with the number of resources. The high-dimensional discrete action space greatly increases the training complexity of the reinforcement learning algorithm, reduces the learning efficiency and prolongs the convergence time. How to effectively reduce the environmental uncertainty and remarkably compress the action space while maintaining the scheduling flexibility as much as possible, thereby improving the stability and the optimization efficiency of the strategy, and becoming a core problem to be solved. Disclosure of Invention In order to achieve the above purpose, the invention adopts a reinforcement learning power exchange station charging method based on prediction enhancement and motion simplification, which comprises the following steps: Step 1, acquiring running state information of a power exchange station in a closed scene in discrete time slots, wherein the running state information comprises the electric quantity state of each battery in battery stock, queuing information of a vehicle queue, current and historical vehicle battery electric quantity information, vehicle arrival and power grid price information; Step 2, generating a power price prediction sequence of a plurality of time slots in the future through a prediction module based on the power grid power price information, generating a power change demand prediction sequence of a plurality of time slots in the future through a prediction module based on historical data of the vehicle battery power information and the vehicle arrival information, and forming the state input of a reinforcement learning control module by the running state information, the power price prediction sequence and the power change demand prediction sequence together; The method comprises the steps of (1) inputting a state into a reinforcement learning control module and outputting a simplified control action for representing a charging resource starting mode, wherein the simplified control action comprises a proportion parameter for representing the proportion of the number of starting quick charging interfaces to the total number of available charging interfaces in a current time slot and an allocation rule parameter for representing the connection relation between the charging interfaces and a battery to be charged; step 4, after the current time slot is finished, calculating instant rewards according to the operation income, the electric energy consumption cost, the battery aging cost and the vehicle waiting time of the power exchange station; and step 5, updating the control strategy of the reinforcement learning control module based on the instant rewards, the state of the current time slot, the action and the state information of the next time slot so as to obtain an optimized decision strategy for the charging control of the time slot of the subsequent time. The step 1 specifically includes: the whole simulation process is divided into a series of discrete time steps with fixed length, and is marked as T epsilon T. Under a closed scene, the power exchange s