CN-116227215-B - Dynamic highway toll collection method, device, equipment and storage medium
Abstract
The invention discloses a dynamic highway charging method, a device, equipment and a storage medium, wherein the method comprises the steps of establishing a highway network simulation environment model and determining the system state of a reinforcement learning model; the method comprises the steps of establishing a traveler channel selection model, determining a system strategy of an reinforcement learning model according to a system state and the traveler channel selection model, determining a system action of an agent corresponding to each charging road section for rate adjustment according to the system strategy, calculating system rewards after current system action is executed, continuously adjusting the system strategy of the reinforcement learning model according to the system rewards, and determining the current optimal rate adjustment scheme of each charging road section when the system rewards are maximum. The invention utilizes the reinforcement learning algorithm to carry out dynamic charging on the designed real Chinese expressway network simulation environment and the traveler's channel selection model so as to relieve the congestion and improve the income of all parties.
Inventors
- ZHANG XI
- WANG WEI
- CHEN JING
Assignees
- 云南大学
Dates
- Publication Date
- 20260505
- Application Date
- 20230315
Claims (6)
- 1. A dynamic highway toll collection method, comprising the steps of: Establishing a highway network simulation environment model, and determining the system state of the reinforcement learning model; The step of establishing the highway network simulation environment model comprises the following steps: Building triplets of expressways , Representing a set of nodes in an expressway network for a limited non-empty set, The number of the nodes; Is a set of edges between adjacent nodes; The adjacency matrix represents communication information between two nodes in the highway network; defining each time step Aggregation of all time steps T is Defining time intervals for system actions for rate adjustment by the agent , Fixed value determined for charge management department, time series set for defining rate adjustment ; Wherein, in the step of establishing the highway network simulation environment model, the method further comprises the following steps: by taking the mean value as Standard deviation is Modeling of modified gaussian random variable in time steps From the source node To the destination node Is a vehicle flow demand; Modeling traffic flow using a Daganzo CTM model, each road in a road network Divided into separate sub-sections Representing belonging to a road segment Is provided for the collection of all sub-segments of a road, For each of For its length The distance travelled by the vehicle in accordance with the free flow speed in a time step is indicated; at each time step, the road traffic state consists of the number of vehicles in each sub-road , , , Respectively show road sections Free flow speed, capacity, density when congestion occurs of vehicles traveling on the road segment; establishing a traveler channel selection model, and determining a system strategy of the reinforcement learning model according to the system state and the traveler channel selection model; Wherein, in the step of establishing the traveler channel selection model, the method comprises the following steps: establishing a traveler channel selection model based on the decision route model; The method comprises the steps that when a traveler selects a road through a toll gate entrance and a road crossing in running, the traveler selecting model judges the distances of different routes and the utility function values of different routes at the same time, and selects a route with a shorter distance or a larger utility function value to drive in; wherein the utility function value is expressed as: ; Wherein, the For a road segment from the current node to the destination node, For all the set of road segments traversed from the current node to the destination node, one path from the current node to the destination node is represented, To the set of all the different paths from the current node to the destination node, For the total toll of a certain route, Representing road segments To the cost weight coefficient of (1) Approximately as the time required to traverse each road segment in the current state, Is a road section The current total number of vehicles; the expression of the traveler channel selection model is as follows: ; Wherein, the The function is used to calculate the first The physical total distance of the paths; wherein the system state comprises a time step And traffic flow in the sub-section under the update step The system state is described as the equation: The system rewards include revenue, system run time and road network throughput, wherein: the system strategy of the reinforcement learning model is continuously adjusted according to the system rewards, and specifically comprises the following steps: ; ; ; ; Wherein, the Representing slave time steps To time step Within a time period, from the road section Move to road section Is used for controlling the total vehicle flow of the vehicle, Expressed in time steps When the vehicle is in the high-speed network, all vehicles which exit the high-speed network are summed; determining system actions of the intelligent agent corresponding to each charging road section for rate adjustment according to the system strategy; Calculating system rewards after executing the current system actions, and continuously adjusting the system strategy of the reinforcement learning model according to the system rewards; when the system rewards reach a maximum, a currently optimal rate adjustment scheme for each toll road segment is determined.
- 2. The highway dynamic charging method according to claim 1, wherein the system policy step of continuously adjusting the reinforcement learning model according to the system rewards further comprises: measuring the maximum difference between the number of vehicles on lanes in different directions at all time steps The method specifically comprises the following steps: ; Wherein, the Representing roads in different directions; defining a network independent statistic The method specifically comprises the following steps: ; ; By means of And The operation of multiple dimensions in the road network is described.
- 3. The highway dynamic charging method according to claim 2, wherein the system policy step of continuously adjusting the reinforcement learning model according to the system rewards further comprises: defining the ratio for respectively measuring the lower than minimum speed limit on the roads in different directions and the total overspeed running on the two roads, specifically: ; ; ; Wherein, the And Are all indicative variables, when the time steps Time-of-day road section The number of vehicles is higher than When the number of vehicles is required Taking 1 or 0 when the time is over Time-of-day road section When the running speed of the upper vehicle is greater than the maximum speed allowed by the expressway Taking 1, otherwise taking 0; the conditions of violation of the speed limit during running of the vehicle are measured by using the weight percent of violationRight, the weight percent of violationLeft and the weight percent of sumOverSpeedviolation.
- 4. A dynamic highway toll collection apparatus for use in a dynamic highway toll collection method according to any one of claims 1 to 3, said dynamic highway toll collection apparatus comprising: the first building module is used for building a highway network simulation environment model and determining the system state of the reinforcement learning model; the second building module is used for building a traveler channel selection model and determining a system strategy of the reinforcement learning model according to the system state and the traveler channel selection model; The first determining module is used for determining the system action of rate adjustment of the intelligent agent corresponding to each charging road section according to the system strategy; the calculation module is used for calculating system rewards after the current system actions are executed, and continuously adjusting the system strategy of the reinforcement learning model according to the system rewards; and the second determining module is used for determining the current optimal rate adjustment scheme of each charging road section when the system rewards reach the maximum.
- 5. An expressway dynamic charging device comprising a memory, a processor and an expressway dynamic charging program stored on the memory and operable on the processor, which expressway dynamic charging program, when executed by the processor, implements the steps of the expressway dynamic charging method as claimed in any one of claims 1 to 3.
- 6. A storage medium having stored thereon a dynamic highway toll collection program which when executed by a processor performs the steps of the dynamic highway toll collection method according to any one of claims 1 to 3.
Description
Dynamic highway toll collection method, device, equipment and storage medium Technical Field The invention relates to the technical field of reinforcement learning, in particular to a dynamic highway toll collection method, a device, equipment and a storage medium. Background The dynamic highway charging scheme implemented by the floor of China is differential charging, namely a payment-dividing mode, a vehicle-dividing mode, a time-dividing mode, a direction-dividing and branching mode, has limited charging 'dynamic property', and cannot be well adapted to real-time dynamic changes of the highway. Congestion charging means that on the urban road without charging, a certain fee is charged to the traveler in part of the road or area during the period of traffic congestion to relieve congestion, which is essentially a traffic demand management measure. Congestion charging is generally classified into static charging and dynamic charging, the static charging only considers space dimension, the time variability of the system is not considered, and the influence of current charging on the road network is ignored. Dynamic charging integrates the two dimensions of time and space, and the charged fees are different in different time periods for different road sections [8]. Highway dynamic charging is essentially an extended application of dynamic congestion charging on chinese highways. Dynamic charging is a charging scheme for dynamically adjusting the charging rate by considering the adaptation of the charging rate and the traffic state on the basis of congestion charging. Joksimovic et al (2005), lu et al (2008) have all sought dynamic charging issues. Zhang et al (2013) propose a method based on traffic dynamics, but this method assumes that the traffic demand between different nodes is fixed, assuming that the conditions are too ideal. The latter-tolling (2017), while dynamically charging based on real-time traffic flow, may result in a model that is difficult to achieve optimal performance because the method does not actively take into account traffic demand in the model. The first application of reinforcement learning on road dynamic charging problems is that Chen et al (2018) provides a dynamic model DyETC taking PG-beta algorithm as a core by forming traffic dynamics problems into Markov Decision Process (MDP), so that the problems existing in the method of the model-tolling are remarkably improved, and good effects are achieved in the aspect of relieving traffic jams. Although DyECT works well, it can only work on road networks with 11 areas and cannot be extended to large-scale road networks. In order to enable the dynamic charging model to be capable of being robustly expanded to a large scale, qia and the like divide the whole road network into different subareas according to geographic and economic characteristics, then a multi-Agent reinforcement learning algorithm is used for training a charging Agent (Agent) for each subarea, a DPG-beta algorithm is provided, and better effects are achieved in performance and expansibility. In addition, pandey et al (2020) have used existing reinforcement learning algorithms to study the dynamic charging problem of the managing highway (MANAGED LANES) in the united states. The researches mainly aim at that urban roads with dense road network structures are charged only on partial road sections in the congestion period, and the method can not be well suitable for expressway network characteristics and traveller's channel selection behaviors in China. The existing domestic and foreign dynamic charging research on roads is mainly divided into two aspects, namely, the related research on congestion charging problems of domestic and foreign urban roads and the related research on determination problems of charging rates. The research results are rich, but the following problems still exist: (1) The current research mainly focuses on urban road networks at home and abroad, the road networks are composed of toll roads in crowded time periods and adjacent free roads, dense travelers of the urban road networks are easy to change paths under the influence of tolls, the distance difference between different path selections to reach destinations is limited, the density of the national highway road networks is far less than that of the urban roads and the toll roads in all road sections, once the travelers enter the roads to change, the two road networks have larger characteristic differences, and the current research results on crowded tolls are difficult to adapt to the actual conditions of dynamic tolls of the national highways. (2) Based on the characteristics of urban road networks at home and abroad, in the existing dynamic charging scheme based on reinforcement learning, it is assumed that a traveler performs route selection only according to the current passing time and passing cost when receiving at an intersection, namely, selects a follow-up driving toll road or free