CN-122009288-A - Autonomous decision-making method for urban rail train group

CN122009288ACN 122009288 ACN122009288 ACN 122009288ACN-122009288-A

Abstract

The invention provides an autonomous decision-making method for urban rail train groups, and belongs to the field of urban rail transit train operation control and intelligent scheduling. According to the invention, by constructing a distributed collaborative framework based on vehicle-to-vehicle communication, establishing a dynamic passenger flow prediction model and designing an autonomous decision maker based on Sarsa reinforcement learning, the self-adaptive adjustment and collaborative optimization of the running interval of multiple trains under the dynamic passenger flow condition are realized, so that the service quality of passengers is improved and the running cost is reduced on the premise of ensuring the running safety.

Inventors

FENG XIAOYUN
LUO CAN
SUN PENGFEI
WANG QINGYUAN

Assignees

西南交通大学

Dates

Publication Date: 20260512
Application Date: 20260130

Claims (8)

1. An autonomous decision-making method for urban rail train groups is characterized by comprising the following steps: s1, initializing a system, and acquiring running state data of each train in a urban rail line and passenger flow data of each platform in real time; S2, constructing a dynamic passenger flow model, and predicting the passenger flow distribution state of each train when the trains arrive at each station by using the dynamic passenger flow model according to the historical passenger flow arrival rate and the current real-time passenger flow data; S3, on the basis of predicting a passenger flow distribution state, constructing an inter-train communication topological graph by combining operation state data of each train, and based on the inter-train communication topological graph, utilizing a front train following communication topology to enable each train to perform information interaction with a front train only to acquire front train information, wherein a communication link is established between a train i and a front train i-1 only, and the front train i=1 is not connected with the front train; S4, inputting the acquired real-time station passenger flow data and the front train information into a dynamic passenger flow model, and cooperatively executing future passenger flow distribution prediction and updating the dynamic passenger flow model; S5, designing an autonomous decision maker based on Sarsa reinforcement learning for each train, inputting running state data of the trains, updated passenger flow distribution prediction and constraint conditions of a train group multi-target distributed optimization model into the autonomous decision maker, and performing offline training through interaction with a simulation environment to learn an optimal running interval and stop time adjustment strategy; And S6, in the deployment stage, executing online autonomous decision making by using the autonomous decision making device after offline training to obtain an optimal control instruction output by each train autonomous decision making device.
2. The urban rail train consist autonomous decision making method according to claim 1, wherein the traffic distribution state comprises: Number of passengers arriving at station k from station s : Wherein, the Indicating the rate of arrival of the passenger flow at the station, The moment when the train i arrives at station s is indicated, Indicating the moment at which train i leaves the previous station, Indicating the total number of trains, Representing the total number of line sites, Representing the coefficient of smoothing and the coefficient of smoothing, Indicating the current measured rate of arrival at the station, Indicating the arrival rate at the last moment; Total passenger flow number of station s waiting train i : Wherein, the Indicating the number of passengers at station s; Number of alighting passengers of train i at station s : ; Wherein, the Representing the number of passengers boarding at station k and destined for station s; Total passenger demand for boarding of platform s : Wherein, the The speed of the boarding of the vehicle is indicated, Represents the speed of the drive-off, S represents the total number of stations, The moment when the train i leaves the station is represented; Actual number of passengers on train i at station s : ; Number of passengers on train i leaving station s : Wherein, the Indicating the number of passengers on board the train i as it leaves the preceding station s; station s passenger number update : 。
3. The urban rail train group autonomous decision-making method according to claim 1, wherein the expression for information interaction for head-end i=1 is as follows: Wherein, the Representing the predicted total number of passenger flows for a first vehicle to arrive at station k, A passenger flow predicted value for station k when the head car is located at station s is indicated, Representing the rate of arrival of the passenger at station k, Indicating the moment at which the lead car is expected to arrive at station k, Indicating the moment when the head car leaves station S, S indicating stations, S indicating the total number of stations, Representing the total number of line sites; The expression for information interaction for trains other than the first train i=1 is as follows: Wherein, the Representing the predicted total number of stops when train i arrives at stop k, Indicating the number of the lead train preceding train i, Indicating the number of passengers the lead train x is expected to travel at station k, Indicating the total number of trains, i indicating the number of trains.
4. The urban rail train consist autonomous decision-making method according to claim 1, wherein the objective function expression of the consist multi-objective distributed optimization model is as follows: Wherein, the Representing a multi-objective function, f p representing a passenger dissatisfaction function, Representing the function of the operating cost, 、、 And All of which represent the weight coefficient, 、、 And All of which represent a coefficient of cost, Indicating the average waiting time of the passengers at station j, Indicating the degree to which the occupancy of the train at station j deviates from the comfort zone, Indicating the total number of stations, Represents the traction energy consumption of the interval j, Indicating the downtime of site j.
5. The urban rail train consist autonomous decision-making method according to claim 1, wherein the constraints of the consist multi-objective distributed optimization model include: Quasi-point constraint: Wherein, the Indicating the moment when the head car leaves the head station, And Representing the time window boundaries of the first and last stations respectively, Indicating the total operating time of the train i, Indicating the moment at which the train i leaves the terminal, Indicating the total number of trains, i indicating the number of trains; site capacity constraint: If it is Then Wherein, the Indicating the total number of bus stops at time t, Indicating an upper limit of the capacity of the station, Indicating the total number of bus stops at station k, Representing the number of passengers retained at station S, S representing stations, S representing the total number of stations; line average full load rate constraint: Wherein, the Representing the total number of line sites, The number of passengers on the train i leaving the station s is indicated, Indicating the rated passenger capacity of the train, Representing the upper limit of the average full load rate of the line; Front and rear vehicle safety protection constraint: Wherein, the The moment when the train i arrives at station s is indicated, Indicating the moment when the lead vehicle i-1 arrives at station s, The minimum operating interval is indicated and the minimum operating interval, Indicating the moment at which the train i leaves the station s, The moment when the preceding vehicle i-1 leaves the station s is indicated; Run time and stop time constraints: Wherein, the Indicating the moment at which train i arrives at station j, Indicating the moment at which train i leaves the previous station j-1, Indicating the maximum operating interval and j indicating the number of stations.
6. The urban rail train consist autonomous decision making method according to claim 1, wherein performing offline training in S5 comprises: constructing a simulation environment, and observing the state of a current site in the simulation environment, wherein the expression of a state space is as follows: Wherein, the Representing a state space, j representing the number of stations, Representing the total number of line sites; Based on the state of the current site, combining the current Q value table with the current Q value table A policy selection action, wherein for each site j, a run-time adjustment action and a downtime adjustment action are defined: Wherein, the The motion space is represented by the number of motion vectors, Indicating a run-time adjustment action of site j, Indicating a downtime adjustment action of the station j, Indicating the maximum run-time of site j, Representing the minimum run-time of site j, Indicating the maximum downtime of site j, Representing minimum downtime for site j, for run-time adjustment actions And stop time adjustment actions Negative values indicate a shortened time, positive values indicate an extended time, and 0 indicates a maintenance of the original plan; executing actions, and adjusting the running time and the stop time; Based on the adjustment result, obtaining the instant prize and transferring to the next state, wherein the expression of the prize function is as follows: Wherein, the Representing the instant prize function obtained after action a is performed in state s, 、、 And The weight coefficients representing the items in the bonus function, Representing the cost coefficient of the waiting time, A passenger waiting time term indicating the result of train i at station j, Representing the coefficient of the full load rate deviation cost, A full rate deviation term representing the train i at station j, Representing the coefficient of energy consumption and cost, Represents the traction energy consumption term of train i in section j, Representing a coefficient of the cost of stopping the station, Represents stop time of train i at station j, x represents station index variable, y represents station index variable, The moment when the train i arrives at station y is indicated, Indicating the number of boarding passengers at station y, Indicating the upper vehicle speed rate of the station y, The number of passengers on the train at station x, Indicating the rated passenger capacity of the train, And Indicating the times of the trains at station x and station x +1 respectively, Represents the relation function of the traction force and the speed of the train, v represents the running speed of the train, dt represents the time differentiation, The moment when the train i arrives at the station x; in the new state, selecting the next action; based on the next action selected, the Q value is updated using the following equation: Wherein, the Is shown in the state Executing an action Is a state-action cost function of (c), Indicating the state corresponding to site j, Indicating the action selected at site j, The learning rate is indicated as being indicative of the learning rate, Indicating an instant prize is provided, Representing the discount factor(s), Representing the state corresponding to the next site j +1, Indicating the action to be selected at the next site, Representing the next state Executing an action Q value, learning rate of (C) An adaptive attenuation strategy is adopted, so that the method comprises the following steps of, gradually decreases with the increase of the training wheel number: Wherein, the The learning rate at the time of the e-th training is represented, The initial learning rate is indicated as being indicative of the initial learning rate, Representing an attenuation coefficient, e representing the current training wheel number; And repeating the process until the train reaches the terminal station to complete one training round, and completing off-line training when the training round number reaches the maximum value or the Q value converges.
7. The urban rail train consist autonomous decision making method according to claim 6, wherein the The expression of the policy selection action is as follows: Wherein, the Indicating the optimal action selected at site j, Representing a random number between 0 and 1, Indicating the state corresponding to site j, Representing the learning rate.
8. The urban rail train group autonomous decision making method according to claim 6, wherein the constraint conditions satisfied by the selection of the states and the action space update are as follows: Wherein, the Indicating the running time of the train at station j, Indicating a run-time adjustment action of site j, Indicating the stop time of the train at station j, Showing the downtime adjustment action of station j, The minimum operating interval is indicated and the minimum operating interval, The maximum operating interval is indicated and the maximum operating interval, Indicating a minimum stop time for the station, Indicating the maximum stop time of the station, Indicating the run time of the train at station j +1, Indicating the stop time of the train at station j+1.

Description

Autonomous decision-making method for urban rail train group Technical Field The invention belongs to the field of urban rail transit train operation control and intelligent scheduling, and particularly relates to an urban rail train group autonomous decision-making method. Background In the daily operation process of urban rail transit, the operation efficiency and the service quality of the urban rail transit are highly dependent on the matching degree of a train operation plan and actual passenger flow demands. The current urban rail system generally adopts a preset fixed schedule to schedule trains, and can meet the requirement of regular passenger flow change, but when irregular passenger flow fluctuation (such as sudden large passenger flow, tidal passenger flow and the like) is faced, the preset schedule is difficult to realize dynamic matching of passenger flow and train capacity, so that the full rate of trains is too high in part of time periods, the waiting time of passengers is too long, and the capacity waste phenomenon exists in other time periods. The traditional train operation optimization method mainly comprises schedule optimization based on historical data, real-time scheduling based on centralized control and emergency response strategy based on rules. For example, the existing automatic train control system (ATO) mainly depends on a preset running curve, lacks self-adaptive capacity to dynamic passenger flows, has high calculation complexity and large response delay although a centralized dispatching system can realize global optimization, is difficult to cope with real-time decision-making requirements of large-scale train groups, and is lack of self-learning capacity due to dependence on manual experience in a rule-based method and cannot adapt to complex and changeable running environments. The prior art has the following defects: The train operation plan lacks self-adaptive adjustment capability for dynamic passenger flows, real-time matching of the capacity and the demand is difficult to achieve, the centralized control architecture has single-point fault risks and is difficult to expand to large-scale train group collaborative optimization, a distributed collaborative decision mechanism of a multi-train workshop is lacking, intelligent train group scheduling cannot be achieved by fully utilizing a train communication technology, and the traditional optimization method is difficult to simultaneously consider multi-objective optimization of passenger service quality (waiting time and full load rate) and operation cost (energy consumption and stop time). Disclosure of Invention Aiming at the defects in the prior art, the urban rail train group autonomous decision-making method provided by the invention solves the problems that a train operation plan lacks self-adaptive adjustment capability for dynamic passenger flow, a centralized control architecture has single-point fault risk and lacks a multi-train distributed collaborative decision-making mechanism. In order to achieve the purpose, the technical scheme adopted by the invention is that the urban rail train group autonomous decision-making method comprises the following steps: s1, initializing a system, and acquiring running state data of each train in a urban rail line and passenger flow data of each platform in real time; S2, constructing a dynamic passenger flow model, and predicting the passenger flow distribution state of each train when the trains arrive at each station by using the dynamic passenger flow model according to the historical passenger flow arrival rate and the current real-time passenger flow data; S3, on the basis of predicting a passenger flow distribution state, constructing an inter-train communication topological graph by combining operation state data of each train, and based on the inter-train communication topological graph, utilizing a front train following communication topology to enable each train to perform information interaction with a front train only to acquire front train information, wherein a communication link is established between a train i and a front train i-1 only, and the front train i=1 is not connected with the front train; S4, inputting the acquired real-time station passenger flow data and the front train information into a dynamic passenger flow model, and cooperatively executing future passenger flow distribution prediction and updating the dynamic passenger flow model; S5, designing an autonomous decision maker based on Sarsa reinforcement learning for each train, inputting running state data of the trains, updated passenger flow distribution prediction and constraint conditions of a train group multi-target distributed optimization model into the autonomous decision maker, and performing offline training through interaction with a simulation environment to learn an optimal running interval and stop time adjustment strategy; And S6, in the deployment stage, executing online a