CN-121973833-A - Railway signal scheduling optimization method and system based on AlphaEvolve algorithm

CN121973833ACN 121973833 ACN121973833 ACN 121973833ACN-121973833-A

Abstract

The invention discloses a railway signal scheduling optimization method and system based on AlphaEvolve algorithm, which relate to the technical field of railway signal scheduling and comprise the following steps of S1, constructing a track state diagram; S2, constructing a track state representation model, a strategy network and a value network, S3, constructing a Monte Carlo tree, searching by utilizing selection, expansion, simulation and return stages in Monte Carlo tree searching, generating a preliminary scheduling strategy after searching, S4, introducing a plurality of candidate scheduling sequences from the preliminary scheduling strategy to serve as a population, generating a scheduling new sample by using an evolution strategy, screening the scheduling new sample and updating the strategy network based on path cost, and S5, selecting a scheduling scheme with the least conflict and the least delay in the candidate samples to serve as an optimal scheduling scheme and outputting the optimal scheduling scheme. Compared with the traditional railway dispatching method based on rules or single heuristic algorithm, the method has stronger learning capacity and generalization capacity, and can be continuously adapted to different running scenes.

Inventors

WEI LEQI
FU LINTAI
ZHANG JUAN
YE CHENG
LI NANXIN
YUAN LI

Assignees

卡斯柯信号(成都)有限公司

Dates

Publication Date: 20260505
Application Date: 20260109

Claims (10)

1. The railway signal scheduling optimization method based on AlphaEvolve algorithm is characterized by comprising the following steps: s1, constructing a track state diagram and initializing the state diagram; S2, constructing a track state characterization model, a strategy network and a value network, wherein the track state characterization model is constructed based on a graph neural network and is used for coding a state graph and extracting key track state characteristics, the strategy network is used for generating initial scheduling probability distribution, and the value network is used for estimating potential scheduling quality of the current state; S3, constructing a Monte Carlo tree, searching by utilizing the selection, expansion, simulation and return stages in Monte Carlo tree searching, extracting key track state characteristics of each state node in the tree by utilizing a track state representation model in the searching process, and evaluating the state nodes through a strategy network and a value network to guide the searching; s4, introducing a plurality of candidate scheduling sequences from the preliminary scheduling strategy to serve as a population, generating scheduling new samples by using an evolution strategy, screening the scheduling new samples based on path cost and updating a strategy network, wherein the screened scheduling new samples and the preliminary scheduling strategy form candidate samples; S5, selecting the scheduling scheme with the least conflict and the least delay in the candidate samples as the optimal scheduling scheme and outputting the optimal scheduling scheme.
2. The railway signal scheduling optimization method based on AlphaEvolve algorithm as claimed in claim 1, wherein in the step S3, the monte carlo tree search includes: The selection stage comprises the steps of taking an initialized track state diagram as a root node, constructing a search tree, starting from the root node of the search tree, selecting an optimal scheduling action path in a child node according to a PUCT formula, and fusing strategy network output and access times; If the selected child node is not fully expanded, adding a new child node to represent a new scheduling decision; the simulation stage, namely performing one-time complete scheduling simulation from the new child node, wherein the scheduling simulation comprises the steps of simulating a train passing path in turn and judging conflict/success; And the feedback stage is used for returning the simulation result to the nodes on the path and updating the node estimation value on the path, wherein the simulation result comprises whether the simulation result conflicts, the path cost and the delay cost.
3. The railway signal scheduling optimization method based on AlphaEvolve algorithm as claimed in claim 1, wherein the selection stage of the monte carlo tree search includes: The selection stage process description comprises the steps of starting from a root node of a search tree, selecting an optimal scheduling action path from child nodes according to PUCT strategies until a leaf node or a node which is not expanded yet is reached; the strategies applied in railway dispatch by the selection phase include: Each node represents a train scheduling state; action selection considers that a current schedulable train set, the priority of each train and the available condition of a current signal section; balancing the initial scheduling probability distribution and the number of accessed times provided by using a policy network, comprising: In the formula, Is the comprehensive evaluation of actions; is action value; To explore-use equilibrium constants; The probability distribution is initially scheduled; The number of times of access for the father node; The number of accesses is the action.
4. The railway signal scheduling optimization method based on AlphaEvolve algorithm as claimed in claim 1, wherein the expansion phase of the monte carlo tree search includes: if the currently selected node is not fully expanded, adding a child node representing a new scheduling action to the currently selected node; the rules applied in railway dispatch by the expansion phase include: For a dispatch state, the expandable actions include arranging a train to enter a signal section, designating a next running path for the train, and adjusting the inter-train departure interval; business constraint is considered, namely, the two trains are prevented from occupying the same block section at the same time, and the speed limit and turnout state are hard constrained.
5. The railway signal scheduling optimization method based on AlphaEvolve algorithm as claimed in claim 1, wherein the simulation phase of the monte carlo tree search includes: the simulation stage process description comprises the steps of starting from a newly expanded child node, performing once complete scheduling simulation based on the current state until the state is finished, and evaluating scheduling quality by using a value network; The simulation process of the simulation phase applied to railway dispatching comprises the following steps: Guiding a simulation direction by using a strategy network, and adding a certain randomness to keep diversity; Simulating and generating a dispatching track, wherein the dispatching track comprises a train running path, a section record is occupied when each annunciator passes through; outputting the analog evaluation value: negative factors including number of collisions, delay time, path deviation; positive factors, namely overall passing efficiency and running chart matching degree; And comprehensively generating an analog return value.
6. The railway signal scheduling optimization method based on AlphaEvolve algorithm as claimed in claim 1, wherein the backhaul phase of the monte carlo tree search includes: The return stage process description comprises uploading the return value obtained in the simulation stage to all the passing nodes along the path to update the average value Q and the access times N of each node, wherein the return formula is as follows: Wherein, the Is the first The number of accesses by the individual nodes; Is the first Average value of individual nodes; is a state value estimate; The meaning of the backhaul phase applied to railway dispatch includes: the higher the node access frequency is, the better the scheduling path is; the high Q value represents that the corresponding scheduling action history is good; The backhaul strengthens the high quality scheduling path and suppresses the inefficient or unsafe path.
7. The railway signal scheduling optimization method based on AlphaEvolve algorithm as claimed in claim 1, wherein the step S4 includes: S41, initializing a candidate individual population, namely searching from a Monte Carlo tree to obtain a plurality of high-score scheduling sequences as the population, wherein each individual in the population comprises a running path of each train, a departure time plan and a signal occupation plan; s42, generating a new sample by individual variation, namely generating a scheduling new sample by individual variation by using a disturbance operator; s43, calculating a fitness function, namely calculating the score of a new dispatching sample under multiple targets by using the fitness function; S44, selecting and feeding back a winning individual, namely selecting a new scheduling sample with k% of the previous score as a winner, and using the winner for updating a strategy network, expanding a new action of a node on a Monte Carlo tree search tree and serving as an initial individual for next evolution.
8. The railway signal scheduling optimization method based on AlphaEvolve algorithm as claimed in claim 7, wherein in step S42, the perturbation operator includes: Time disturbance, namely slightly adjusting train departure/arrival time; path replacement, namely selecting an equivalent branch from feasible paths; the sequence rearrangement is to adjust the scheduling sequence of the two trains; Section exchange, namely exchanging the arrival sequence of trains at a certain junction.
9. The railway signal scheduling optimization method based on AlphaEvolve algorithm as claimed in claim 7, wherein in step S43, the fitness function is: Wherein, the For new samples Is used to calculate the score of (a), The number of signal collisions; Total delay time for the train; whether the original operation diagram plan is met or not; Weighting the passenger priorities; 、、、 As an adjustable coefficient, a scheduling policy tendency is reflected.
10. Railway signal scheduling optimization system based on AlphaEvolve algorithm based on the railway signal scheduling optimization method according to any one of claims 1 to 9, characterized in that it comprises: The track topology modeling module adopts a graph structure to represent a railway signal system, models stations, sections, turnouts and annunciators as nodes and edges, and forms an inputtable state graph; The state representation and self-supervision learning module is in communication connection with the track topology modeling module and is provided with a track state representation model, the track state representation model encodes a state diagram by using a graph neural network, key track state characteristics are extracted, and the track state representation model is trained by combining a self-supervision learning mechanism; AlphaEvolve an intelligent search scheduling module which is in communication connection with the state representation and the self-supervision learning module, and combines Monte Carlo tree search and strategy-value network guidance strategy to search the state characteristics of the key track, simulate railway signal scheduling and generate a preliminary scheduling strategy; The evolution strategy fusion module is in communication connection with the AlphaEvolve intelligent search scheduling module, introduces a plurality of candidate scheduling bodies by utilizing a preliminary scheduling strategy, asynchronously and parallelly evaluates the strategy expression, and performs mutation and screening according to the reward signal to obtain a scheduling screening new sample; the scheduling rule constraint module is respectively in communication connection with the AlphaEvolve intelligent search scheduling module and the evolution strategy fusion module, integrates railway service rules and embeds a scheduling judgment mechanism; And the output interface module is respectively in communication connection with the AlphaEvolve intelligent search scheduling module and the evolution strategy fusion module, screens new samples according to the preliminary scheduling strategy and the scheduling to obtain an optimal scheduling scheme and outputs the optimal scheduling scheme.

Description

Railway signal scheduling optimization method and system based on AlphaEvolve algorithm Technical Field The invention relates to the technical field of railway signal scheduling, in particular to a railway signal scheduling optimization method and system based on AlphaEvolve algorithm. Background The railway signal dispatching is a core link in a railway transportation dispatching system and is mainly responsible for carrying out real-time monitoring, instruction issuing and coordination control on train operation through signal equipment and communication technology, so as to ensure that the train runs ‌ safely and efficiently according to an operation diagram. In the prior art, the railway signal scheduling has the following problems: 1. Conventional railway signal scheduling systems face complexity and real-time challenges Along with the increase of the density of trains and the complexity of a line network, the railway signal scheduling system must realize the efficient collaborative operation of multiple trains on the premise of ensuring the safety. However, the conventional scheduling system is difficult to cope with sudden conditions such as late train, temporary scheduling or line occupation conflict in real time by a multi-dependency rule driving or static diagram scheduling method. The manual intervention of the dispatcher is low in efficiency and easy to make mistakes, and the improvement of the railway transportation capacity and the intelligent development of dispatching are seriously restricted. 2. The application of the existing intelligent algorithm in the railway field is limited In recent years, heuristic algorithms such as genetic algorithm, ant colony algorithm, classical Monte Carlo Tree Search (MCTS) and the like are introduced to solve scheduling problems, but most of the methods are used for scheduling manufacturing systems or computing tasks, and obvious shortboards exist in aspects of real-time performance, safety and topological constraint modeling of railway signal systems. In addition, the lack of an efficient state modeling mode and a strategy generalization mechanism makes the methods difficult to adapt to complex and changeable railway operation scenes, and cannot realize large-scale, multi-objective and high-frequency scheduling decision requirements. 3. Deep learning and reinforcement learning fusion technology is not fully combined with railway service characteristics Although the AlphaGo and like deep reinforcement learning frameworks exhibit extremely strong decision optimization capabilities, it is not practical to migrate directly to a railway dispatch scenario. The main reasons are that the railway dispatching problem has the characteristics of multiple targets (safety and efficiency), strong constraint (signal section, rule meeting) and high real-time performance, and the dispatching algorithm is required to have strong learning capacity and combined with the railway service logic understanding and graph structure modeling capacity. At present, a scheduling optimization system which integrates deep learning and evolution optimization and has self-supervision state understanding capability is not available to meet the requirements. Disclosure of Invention In order to overcome the defects in the prior art, the invention discloses a railway signal dispatching optimization method and a railway signal dispatching optimization system based on AlphaEvolve algorithm, which are integrated with AlphaEvolve algorithm (consisting of Monte Carlo tree search MCTS, strategy-value network, evolution strategy and self-supervision learning mechanism) and solve the problems of frequent collision of current train paths, delayed dispatching response and low intelligent degree of the system. In order to achieve the above purpose, the present invention adopts the technical scheme that: in a first aspect, the present invention provides a railway signal scheduling optimization method based on AlphaEvolve algorithm, which includes the following steps: 1. Track state initialization S1, constructing a track state diagram and initializing the state diagram; Preferably, the step S1 comprises the steps of obtaining a current railway running diagram, a train position and a device state, constructing a track topological diagram by using the obtained data, generating a track state diagram, and initializing a scheduling initial state node in the state diagram to serve as a Monte Carlo tree searching root node. 2. Model network construction S2, constructing a track state characterization model, a strategy network and a value network, wherein the track state characterization model is constructed based on a graph neural network and is used for coding a state graph and extracting key track state characteristics, the strategy network is used for generating initial scheduling probability distribution, and the value network is used for estimating potential scheduling quality of the current state; In the pres