CN-121981444-A - Cascade ship lock scheduling method and device

CN121981444ACN 121981444 ACN121981444 ACN 121981444ACN-121981444-A

Abstract

The invention relates to a cascade ship lock scheduling method and device, wherein the method comprises the steps of inputting a current state vector into a near-end strategy optimization algorithm to obtain an optimal action, carrying out strategy reconstruction on the optimal action based on a multi-target self-adaptive large neighborhood search strategy in a reinforcement learning environment to obtain a candidate solution, carrying out constraint and multi-target calculation on the candidate solution through a flow control scheduling optimization model to obtain a final pareto solution set, and scheduling course tasks on the cascade ship lock according to the final pareto solution set.

Inventors

DU LEI
YU ZHONGYU
WAN YUQI
ZHANG FAN
ZHANG LEI
WEN YUANQIAO

Assignees

武汉理工大学

Dates

Publication Date: 20260505
Application Date: 20251230

Claims (10)

1. A method of cascading lock dispatch comprising: acquiring all initial voyage tasks to be scheduled in a cascading ship lock, and constructing a hybrid intelligent optimization algorithm in a reinforcement learning environment, wherein the hybrid intelligent optimization algorithm comprises a near-end strategy optimization algorithm and a multi-target self-adaptive large neighborhood search strategy; sequencing and scheduling optimization are carried out on all the initial voyage tasks to obtain a current state vector; Inputting the current state vector to the near-end strategy optimization algorithm to obtain an optimal action; Performing strategy reconstruction on the optimal action based on the multi-target self-adaptive large neighborhood search strategy in the reinforcement learning environment to obtain a candidate solution, performing constraint and multi-target calculation on the candidate solution through a flow control scheduling optimization model to obtain a multi-dimensional target vector, performing iterative updating on a pareto archive through the multi-dimensional target vector to obtain a final pareto solution set, and scheduling course tasks on the cascade ship lock according to the final pareto solution set, wherein the multi-target calculation comprises calculation of total operation cost, water area congestion degree, ship lock utilization rate, reversing cost and load balancing degree in a hub.
2. The method of claim 1, wherein said sequencing and scheduling optimization of all initial voyage tasks to obtain a current state vector comprises: The method comprises the steps of carrying out overall descending order on all initial voyage tasks based on a composite order rule to obtain an order result, wherein the composite order rule is to order according to the preset ship priority of each initial voyage task as a main order key and the initial registration arrival time of a ship as a secondary order key; sequentially iterating and searching optimal insertion points for the initial voyage tasks in the sequencing result one by one based on a sequential greedy insertion heuristic algorithm to obtain a scheduling plan initial solution; calculating the initial solution of the scheduling plan according to the flow control scheduling optimization model to obtain a performance initial vector, and taking the performance initial vector as an initial solution of a pareto archive; And constructing a current state vector according to various parameters in the reinforcement learning environment and the contents of the pareto archive.
3. The cascading lock scheduling method of claim 2, wherein the flow control scheduling optimization model comprises a total operation cost objective function, a water area congestion level objective function, a lock utilization objective function, a diversion cost objective function, and an in-hub load balancing objective function.
4. The cascading ship lock scheduling method according to claim 2, wherein the constraint conditions of the flow control scheduling optimization model comprise scheduling general constraints, ship lock operation constraints, lock chamber arrangement constraints and bidirectional flow gating constraints, wherein the scheduling general constraints comprise ship lock time constraints, scheduling unit unique allocation constraints and intra-lock scheduling consistency constraints, the ship lock operation constraints comprise adjacent lock time minimum interval constraints and scheduling period constraints, the lock chamber arrangement constraints comprise ship berthing boundary constraints and ship berthing non-overlapping constraints, and the bidirectional flow gating constraints comprise inter-dam water area capacity constraints, downlink flow gating constraints and uplink flow gating constraints.
5. The method for cascade lock scheduling according to claim 1, wherein after said sequencing and scheduling optimization of all the initial voyage tasks to obtain a current state vector, further comprising: updating the current iteration times and judging whether the current iteration times are smaller than the maximum iteration times or not; If yes, inputting the current state vector to the near-end strategy optimization algorithm to obtain an optimal action; if not, determining the pareto archive as a final pareto solution set.
6. The cascading lock scheduling method according to claim 1, wherein the inputting the current state vector into the near-end policy optimization algorithm results in an optimal action, comprising: And calculating the selection probability of the composite neighborhood search strategy in the current state vector according to the strategy network of the near-end strategy optimization algorithm, and determining the composite neighborhood search strategy with the highest selection probability as the optimal action.
7. The cascade lock scheduling method according to claim 1, wherein the performing policy reconstruction on the optimal action based on the multi-objective adaptive large neighborhood search policy in the reinforcement learning environment to obtain a candidate solution, performing constraint and multi-objective calculation on the candidate solution through a flow control scheduling optimization model to obtain a multi-dimensional target vector, performing iterative update on a pareto archive through the multi-dimensional target vector to obtain a final pareto solution set, includes: determining a parent solution from the pareto archive of the multi-target adaptive large neighborhood search strategy based on the reinforcement learning environment; Destroying and reconstructing the parent solution according to a composite neighborhood search strategy corresponding to the optimal action to obtain a candidate solution; performing multi-objective evaluation on the candidate solution according to a flow control scheduling optimization model to obtain a multi-dimensional objective vector; Comparing the multi-dimensional target vector with the performance vectors of all solutions in the pareto archive one by one to obtain a comparison result; Updating the pareto archive according to the comparison result, updating the current iteration times, and determining a next state vector; When the updated current iteration times are smaller than the maximum iteration times, determining the next state vector as the current state vector, and performing iterative updating in the reinforcement learning environment; and when the updated current iteration times are not smaller than the maximum iteration times, determining the updated pareto archive as a final pareto solution set.
8. The cascading lock scheduling method of claim 1, further comprising: acquiring historical shipping data and determining the earliest and latest dates of random sampling; Determining a scheduling problem instance set according to the historical shipping data, the earliest date and the latest date; creating a plurality of independent sub-processes, and instantiating a corresponding initial reinforcement learning environment for each independent sub-process; determining corresponding current states in a plurality of initial reinforcement learning environments to form a batch state tensor; inputting the batch state tensors into a strategy network and a value network of the near-end strategy optimization algorithm to obtain initial action and value estimation; Iterating in an independent subprocess of each initial reinforcement learning environment according to the multi-target self-adaptive large neighborhood search strategy and the corresponding initial action to obtain a tuple of each initial reinforcement learning environment, and storing the tuple and the corresponding value estimation into an experience playback buffer area; when the data volume in the experience playback buffer zone reaches the upper limit, all data in the experience playback buffer zone is evaluated based on a generalized dominance estimation algorithm, and an evaluation result is obtained; Updating weights of the strategy network and the value network according to the evaluation result, and emptying and updating iteration times of the experience playback buffer; And when the iteration times reach the total training step number, obtaining an updated near-end strategy optimization algorithm.
9. The method of cascading lock dispatching according to claim 4, wherein the inter-dam water volume constraints are as follows: In the formula, Is at the moment The collection of vessels in the inter-dam waters, Is a preset maximum safe capacity; the downstream traffic gating constraint is as follows: In the formula, Is any time window in the future for the downstream hub The theoretical maximum capacity of the interior, A set of downstream vessels that are expected to arrive at the downstream hub within the time window under the current decision; the upstream traffic gating constraint is as follows: In the formula, Is the upstream hub in the future time window Theoretical maximum capacity within. Is the set of upstream vessels that are expected to reach the upstream hub.
10. A dispatching device for a cascading ship lock, characterized by comprising the following steps: The system comprises a task acquisition module, a task analysis module and a task analysis module, wherein the task acquisition module is used for acquiring all initial voyage tasks to be scheduled in a cascading ship lock and constructing a hybrid intelligent optimization algorithm in a reinforcement learning environment, and the hybrid intelligent optimization algorithm comprises a near-end strategy optimization algorithm and a multi-target self-adaptive large neighborhood search strategy; The state determining module is used for sequencing and scheduling and optimizing all the initial voyage tasks to obtain a current state vector; the action determining module is used for inputting the current state vector into the near-end strategy optimization algorithm to obtain an optimal action; The task scheduling module is used for carrying out strategy reconstruction on the optimal action based on the multi-target self-adaptive large neighborhood search strategy in the reinforcement learning environment to obtain a candidate solution, carrying out constraint and multi-target calculation on the candidate solution through a flow control scheduling optimization model to obtain a multi-dimensional target vector, carrying out iterative updating on a pareto archive through the multi-dimensional target vector to obtain a final pareto solution set, and scheduling the range task on the cascade ship lock according to the final pareto solution set, wherein the multi-target calculation comprises calculation of total operation cost, water area congestion degree, ship lock utilization rate, reversing cost and in-hub load balancing degree.

Description

Cascade ship lock scheduling method and device Technical Field The invention relates to the technical field of inland navigation scheduling, in particular to a cascade ship lock scheduling method and device. Background The ship lock is used as a key infrastructure in a inland navigation system, and is a core node for overcoming the natural or artificial water level fall, ensuring the smoothness of a channel and connecting a large water transport artery. The operation efficiency directly determines the traffic capacity and economic benefit of the whole river basin shipping system, and plays a vital role in the global logistics and supply chain. Particularly in a cascade junction system formed by a plurality of ship locks, efficient and cooperative scheduling is a basic stone for guaranteeing the safety and stability of the destiny pulse. However, the actual operating environment of the ship lock system is complex and changeable, and the traffic capacity is not constant, but faces challenges of various internal and external factors at any time. Traditional scheduling algorithms are also deficient in the face of such large-scale, multi-objective, strongly constrained combinatorial optimization problems. The common heuristic algorithm mostly adopts a greedy strategy of 'sequential construction' or 'seam insertion', is easy to cause 'no-way walking' of the subsequent ship due to early local optimal selection, and can not generate a completely feasible solution even when resources are tense. Therefore, there is an urgent need to provide a cascade ship lock scheduling method and apparatus, which solve the problem that the scheduling algorithm in the prior art adopts a greedy strategy, so that the subsequent ship can "walk without road" easily due to the early local optimal selection. Disclosure of Invention In view of the foregoing, it is necessary to provide a cascade lock scheduling method and apparatus for solving the problem that the scheduling algorithm in the prior art adopts a greedy strategy, which is easy to cause "no-way walking" of the following ship due to early local optimal selection. In order to solve the above problems, in a first aspect, the present invention provides a cascade lock scheduling method, including: acquiring all initial voyage tasks to be scheduled in a cascading ship lock, and constructing a hybrid intelligent optimization algorithm in a reinforcement learning environment, wherein the hybrid intelligent optimization algorithm comprises a near-end strategy optimization algorithm and a multi-target self-adaptive large neighborhood search strategy; sequencing and scheduling optimization are carried out on all the initial voyage tasks to obtain a current state vector; Inputting the current state vector to the near-end strategy optimization algorithm to obtain an optimal action; Performing strategy reconstruction on the optimal action based on the multi-target self-adaptive large neighborhood search strategy in the reinforcement learning environment to obtain a candidate solution, performing constraint and multi-target calculation on the candidate solution through a flow control scheduling optimization model to obtain a multi-dimensional target vector, performing iterative updating on a pareto archive through the multi-dimensional target vector to obtain a final pareto solution set, and scheduling course tasks on the cascade ship lock according to the final pareto solution set, wherein the multi-target calculation comprises calculation of total operation cost, water area congestion degree, ship lock utilization rate, reversing cost and in-hub load balancing degree. In a second aspect, the present invention also provides a cascaded lock scheduling apparatus, including: The system comprises a task acquisition module, a task analysis module and a task analysis module, wherein the task acquisition module is used for acquiring all initial voyage tasks to be scheduled in a cascading ship lock and constructing a hybrid intelligent optimization algorithm in a reinforcement learning environment, and the hybrid intelligent optimization algorithm comprises a near-end strategy optimization algorithm and a multi-target self-adaptive large neighborhood search strategy; The state determining module is used for sequencing and scheduling and optimizing all the initial voyage tasks to obtain a current state vector; the action determining module is used for inputting the current state vector into the near-end strategy optimization algorithm to obtain an optimal action; The task scheduling module is used for carrying out strategy reconstruction on the optimal action based on the multi-target self-adaptive large neighborhood search strategy in the reinforcement learning environment to obtain a candidate solution, carrying out constraint and multi-target calculation on the candidate solution through a flow control scheduling optimization model to obtain a multi-dimensional target vector, carrying out iterativ