CN-121998292-A - Intelligent scheduling method and system for underground mining truck fleet

CN121998292ACN 121998292 ACN121998292 ACN 121998292ACN-121998292-A

Abstract

The invention discloses an intelligent scheduling method and system for underground mining truck fleets, and relates to the technical field of underground mine transportation scheduling. The method comprises the steps of constructing a scene model of an underground mine transportation system, obtaining a topological structure of the underground mine, designing an observation space, obtaining a physical state and a logic state of each vehicle and a global task state, adopting a graph attention network as a feature extractor for reinforcement learning, extracting cooperative features for inputting a strategy network according to state data of the observation space, constructing the strategy network based on a reinforcement learning algorithm of a continuous action space, obtaining the current cooperative features by an agent at each time step, outputting continuous actions, designing a shaping reward function inspired by a dynamic window method, guiding the learning of the agent, balancing efficiency and safety, and realizing optimal scheduling of the vehicle. The invention can make the vehicle execute smooth and fine dynamic control, effectively improve the transportation efficiency and reduce the conflict.

Inventors

GU QING
LIU YANG
YANG JUN
MENG YU

Assignees

北京科技大学

Dates

Publication Date: 20260508
Application Date: 20251216

Claims (10)

1. An intelligent scheduling method for a mining truck fleet in a well is characterized by comprising the following steps: S1, constructing an underground mine transportation system scene model, and acquiring a topological structure of an underground mine; S2, designing an observation space, and acquiring a physical state and a logic state of each vehicle and a global task state; S3, adopting a graph attention network as a feature extractor for reinforcement learning, dynamically constructing an adjacent relation according to the observed space state data, and aggregating neighbor node information through an attention mechanism to extract cooperative features for inputting a strategy network; S4, constructing a strategy network based on a reinforcement learning algorithm of a continuous action space, acquiring current cooperative characteristics by an intelligent agent at each time step, and outputting continuous actions, wherein the action space of the intelligent agent is defined as a continuous acceleration instruction; s5, designing a shaping rewarding function inspired by a dynamic window method, guiding an intelligent agent to learn, and realizing optimal scheduling of the vehicle, wherein the shaping rewarding function comprises a distance punishment item and a speed rewarding item.
2. The intelligent scheduling method for the underground mining truck fleet according to claim 1, wherein in the step S1, the topology structure of the underground mine comprises a loading node, an unloading node, a bidirectional single-lane road section, a vehicle-crossing chamber for meeting and a lane-crossing node.
3. The intelligent scheduling method for the underground mining truck fleet according to claim 1, wherein in the step S2, the physical state of each vehicle includes a normalized vehicle position and a normalized vehicle speed; The logic state of each vehicle comprises whether to load, whether to distribute tasks and an intersection conflict role, wherein the intersection conflict role refers to whether the vehicle plays a role of avoiding the vehicle or driving ahead in the intersection conflict, and the distance and the speed of the opposite vehicle; The global task state includes a normalized value of the remaining task volume for each load point.
4. The intelligent scheduling method for the underground mining truck fleet according to claim 1, wherein in the step S3, a graph attention network is adopted as a feature extractor, a distance matrix between vehicles is calculated according to real-time positions of all vehicles in each time step, and an adjacency matrix and an edge index are dynamically constructed according to a preset perception radius; The self state characteristics of the vehicles are used as node characteristics, the node characteristics are input into the multi-layer graph attention convolution, the graph attention mechanism is utilized to enable the intelligent agent to dynamically evaluate the importance of different neighbor vehicles, and the weighting information is aggregated according to the importance, so that the graph embedding characteristics are obtained; And splicing the graph embedded features with global task state features of an observation space to obtain the cooperative features.
5. The intelligent scheduling method for underground mining truck fleets according to claim 4, wherein the self-state features of the vehicles comprise physical state features and logical state features, the embedding is firstly carried out through a linear layer, then the embedded data and the edge index are fed into two GAT convolution layers, wherein a dropout layer is added between the first GAT convolution layer and the second GAT convolution layer, the graph embedding features are obtained through global averaging of the output of the second GAT convolution layer, the graph embedding features are spliced with the global task state features, and the high-dimensional cooperative features are obtained through the output feature layer.
6. The intelligent scheduling method for the underground mining truck fleet according to claim 1, wherein in the step S4, the action space of the agent is defined as a continuous, standardized acceleration command, and the agent outputs a specific acceleration value at each decision step, and the value can be directly substituted into the physical kinematic formula of the vehicle in the environment.
7. The intelligent scheduling method for the underground mining truck fleet according to claim 1, wherein in the step S5, the distance penalty term is used for guaranteeing safety, specifically comprising the steps of calculating the minimum distance d_obs between the current vehicle and other vehicles, and applying a penalty inversely proportional to the minimum distance when d_obs is smaller than the perception range; the speed rewards term is used to encourage efficiency and specifically includes awarding a positive reward based on the ratio of the current speed of the vehicle to the maximum speed self.max_speed.
8. An intelligent dispatch system for a fleet of underground mining trucks for use in implementing the method of any one of claims 1 to 7, the system comprising: The scene construction module is used for constructing a scene model of the underground mine transportation system and acquiring the topological structure of the underground mine; the state acquisition module is used for designing an observation space and acquiring the physical state and the logic state of each vehicle and the global task state; A feature extractor for processing the observed spatial state data; The method comprises the steps of adopting a graph attention network as a feature extractor for reinforcement learning, dynamically constructing an adjacency relation according to observed space state data, and aggregating neighbor node information through an attention mechanism to extract cooperative features for inputting a strategy network; the strategy network is used for making a decision according to the output of the feature extractor and outputting continuous actions; the method comprises the steps that a strategy network is built based on a reinforcement learning algorithm of a continuous action space, an intelligent agent obtains current cooperative characteristics at each time step and outputs continuous actions, wherein the action space of the intelligent agent is defined as a continuous acceleration instruction; And the reward function design module is used for designing a shaping reward function inspired by a dynamic window method, guiding an intelligent agent to learn and realizing optimal scheduling of the vehicle, wherein the shaping reward function comprises a distance punishment item and a speed reward item.
9. An electronic device, the electronic device comprising: A processor; A memory having stored thereon computer readable instructions which, when loaded and executed by the processor, implement the method of any of claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program code, which is callable by a processor for executing the method according to any one of claims 1 to 7.

Description

Intelligent scheduling method and system for underground mining truck fleet Technical Field The invention relates to the technical field of underground mine transportation scheduling, in particular to an underground mining truck fleet intelligent scheduling method and system based on a graph attention network and continuous action reinforcement learning. Background With the development of modern mining technology, intellectualization and automation have become core trends in mine operation. In this process, underground mines place extremely high demands on the collaborative operation of multiple facilities due to their complex working environment. The production cycle of underground mining involves multiple closely connected links of mining, digging, transporting, lifting, etc., wherein the transport system is a key proposition that connects each production stage, guaranteeing production continuity. The underground mining truck fleet is the subject of the transportation task. However, unlike the mesh structure of strip mines or traditional manufacturing industry, the transport network of underground mines has its unique topology structure, which is mainly composed of long and narrow single-row bidirectional single-lane roadways, and the space is extremely limited. In order to solve the problem of two-way vehicle meeting, a limited number of vehicle-staggering chambers are arranged in the roadway. Meanwhile, the main roadway and branches leading to different loading nodes form a plurality of intersection nodes. The long, straight and narrow topological structure depending on specific avoidance points makes the vehicle scheduling problem extremely complex, and traffic conflict, operation deadlock and efficiency bottleneck caused by invalid waiting are extremely easy to generate. Existing underground mine car team management research, such as car-by-car scheduling or global scheduling strategies, often rely on static rules or traditional operational research optimization, and real-time fluctuation of transportation tasks and dynamic changes of environments are difficult to effectively cope with. These methods fail to fully exploit the potential of modern autonomous mining trucks, especially those intelligent vehicles that are capable of accurately responding to scheduling instructions and flexibly adjusting the speed of travel. In recent years, reinforcement learning (Reinforcement Learning, RL) has shown great advantage in solving complex sequence decision problems as a powerful machine learning method. Reinforcement learning learns optimal strategies by trial and error by letting the agent learn in interactions with the environment to maximize jackpot (e.g., maximize transport efficiency, minimize waiting time). The nature of this self-learning and strategy iteration makes it well suited to deal with complex problems with high dynamics and uncertainty such as underground mine scheduling. While reinforcement learning has shown potential in the dispatch area, existing related research has focused mainly on mesh dispatch of strip mine operations or general Automatic Guided Vehicles (AGVs). The topology of these scenarios is relatively simple and the collision type is single. At present, no research has been conducted to explore how to solve the complex scheduling problem of single-row bidirectional single-channel and staggered car chambers which are unique to underground mines by using reinforcement learning. More importantly, in the existing few reinforcement learning scheduling attempts, discretization is commonly adopted in the design of the action space. Specifically, the decision-making of the agent is limited to a few advanced, discrete instructions, such as "advance", "stop", "enter chamber", or "wait". This design of discrete motion space has the disadvantage that the control of the vehicle is non-black, i.e. white. For example, during a meeting, the agent can only choose to "stop completely" or "go forward at full speed". This "sudden start and stop" control not only results in hard and uneven running process, but also wastes significantly time. In many scenarios where "slow down slightly, misuse" is possible, discrete control may force the vehicle to wait unnecessarily long. Moreover, this results in high and costly vehicle energy consumption, frequent complete stops and restarts being a major source of vehicle energy consumption, as well as exacerbating mechanical wear of the vehicle (e.g., engine, brake system). Furthermore, discrete actions cannot refine decisions and cannot handle refined interactions. For example, it cannot express a more humankind decision of "decelerating at 70% speed" or "accelerating slightly to rob the intersection before the other, resulting in hard and inefficient control of the vehicle. Disclosure of Invention Aiming at the problem of hardness control of traditional discrete actions, the invention provides an intelligent scheduling method and system for underground mining