CN-122022625-A - Deep reinforcement learning-based intelligent scheduling method for AGV production logistics through complete constraint
Abstract
The invention discloses a complete set constraint AGV production logistics intelligent scheduling method based on deep reinforcement learning, which comprises the following steps of 1, constructing a unified data paradigm to form a state space of deep reinforcement learning and define an action space, 2, obtaining a current time step state of the state space and calculating to obtain a Q value set, deciding based on the Q value set to obtain current time step scheduling actions of each AGV, 3, executing the current time step scheduling actions by each AGV and obtaining execution results, calculating based on the current time step execution results by adopting a reward function to obtain a current time step reward value, thereby completing scheduling of the current time step, 4, obtaining a next time step state of the state space after the scheduling of the current time step is completed by a backbone network, and then repeating the steps 2 and 3 to complete scheduling of the next time step. The invention can provide a complete data base for scheduling decisions and can form an effective material collaborative distribution strategy.
Inventors
- LING LIN
- WANG SHIXING
- ZHANG XI
- GE MAOGEN
- LIU MINGZHOU
- HU JING
Assignees
- 合肥工业大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260106
Claims (6)
- 1. The intelligent scheduling method for the AGV production logistics based on the complete set constraint of deep reinforcement learning is characterized by comprising the following steps of: Step 1, constructing a unified data paradigm for adaptive nesting management, wherein the unified data paradigm comprises nesting sets, AGV states, task demands and environment states; Defining a deep reinforcement learning action space based on material cooperative distribution requirements under the alignment constraint and combining with the response characteristics of the AGVs, wherein the action space is a discrete action set of each AGV and comprises material distribution tasks to be responded of all the associated alignment groups and an idle action, and the idle action is used for avoiding invalid idle running or idle running when the AGVs do not adapt to the distribution tasks; Step 2, the DQN network of deep reinforcement learning acquires the complete set data, AGV state data, task demand data and environment state data of the current time step as the current time step state of the state space, and calculates a Q value set according to the current time step state of the state space; then, a decision is made based on the Q value set through a decision strategy of deep reinforcement learning so as to determine the current time pace action of each AGV in a motion space; Step 3, each AGV executes corresponding current time step scheduling action to perform current time step alignment and distribution operation, and obtains respective current time step execution results; calculating based on the current time step execution result of each AGV by adopting a reward function to obtain a current time step reward value; the current time step state, the current time step scheduling action and the current time step rewarding value of the state space form a current experience sample, and the current experience sample is added into an experience playback pool of deep reinforcement learning; Thereby completing the scheduling of the current time step; step 4, the DQN network acquires a complete set of the current time step scheduling, an AGV state, a task demand and an environment state as a next time step state of a state space; When the number of experience samples in the experience playback pool reaches a preset threshold value, randomly extracting a preset batch of experience samples from the experience playback pool, calculating network loss of a main network based on a mean square error loss function, and updating parameters of the main network through a gradient descent method; and then repeating the steps 2 and 3 to finish the scheduling of the next time step.
- 2. The deep reinforcement learning-based intelligent scheduling method for the production logistics of the AGV with the alignment constraint is characterized in that in the step 1, the alignment group comprises a bill of materials, an associated order ID, an alignment group ID, a required delivery time, a target station ID and a matched progress; the AGV state comprises a current task ID, a load state, an operating state, a residual electric quantity, a real-time position and AGVID; The task demands comprise task IDs, associated group IDs, material IDs to be distributed, target station IDs, task states, task priorities and latest completion time; The environment state comprises a station state, an environment state ID, a station ID, a path congestion coefficient and data updating time.
- 3. The deep reinforcement learning-based nested constraint AGV production logistics intelligent scheduling method according to claim 1, wherein in step 2, a deep reinforcement learning decision strategy adopts -Greedy strategy by A greedy decision strategy makes decisions based on the set of Q values to (1- ) Probability selection of action with maximum Q value The probability randomly selects an action, thereby determining the current time pace action of each AGV from the action space.
- 4. The deep reinforcement learning-based intelligent scheduling method for the production logistics of the aligned constraint AGV according to claim 1, wherein in step 2, the execution result specifically comprises a completion state of a corresponding material distribution task, whether the execution is performed in the latest completion time, a matched progress change of an associated aligned group, and a real-time state after the execution of an action by the AGV.
- 5. The deep reinforcement learning based intelligent scheduling method for the production logistics of the AGV with the nested constraint according to claim 1, wherein in the step 3, the reward function is a composite reward function including high sparse rewards, dense rewards, auxiliary basic rewards, and ineffective behavior penalties.
- 6. The deep reinforcement learning based nested constraint AGV production logistics intelligent scheduling method according to any one of claims 1-5, wherein step 4 further comprises randomly extracting a plurality of experience samples from an experience playback pool, calculating network loss based on each experience sample, and updating parameters of the backbone network according to the network loss.
Description
Deep reinforcement learning-based intelligent scheduling method for AGV production logistics through complete constraint Technical Field The invention relates to the field of AGV intelligent scheduling methods, in particular to a complete constraint AGV production logistics intelligent scheduling method based on deep reinforcement learning. Background In modern manufacturing industry, "nesting" of materials (i.e., the multiple materials needed for production are matched in groups and synchronously delivered to stations) is a core premise for ensuring continuous operation of a production line. In actual production, station downtime caused by material sleeve shortage is more than 30%, production plan delay and equipment utilization rate reduction are directly caused, and with popularization of multi-variety and small-batch production modes, workshop material requirements show multi-batch and multi-combination characteristics, and a single order is matched with tens of materials according to a specific proportion, so that complexity of sleeve alignment management is further increased. To address this challenge, manufacturing workshops commonly introduce AGVs to replace manual work to accomplish material distribution, desirably to increase turnover efficiency through automation. With the maturity of industrial internet of things technology, data such as material inventory, AGV state, complete set group demand are acquired in real time to become possible, and a data foundation is provided for breaking through the dilemma. The current mainstream research attempts to optimize AGV scheduling through intelligent algorithms such as deep reinforcement learning, and the like, although the results are achieved in path planning and task response time, the following key disadvantages still exist in practical application, and particularly the problems are prominent when handling the shrink fit constraint: (1) The complete set constraint adaptation is absent, and the production cooperativity is insufficient. The existing AGV scheduling focuses on punctuality or path optimization of a single task, and material alignment is not used as a core constraint. Because the AGV path distance is optimized, the actual production requirement of synchronous material delivery is ignored, so that stations are stopped due to material shortage or materials are accumulated in advance, and the production continuity is seriously affected. (2) The heterogeneous data paradigm does not integrate complete sets of information, and data consistency is poor. The current data driving method can acquire data such as AGV state, task demand and the like through the Internet of things, but lacks a unified data format aiming at alignment. Key information such as the complete set composition, the material matching relation, the station complete set requirement and the like is often stored in a scattered manner, so that the complete set state cannot be obtained in real time when the AGV schedule makes a decision, and the adaptability of a scheduling strategy is reduced. (3) The existing fixed scheduling plans are difficult to cope with dynamic task disturbance and have response lag. In order to cope with dynamic tasks, the existing method mostly adopts an AGV self-organizing mode. However, the method only focuses on the local optimization of single task or single AGV, does not perform global optimization from the overall completion efficiency of the alignment group, has local greedy, is easy to cause unbalanced material distribution progress in the alignment group, prolongs the overall alignment period, and restricts the global benefit of the production logistics system. Disclosure of Invention The invention provides an intelligent scheduling method for an AGV production logistics based on complete set constraint based on deep reinforcement learning, which aims to solve the problems that the scheduling method in the prior art is difficult to adapt to complete set constraint, insufficient in data support and delayed in dynamic response. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: the intelligent scheduling method for the AGV production logistics based on the complete set constraint of deep reinforcement learning comprises the following steps: Step 1, constructing a unified data paradigm for adaptive nesting management, wherein the unified data paradigm comprises nesting sets, AGV states, task demands and environment states; Defining a deep reinforcement learning action space based on material cooperative distribution requirements under the alignment constraint and combining with the response characteristics of the AGVs, wherein the action space is a discrete action set of each AGV and comprises material distribution tasks to be responded of all the associated alignment groups and an idle action, and the idle action is used for avoiding invalid idle running or idle running when the AGVs do not adapt to the