CN-121981485-A - Cloud manufacturing scheduling method based on discrete event simulation and multi-agent reinforcement learning

CN121981485ACN 121981485 ACN121981485 ACN 121981485ACN-121981485-A

Abstract

The invention discloses a cloud manufacturing scheduling method based on discrete event simulation and multi-agent reinforcement learning, and belongs to the technical field of intelligent manufacturing and cloud manufacturing service scheduling. The method models each manufacturing task as an independent agent, builds a part of observable Markov decision process model, realizes information fusion among agents through a strategy network, adopts a value mixed network meeting monotonicity constraint to carry out global value estimation, triggers agent cooperative decision through an event driving mechanism in a discrete event simulation environment, maps factory allocation actions to manufacturing or logistics events, and carries out network training through a priority experience playback mechanism. The method breaks through the sequence limitation of the traditional 'pre-ordering and post-selecting' method, realizes synchronous joint optimization of service selection and task scheduling, remarkably improves scheduling efficiency and sample utilization rate in complex manufacturing scenes, and is suitable for a multi-task and multi-factory distributed cloud manufacturing environment.

Inventors

WANG CHENG
FAN JIAPENG

Assignees

浙江工业大学

Dates

Publication Date: 20260505
Application Date: 20260130

Claims (7)

1. The cloud manufacturing scheduling method based on discrete event simulation and multi-agent reinforcement learning is characterized by comprising the following steps of: Step1, establishing a cloud manufacturing system model comprising manufacturing tasks, factory resources, machine capacity and logistics parameters, and defining an optimization target of the model to minimize the total manufacturing span; Step2, forming a scheduling optimization problem in a cloud manufacturing system model into a partially observable Markov decision process; Step3, constructing a multi-agent deep reinforcement learning framework based on a centralized training and decentralized execution paradigm to solve part of observable Markov decision process, wherein the framework comprises an agent strategy network and a value mixed network; Step4, training in a discrete event simulation environment based on the constructed multi-agent deep reinforcement learning framework to optimize network parameters of the framework; Step5, deploying the intelligent agent strategy network trained in the Step 4 to a cloud manufacturing platform for real-time scheduling decision of newly arrived orders.
2. The cloud manufacturing scheduling method based on discrete event simulation and multi-agent reinforcement learning according to claim 1, wherein in Step 1: the manufacturing tasks include decomposing each manufacturing order into a plurality of sequentially executed sub-tasks, each sub-task defining a process type and a standard process time; factory resources and machine capabilities include distributed factories and machines in each factory; the logistics parameters comprise geographical position coordinates of factories and warehouses and are used for calculating the transportation distance and corresponding transportation time of the subtasks among different factories; the optimization objective is to minimize the total manufacturing span, which is defined as the total time span from the beginning of the process to the completion of the process by the last sub-task and back to the warehouse.
3. The cloud manufacturing scheduling method based on discrete event simulation and multi-agent reinforcement learning of claim 1, wherein in Step2, a part of observable markov decision process is defined as a tuple < O, S, a, R, Y, γ >, wherein: Each manufacturing task is modeled as an independent agent; a local observation space O is local information which can be acquired by each intelligent agent at the decision time; The global state space S is formed by vectorization splicing of residual time vectors of all current machining tasks of the machine and all local observations of the intelligent agents; Action space A and masking mechanism, wherein the action of each agent is selected as one factory or waiting from the platform factory set, and the selectable actions are constrained by binary mask vectors; the state transfer function Y is driven by manufacturing completion events and logistics completion events in a discrete event simulation environment, so that system state updating is realized; The rewarding function R is that in each decision step, the rewarding value is a negative value of the simulation clock propulsion between two adjacent decisions; Discount factor gamma is used to adjust the weight of future awards in the jackpot.
4. The cloud manufacturing scheduling method based on discrete event simulation and multi-agent reinforcement learning according to claim 1, wherein in Step3, the agent policy network is a deep neural network, which calculates an action value estimation vector of each agent based on an observation matrix, and the value mixing network is a mixed network satisfying monotonicity constraint, which calculates a global total action value from the action value estimation vector and a global state of each agent.
5. The cloud manufacturing scheduling method based on discrete event simulation and multi-agent reinforcement learning according to claim 4, wherein the agent strategy network adopts an encoder-decoder structure, the encoder processes observation matrixes of all agents through a multi-head self-attention mechanism to generate a feature matrix fused with information of all agents, and the decoder takes feature vectors corresponding to all agents in the feature matrix as input, calculates through a feedforward neural network and outputs motion value estimation vectors of all agents.
6. The cloud manufacturing scheduling method based on discrete event simulation and multi-agent reinforcement learning according to claim 4, wherein the value mixing network extracts scalar values of corresponding actions from action value estimation vectors of all agents according to a greedy strategy during training, forms a selected action value vector, takes the selected action value vector and a global state as input, and calculates global total action value through a feedforward network, wherein weights of the feedforward network are generated by a super-network according to the global state, and take absolute values.
7. The cloud manufacturing scheduling method based on discrete event simulation and multi-agent reinforcement learning according to claim 1, wherein the specific process of Step4 is as follows: 1) At each decision point, based on the current observation and action mask of each agent, calculating action value estimation of each agent by using strategy network, and selecting action from the effective action space of each agent according to exploration strategy and decision sequence to form joint action 2) The operation-event mapping is to map the factory allocation operation of each agent in the combined operation into a manufacturing event or a logistics event according to the sequence position of the current sub-task of the agent in the task to which the agent belongs, and record the manufacturing event or the logistics event in a manufacturing event list or a logistics event list correspondingly; 3) Determining and executing the event with the shortest remaining time based on the manufacturing event list or the logistics event list, pushing the simulation clock, and updating the event list, the machine state and the task state according to the event; 4) And (3) algorithm training, namely under a centralized training frame, sampling historical data from an experience playback pool, calculating loss based on a time sequence difference target, and updating parameters of a strategy network and a value mixed network through a gradient descent method.

Description

Cloud manufacturing scheduling method based on discrete event simulation and multi-agent reinforcement learning Technical Field The invention belongs to the technical field of intelligent manufacturing and cloud manufacturing service scheduling, and particularly relates to a cloud manufacturing scheduling method based on discrete event simulation and multi-agent reinforcement learning. Background The cloud manufacturing platform provides on-demand manufacturing services to users by integrating distributed manufacturing resources. In practical applications, customer orders are often broken down into sub-tasks that are scheduled to be executed on manufacturing resources at different factories. How to efficiently combine manufacturing services and optimize task scheduling orders is a core challenge in cloud manufacturing systems. The existing research methods mainly comprise scheduling rules, meta heuristic algorithms and reinforcement learning-based methods. In recent years, a reinforcement learning-based method is widely applied to the problem of cloud manufacturing service combination optimization, but most of researches adopt a paradigm of 'sequencing first and selecting later', namely, defining a scheduling sequence of sub-tasks randomly or in a certain rule in advance, and then selecting manufacturing services for each sub-task. The method has the obvious defects that the scheduling sequence is fixed, the search space is limited, the local optimization is easy to fall into, the service selection and the task scheduling are manually decoupled, and the global optimization is difficult to realize. In addition, it is difficult for a single agent perspective to effectively capture the competing and collaborative relationship of manufacturing resources between multiple tasks. Therefore, there is a need for a reinforcement learning method that breaks the fixed sequence constraints, enables joint optimization of service selection and task scheduling, and effectively coordinates multi-task decisions. Disclosure of Invention Aiming at the problems in the prior art, the invention aims to provide a cloud manufacturing scheduling method based on discrete event simulation and multi-agent reinforcement learning, which models each manufacturing task as an independent agent and embeds a decision process thereof into a discrete event simulation environment so as to realize synchronous joint optimization of service selection and task scheduling. The invention provides a cloud manufacturing scheduling method based on discrete event simulation and multi-agent reinforcement learning, which comprises the following steps: Step1, establishing a cloud manufacturing system model comprising manufacturing tasks, factory resources, machine capacity and logistics parameters, and defining an optimization target of the model to minimize the total manufacturing span; Step2, forming a scheduling optimization problem in a cloud manufacturing system model into a partially observable Markov decision process; Step3, constructing a multi-agent deep reinforcement learning framework based on a centralized training and decentralized execution paradigm to solve part of observable Markov decision process, wherein the framework comprises an agent strategy network and a value mixed network; Step4, training in a discrete event simulation environment based on the constructed multi-agent deep reinforcement learning framework to optimize network parameters of the framework; Step5, deploying the intelligent agent strategy network trained in the Step 4 to a cloud manufacturing platform for real-time scheduling decision of newly arrived orders. Further, in Step1: the manufacturing tasks include decomposing each manufacturing order into a plurality of sequentially executed sub-tasks, each sub-task defining a process type and a standard process time; factory resources and machine capabilities include distributed factories and machines in each factory; the logistics parameters comprise geographical position coordinates of factories and warehouses and are used for calculating the transportation distance and corresponding transportation time of the subtasks among different factories; the optimization objective is to minimize the total manufacturing span, which is defined as the total time span from the beginning of the process to the completion of the process by the last sub-task and back to the warehouse. Further, in Step2, the partially observable markov decision process is defined as the tuple < O, S, a, R, Y, γ >, wherein: Each manufacturing task is modeled as an independent agent; a local observation space O is local information which can be acquired by each intelligent agent at the decision time; The global state space S is formed by vectorization splicing of residual time vectors of all current machining tasks of the machine and all local observations of the intelligent agents; Action space A and masking mechanism, wherein the action of each agent is sele