CN-113496347-B - Apparatus and method for scheduling a set of jobs for a plurality of machines
Abstract
Apparatus and methods are provided for scheduling a set of jobs for a plurality of machines. Method for scheduling a set of jobs for a plurality of machines, wherein each job is defined by at least one feature characterizing the processing time of the job, wherein if any machine is idle, one job is selected from the set of jobs to be carried out by the machine and is scheduled for the machine, wherein the jobs are selected in that a graphical neural network receives as input the set of jobs and at least the current state of the idle machine, the graphical neural network outputs rewards for each job if started on the machine, the state is input into the graphical neural network, and the job for the idle machine is selected in dependence of the graphical neural network output.
Inventors
- A. Tattler
- C. DANIEL
- D. D. Castro
- F. M. Richter
- J. Olen
- M. Lefarov
- N. M. dizbin
- Z. Feldman
Assignees
- 罗伯特·博世有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20210401
- Priority Date
- 20200403
Claims (9)
- 1. A method for scheduling a set of jobs on a plurality of machines, each job defined by at least one characteristic characterizing a processing time of the job, the method comprising: When any of the plurality of machines is an idle machine: selecting a job from a set of jobs to be performed by the idle machine, Scheduling the selected job on the idle machine, wherein each of the plurality of machines is a cutter for cutting or a gun drill for drilling in the manufacturing system, Wherein the job is selected as follows: The graph neural network receives as input a bipartite and undirected graph comprising two disjoint sets of independent nodes and a set of edges, wherein a first set of nodes comprises at least a current state of an idle machine and a second set of nodes comprises a set of jobs, and the edges characterize connections between nodes of the first set of nodes and nodes of the second set of nodes and are undirected, The graphic neural network outputs rewards for each job initiated on at least one of the plurality of machines, the status of which is input to the graphic neural network, and Selecting a job for the idle machine depending on rewards output by the graph neural network to perform the selected job with minimized total completion time, wherein parameterization of the graph neural network is optimized by deep Q network learning And wherein, for selecting a subsequent job, a Monte Carlo tree search is applied, wherein the Monte Carlo tree search iteratively builds a search tree starting from a current state of each of the plurality of machines and the set of jobs, wherein, for expanding the search tree, an output reward of the graph neural network is used as a search heuristic, wherein the subsequent job involves physical execution of the task by the idle machine, and wherein the subsequent job is selected depending on the search tree, and The generated control signal is used to control the idle machine to perform physical execution of cutting or drilling in subsequent jobs, Performing physical execution of the subsequent job by the idle machine, wherein the idle machine comprises a robot, wherein the control signal is generated in accordance with the subsequent job, and wherein performing comprises applying the control signal to cause the robot to perform physical execution of the subsequent job.
- 2. The method of claim 1, wherein the jobs are defined by further features characterizing one of a plurality of classes, wherein each of the plurality of classes characterizes a corresponding setting for a plurality of machines required to perform the corresponding job, wherein a current state of each machine also characterizes its current setting, and wherein the graph neural network receives further input that is a time penalty when a machine has to be reconfigured by changing the setting of any machine due to two immediately successive jobs for different classes of the same machine.
- 3. The method of claim 1, wherein additional jobs arrive at a particular point in time and additional jobs are added to the job set.
- 4. The method of claim 1, wherein parameterization of the graph neural network is optimized by reinforcement learning 。
- 5. The method of claim 1, wherein to expand the search tree, an action is selected based on a confidence cap.
- 6. The method of claim 1, wherein the search tree is pruned by considering only a predetermined number of jobs having a highest Q value for at least one test run.
- 7. The method of claim 1, wherein the status of each of the plurality of machines is determined in dependence upon the received sensor signal, wherein the status of each of the plurality of machines further comprises a current load status and/or a parameter indicative of maintenance requirements of a respective one of the plurality of machines.
- 8. A non-transitory machine-readable storage medium having stored thereon a computer program for scheduling a set of jobs on a plurality of machines, each job defined by at least one feature characterizing a processing time of the job, the computer program when executed by a processor causes the processor to perform: When any of the plurality of machines is an idle machine: selecting a job from a set of jobs to be performed by the idle machine, Scheduling the selected job on the idle machine, wherein each of the plurality of machines is a cutter for cutting or a gun drill for drilling in the manufacturing system, Wherein the job is selected as follows: The graph neural network receives as input a bipartite and undirected graph comprising two disjoint sets of independent nodes and a set of edges, wherein a first set of nodes comprises at least a current state of an idle machine and a second set of nodes comprises a set of jobs, and the edges characterize connections between nodes of the first set of nodes and nodes of the second set of nodes and are undirected, The graphic neural network outputs rewards for each job initiated on at least one of the plurality of machines, the status of which is input to the graphic neural network, and Selecting a job for the idle machine depending on rewards output by the graph neural network to perform the selected job with minimized total completion time, wherein parameterization of the graph neural network is optimized by deep Q network learning And wherein, for selecting a subsequent job, a Monte Carlo tree search is applied, wherein the Monte Carlo tree search iteratively builds a search tree starting from a current state of each of the plurality of machines and the set of jobs, wherein, for expanding the search tree, an output reward of the graph neural network is used as a search heuristic, wherein the subsequent job involves physical execution of the task by the idle machine, and wherein the subsequent job is selected depending on the search tree, and The generated control signal is used to control the idle machine to perform physical execution of cutting or drilling in subsequent jobs, Performing physical execution of the subsequent job by the idle machine, wherein the idle machine comprises a robot, wherein the control signal is generated in accordance with the subsequent job, and wherein performing comprises applying the control signal to cause the robot to perform physical execution of the subsequent job.
- 9. A system configured to schedule a set of jobs on a plurality of machines, each job defined by at least one feature characterizing a processing time of the job, the system configured to: When any of the plurality of machines is an idle machine: selecting a job from a set of jobs to be performed by the idle machine, Scheduling the selected job on the idle machine, wherein each of the plurality of machines is a cutter for cutting or a gun drill for drilling in the manufacturing system, Wherein the job is selected as follows: The graph neural network receives as input a bipartite and undirected graph comprising two disjoint sets of independent nodes and a set of edges, wherein a first set of nodes comprises at least a current state of an idle machine and a second set of nodes comprises a set of jobs, and the edges characterize connections between nodes of the first set of nodes and nodes of the second set of nodes and are undirected, The graphic neural network outputs rewards for each job initiated on at least one of the plurality of machines, the status of which is input to the graphic neural network, and Selecting a job for the idle machine depending on rewards output by the graph neural network to perform the selected job with minimized total completion time, wherein parameterization of the graph neural network is optimized by deep Q network learning And wherein, for selecting a subsequent job, a Monte Carlo tree search is applied, wherein the Monte Carlo tree search iteratively builds a search tree starting from a current state of each of the plurality of machines and the set of jobs, wherein, for expanding the search tree, an output reward of the graph neural network is used as a search heuristic, wherein the subsequent job involves physical execution of the task by the idle machine, and wherein the subsequent job is selected depending on the search tree, and The generated control signal is used to control the idle machine to perform physical execution of cutting or drilling in subsequent jobs, Performing physical execution of the subsequent job by the idle machine, wherein the idle machine comprises a robot, wherein the control signal is generated in accordance with the subsequent job, and wherein performing comprises applying the control signal to cause the robot to perform physical execution of the subsequent job.
Description
Apparatus and method for scheduling a set of jobs for a plurality of machines Technical Field The present invention relates to a job scheduler for a machine, in particular a production machine, for continuously selecting jobs via a neural network. The invention further relates to a computer program and a machine readable storage medium and a system configured to implement the job scheduler. Background Prior Art Optimization of throughput or resource utilization is challenging and has the potential for significant cost savings in manufacturing plants and significant potential for increased productivity. One challenge is the combined nature of the problems. There are many possible schedules and the problem of finding the best schedule is often NP-hard. Another challenge is its online (or dynamic) nature. Additional tasks arrive during the current scheduled execution. The nature of the arriving jobs is not known until they arrive. Mao et al use reinforcement learning algorithm REINFORCE embedded with graph convolution for job scheduling on clusters. THE ADVANTAGES OF THE PRESENT INVENTION The present invention discloses a job scheduler based on reinforcement learning and preferably further based on global search, more precisely by combining a graph rolling network trained with DQN (deep Q network) algorithm and with MCTS (monte carlo tree search). The advantages are given by (1) the job scheduler has good performance when scheduling decisions need to be made quickly. In this case, there is no time to search and it is suggested to create a schedule based on only the value network of the DQN. Experiments have shown that the scheduling created by the DQN value network has been superior to standard heuristics for scheduling, such as rule-based scheduling, e.g. weighted shortest processing time heuristics. (2) The job scheduler may fully utilize any available time for interrogation. When some time is available for optimal scheduling via MCTS. Since MCTS is an anytime algorithm, it can stop at any time and produce an output schedule. Because the output of the value network of the DQN is used as a search heuristic, the scheduling of the output is at least as good as the scheduling generated by the value network alone. The longer the MCTS is running, the output continues to improve. Given enough time, the MCTS can reach optimal scheduling. (3) The job scheduler may take into account information about how additional tasks arrive over time. Disclosure of Invention In a first aspect of the invention, a method for scheduling a plurality of jobs is disclosed. Preferably, it is a computer-implemented method. In particular, the allocation and/or ordering of work within a time slot to an industrial actuator that is a machine may be referred to as scheduling. The job set contains all jobs that have to be executed and are currently waiting to be executed. A job may be understood as a task that has to be performed by a machine. Multiple machines may be the same or similar machines and may operate in parallel. The machines may all be in one plant or distributed within several plants. Each job is defined by at least a first characteristic characterizing a job processing time. The processing time may be the duration that the machine needs in order to fulfill the corresponding job. Optionally, the second feature characterizes the importance of the job. If any machine is idle, a job is selected from a set of jobs to be performed by the machine and scheduled for the machine. Ideally, the jobs are selected such that the total completion time of the job set is minimized. Under an idle machine, it is understood that the machine has completed its current job/the current job is canceled and the machine is able to begin a new job or the machine has not begun any jobs. If a machine performs a job, the corresponding state of the machine may be referred to as occupied. The job is selected as follows. The Graph Neural Network (GNN) receives as input a set of jobs and a current state of at least an idle machine. GNNs are trained to estimate rewards that will be achieved when a job is performed such that the total completion time is minimized. More precisely, GNNs are trained to output high rewards for small completion times of job sets. The current state of the machine may include the current settings and the remaining processing time. The state of the machine may further include, for example, the class for which they are currently set, whether they are currently occupied by a job, and if so, the remaining processing time. Alternatively, the duration until the next maintenance operation may be carried out. It is also possible that the GNN receives as input the status of all machines. The irrelevant machines and jobs are then preferably masked off, for example by zero. If the assumption is started on the machine, the GNN outputs a reward for the job set, particularly the expected cumulative discount, and the status is entered into the GNN. It can