CN-122022304-A - Distributed non-replacement dynamic flow shop scheduling optimization method based on deep reinforcement learning

CN122022304ACN 122022304 ACN122022304 ACN 122022304ACN-122022304-A

Abstract

The invention discloses a distributed non-replacement dynamic flow shop scheduling optimization method based on deep reinforcement learning, and belongs to the technical field of distributed green production scheduling. Aiming at the difficult problem of multi-target dynamic scheduling in a distributed heterogeneous environment, the method aims at minimizing the maximum finishing time, the total energy consumption and the delay cost. Firstly, constructing a workshop state heterograph model, extracting topological features of processes and machines by using a graph attention network, secondly, constructing a PPO decision model combined with an action mask mechanism, outputting a scheduling action meeting physical constraints, and finally, excavating energy-saving potential of non-critical processes by using a post-processing strategy based on a critical path. The invention can respond to dynamic disturbance such as emergency bill insertion and the like in real time, effectively balance production efficiency and green energy-saving indexes, and realize efficient utilization of workshop resources.

Inventors

ZHAN YAN
FU XINBO
Fang Zhouqi
CHEN QINGFENG
TANG HONGTAO
LU JIANXIA
Liu Saimiao
YANG WENJIE
SHEN LE
WU MINGXUAN

Assignees

浙江工业大学

Dates

Publication Date: 20260512
Application Date: 20260123

Claims (7)

1. A multi-target distributed non-replacement flow shop scheduling method based on deep reinforcement learning is characterized by comprising the following steps: s1, environment modeling and initialization, constructing a multi-objective distributed scheduling model considering heterogeneous factories and dynamic events, establishing optimization objectives and constraint conditions, and initializing production environment parameters; s2, constructing a heterogeneous graph state, modeling a workshop state which is dynamically changed into a process and machine heterogeneous graph, and representing a process constraint and resource allocation relation through node continuous edge; Step S3, feature extraction and embedding, wherein a heterogeneous graph is embedded by using a graph attention network, and high-dimensional features reflecting the global state are extracted by aggregating neighborhood information; s4, constructing a decision model, constructing an Actor-Critic architecture PPO model, and outputting process machine joint action distribution and state value estimation; s5, model training and updating, namely updating network parameters by utilizing a clipping objective function back propagation based on multi-objective hybrid rewards and generalized advantage estimation; And S6, generating a scheduling scheme, responding to the dynamic event in real time, inputting an instance to be solved for model reasoning, and decoding an output action to generate a final procedure ordering and machine allocation scheme.
2. The method for optimizing distributed non-replacement dynamic flow shop scheduling based on deep reinforcement learning according to claim 1, wherein the environment modeling and initializing in the step S1 constructs a multi-objective distributed scheduling model considering heterogeneous factories and dynamic events, establishes optimization objectives and constraint conditions, and initializes production environment parameters, and specifically comprises: step S1-1, defining a basic set of scheduling problems, wherein the basic set comprises a workpiece set to be processed, a factory set which is distributed at different geographic positions and has heterogeneous processing capability, and a process set corresponding to each workpiece, and loading standard processing time of each process under different heterogeneous factories, energy consumption of each machine and delivery date data of the workpieces; Step S1-2, defining a dynamic event type, wherein the dynamic event at least comprises random arrival of an emergency order, and setting the emergency order to have priority weight higher than that of a common order; Step S1-3, establishing a multi-dimensional optimization target, wherein the target aims at simultaneously minimizing three performance indexes, namely maximum finishing time reflecting production efficiency, total energy consumption reflecting green manufacturing level and total delay cost reflecting customer satisfaction; and S1-4, setting rigid constraint conditions, wherein the rigid constraint conditions comprise non-preemptive constraint that each process of the same workpiece is required to be processed strictly according to the sequence of a process route, one machine can only process one workpiece at any time, and the workpiece cannot be interrupted once the processing is started.
3. The method for dispatching and optimizing distributed non-replacement dynamic flow shop based on deep reinforcement learning according to claim 1, wherein the heterogeneous graph state construction in the step S2 models the shop state as a procedure and a machine heterogeneous graph, and characterizes the relation between the process constraint and the resource allocation by node continuous edge, and specifically comprises the following steps: step S2-1 at any decision time Defining an iso-pattern based on the current arriving workpiece and machine state Wherein the node sets Comprising a process node set And a set of machine nodes ; Step S2-2 defining node characteristic mapping function Wherein In order to normalize the standard processing time, Is a workpiece Is used for the number of the remaining steps of (a), In order to be able to deliver the lead time, For the identification of the process state, For workpiece urgency type identification (normal or urgent), A priority value for the workpiece; Step S2-3 defining machine node characteristics Wherein Is a machine Is used for the current time of completion of the process, In order to accumulate the power consumption of the energy, Running speed class index currently set for machine , A machine capacity factor to characterize the heterogeneous plant; step S2-4, constructing edge set Comprises a tightly-front constraint edge connecting adjacent procedures of the same workpiece Compatible edge of connection procedure and optional machine thereof Full connection edge connecting all machine nodes 。
4. The method for optimizing multi-objective distributed non-replacement flow shop scheduling based on deep reinforcement learning according to claim 1, wherein the feature extraction and embedding in step S3 embeds the heterogeneous graph by using a graph attention network, and the aggregated neighborhood information extracts high-dimensional features reflecting a global state, and specifically comprises: Step S3-1, utilizing a learnable weight matrix Performing linear transformation on the node characteristics to obtain initial embedding Wherein ; Step S3-2 computing node With its neighbors Attention coefficient between : Wherein the method comprises the steps of In order to pay attention to the weight vector, Representing vector stitching; step S3-3 of utilizing The function normalizes the attention coefficient to obtain And based on Multi-head Attention mechanism (Multi-head Attention) aggregation features: Wherein the method comprises the steps of In order to pay attention to the number of heads, As a nonlinear activation function, through Layer stack output final node embedded matrix 。
5. The method for optimizing multi-objective distributed non-replacement flow shop scheduling based on deep reinforcement learning according to claim 1, wherein the decision model construction in the step S4 constructs an Actor-Critic architecture PPO model, and the output process machine joint motion distribution and state value estimation specifically comprises: S4-1, constructing an Actor strategy network Receiving node embedding Output action Logits vector of (v) In which the motion is Defined as a selection procedure And assigned to machines ; Step S4-2 construction of an action mask vector For actions that violate process constraints (e.g., pre-process runs out) or machine constraints (e.g., machine is not available) Setting up Otherwise ; Step S4-3, calculating the masked joint action probability distribution: S4-4, constructing Critic value network Compression of graph features into graph-level state vectors by global averaging pooling And outputs state value estimates 。
6. The method for optimizing multi-objective distributed non-replacement flow shop scheduling according to claim 1, wherein the model training and updating in the step S5 is based on multi-objective hybrid rewards and generalized advantage estimation, and updating network parameters by using clipping objective functions in a back propagation manner, specifically comprising: step S5-1 defining a multi-objective hybrid instant prize : Wherein the method comprises the steps of The index reduction amount is indicated by the index reduction amount, As the weight coefficient of the light-emitting diode, The method comprises the steps of punishing an amplification factor for the early stage of an emergency order, and guiding a model to schedule high-priority emergency workpieces preferentially; step S5-2, calculating a dominance function using generalized dominance estimation : S5-3, constructing a PPO total loss function And optimized by random gradient descent: Wherein the method comprises the steps of For the new and old policy ratio, Is a policy entropy term.
7. The method for optimizing multi-objective distributed non-replacement flow shop scheduling based on deep reinforcement learning according to claim 1, wherein the scheduling scheme in the step S6 is generated, the input to-be-solved instance is subjected to model reasoning, and the output actions are decoded to generate a final process sequencing and machine allocation scheme, and the method specifically comprises the following steps: S6-1, reasoning an instance by using a trained strategy network, decoding to generate an initial scheduling scheme, and constructing a directed acyclic graph; Step S6-2, calculating the earliest starting time of each process by using the critical path method And latest completion time Identifying a set of critical processes ; Step S6-3 for a procedure on a non-critical path Calculate its time float ; Step S6-4, when meeting Is to adjust the machine under the constraint of (a) Is of the operating speed class of (2) Selecting an optimal energy efficiency level : Wherein the method comprises the steps of And Machines respectively At the speed level The power and the processing speed of the device are lower, The amount of process work is such that the total energy consumption is minimized while maintaining the maximum finishing time unchanged.

Description

Distributed non-replacement dynamic flow shop scheduling optimization method based on deep reinforcement learning Technical Field The invention belongs to the technical field of distributed green production scheduling, and particularly relates to a dynamic distributed non-replacement flow shop scheduling optimization method based on deep reinforcement learning. Background With the transition from manufacturing industry to intensification and intellectualization, distributed heterogeneous flow shops have become the dominant form of modern production. Unlike a homogenous plant, machines in heterogeneous environments have significant differences in processing speed and energy consumption characteristics, which makes matching decisions of workpieces to machines critical. On the basis, a non-replacement scheduling strategy is introduced, namely, the processing sequence of the workpiece on a subsequent machine is allowed to be flexibly adjusted according to a real-time state, and the solution space is exponentially exploded although the efficiency bottleneck of the traditional replacement scheduling can be broken through. In addition, the actual production requirements are balanced among three conflicting objectives of minimizing maximum finishing time, reducing total energy consumption and reducing total downtime costs, and it is desirable to have real-time rescheduling capability to cope with dynamic disturbances such as emergency slips. Therefore, the research on the dynamic multi-target scheduling method for efficiently processing the heterogeneous resource matching and the non-replacement time sequence constraint has important significance for improving the flexibility and the core competitiveness of the manufacturing system. In view of the above-mentioned high-dimensional and complex scheduling problem, the prior art scheme still faces specific challenges in practical application. Accurate algorithms such as mixed integer linear programming are limited by the huge search space of non-permutation scheduling, and medium and large scale calculation cases are difficult to solve. The genetic algorithm and other meta-heuristic algorithms are widely applied, but the iterative search mechanism is long in time consumption, difficult to meet the real-time response requirement under the emergency single-interpolation scene, and easy to trap local convergence in multi-objective optimization. More importantly, the existing deep reinforcement learning method mostly adopts a fixed length vector to represent a state, omits the connection topology among heterogeneous machines and strict pre-process constraint among working procedures, so that the model is difficult to perceive the graph structural characteristics of workshops, and the generated scheduling scheme is low in feasibility or difficult to obtain effective balance among multiple targets. Disclosure of Invention In view of the problems of low solving efficiency and poor generalization performance commonly existing in the prior art for solving the scheduling problem of the distributed non-replacement dynamic flow shop, the invention provides a scheduling optimization method of the distributed non-replacement dynamic flow shop based on deep reinforcement learning, which aims at optimizing the minimum maximum finishing time, total energy consumption and total delay cost, trains an intelligent decision model by combining a graph injection force network with a near-end strategy optimization algorithm, and realizes efficient, green and low-delay collaborative optimization scheduling of the production process of the distributed heterogeneous shop with an emergency plug sheet. In order to achieve the above purpose, the present invention provides the following technical solutions: A distributed non-replacement dynamic flow shop scheduling optimization method based on deep reinforcement learning comprises the following steps: s1, environment modeling and initialization, constructing a multi-objective distributed scheduling model considering heterogeneous factories and dynamic events, establishing optimization objectives and constraint conditions, and initializing production environment parameters; s2, constructing a heterogeneous graph state, modeling a workshop state which is dynamically changed into a process and machine heterogeneous graph, and representing a process constraint and resource allocation relation through node continuous edge; Step S3, feature extraction and embedding, wherein a heterogeneous graph is embedded by using a graph attention network, and high-dimensional features reflecting the global state are extracted by aggregating neighborhood information; s4, constructing a decision model, constructing an Actor-Critic architecture PPO model, and outputting process machine joint action distribution and state value estimation; s5, model training and updating, namely updating network parameters by utilizing a clipping objective function back propagation based o