CN-121764010-B - Novel flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning

CN121764010BCN 121764010 BCN121764010 BCN 121764010BCN-121764010-B

Abstract

The invention discloses a novel flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning, which relates to the technical field of intelligent manufacturing and comprises the steps of constructing a state space driven by a cross-domain knowledge graph, fusing process knowledge and equipment capability to perform semantic embedding, establishing an agent cooperative network for space-time heterographing self-adaptive evolution, dynamically adjusting a topological structure according to task dependence and equipment coupling degree, adopting a progressive cooperative value decomposition network to perform strategy optimization, accurately realizing credit allocation, constructing a multi-scale closed-loop compensation decision mechanism, and forming deep coupling and closed-loop feedback at an equipment layer, a production line layer and a system layer.

Inventors

TIAN XINKAI
YANG HONGBING
WANG ZIYAN
Lian Zihan

Assignees

苏州大学

Dates

Publication Date: 20260508
Application Date: 20260304

Claims (10)

1. A novel flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning is characterized by comprising the following steps: acquiring original target data in a flexible manufacturing system, and constructing a state space driven by a cross-domain knowledge graph based on the original target data; mapping manufacturing units in the equipment into an agent, and constructing an agent cooperative network of a space-time heterograph topological structure by taking the agent as a node; acquiring global situation characterization of the intelligent agent from the intelligent agent cooperative network, and combining local observation information of the intelligent agent to make an intelligent agent decision to generate a scheduling action; Executing the scheduling action, calculating action rewards based on a preset global cost function, and constructing a corresponding experience sample, wherein the experience sample comprises a current state, the scheduling action, the action rewards, a next state and a completion mark; and weighting and sampling the high-value experience samples by adopting a priority experience playback mechanism, and updating the strategy parameters of the intelligent agent by utilizing the progressive target network.
2. The flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning according to claim 1, wherein the raw target data includes equipment running state, process constraint relation and task demand information; The constructing a state space of a cross-domain knowledge graph driver based on the original target data comprises the following steps: mapping the equipment running state, the procedure constraint relation and the task demand information in the flexible manufacturing system into initial feature vectors; and carrying out semantic embedding on the initial feature vector based on a process knowledge base and a device capability map, generating a state representation vector fusing process knowledge through a heterogeneous relation propagation algorithm, and constructing a multi-level state space according to the state representation vector.
3. The flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning according to claim 2, wherein the semantic embedding of the initial feature vector based on the process knowledge base and the equipment capability map, and the generating of the state characterization vector fusing the process knowledge through the heterogeneous relation propagation algorithm comprises: Extracting the process relation types among the processes from a process knowledge base, and constructing a process relation triplet, namely a process A, a process relation type and a process B; Extracting capability attributes of all the devices and compatibility relations among the devices from a device capability map, and constructing a device relation triplet, wherein the capability attributes comprise processing capability and precision grade, and the processing capability comprises a process type executable by the device; Combining the process relation triplet and the equipment relation triplet into a knowledge triplet, and constructing a knowledge triplet set; Performing entity embedding and relation embedding on the knowledge triplet set by adopting a knowledge graph embedding algorithm, and performing multi-hop neighborhood aggregation on the embedded vectors through a graph convolution network to obtain entity characterization vectors fused with semantic information; and performing cross-modal alignment on the initial feature vector and the entity characterization vector, calculating semantic similarity between the initial feature vector and the entity characterization vector through an attention mechanism, and performing weighted fusion on the initial feature vector according to the semantic similarity to generate a state characterization vector of fusion process knowledge.
4. The flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning according to claim 1, wherein the agent collaborative network for constructing a space-time heterograph topology by using the agent as a node further comprises: calculating dynamic weights among the intelligent agents based on the task dependence and the equipment coupling degree, and adaptively adjusting communication edges of the space-time heterograph topological structure according to the dynamic weights; and carrying out bidirectional self-attention coding on the node characteristics and the edge characteristics in the space-time heterograph topological structure to generate the intelligent agent global situation representation fusing the space-time information.
5. The flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning of claim 4, wherein the calculating the dynamic weights between agents based on the task dependencies and the device coupling degree comprises: Task information currently executed by each intelligent agent is acquired, and task dependency among the intelligent agents is determined based on process front-back sequence relations in the task information, wherein if a direct process dependency relation exists between two intelligent agents, the task dependency degree takes a preset high weight value, and if an indirect process dependency relation exists between the two intelligent agents, the task dependency degree takes a preset medium weight value; Acquiring physical layout information and resource sharing information of equipment to which each intelligent agent belongs, determining physical distance between the equipment based on the physical layout information, and determining preliminary equipment coupling degree based on the principle that the closer the distance is, the higher the equipment coupling degree is, determining a weight factor of the equipment coupling degree based on the resource sharing information, and multiplying the preliminary equipment coupling degree by the weight factor to obtain final equipment coupling degree; carrying out nonlinear fusion on the task dependence and the equipment coupling degree to obtain dynamic weights among the intelligent agents; the adaptively adjusting the communication edge of the space-time iso-composition topological structure according to the dynamic weight comprises the following steps: setting a dynamic weight threshold, establishing communication edges between nodes of corresponding intelligent agents when the dynamic weight is larger than the dynamic weight threshold, and deleting the communication edges between the nodes of the corresponding intelligent agents when the dynamic weight is smaller than or equal to the dynamic weight threshold.
6. The flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning of claim 4, wherein the bi-directional self-attention encoding of node features and edge features in the spatiotemporal heterograph topology comprises: Updating node characteristics according to the task execution progress of the intelligent agent, wherein the node characteristics comprise equipment state vectors, task queue vectors and performance index vectors, performing time sequence coding on the node characteristics, and capturing the time evolution rule of the node characteristics; Updating edge characteristics according to the material circulation and resource competition relationship among the intelligent agents, wherein the edge characteristics comprise communication delay, cooperative strength and conflict probability, performing spatial coding on the edge characteristics, and capturing a spatial interaction mode among the intelligent agents; the generating the intelligent agent global situation representation of the fusion space-time information comprises the following steps: and inputting the updated node characteristics and the updated edge characteristics into a space-time diagram neural network, and realizing information aggregation among the agents through a message transmission mechanism to generate the global situation representation of the agents fusing space-time information.
7. The flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning according to claim 1, wherein the global cost function is a desired cumulative prize of the combined actions of all agents in the current state, and the value is the sum of the local cost functions of the agents plus the collaborative cost function among the agents; the local value function is calculated through independent value networks of the intelligent agents, the value networks are input into local observation information of the intelligent agents, and the local value functions are output into estimated values of scheduling actions of the intelligent agents; The collaborative value function is calculated through a mixed network, the mixed network input is the local value function output and global situation representation of all the agents, mixed weights are generated through a super network, the mixed weights carry out weighted combination on the local value functions of the agents, nonlinear collaborative terms are introduced to capture the collaborative effect among the agents, and the collaborative value function among the agents is generated.
8. The flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning of claim 1, wherein the weighted sampling of high-value experience samples using a preferential experience playback mechanism comprises: Storing experience samples generated by the interaction of the intelligent agents into an experience playback pool; Calculating a priority weight for each experience sample, wherein the priority weight is determined based on the absolute value of the time difference error, and the experience sample with large time difference error has higher priority weight; During training, weighting and sampling the experience samples according to the priority weights, wherein the sampling probability is in direct proportion to the priority weights; The updating of the policy parameters of the agent by using the progressive target network comprises: And calculating a strategy updating gradient through a time difference error, and introducing importance sampling weight to correct the strategy updating gradient, wherein the importance sampling weight is inversely proportional to the sampling probability.
9. The multi-agent reinforcement learning based flexible manufacturing system dynamic scheduling method of claim 1, further comprising continuously monitoring the flexible manufacturing system and identifying disturbance events; When a disturbance event is detected, a local adjustment strategy is generated at a device level, a global rescheduling scheme is generated at a production line level, a resource reconfiguration instruction is generated at a system level, the local adjustment strategy, the global rescheduling scheme and the resource reconfiguration instruction generate a comprehensive compensation decision through a multi-scale closed-loop fusion mechanism, and the comprehensive compensation decision is cooperated with a normal scheduling strategy after feasibility verification to form a complete self-adaptive scheduling instruction sequence and is executed.
10. The multi-agent reinforcement learning based flexible manufacturing system dynamic scheduling method of claim 9, wherein the continuously monitoring the flexible manufacturing system and identifying disturbance events comprises: Collecting real-time sensing data of each device in the flexible manufacturing system, preprocessing the real-time sensing data and extracting features to generate the current state of the device; Based on a long-short-term memory network, performing time sequence modeling on the equipment state in the normal operation mode, learning a state evolution rule in the normal operation mode, and calculating the deviation degree of the current state of the equipment and the equipment state in the normal operation mode; When the deviation exceeds a preset threshold, triggering a disturbance event detection flow, and identifying disturbance event types through multi-classifier fusion, wherein the disturbance event types comprise equipment faults, material shortage, process abnormality and order change; And determining a hierarchy needing to make a compensation decision according to the disturbance event type and the influence range, wherein the hierarchy comprises a device hierarchy, a production line hierarchy and a system hierarchy.

Description

Novel flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning Technical Field The invention relates to the technical field of intelligent manufacturing, in particular to a novel flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning. Background The flexible manufacturing system is a core carrier for realizing intelligent manufacturing, and has the characteristics of various production varieties, frequent batch change, reconfigurable equipment and the like. The traditional flexible manufacturing system scheduling method mainly adopts strategies such as mathematical planning, heuristic algorithm, regular scheduling and the like, and the methods are good in handling scheduling problems in a deterministic environment, but are difficult to quickly respond and adaptively adjust when facing dynamic disturbance and uncertainty. In recent years, artificial intelligence technology, particularly reinforcement learning technology, has great potential in the field of dynamic scheduling, and multi-agent reinforcement learning is an effective way for solving the scheduling problem of a complex flexible manufacturing system due to the distributed decision and collaborative optimization capability. The existing multi-agent reinforcement learning scheduling method still has a plurality of defects. Chinese patent No. CN120103803a discloses a flexible production line adaptive scheduling control method that incorporates multi-agent reinforcement learning, which maps devices to agents and establishes a local communication network by constructing a state space and an action space, and updates a state-action value function using an experience playback pool. However, this method has the problem that, first, the state space is constructed to lack the depth fusion of the process knowledge, and only the simple mapping is performed based on the original feature vector, so that the abundant process knowledge and equipment capability information in the manufacturing system cannot be effectively utilized, resulting in insufficient state characterization. And secondly, the intelligent agent communication network adopts a static or semi-static topological structure, and the communication mode cannot be adjusted in real time according to the dynamic change of tasks and the state of equipment, so that the cooperative efficiency among intelligent agents is low. Third, the cost function update adopts a traditional centralized or simple distributed method, so that the credit allocation problem among multiple agents cannot be effectively processed, and the individual contribution and the cooperative gain of each agent are difficult to accurately evaluate. Fourth, the compensation scheduling mechanism lacks multi-level closed-loop coordination, only performs compensation decision at a single level, and cannot form deep coupling and closed-loop feedback of the equipment layer, the production line layer and the system layer, so that disturbance recovery capability is insufficient. Disclosure of Invention The invention aims to overcome the defects in the prior art, provides a novel flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning, solves the technical problems of insufficient state characterization, low agent coordination efficiency, inaccurate credit allocation and weak disturbance recovery capability in the prior art, and can realize intelligent self-adaptive dynamic scheduling of a flexible manufacturing system. In order to achieve the above purpose, the invention is realized by adopting the following technical scheme: the invention provides a novel flexible manufacturing system dynamic scheduling method based on multi-agent reinforcement learning, which comprises the following steps: acquiring original target data in a flexible manufacturing system, and constructing a state space driven by a cross-domain knowledge graph based on the original target data; mapping manufacturing units in the equipment into an agent, and constructing an agent cooperative network of a space-time heterograph topological structure by taking the agent as a node; acquiring global situation characterization of the intelligent agent from the intelligent agent cooperative network, and combining local observation information of the intelligent agent to make an intelligent agent decision to generate a scheduling action; Executing the scheduling action, calculating action rewards based on a preset global cost function, and constructing a corresponding experience sample, wherein the experience sample comprises a current state, the scheduling action, the action rewards, a next state and a completion mark; and weighting and sampling the high-value experience samples by adopting a priority experience playback mechanism, and updating the strategy parameters of the intelligent agent by utilizing the progressive target network. Optionally, the ori