CN-122022267-A - Multi-enterprise-oriented loosely-coupled MDP Lagrange collaborative optimization system and method

CN122022267ACN 122022267 ACN122022267 ACN 122022267ACN-122022267-A

Abstract

The invention discloses a multi-enterprise-oriented loosely coupled MDP Lagrange collaborative optimization system and method, and belongs to the technical field of port supply chain collaborative scheduling. According to the invention, an optimization framework combining an LC-MDP sub-model, lagrange relaxation and a global coordination function C is adopted, cross-enterprise business is decomposed into a plurality of independently-solvable LC-MDP sub-models, global constraint is processed through Lagrange relaxation, multipliers are updated iteratively, policy fusion and conflict resolution are realized by means of the global coordination function C, and supply chain disturbance is dynamically adapted by combining a three-layer framework and a feedback mechanism. The invention realizes the scalable collaborative optimization of large-scale multi-enterprise business, supports distributed training and real-time scheduling, effectively solves the problems of resource conflict and dynamic decision, and improves the collaborative efficiency and stability of a supply chain.

Inventors

QIU XIAOPING
LI PENGFEI
LIU HAIXIANG
XU MINGYUAN
MA YUHANG
FENG CHONG
CHEN MINGYU

Assignees

西南交通大学
西南交通大学唐山研究院

Dates

Publication Date: 20260512
Application Date: 20251231

Claims (9)

1. The multi-enterprise loose coupling MDP Lagrange collaborative optimization system is characterized by being applied to port integration and cluster type supply chain environments, and the system adopts a three-layer architecture, comprises a service layer, an optimization decision layer and an execution feedback layer, and all the layers are mutually cooperated in sequence to realize multi-enterprise service collaborative optimization; The service layer is used for providing service state data and receiving a feedback execution result; The optimization decision layer builds an optimization model based on the service state data, performs constraint relaxation and iterative solution, and generates a unified collaborative strategy; And the execution feedback layer receives and executes the unified collaborative strategy, monitors the execution process and feeds back data to the optimization decision layer at the same time, and realizes dynamic adaptation.
2. The multi-enterprise-oriented loosely coupled MDP lagrangian collaborative optimization system of claim 1, wherein the business layer comprises a loading and unloading operation system, a storage management system, a transportation scheduling system and other business systems, each of which operates independently and outputs respective task state, equipment state, resource occupancy state and environmental state data to an optimization decision layer; the optimization decision layer comprises a data acquisition and state construction module, an LC-MDP submodel management module, a Lagrange relaxation module and a global coordination function C module, wherein the modules work cooperatively: the data acquisition and state construction module is used for acquiring data output by the service layer and constructing a system state in a unified format; The LC-MDP sub-model management module is used for constructing and managing a plurality of loosely coupled Markov decision LC-MDP sub-models, and each sub-model corresponds to one cross-enterprise business link; the Lagrange relaxation module is used for establishing a global resource and time window constraint set, constructing a Lagrange function, initializing and updating a Lagrange multiplier ; The global coordination function C module is used for receiving the local strategies of each LC-MDP sub-model, realizing resource conflict resolution, task dependency sequencing and strategy fusion, and generating a global coordination strategy; The execution feedback layer comprises a collaborative strategy execution module and an execution monitoring and feedback module: the collaborative policy execution module is used for receiving the unified collaborative policy output by the optimization decision layer and issuing the unified collaborative policy to each system of the business layer for execution; The execution monitoring and feedback module is used for monitoring the strategy execution process in real time, collecting execution data including resource occupation data, constraint violation data and task completion condition data, and feeding the data back to the data collection and state construction module of the optimization decision layer.
3. A multi-enterprise loose coupling MDP Lagrange collaborative optimization method is characterized by comprising the following steps: S1, decomposing the whole business across enterprises into a plurality of loosely coupled Markov decision LC-MDP sub-models, wherein each LC-MDP sub-model comprises a state space module, a decision and strategy module and a rewarding and punishment module; S2, global constraint Lagrange relaxation, namely establishing a global resource and time window constraint set, constructing a Lagrange function and initializing a Lagrange multiplier Lagrange multiplier Penalty coefficient vectors for constraints; s3, solving the sub-problem, namely multiplying the given Lagrangian Under the condition of (1), each LC-MDP sub-model is independently solved in parallel to obtain an optimal local strategy of each sub-model and a corresponding expected resource occupation vector; S4, solving a main problem, namely updating the multiplier, namely adjusting Lagrangian multipliers according to the comparison result of the expected resource occupation vectors of all the submodels and the constraint upper limit vectors If the resource occupation exceeds the capacity, the capacity is increased If the resource is idle, the cost is reduced Iterating until dual convergence; S5, strategy integration, namely fusing the optimal local strategies of all sub-models through a global coordination function C to realize resource conflict resolution, task dependency sequencing and strategy time scale unification, and generating a unified cooperative strategy; and S6, executing and feeding back, namely issuing a unified collaborative strategy, acquiring execution data through an execution monitoring and feeding back module, feeding back the execution data to an optimization decision layer, and dynamically adapting to supply chain disturbance.
4. The multi-enterprise-oriented loosely coupled MDP lagrangian collaborative optimization method of claim 3, wherein in S1, the LC-MDP sub-model includes at least two of a ship unloading operation scheduling sub-model, a port transfer scheduling sub-model, a warehouse entry management sub-model, a transport vehicle scheduling sub-model, an ex-warehouse and a distribution sub-model.
5. The multi-enterprise-oriented loosely coupled MDP lagrangian collaborative optimization method of claim 3, wherein in S1, the state of the state space module comprises a task state, a device state, a resource occupancy state, and an environmental state, the actions of the decision and policy module comprise resource allocation, priority adjustment, and path selection, and the feedback of the rewarding and punishing module comprises a task completion rewarding, a delay punishment, a resource conflict punishment, and a resource balance rewarding.
6. The multi-enterprise-oriented loosely coupled MDP lagrangian collaborative optimization method of claim 3, wherein in S2, the set of global resource and time window constraints includes a device capacity constraint, a time window constraint, a task dependent constraint, a fair allocation constraint, and a multi-enterprise shared resource constraint; Lagrangian function: ; Wherein, the For a given local policy The ith sub-model expects cumulative returns over the entire decision period; representing constraint penalty coefficient vectors for Lagrangian multiplier vectors; A violation vector for each constraint under the joint policy pi; to constrain the weighted penalty term of the violation, when the resource occupancy exceeds capacity, To be positive, the penalty term is increased, thus suppressing this strategy at the time of optimization.
7. The multi-enterprise-oriented loosely coupled MDP lagrangian collaborative optimization method according to claim 6, wherein in S3, the objective function of LC-MDP submodel independent solution is: ; Wherein, the Representing penalty coefficients at the current constraint The optimal local strategy of the ith sub-model is performed; to be in policy The subsystem then expects a resource demand vector for each global constraint; The cost is used for the corresponding resources; the larger the resource is, the more scarce the sub-model will tend to occupy less of the resource.
8. The multi-enterprise-oriented loosely coupled MDP lagrangian collaborative optimization method of claim 7, wherein in S4, lagrangian multipliers The updated formula of (2) is: ; Wherein k is the outer layer iteration round; is the multiplier vector at the kth iteration; In the k round of iteration, solving all the sub-problems to obtain a combined strategy; representing the vector of the default amount of each constraint under the joint strategy, b is the upper constraint limit vector when When b, the resource is overused; and >0 represents the step size coefficient of the kth update, which is used for controlling the change speed of the multiplier, and adopts a fixed step size, a decremental step size and an adaptive step size strategy.
9. The multi-enterprise-oriented loosely coupled MDP lagrangian collaborative optimization method of claim 8, wherein in S5, the fusion logic of the global coordination function C is a rule-based decision logic, an optimization algorithm and a learning model, and the output of C satisfies Wherein Outputting a local strategy set for all sub-models; In order to coordinate and fuse operators, the method is derived from rule-based decision logic, an optimization algorithm and a learning model and is used for reselecting actions when local strategies collide, pi is a coordinated global execution strategy, and the maximum overall benefit is ensured on the premise of meeting global constraint.

Description

Multi-enterprise-oriented loosely-coupled MDP Lagrange collaborative optimization system and method Technical Field The invention relates to the technical field of port supply chain collaborative scheduling, in particular to a multi-enterprise-oriented loosely coupled MDP Lagrange collaborative optimization system and method. Background In port integration and cluster type supply chain environments, multiple types of enterprises such as loading, unloading, storage, transportation and the like have the problems of high coupling of service flows, long scheduling chain, strong competition of multi-main shared resources (equipment, storage yard and vehicles), multiple and complex global constraint (capacity/time window/fairness/task dependence), and state dimension explosion and inextensible calculation based on the integrated optimization of a single model. The existing centralized optimization method or reinforcement learning method cannot be effectively solved under the conditions of large-scale, multi-enterprise and multi-service link coordination. At present, the traditional scheme adopted at home and abroad mainly comprises a centralized resource scheduling model (MILP, CP), but the model scale grows exponentially along with the number of enterprises, and is not suitable for dynamic scenes. Workflow/business process management (BPM/WfMS) may describe the process but may not perform optimal scheduling calculations. Reinforcement learning (RL/MDP) is mostly applied to single port area and single equipment scheduling, and once the reinforcement learning is extended to multi-enterprise linkage, the state space explodes and cannot be solved. None of these schemes can meet the comprehensive requirements of "multi-subject, shared resource, multi-constraint, real-time scheduling". The Markov decision MDP is applied to warehouse entry/exit strategies, port bridge crane track optimization and transportation task automatic allocation in the existing research at home and abroad, but cannot be expanded due to the fact that coupling decisions among multiple enterprises cannot be processed, large-scale resource constraint cannot be expressed, and a unified coordination mechanism across the MDP is lacking. Lagrange relaxation is used for optimizing traffic network flows, logistics distribution paths and power distribution, but cannot process reinforcement learning decision class models, solve sub-problems, not be fused with strategy learning, and lack unified business semantics oriented to enterprise cooperation. The prior art cannot simultaneously consider dynamic decision and global constraint processing, lacks a pluggable and extensible decision structure oriented to a multi-enterprise cooperative scene, is difficult to incorporate task dependence, resource conflict, fair allocation and the like into a unified framework, cannot be solved in parallel in a multi-main-body environment, is difficult to apply in real time, and is frequently collided due to the fact that a strategy is executed without a unified coordination function. Disclosure of Invention The invention aims to provide a multi-enterprise loose coupling MDP Lagrange collaborative optimization system and a multi-enterprise loose coupling MDP Lagrange collaborative optimization method, which are used for decomposing a large-scale business resource scheduling problem into a plurality of local decision models which can be solved independently in a multi-enterprise collaborative scene, and realizing an overall optimal collaborative strategy under the condition of guaranteeing global constraint through a unified coordination mechanism. In order to achieve the above purpose, the invention provides a multi-enterprise loose coupling MDP Lagrange collaborative optimization system which is applied to port integration and cluster type supply chain environments, wherein the system adopts a three-layer architecture, and comprises a service layer, an optimization decision layer and an execution feedback layer, and all the layers are mutually cooperated in sequence to realize multi-enterprise service collaborative optimization; The service layer is used for providing service state data and receiving a feedback execution result; The optimization decision layer builds an optimization model based on the service state data, performs constraint relaxation and iterative solution, and generates a unified collaborative strategy; And the execution feedback layer receives and executes the unified collaborative strategy, monitors the execution process and feeds back data to the optimization decision layer at the same time, and realizes dynamic adaptation. Preferably, the service layer comprises a loading and unloading operation system, a storage management system, a transportation scheduling system and other service systems, and each system operates independently and outputs respective task state, equipment state, resource occupation state and environment state data to the optimizati