CN-121981339-A - Supply chain dispatching optimization system based on reinforcement learning

CN121981339ACN 121981339 ACN121981339 ACN 121981339ACN-121981339-A

Abstract

The invention discloses a supply chain dispatching optimization system based on reinforcement learning, which particularly relates to the technical field of intelligent supply chain dispatching, and comprises the steps of forming an on-road order sequence data set by collecting supply chain node inventory data, replenishment instruction data and order placement and arrival time data, calculating order sequence inversion relation based on the order placement and arrival time data, identifying an order crossing section and carrying out arrival sequence rearrangement to obtain on-road order real arrival sequence data, calculating an inventory evolution path under the condition of order crossing by combining the supply chain node inventory data to generate state transition correction data, re-associating the replenishment instruction data with the actual arrival feedback data to construct profit attribution correction data, inputting the state transition correction data and the profit attribution correction data into a reinforcement learning dispatching model to carry out strategy updating, and outputting an optimized supply chain dispatching strategy.

Inventors

Fan Taisheng
PAN JIN

Assignees

阳光学院

Dates

Publication Date: 20260505
Application Date: 20260331

Claims (7)

1. A reinforcement learning-based supply chain schedule optimization system, comprising: The time sequence acquisition module is used for acquiring supply chain link point inventory data, replenishment instruction data, order placement time data and order arrival time data to form an on-road order time sequence data set; The cross recognition module is used for calculating order sequence inversion relation based on order arrival time data and order delivery time data, recognizing an order cross section and generating order cross identification data; The order rearrangement module rearranges the order sequence data set to the order sequence based on the order cross identification data to construct real order sequence data of the order in transit; The path correction module is used for calculating an inventory evolution path under the condition of order crossing based on real arrival sequence data of the on-road order and supply chain link point inventory data, and generating state transition correction data; the attribution correction module is used for re-associating the replenishment instruction data with the corresponding replenishment feedback data based on the state transition correction data to construct benefit attribution correction data; And the strategy updating module is used for inputting the state transition correction data and the profit attribution correction data into the reinforcement learning scheduling model to update the strategy and outputting the supply chain scheduling optimization strategy.
2. The reinforcement learning-based supply chain schedule optimization system of claim 1, wherein the obtaining of supply link point inventory data, replenishment order data, order placement time data, and order arrival time data forms an in-transit order time series data set, specifically: Continuously acquiring inventory data of the supply chain link points according to a time sequence and executing time alignment to form an inventory change time sequence; matching the replenishment instruction data in the corresponding time range according to the inventory change time sequence, and constructing corresponding relation data of inventory change and replenishment instruction; time indexing is carried out on the replenishment instruction data, and order placement time data are extracted; analyzing the order execution record corresponding to the replenishment instruction data, and extracting order arrival time data; and carrying out association arrangement on the replenishment instruction data based on the order placement time data and the order arrival time data to form an in-transit order time sequence data set.
3. The reinforcement learning-based supply chain scheduling optimization system of claim 2, wherein order inversion relationships are calculated based on order placement time data and order arrival time data, order crossing sections are identified, and order crossing identification data is generated, specifically: based on the on-the-way order time sequence data set, extracting the corresponding relation between order time data and order arrival time data, constructing an order sequence according to the order time sequence and synchronously recording the corresponding order arrival time position; comparing the order time sequence of adjacent orders in the order sequence with the order time sequence, and identifying the order pairs of which the order time sequence is inconsistent with the order time sequence; merging the continuous distribution intervals of the order pairs in the in-transit order time sequence data set to form an order cross section set; And marking the corresponding orders based on the order crossing section set, and generating order crossing identification data corresponding to the on-road order time sequence data set one by one.
4. A reinforcement learning based supply chain schedule optimization system as claimed in claim 3 wherein the order crossing identification data based on order crossing identification data is used to reorder the order sequence data set in transit to construct real order sequence data in transit, specifically: Extracting marked order crossing sections in the in-transit order timing data set based on the order crossing identification data; re-ordering orders in the order crossing section according to the order arrival time data, and constructing order arrival orders in the section; Replacing the order arrival sequence of the reordered orders in the order crossing section with the original order placement time sequence position; And carrying out consistency check on the basis of the replaced whole order sequence and the original order delivery time data to form real arrival sequence data of the on-road order.
5. The reinforcement learning-based supply chain scheduling optimization system of claim 4, wherein the inventory evolution path under the condition of order crossover is calculated based on the real arrival sequence data of the on-road order combined with the supply chain link point inventory data to generate state transition correction data, specifically: determining an inventory update relationship corresponding to the supply chain node inventory data at the arrival time of the order based on the real arrival sequence data of the on-transit order, and forming an inventory state change path time by time; extracting a moment-by-moment inventory state change path in the order crossing section based on the order crossing section set, and distinguishing individual influences of order arrival in the order crossing section on inventory change; And carrying out association combination on the inventory change results which are singly distinguished in the order crossing section and the inventory state change path to obtain a transition path of the inventory state under the order crossing condition, and forming state transition correction data.
6. The reinforcement learning based supply chain schedule optimization system of claim 5, wherein the replenishment instruction data is re-associated with the corresponding replenishment feedback data based on the state transition correction data to construct revenue attribution correction data, in particular: based on the state transition correction data, extracting a transition path of the stock state under the condition of the order crossing; re-matching the order arrival time with the replenishment instruction data and establishing a corresponding relation to form a new association relation between the replenishment instruction data and corresponding replenishment feedback data under the condition of order intersection; And determining the real influence of the replenishment instruction data on the inventory change based on the new association relationship, and forming the benefit attribution correction data.
7. The reinforcement learning-based supply chain schedule optimization system of claim 6, wherein the state transition correction data and the profit-attribution correction data are input into the reinforcement learning schedule model for policy updating, and the supply chain schedule optimization policy is output, specifically: Constructing input data of a supply chain scheduling state based on the state transition correction data and the profit-attribution correction data; Correcting and updating the state transition relation and the profit distribution in the reinforcement learning scheduling model by utilizing the input data of the supply chain scheduling state; Calculating a replenishment action sequence in a supply chain scheduling state by using the corrected and updated reinforcement learning scheduling model; And ordering the replenishment action sequences according to the corresponding relation with the inventory data of the supply chain link points to form a supply chain dispatching optimization strategy.

Description

Supply chain dispatching optimization system based on reinforcement learning Technical Field The invention relates to the technical field of intelligent scheduling of supply chains, in particular to a supply chain scheduling optimization system based on reinforcement learning. Background In the field of supply chain management, existing supply chain scheduling methods generally predict inventory consumption according to order placement order, and formulate corresponding replenishment instructions. In the actual supply chain operation process, the order of the order actually reaching the supply chain node is often inconsistent with the order placing order due to the influence of factors such as logistics transportation delay, order execution efficiency fluctuation and the like, namely, an order crossing phenomenon exists, so that the replenishment strategy formulated based on the order placing order cannot accurately reflect the real change situation of the stock. The existing supply chain scheduling method cannot effectively process inventory management deviation caused by the cross arrival of orders, so that mismatch between replenishment decisions and actual inventory changes is caused, and the accuracy of overall inventory control and resource scheduling of the supply chain is reduced. Disclosure of Invention In order to overcome the defects in the prior art, the embodiment of the invention provides a greening monitoring and evaluating method and system for urban landscaping, which are used for solving the problems in the background art. In order to achieve the above purpose, the present invention provides the following technical solutions: a reinforcement learning based supply chain schedule optimization system comprising: The time sequence acquisition module is used for acquiring supply chain link point inventory data, replenishment instruction data, order placement time data and order arrival time data to form an on-road order time sequence data set; The cross recognition module is used for calculating order sequence inversion relation based on order arrival time data and order delivery time data, recognizing an order cross section and generating order cross identification data; The order rearrangement module rearranges the order sequence data set to the order sequence based on the order cross identification data to construct real order sequence data of the order in transit; The path correction module is used for calculating an inventory evolution path under the condition of order crossing based on real arrival sequence data of the on-road order and supply chain link point inventory data, and generating state transition correction data; the attribution correction module is used for re-associating the replenishment instruction data with the corresponding replenishment feedback data based on the state transition correction data to construct benefit attribution correction data; And the strategy updating module is used for inputting the state transition correction data and the profit attribution correction data into the reinforcement learning scheduling model to update the strategy and outputting the supply chain scheduling optimization strategy. In a preferred embodiment, supply chain node inventory data, replenishment order data, order placement time data, and order arrival time data are obtained to form an in-transit order time series data set, specifically: Continuously acquiring inventory data of the supply chain link points according to a time sequence and executing time alignment to form an inventory change time sequence; matching the replenishment instruction data in the corresponding time range according to the inventory change time sequence, and constructing corresponding relation data of inventory change and replenishment instruction; time indexing is carried out on the replenishment instruction data, and order placement time data are extracted; analyzing the order execution record corresponding to the replenishment instruction data, and extracting order arrival time data; and carrying out association arrangement on the replenishment instruction data based on the order placement time data and the order arrival time data to form an in-transit order time sequence data set. In a preferred embodiment, order inversion relation is calculated based on order placement time data and order arrival time data, order crossing sections are identified, and order crossing identification data is generated, specifically: based on the on-the-way order time sequence data set, extracting the corresponding relation between order time data and order arrival time data, constructing an order sequence according to the order time sequence and synchronously recording the corresponding order arrival time position; comparing the order time sequence of adjacent orders in the order sequence with the order time sequence, and identifying the order pairs of which the order time sequence is inconsistent with the order time sequen