CN-120632606-B - Container position recommendation method, system, equipment and medium based on reinforcement learning

CN120632606BCN 120632606 BCN120632606 BCN 120632606BCN-120632606-B

Abstract

The application relates to the technical field of container scheduling, in particular to a container position recommending method, a system, equipment and a medium based on reinforcement learning, which comprise the steps of acquiring storage yard state data and to-be-entered container attribute data and encoding the storage yard state data and to-be-entered container attribute data into a joint state vector; inputting the combined state vector into a trained box recommendation model, calculating the action value of all available boxes, selecting the box with the highest action value as a recommended box, outputting the position information of the recommended box, executing the actual box selection operation, recording the position information of the actual selected box, generating a feedback signal according to the actual selected box, storing the combined state vector, the actual selected box, the feedback signal and the storage yard state data after box selection into an experience playback pool, and updating the parameters of the box recommendation model by sampling the training data. The application can improve the space utilization rate and the operation efficiency of the storage yard, reduce the box position conflict and the box turning movement cost, realize multi-target cooperative optimization and has strong environmental adaptability.

Inventors

GONG RUSHENG
JIANG TAO
CHEN YU
Zheng tuo
WANG YUNHUA
CHEN ANYI
WANG HONGCHANG
WANG SHUAI
WANG JIE

Assignees

山东港口陆海国际物流集团有限公司
山东港口陆海国际物流集团发展有限公司
山东陆海通数字科技有限公司

Dates

Publication Date: 20260508
Application Date: 20250416

Claims (7)

1. The container position recommending method based on reinforcement learning is characterized by comprising the following steps of: S1, acquiring storage yard state data and attribute data of containers to be entered, wherein the storage yard state data comprises positions of various fields, shellfish and rows in a storage yard, and spare conditions, positions and attribute data of stored containers; s2, encoding the storage yard state data and the attribute data of the container to be entered into a joint state vector; The encoding rules of the joint state vector include: encoding the storage field state data into a two-dimensional storage field state matrix, wherein the storage field state matrix comprises storage field state submatrices the same as the field areas in number, each storage field state submatrix corresponds to one field area, the rows and the columns of the storage field state submatrices respectively correspond to the shellfish and the rows of the field areas, and each element in the storage field state submatrices represents the vacant state of the corresponding position, the type of the stored container and the layer height limit; the attribute data of the container to be entered is converted into a characteristic vector through single thermal coding or embedded coding, and is spliced with a storage yard state matrix into a joint state vector; s3, inputting the combined state vector into a trained tank bit recommendation model, and calculating the action value of all available tank bits, wherein the tank bit recommendation model is a reinforcement learning algorithm model; selecting the box position with the highest action value as a recommended box position, and outputting the position information of the recommended box position; The method comprises the steps of selecting recommended boxes by using an epsilon-greedy strategy, wherein the exploration probability epsilon of the epsilon-greedy strategy decays exponentially along with training rounds; s4, executing actual box position selection operation, and recording position information of the actual selected box position; s5, generating a feedback signal according to the actual selected box bit, wherein the feedback signal is related to a quality evaluation index of box bit selection; The calculation formula of the feedback signal R is: Wherein S match is a matching coefficient, and is 1 when the actual selected box position is consistent with the recommended box position, or is 0; n move is estimated moving times, and is calculated based on the number of conflict times of the operation paths of the target tank and the associated container; r waste is the space wave rate, and the expression is: n empty is the current number of vacant positions of the target row; N max is the theoretical maximum stacking layer number of the target row; n current is the number of stored containers in the target row; α, β, γ are weight coefficients, satisfying α+β+γ=1; and S6, storing the combined state vector, the actual selected box bit, the feedback signal and the storage yard state data after box bit selection into an experience playback pool, and updating parameters of a box bit recommendation model through sampling training data.
2. The container position recommending method according to claim 1, wherein in the step S1, the attribute data of the stored container includes a cargo category, a size, a loading and unloading priority, and a stocking time of the container; The inbound container attribute data includes cargo category, size, weight, destination port and shipping priority of the inbound container.
3. The container recommendation method according to claim 1, wherein in step S3, the container recommendation model is a deep Q network model, the input of the container recommendation model is a joint state vector S, and the output is the Q value of each available container; the tank recommendation model comprises an online network and a target network, and the target network parameters are used for regularly synchronizing the online network parameters.
4. The container position recommending method according to claim 1, wherein the calculation rule of N move comprises that when the row of the target container position has a container which conflicts with the ship name or the bill of lading of the container to be entered, N move is increased by 1 every time the container conflicts; R waste satisfies that if the number of empty positions of the target row is less than 10% of the theoretical maximum number of stacks, R waste =1 is forcibly set.
5. A reinforcement learning-based container recommendation system, configured to implement the container recommendation method according to any one of claims 1 to 4, comprising: the data acquisition module is used for acquiring storage yard state data and attribute data of containers to be entered; The data coding module is used for coding the storage yard state data and the attribute data of the container to be entered into the storage yard into a joint state vector; The box position recommending module is used for inputting the combined state vector into the trained box position recommending model, calculating the action values of all available box positions, selecting the box position with the highest action value as the recommended box position, and outputting the position information of the recommended box position; the box position recording module is used for recording the position information of the actual selected box position when the actual selected box position is inconsistent with the recommended box position; the feedback signal generation module is used for generating a feedback signal according to the actual selection box bit; And the experience playback and model updating module is used for storing the joint state vector, the actual selection box bit, the feedback signal and the storage yard state data after the box bit selection into an experience playback pool, and updating parameters of the box bit recommendation model through sampling training data.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor being adapted to perform the steps of the container position recommendation method according to any one of claims 1-4 when the computer program is executed.
7. A storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the container position recommendation method according to any of claims 1-4.

Description

Container position recommendation method, system, equipment and medium based on reinforcement learning Technical Field The application relates to the technical field of container scheduling, in particular to a container position recommending method, a system, equipment and a medium based on reinforcement learning. Background The container terminal is used as a core node of a global logistics chain and bears the tasks of entering and exiting, piling and carrying a large number of containers. With the explosion of international trade volume, the business scale and complexity of wharfs continue to rise, and the traditional manual scheduling mode has difficulty in meeting the high throughput demand, and particularly has obvious bottlenecks in the aspects of yard space optimization, container repeated handling reduction and operation efficiency improvement. In the prior art, an automatic storage yard needs to determine which position (region, shellfish, row and layer) of the storage yard the new container should be stacked on through box position recommendation so as to meet the requirements of storage yard space optimization, reduction of repeated carrying of the container and improvement of operation efficiency. The core goal of the tank recommendation is to optimize the stacking position of the containers so as to maximize the yard utilization efficiency and reduce the cost and time consumption in the future container handling. The existing automatic container position recommending method mainly adopts a heuristic algorithm or a rule-based algorithm, and utilizes historical data to match stacking rules in a fixed scene so as to realize container position recommending of the container. The method has a certain effect in standardized operation, but the recommendation precision and the adaptability of the method are greatly reduced when the container is in face of the dynamic change of the storage yard state, the diversity of the container attribute and the high-dimensional state space. The limitations of the existing container position recommending method are mainly that the method cannot adapt to site layout change in real time depending on manual preset rules, is low in efficiency when complex constraint conditions (such as multi-ship and multi-size container mixed storage) are processed, and is difficult to lead to storage space waste and increase in carrying cost through autonomous learning optimization strategies. In addition, the existing system lacks a feedback mechanism for real-time interaction with a dispatcher, and cannot dynamically adjust a recommendation strategy according to actual service requirements. Disclosure of Invention Aiming at the technical problems that the existing container recommendation method lacks dynamic adaptability, is low in efficiency when complex constraint conditions are processed, is difficult to autonomously learn and lacks a feedback mechanism, the application provides the container recommendation method, system, equipment and medium based on reinforcement learning, which can improve the storage space utilization rate and the operation efficiency, reduce the container conflict and the container turning movement cost, realize multi-objective collaborative optimization and have strong environmental adaptability. In a first aspect, the present application provides a reinforcement learning-based container position recommendation method, including the steps of: s1, acquiring storage yard state data and attribute data of containers to be entered; the storage yard state data comprise the positions and the vacant conditions of each field, shellfish and row in the storage yard, the positions and the attribute data of the stored containers; s2, encoding the storage yard state data and the attribute data of the container to be entered into a joint state vector; s3, inputting the combined state vector into a trained tank bit recommendation model, and calculating the action value of all available tank bits, wherein the tank bit recommendation model is a reinforcement learning algorithm model; Selecting the box position with the highest action value as a recommended box position, and outputting the position information of the recommended box position, wherein the position information comprises combinations of a field, a shellfish and a row where the recommended box position is located; s4, executing actual box position selection operation, and recording position information of the actual selected box position; s5, generating a feedback signal according to the actual selected box bit, wherein the feedback signal is related to a quality evaluation index of box bit selection; and S6, storing the combined state vector, the actual selected box bit, the feedback signal and the storage yard state data after box bit selection into an experience playback pool, and updating parameters of a box bit recommendation model through sampling training data. It should be further noted that, in step S1,