CN-122022027-A - Smart port decision optimization method and device based on reinforcement learning

CN122022027ACN 122022027 ACN122022027 ACN 122022027ACN-122022027-A

Abstract

The application provides an intelligent port decision optimization method and device based on reinforcement learning, and the method comprises the steps of collecting port environment data, equipment state data and ship dynamic data through multiple types of sensors, generating a port operation situation map, generating a resource scheduling instruction comprising berth allocation and operation scheduling based on the port operation situation map, detecting the resource scheduling instruction, including abnormal detection and rule base matching verification, triggering reinforcement learning model update and generating a rescheduling instruction when conflict alarm information is detected, generating equipment action instruction through a bottom layer strategy when no conflict is detected, verifying the executable of the equipment action instruction according to real-time feedback data, and executing the instruction when verification passes. According to the method, through multi-mode data fusion and reinforcement learning, resource scheduling is optimized, conflict is intelligently handled and rescheduled, equipment action instructions are ensured to be executable, efficient and accurate decision of port operation is realized, and port operation efficiency and intelligent level are improved.

Inventors

SONG YINGLIN
LIU FAN
GE BING
LI YONG
WANG WEIAN
YU ZIHAO
YU HAO
MA XIAOLONG

Assignees

中交第一航务工程勘察设计院有限公司
山东港口烟台港集团有限公司

Dates

Publication Date: 20260512
Application Date: 20260121

Claims (8)

1. The intelligent port decision optimization method based on reinforcement learning is characterized by comprising the following steps of: collecting port environment data, equipment state data and ship dynamic data through a multi-type sensor, and generating a port operation situation map through multi-mode data fusion; Generating a resource scheduling instruction comprising berth allocation and job scheduling based on the port operation situation map; Detecting the resource scheduling instruction, including matching and verifying abnormality detection and a rule base, triggering reinforcement learning model update and generating a rescheduling instruction when conflict alarm information is detected, and generating a device action instruction through a bottom layer strategy when no conflict is detected; Verifying the executable performance of the equipment action instruction according to the real-time feedback data, and executing the instruction when verification passes; The method comprises the steps of generating a resource scheduling instruction comprising berth allocation and job scheduling, wherein a reinforcement learning model is adopted in the generation process, the reinforcement learning model comprises a multi-objective optimization algorithm, the multi-objective optimization algorithm dynamically adjusts rewarding weights through a self-adaptive rewarding function, the self-adaptive rewarding function automatically searches an optimal objective weighing strategy according to different stages of port operation and business requirements to record an execution result of the execution instruction, increment parameter updating is carried out on the reinforcement learning model based on the execution result, dynamic optimization of port decision is achieved through the reinforcement learning model after updating, performance indexes before and after each reinforcement learning model updating are recorded, and reinforcement learning exploration rate coefficients are dynamically adjusted according to index change trends.
2. The reinforcement learning-based intelligent port decision optimization method of claim 1, wherein generating resource scheduling instructions including berth allocation and job scheduling based on the port operation situation map comprises: And receiving ETA time, cargo priority and tide cycle data of the ship through a hierarchical reinforcement learning high-level strategy network, and outputting berth allocation scheme and loading and unloading operation time window instructions.
3. The reinforcement learning-based intelligent port decision optimization method of claim 1, wherein detecting the resource scheduling instruction comprises anomaly detection and rule base matching verification, and comprises: and calling a preset port physical constraint rule library to verify the relationship between the draft and the berth water depth of the ship and the matching degree between the crane jib span and the position of the cargo yard.
4. The reinforcement learning-based intelligent port decision optimization method of claim 1, wherein verifying the executability of the equipment action instructions based on real-time feedback data comprises: And verifying whether the displacement and the speed value in the action instruction exceed the physical limit of the equipment or not through the crane hook position, the gantry crane moving speed and the energy consumption parameters fed back in real time by the equipment state sensor.
5. An intelligent harbor decision optimizing device based on reinforcement learning, which is characterized by comprising: the fusion module is used for acquiring port environment data, equipment state data and ship dynamic data through the multi-type sensors and generating a port operation situation map through multi-mode data fusion; the scheduling module is used for generating a resource scheduling instruction comprising berth allocation and job scheduling based on the port operation situation map; The verification module is used for detecting the resource scheduling instruction and comprises the steps of carrying out matching verification on anomaly detection and a rule base, triggering reinforcement learning model update and generating a rescheduling instruction when conflict alarm information is detected, and generating a device action instruction through a bottom layer strategy when no conflict is detected; The execution module verifies the executable performance of the equipment action instruction according to the real-time feedback data, and executes the instruction when verification passes; The method comprises the steps of generating a resource scheduling instruction comprising berth allocation and job scheduling, wherein a reinforcement learning model is adopted in the generation process, the reinforcement learning model comprises a multi-objective optimization algorithm, the multi-objective optimization algorithm dynamically adjusts rewarding weights through a self-adaptive rewarding function, the self-adaptive rewarding function automatically searches an optimal objective weighing strategy according to different stages of port operation and business requirements to record an execution result of the execution instruction, increment parameter updating is carried out on the reinforcement learning model based on the execution result, dynamic optimization of port decision is achieved through the reinforcement learning model after updating, performance indexes before and after each reinforcement learning model updating are recorded, and reinforcement learning exploration rate coefficients are dynamically adjusted according to index change trends.
6. The reinforcement learning-based intelligent port decision optimization device of claim 5, wherein the scheduling module generates resource scheduling instructions including berth allocation and job scheduling based on the port operation situation map, and the resource scheduling instructions comprise: And receiving ETA time, cargo priority and tide cycle data of the ship through a hierarchical reinforcement learning high-level strategy network, and outputting berth allocation scheme and loading and unloading operation time window instructions.
7. The reinforcement learning-based intelligent port decision optimization apparatus of claim 5, wherein the verification module detects the resource scheduling instruction, including anomaly detection and rule base matching verification, comprising: and calling a preset port physical constraint rule library to verify the relationship between the draft and the berth water depth of the ship and the matching degree between the crane jib span and the position of the cargo yard.
8. The reinforcement learning-based intelligent port decision optimizing apparatus of claim 5, wherein said execution module verifies the performability of said equipment action instructions based on real-time feedback data, comprising: And verifying whether the displacement and the speed value in the action instruction exceed the physical limit of the equipment or not through the crane hook position, the gantry crane moving speed and the energy consumption parameters fed back in real time by the equipment state sensor.

Description

Smart port decision optimization method and device based on reinforcement learning Technical Field The application belongs to the field of port optimization decision making, and particularly relates to an intelligent port decision optimizing method and device based on reinforcement learning. Background The intelligent port technology aims at promoting the digital and intelligent transformation of port operation by applying advanced information technology, automation equipment and intelligent algorithm. The traditional port management system has obvious defects in the links of data acquisition, transmission, processing and decision making. Specifically, the data acquisition types are limited, and the sampling strategy is difficult to flexibly adjust according to the data change characteristics, so that the acquired data cannot comprehensively reflect the actual operation condition of the port in real time. In the data transmission process, all data are required to be transmitted to the central processing module due to the lack of edge computing capability, so that network congestion is caused, and data processing response is delayed. In a decision link, the traditional system mostly adopts preset rules and simple algorithm models, cannot adapt to complex and changeable operation scenes of ports, and is difficult to effectively balance among a plurality of targets. In addition, model updating generally requires a large amount of data training again, and the process is tedious and time-consuming, and cannot respond to dynamic changes of the port operation environment in time. Disclosure of Invention The application aims to overcome the defects in the prior art and provide an intelligent port decision optimization method and device based on reinforcement learning. The application also provides an intelligent port decision optimization method based on reinforcement learning, which comprises the following steps: collecting port environment data, equipment state data and ship dynamic data through a multi-type sensor, and generating a port operation situation map through multi-mode data fusion; Generating a resource scheduling instruction comprising berth allocation and job scheduling based on the port operation situation map; Detecting the resource scheduling instruction, including matching and verifying abnormality detection and a rule base, triggering reinforcement learning model update and generating a rescheduling instruction when conflict alarm information is detected, and generating a device action instruction through a bottom layer strategy when no conflict is detected; Verifying the executable performance of the equipment action instruction according to the real-time feedback data, and executing the instruction when verification passes; The method comprises the steps of generating a resource scheduling instruction comprising berth allocation and job scheduling, wherein a reinforcement learning model is adopted in the generation process, the reinforcement learning model comprises a multi-objective optimization algorithm, the multi-objective optimization algorithm dynamically adjusts rewarding weights through a self-adaptive rewarding function, the self-adaptive rewarding function automatically searches an optimal objective weighing strategy according to different stages of port operation and business requirements to record an execution result of the execution instruction, increment parameter updating is carried out on the reinforcement learning model based on the execution result, dynamic optimization of port decision is achieved through the reinforcement learning model after updating, performance indexes before and after each reinforcement learning model updating are recorded, and reinforcement learning exploration rate coefficients are dynamically adjusted according to index change trends. Optionally, generating a resource scheduling instruction including berth allocation and job scheduling based on the port operation situation map includes: And receiving ETA time, cargo priority and tide cycle data of the ship through a hierarchical reinforcement learning high-level strategy network, and outputting berth allocation scheme and loading and unloading operation time window instructions. Optionally, detecting the resource scheduling instruction, including anomaly detection and rule base matching verification, includes: and calling a preset port physical constraint rule library to verify the relationship between the draft and the berth water depth of the ship and the matching degree between the crane jib span and the position of the cargo yard. Optionally, verifying the executable of the device action instruction according to the real-time feedback data includes: And verifying whether the displacement and the speed value in the action instruction exceed the physical limit of the equipment or not through the crane hook position, the gantry crane moving speed and the energy consumption parameters fed back in rea