Search

CN-122020375-A - Multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and machine learning model

CN122020375ACN 122020375 ACN122020375 ACN 122020375ACN-122020375-A

Abstract

The application belongs to the technical field of ship cargo capacity prediction, and in particular relates to a multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and a machine learning model, which comprises the steps of fusing ship navigation section data and arrival/departure data, matching the types of loaded cargos in the arrival/departure data into the ship navigation section data, and dividing the types of the loaded cargos into different data sets according to the types of the loaded cargos; correcting the draft cargo carrying capacity of the data in the data set, removing abnormal data, supplementing missing values of the data in the data set, preprocessing the data in the data set corresponding to the loaded cargo types to obtain input data corresponding to different loaded cargo types, constructing a plurality of ship cargo carrying capacity prediction models corresponding to different loaded cargo types, and evaluating to obtain an optimal ship cargo carrying capacity prediction model corresponding to different loaded cargo types. The method provided by the application solves the problems of poor data reliability and low accuracy of the prediction result of the existing ship cargo capacity prediction method.

Inventors

  • LU TIANYE
  • CHEN SHUHANG
  • WANG ZHIHUAN
  • LIU XINXIN

Assignees

  • 中远海运科技股份有限公司
  • 上海海事大学

Dates

Publication Date
20260512
Application Date
20260122

Claims (10)

  1. 1. The multi-cargo ship cargo capacity prediction method based on the multi-source heterogeneous data and the machine learning model is characterized by comprising the following steps of: S1, fusing ship leg data and arrival/departure data, and matching the types of the loaded cargos in the ship arrival/departure data into the ship leg data; S2, dividing data sets according to the types of the loaded cargos in the ship leg data in the S1, wherein different loaded cargos correspond to different data sets; S3, correcting the draft cargo carrying capacity of the data in the data set obtained in the S2, wherein the correction of the data with the adjacent draft data is carried out on the data with unchanged water output before and after the berthing operation, the correction of the data with excessive empty draft and the correction of the data with non-positive correlation of draft and cargo carrying capacity; s4, checking whether the ship draft and cargo capacity are positively correlated or not based on the ship structure theory of the data in the data set obtained after the processing in S3, and if not, judging abnormal data and eliminating the abnormal data from the data set; S5, adopting a random forest model to supplement missing values of data in the data set for the data set obtained by the S4 processing; S6, preprocessing data in the data set corresponding to the loaded cargo types to obtain input data corresponding to different loaded cargo types; And S7, respectively constructing a plurality of ship cargo capacity prediction models corresponding to different cargo types based on different models, training the prediction models by adopting input data corresponding to the cargo types, and evaluating model output data to obtain an optimal ship cargo capacity prediction model corresponding to the cargo types.
  2. 2. The multi-cargo ship cargo capacity prediction method based on the multi-source heterogeneous data and the machine learning model is characterized in that ship leg data are segmented and extracted from AIS data, ship data with non-empty liquid tank volume fields in the ship leg data are classified into crude oil in S1, and data in the dataset in S2 are ship leg data with cargo type fields.
  3. 3. The multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and a machine learning model according to claim 2, wherein in the step S3, the adjacent draft data is utilized to correct the data of which the water is unchanged before and after the berthing operation, specifically, all ship leg data with the navigational state being berthed are extracted from the data set obtained in the step S2, each piece of data is used as the beginning of one voyage, traversing is sequentially carried out, for the ship leg data with the loading and unloading state being changed, if the initial draft and the ending draft are not changed, the next adjacent ship leg data is traversed line by line until the data with the initial draft change is found to replace.
  4. 4. The multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and machine learning model according to claim 2, wherein the correction of the empty draft data in S3 is specifically that for a ship with cargo capacity of 0 but draft data is too large, and the resolution method deviates from ship leg data corresponding to draft when the ship is empty, the resolution method comprises the following steps: step 1, inquiring all ship leg data with the cargo carrying capacity of 0 in the data set obtained in the step 2; Step 2, calculating absolute values of actual draft in ship voyage data and draft in a minimum ship, and taking 5% of the actual draft of the ship as allowable draft deviation based on empirical observation; And 3, comparing the absolute value with the draft deviation, if the absolute value is larger than the draft deviation, namely, representing that the actual draft in the no-load data exceeds the maximum allowed no-load draft of the corresponding ship, generating a random number in a section [ the draft in the minimum ship and the draft deviation in the minimum ship ] to replace the actual draft data in the ship leg data, and if the absolute value is smaller than or equal to the draft deviation, namely, representing that the actual draft in the no-load data is within the maximum allowed no-load draft range of the corresponding ship, retaining the draft data in the ship leg data.
  5. 5. The multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and machine learning model according to claim 2, wherein the calculation formula of the minimum in-ship draft is: (1) Wherein: indicating a minimum in-ship draft, Representing the vessel length.
  6. 6. The multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and machine learning model according to claim 5, wherein the data of non-positive correlation between draft and cargo capacity is corrected in S3, specifically, according to the actual draft condition, if the ship is theoretically empty but the actual data is not, the actual draft is corrected to 0, if the ship is theoretically not empty but the actual data does not show positive correlation, the following processing is performed: step 1, calculating absolute values of actual draft in ship leg data and minimum draft in all data sets, wherein 5% of the actual draft of a ship is used as allowable draft deviation; and 2, comparing the absolute value with the draft deviation, if the absolute value is smaller than or equal to the draft deviation, namely, the actual draft is close to the minimum in-ship draft, correcting the load in ship leg data to be 0, and if the absolute value is larger than the draft deviation, namely, the actual draft is not close to the minimum in-ship draft, and the ship is in a non-empty condition and needs to be further processed, wherein the specific processing process is as follows: Calculating the desired load based on equation (2): (2) And 3, calculating the deviation percentage between the expected cargo carrying capacity and the actual cargo carrying capacity, wherein the formula is as follows: (3) And 4, if the deviation percentage is larger than 5%, replacing the cargo load in the ship leg data with the expected cargo load, otherwise, keeping the cargo load data in the ship leg data unchanged.
  7. 7. The multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and a machine learning model according to claim 6, wherein the abnormal data in S4 is judged by taking draft as an x-axis, cargo capacity as a y-axis, and taking a schematic diagram of the relation between ship cargo capacity and draft, wherein the draft at A point is the design draft and maximum cargo capacity of the ship, the draft at B point and C point is 90% and 110% of the design draft, the y-axis is the maximum cargo capacity of the ship, the D point is the minimum empty draft according to the ship safety design requirement, the E point is the empty draft upper limit, the data within the quadrilateral BCDE range is normal data, and the data outside the quadrilateral BCDE range is abnormal data.
  8. 8. The multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and machine learning models according to claim 7, wherein the specific processing method of the missing values in S5 is that the data set obtained by S4 is processed, the field with the missing values is the load ton change of ship draft per centimeter, the models of RF, GBDT or XGBoost for checking the feature importance are adopted, the maximum load ton, length, width, height and actual draft are taken as input, the model output result with the highest R2 and the lowest MAE is selected as the filling result.
  9. 9. The multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and machine learning model according to claim 8, wherein the preprocessing in S6 is logarithmic.
  10. 10. The multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and machine learning model according to claim 9, wherein the model in S7 comprises K neighbor, decision tree, random forest, extreme gradient lifting, gradient lifting decision tree or additional tree model, and the model output data in S7 is evaluated to obtain the optimal ship cargo capacity prediction model corresponding to different loaded cargo types, specifically: Adopting Bayes to search super parameters of an optimized model, carrying out inverse normalization and inverse logarithmization on the output of different models, respectively outputting calculation accuracy aiming at samples and models with actual cargo carrying capacity of 0 and non-0 by utilizing formulas (4) - (6), and storing the optimized model; (4) (5) (6) Wherein: calculating for samples with y_real not 0; calculating for a sample with y_real of 0; Calculating the accuracy of the model for all samples; The test finds that the load prediction precision based on the K nearest neighbor algorithm is higher, and the generalization performance is better, wherein the specific calculation process adopting the K nearest neighbor algorithm is as follows: (1) The Euclidean distance is calculated as follows: (7) Wherein: representing a sample to be predicted; N represents the number of feature dimensions, k represents the kth feature; (2) The predicted value is calculated based on the weighted average as follows: (8) wherein K represents the nearest neighbor number; A target value representing an ith neighbor; Representing the distance between the point to be predicted and the ith neighbor.

Description

Multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and machine learning model Technical Field The invention belongs to the technical field of ship cargo capacity prediction, and particularly relates to a multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and a machine learning model. Background In the field of shipping, ship load information directly influences port throughput estimation and port influence estimation, but due to the problems of opaque information, single data source and the like in the industry, ship load data are difficult to estimate and acquire directly according to the data in the industry, and the traditional load estimation method has the problems of inaccuracy, complex operation and the like. The current conventional load estimation method mainly focuses on the following aspects: (1) The method mainly comprises a water gauge weighing method and a meter measuring method. The water gauge weighing method mainly relies on a shipman or a manager to read six water gauges of a ship and combines a ship load and water gauge comparison table to estimate. The instrument measurement method mainly utilizes ultrasonic wave, pressure, laser sensor and other equipment to detect the current draft of the ship, and then utilizes computer analysis to obtain the ship load. (2) The method is mainly based on the related empirical formula in page Method to estimate cargo tonnemiles of IMO report 361 in 2020, and the ship load is obtained by calculating the related parameters of the ship including design draft, instantaneous draft, design speed and the like. The defects of the existing methods are that the operation process of the equipment measurement method is complex, the real-time measurement performance is poor, and the accuracy of estimating the ship load is not high because the influence caused by different ship scales is difficult to consider by the empirical formula method. Disclosure of Invention The invention solves the problems of poor data reliability and low accuracy of prediction results of the existing ship cargo capacity prediction method, and provides a multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and a machine learning model. The technical scheme of the invention is as follows: 1. The multi-cargo ship cargo capacity prediction method based on the multi-source heterogeneous data and the machine learning model is characterized by comprising the following steps of: S1, fusing ship leg data and arrival/departure data, and matching the types of the loaded cargos in the ship arrival/departure data into the ship leg data; S2, dividing data sets according to the types of the loaded cargos in the ship leg data in the S1, wherein different loaded cargos correspond to different data sets; S3, correcting the draft cargo carrying capacity of the data in the data set obtained in the S2, wherein the correction of the data with the adjacent draft data is carried out on the data with unchanged water output before and after the berthing operation, the correction of the data with excessive empty draft and the correction of the data with non-positive correlation of draft and cargo carrying capacity; s4, checking whether the ship draft and cargo capacity are positively correlated or not based on the ship structure theory of the data in the data set obtained after the processing in S3, and if not, judging abnormal data and eliminating the abnormal data from the data set; S5, adopting a random forest model to supplement missing values of data in the data set for the data set obtained by the S4 processing; S6, preprocessing data in the data set corresponding to the loaded cargo types to obtain input data corresponding to different loaded cargo types; And S7, respectively constructing a plurality of ship cargo capacity prediction models corresponding to different cargo types based on different models, training the prediction models by adopting input data corresponding to the cargo types, and evaluating model output data to obtain an optimal ship cargo capacity prediction model corresponding to the cargo types. 2. The multi-cargo ship cargo capacity prediction method based on the multi-source heterogeneous data and the machine learning model is characterized in that ship leg data are segmented and extracted from AIS data, ship data with non-empty liquid tank volume fields in the ship leg data are classified into crude oil in S1, and data in the dataset in S2 are ship leg data with cargo type fields. 3. The multi-cargo ship cargo capacity prediction method based on multi-source heterogeneous data and a machine learning model according to claim 2, wherein in the step S3, the adjacent draft data is utilized to correct the data of which the water is unchanged before and after the berthing operation, specifically, all ship leg data with the navigational state being berthed are extracted from the