CN-116340797-B - Ship type classification prediction method and system based on K-means and XG-Boost
Abstract
The invention provides a ship type classification prediction method and a system based on K-means and XG-Boost, which are characterized in that ship data are firstly obtained and preprocessed, each data in the preprocessed ship data is respectively clustered by adopting a K-means clustering algorithm to obtain a plurality of clusters, the error square sum of all data in each cluster is calculated, the classification quantity of each data is calculated according to the error square sum, a certain classification quantity is selected by utilizing an elbow method, and marking the ship types of all clustered ships according to the classification quantity, then taking the ship marked with the ship types as a training set sample, training the training set sample by adopting an XG-Boost classification algorithm to obtain a plurality of classification prediction models, verifying the plurality of classification prediction models to obtain an optimal classification prediction model, and predicting the ship types of all the ships worldwide according to the optimal classification prediction model and marking. The invention can accurately classify all ships around the world, and prevent the over fitting and under fitting of the model while ensuring the accuracy.
Inventors
- WANG SHAOHAN
- WANG XIANGYU
Assignees
- 中远海运科技股份有限公司
- 上海船舶运输科学研究所有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20230210
Claims (8)
- 1. The ship type classification prediction method based on K-means and XG-Boost is characterized by comprising the following steps of: the data acquisition and preprocessing step, namely acquiring ship data and preprocessing the ship data; The ship type classification and marking step comprises the steps of adopting a K-means clustering algorithm to cluster each data in the preprocessed ship data respectively to obtain a plurality of clusters, calculating the error square sum of all the data in each cluster, summing the error square sums of all the clusters to further calculate the total error square sum of all the clusters, calculating the classification quantity of each data according to the total error square sum, and carrying out ship type marking on all clustered ships according to the classification quantity of each data, wherein each data in the ship data comprises a rated power of a host, an operation navigational speed, a load ton, a three-stage ship type, a total ship length, ship's Depth, draft, drainage and a rated host rotating speed; And the ship type classification prediction step is to train the training set sample by using the ship marked with the ship type as the training set sample and adopting an XG-Boost classification algorithm to obtain a plurality of classification prediction models, verify the plurality of trained classification prediction models to obtain an optimal classification prediction model, and predict the ship types of all ships worldwide according to the optimal classification prediction model and mark.
- 2. The ship type classification prediction method based on K-means and XG-Boost according to claim 1, wherein in the ship type classification prediction step, verifying the classification prediction model comprises: The XG-Boost classification algorithm sequentially learns a plurality of weak estimators, simultaneously evaluates the error of each trained classification prediction model, resamples predicted ship model data, pays attention to samples with prediction errors, reconstructs part of new weak estimators, evaluates the prediction effect of the constructed new weak estimators on the original error samples again, iterates the process repeatedly, and gathers all the weak estimators to obtain the optimal classification prediction model.
- 3. The ship type classification prediction method based on K-means and XG-Boost according to claim 1, wherein in the data acquisition and preprocessing step, the preprocessing includes deleting abnormal data and repeated data, filling the missing data, and removing noise and data normalization.
- 4. The K-means and XG-Boost based ship model classification prediction method according to claim 1, wherein draft and drain data in the ship data are acquired inside the enterprise using navigation mark data and acquired outside the enterprise using HIS data.
- 5. A ship type classification prediction system based on K-means and XG-Boost is characterized by comprising a data acquisition and preprocessing module, a ship type classification and marking module and a ship type classification prediction module which are connected in sequence, The data acquisition and preprocessing module acquires ship data and preprocesses the ship data; The ship type classifying and marking module is used for respectively clustering each data in the preprocessed ship data by adopting a K-means clustering algorithm to obtain a plurality of clusters, calculating the error square sum of all the data in each cluster, summing the error square sums of all the clusters to further calculate the total error square sum of all the clusters, calculating the classifying quantity of each data according to the total error square sum, and carrying out ship type marking on all clustered ships according to the classifying quantity of each data, wherein each data in the ship data comprises a rated power of a host, an operation navigational speed, a load ton, a three-stage ship type, a total ship length, ship's Depth, draft, drainage and a rated host rotating speed; and the ship type classification prediction module is used for training the training set sample by using the ship marked with the ship type as the training set sample and adopting an XG-Boost classification algorithm to obtain a plurality of classification prediction models, verifying the plurality of trained classification prediction models to obtain an optimal classification prediction model, and predicting the ship types of all ships in the world according to the optimal classification prediction model and marking.
- 6. The K-means and XG-Boost based ship model classification prediction system of claim 5, wherein the verifying the classification prediction model in the ship model classification prediction module comprises: The XG-Boost classification algorithm sequentially learns a plurality of weak estimators, simultaneously evaluates the error of each trained classification prediction model, resamples predicted ship model data, pays attention to samples with prediction errors, reconstructs part of new weak estimators, evaluates the prediction effect of the constructed new weak estimators on the original error samples again, iterates the process repeatedly, and gathers all the weak estimators to obtain the optimal classification prediction model.
- 7. The K-means and XG-Boost based ship classification prediction system according to claim 5, wherein the preprocessing includes deleting abnormal data and duplicate data, filling in the missing data, and removing noise and data normalization.
- 8. The K-means and XG-Boost based ship classification prediction system according to claim 5, wherein the draft and drain data in the ship data is acquired inside the enterprise using navigation mark data and acquired outside the enterprise using HIS data.
Description
Ship type classification prediction method and system based on K-means and XG-Boost Technical Field The invention relates to the technical field of ship classification, in particular to a ship type classification prediction method and system based on K-means and XG-Boost. Background In recent years, artificial intelligence and machine learning methods gradually go deep into the technical fields of various industries, and technology iteration and technology innovation of the traditional industry are greatly promoted. The construction of an algorithm model solves specific problems in the industry and becomes a hot topic of current research. As an important transportation means for marine transportation, ships are of various kinds, large in number and different in scale. The ship category has different ship structures and ship performances according to different use requirements. The current common classification methods include classification by use, classification by power plant, classification by sailing state, classification by hull material, etc. However, according to the single feature classification method, the obtained categories have the defects of obvious multidimensional feature gaps, obvious scale effects and the like, and the ship operation key data such as fuel consumption and the like of the ships in navigation cannot be intuitively reflected. Disclosure of Invention In order to solve the problems of obvious multidimensional feature gap, obvious scale effect and the like in the prior ship classification process, the invention provides a ship type classification prediction method based on K-means and XG-Boost, a certain classification quantity is selected by adopting a K-means clustering algorithm and combining an elbow method based on ship data, and the ship types of all the clustered ships are marked, and then a classification prediction model is established by adopting an XG-Boost classification algorithm, so that the ship types of all the ships worldwide are predicted according to the classification prediction model, all the ships worldwide can be accurately classified, and the over-fitting and under-fitting of the model are avoided while the accuracy is ensured. The invention also relates to a ship type classification prediction system based on the K-means and the XG-Boost. The technical scheme of the invention is as follows: the ship type classification prediction method based on K-means and XG-Boost is characterized by comprising the following steps of: the data acquisition and preprocessing step, namely acquiring ship data and preprocessing the ship data; The ship type classification and marking step, namely clustering each data in the preprocessed ship data by adopting a K-means clustering algorithm to obtain a plurality of clusters, calculating the error square sum of all the data in each cluster, summing the error square sums of all the clusters to further calculate the total error square sum of all the clusters, calculating the classification quantity of each data according to the total error square sum, selecting a certain classification quantity by using an elbow method, and performing ship type marking on all clustered ships according to the selected classification quantity; And the ship type classification prediction step is to train the training set sample by using the ship marked with the ship type as the training set sample and adopting an XG-Boost classification algorithm to obtain a plurality of classification prediction models, verify the plurality of trained classification prediction models to obtain an optimal classification prediction model, and predict the ship types of all ships worldwide according to the optimal classification prediction model and mark. Preferably, in the ship type classification prediction step, verifying the classification prediction model includes: The XG-Boost classification algorithm sequentially learns a plurality of weak estimators, simultaneously evaluates the error of each trained classification prediction model, resamples predicted ship model data, pays attention to samples with prediction errors, reconstructs part of new weak estimators, evaluates the prediction effect of the constructed new weak estimators on the original error samples again, iterates the process repeatedly, and gathers all the weak estimators to obtain the optimal classification prediction model. Preferably, in the data acquisition and preprocessing step, the ship data includes rated power of the host, operating speed, load carrying capacity, three-stage ship type, total ship length, ship's Depth, draft, drainage and rated host rotation speed. Preferably, in the data acquisition and preprocessing step, the preprocessing includes deleting abnormal data and repeated data, filling the missing data, and removing noise and normalizing the data. Preferably, the draft and drain data in the ship data is acquired inside the enterprise using navigation mark data and outside the ente