CN-122023895-A - Underwater garbage identification method based on improvement YOLOv s

CN122023895ACN 122023895 ACN122023895 ACN 122023895ACN-122023895-A

Abstract

The invention discloses an underwater garbage identification method based on an improvement YOLOv s, which mainly comprises the following steps of (1) establishing an underwater garbage data set, (2) adding marking information to images in the data set, (3) establishing an underwater garbage identification model based on the improvement YOLOv s, (4) training the model by adopting a training set and a verification set, and storing the trained model as an optimal model, (5) testing the optimal model by adopting a testing set, wherein a test result meets the precision requirement, and the final underwater garbage identification model based on the improvement YOLOv s is obtained. Compared with the prior art, the underwater garbage identification method based on the improvement YOLOv s disclosed by the invention effectively reduces the false detection rate and the omission rate of the underwater garbage identification.

Inventors

ZHOU XINXIN
DU YUANZE
ZHAO HONGHAO
KANG HUIYING
GAO XIONG

Assignees

东北电力大学

Dates

Publication Date: 20260512
Application Date: 20260127

Claims (5)

1. The underwater garbage identification method based on the improvement YOLOv s is characterized by comprising the following steps of: The method comprises the steps of 1, acquiring underwater garbage images to form a first data set, wherein the images in the first data set can be acquired through collecting through a network or shooting by an underwater robot; Step 2, adding labeling information to the images in the first data set to form a second data set, and dividing the second data set into a training set, a verification set and a test set according to a preset proportion; Step 3, constructing an underwater garbage identification model based on the improvement YOLOv s, wherein the model comprises an improved main network, an improved neck network and an improved head network, and the construction of the model further comprises the steps 3.1 to 3.3: the improved backbone network consists of Conv-1, conv-2, C2f-1, conv-3, C2f-2, conv-4, C2 f-fast-EMA-1, conv-5, C2 f-fast-EMA-2 and SPPF modules which are connected in sequence; taking the training set and the verification set in the second data set as input of an improved backbone network; The improved backbone network outputs 5 kinds of characteristic information with different scales through a C2f-1 module, a C2f-2 module, a C2 f-fast-EMA-1 module, a C2 f-fast-EMA-2 module and an SPPF module respectively; Step 3.2 the improved neck network is composed of BRSAM, TFE-1C2f-3, TFE-2C2f-4, MS-DSFLAM, CPAM-1, upsample, CPAM-2, concat-1 and Concat-2 modules; The outputs of C2f-1 and C2f-2 serve as inputs to the TFE-2C2f-4, and the output of C2f-1 serves as an input to CPAM-1; The outputs of the C2f-2, C2 f-fast-EMA-1 and C2 f-fast-EMA-2 are used as inputs of MS-DSFLAM, and the outputs of the MS-DSFLAM and TFE-2C2f-4 are used as inputs of CPAM-2; The output of MS-DSFLAM is also used as the input of UpSample, and the output of UpSample is used as the input of CPAM-1; the output of the SPPF is taken as BRSAM input, the output of the C2 f-fast-EMA-1 and BRSAM is taken as input of TFE-1C2f-3, the output of the CPAM-2 and TFE-1C2f-3 is taken as input of Concat-1, and the output of the TFE-1C2f-3 and Concat-1 is taken as input of Concat-2; Step 3.3, the improved head network comprises a detection-1 module, a detection-2 module, a detection-3 module and a detection-4 module, wherein the output of the CPAM-1 is taken as the input of the detection-1 module, the output of the CPAM-2 is taken as the input of the detection-2 module, the output of the Concat-1 module is taken as the input of the detection-3 module, and the output of the Concat-2 module is taken as the input of the detection-4 module; And 4, training the underwater garbage identification model based on the improvement YOLOv s by adopting a training set and a verification set, and storing the trained model as an optimal model, wherein the training process of the underwater garbage identification model based on the improvement YOLOv s further comprises the steps of 4.1 to 4.4: Step 4.1, setting the training parameters of the underwater garbage identification model based on the improvement YOLOv s; model training parameters include iteration round number, batch size, optimizer, learning rate, momentum, weight decay and thread number; Step 4.2, inputting the training set and the validation set into the underwater garbage identification model based on the improvement YOLOv s, and calculating the gradient of the loss function to model parameters by using a back propagation algorithm; Updating model parameters by using an optimizer to update the model parameters towards the gradient descending direction until the loss functions of the training set and the verification set are not reduced any more and the accuracy P, the recall ratio R, mAP and the mAP50-95 evaluation indexes are not improved any more; Step 4.4, saving the trained model parameters as an optimal model; And 5, testing the optimal model by adopting the test set, and performing objective evaluation index evaluation on the test result of the test set to meet the precision requirement and obtain a final underwater garbage identification model based on YOLOv s.
2. The method for recognizing underwater garbage based on the improvement YOLOv s according to claim 1, wherein the 2C2 f-fast-EMA modules in the step 3.1 are identical in structure, and the construction method further comprises the steps of 3.1.1 to 3.1.3: step 3.1.1, the C2f-Fater-EMA module is a Faster-EMA module which is replaced by a BottleNeck module in a C2f module in the original YOLOv s network; Step 3.1.2, namely adding FasterBlock modules in FASTERNET networks into an EMA attention mechanism to obtain a fast-EMA module in the step 3.1.1; step 3.1.3 the EMA attention mechanism in step 3.1.2 comprises two 1 x 1 convolution paths and one 3 x 3 convolution path based on a multi-scale parallel processing architecture; In the two 1X 1 convolution paths, two paths of 1D global average pooling are adopted to encode channel information along the directions of the x axis and the y axis respectively, and meanwhile, the two feature codes are connected in the direction h; In the 3×3 convolution path, a single-layer 3×3 convolution kernel stacking structure is adopted, local cross-channel interaction is realized through 3×3 convolution operation, a global average pooling layer is also introduced into 3×3 branches, and adaptive conversion of characteristic dimensions is realized through a joint activation function.
3. The method for recognizing underwater garbage based on the improvement YOLOv s according to claim 1, wherein the BRSAM module in the step 3.2 is to replace the channel attention mechanism in the CBAM attention mechanism with the level-RoutingAttention attention mechanism, and reserve the space attention mechanism; The Bilevel-RoutingAttention attention mechanism, the construction method further comprises the steps of 3.2.1 to 3.2.5: step 3.2.1 giving a 2D input profile X H represents height, W represents width, and C represents channel number, and is divided into S×S non-overlapping regions each containing Feature vectors, this step is accomplished by reconstructing X, i.e Based on the tensor of the resulting query, key, value, Q, K, The linear projection is as shown in equations (1) to (3): (1), (2), (3), in the formulas (1) to (3), Q, K and V respectively represent a query, a bond and a value, W q 、W k and Projection weights respectively expressed as query, key, value; step 3.2.2 obtaining region-level queries and keys by applying the average value to each region The matrix multiplication between Q and transpose K is used for pushing to obtain an inter-region affinity matrix, and a calculation formula is shown in formula (4): (4), In formula (4), the elements in the adjacency matrix A r measure the semantically related degree of the two regions; Step 3.2.3, clipping the affinity diagram to obtain a route index matrix I r , wherein the calculation formula is shown in formula (5): (5), In the formula (5), k represents the number of most relevant regions selected by each region, A r represents an adjacency matrix, topK represents an index selection function, and k regions most relevant to the current region are calculated; step 3.2.4 Each query token in region i is located in all key-value pairs whose attention resides in k route index matrix regions, simultaneously in For indexing, a tensor of keys and values is obtained, and the calculation formula is shown as formula (6-7): (6), (7), In the formulas (6) to (7), Is a tensor of keys and values; step 3.2.5, applying a Bilevel-RoutingAttention attention mechanism to the calculated key value, wherein the calculation formula is shown in formula (8): (8), In formula (8), O represents an attention output feature, LCE (V) represents a local context enhancement method; the spatial attention mechanism, the construction method thereof further comprises steps 3.2.6 to 3.2.7: Step 3.2.6 characteristics obtained for the Bilevel-RoutingAttention attention module using global max pooling and global average pooling operations Processing is carried out, and two weight matrixes describing the importance of each space position are generated; step 3.2.7, performing channel splicing operation on the two weight matrixes, reducing the dimension of 2 channels into 1 channel through a convolution operation of 7×7, and performing SigMod activation function operation to obtain ; The calculation formula of (2) is shown as formula (9): (9), In equation (9), f 7×7 is a convolution operation, avgPool is a global average pooling operation, maxPool is a global maximum pooling operation, σ is a SigMod activation function, For the purpose of averaging the pooling characteristics, To maximize pooling characteristics.
4. The method for recognizing underwater garbage based on the improvement YOLOv s according to claim 1, wherein in the improved neck network in the step 3.2, the MS-DSFLAM module is constructed by adding DySample dynamic sampling modules to SSFF modules in ASF-Yolo neck network; The MS-DSFLAM module comprises three inputs P3, P4-1 and P5,2 Conv1d, 2 DySample, concat, conv3d, BN, leakyReLU and MaxPool3d modules and an output P4-2; the DySample dynamic sampling module converts the feature up-sampling process into a sampling point-based resampling process by learning sampling positions and dynamically adjusting regular sampling points, specifically, the module takes a low-resolution feature map as input, generates sampling offset by utilizing a lightweight offset prediction network, combines the sampling offset with a predefined regular sampling grid to obtain dynamic sampling point positions, and then rebuilds high-resolution features in a continuous feature space in a bilinear resampling mode, so that the calculation complexity is remarkably reduced, and meanwhile, the self-adaptive modeling of target edges and structural information is realized.
5. The method for recognizing underwater garbage based on the improvement YOLOv s according to claim 1, wherein the step 5 further comprises steps 5.1 to 5.3: step 5.1, inputting the test set into the optimal model saved in the step 4; and 5.2, calculating performance indexes of the model, namely an accuracy rate P, a recall rate R, mAP and an mAP50-95, wherein the specific calculation formula is as follows: (10), (11), (12), (13), in the formulas (10) to (13), P is the accuracy, R is the recall, mAP is the average accuracy average value of all the categories, AP is the average precision, m is the total category number of the underwater garbage labels, TP represents the number of positive samples correctly identified as positive samples, FP represents the number of negative samples incorrectly identified as positive samples, and FN represents the number of positive samples incorrectly identified as negative samples; mAP50 represents average precision when IoU threshold is fixed to 0.5, mAP50-95 represents average precision when IoU threshold is between 0.5 and 0.95, average precision is calculated every 0.05, and then average value of the average precision is taken; and 5.3, carrying out objective evaluation index evaluation on the test result of the test set, and meeting the precision requirement to obtain the final underwater garbage identification model based on the improvement YOLOv s.

Description

Underwater garbage identification method based on improvement YOLOv s Technical Field The invention relates to the field of artificial intelligence, in particular to an underwater garbage identification method based on improvement YOLOv s. Background In recent years, an autonomous underwater robot has become a key tool for environmental protection and underwater operation in deep sea, strong current and polluted water by virtue of the intelligent and autonomous capabilities of the autonomous underwater robot, and the operation safety and coverage range are remarkably improved. However, the existing underwater garbage recognition technology still faces multiple challenges that the underwater garbage is obvious in shape, color and size, and is often similar to the surrounding environment in color and texture, so that target features are difficult to extract effectively, and the problems of uneven underwater illumination, suspended matter interference, image degradation and the like further reduce the accuracy and robustness of a recognition model. Therefore, aiming at the special requirements of underwater garbage identification, the existing detection model is required to be subjected to targeted optimization so as to improve the identification precision and stability, thereby providing reliable technical support for autonomous cleaning operation of the underwater robot. Disclosure of Invention Aiming at the problems in the prior art, the invention provides an underwater garbage identification method based on the improvement YOLOv s, and the phenomena of missed detection and false detection of garbage which is different in morphology, obvious in difference in color and size and close to the background are reduced by improving a YOLOv s model. The technical scheme provided by the invention comprises the following steps: step 1, acquiring an underwater garbage image to form a first data set; Step 2, adding labeling information to the images in the first data set to form a second data set, and dividing the second data set into a training set, a verification set and a test set according to a preset proportion; Step 3, constructing an underwater garbage identification model based on the improvement YOLOv s, wherein the model comprises an improved main network, an improved neck network and an improved head network, and the construction of the model further comprises the steps 3.1 to 3.3: the improved backbone network consists of Conv-1, conv-2, C2f-1, conv-3, C2f-2, conv-4, C2 f-fast-EMA-1, conv-5, C2 f-fast-EMA-2 and SPPF modules which are connected in sequence; taking the training set and the verification set in the second data set as input of an improved backbone network; The improved backbone network outputs 5 kinds of characteristic information with different scales through a C2f-1 module, a C2f-2 module, a C2 f-fast-EMA-1 module, a C2 f-fast-EMA-2 module and an SPPF module respectively; Step 3.2 the improved neck network is composed of BRSAM, TFE-1C2f-3, TFE-2C2f-4, MS-DSFLAM, CPAM-1, upsample, CPAM-2, concat-1 and Concat-2 modules; The outputs of C2f-1 and C2f-2 serve as inputs to the TFE-2C2f-4, and the output of C2f-1 serves as an input to CPAM-1; The outputs of the C2f-2, C2 f-fast-EMA-1 and C2 f-fast-EMA-2 are used as inputs of MS-DSFLAM, and the outputs of the MS-DSFLAM and TFE-2C2f-4 are used as inputs of CPAM-2; The output of MS-DSFLAM is also used as the input of UpSample, and the output of UpSample is used as the input of CPAM-1; the output of the SPPF is taken as BRSAM input, the output of the C2 f-fast-EMA-1 and BRSAM is taken as input of TFE-1C2f-3, the output of the CPAM-2 and TFE-1C2f-3 is taken as input of Concat-1, and the output of the TFE-1C2f-3 and Concat-1 is taken as input of Concat-2; Step 3.3, the improved head network comprises a detection-1 module, a detection-2 module, a detection-3 module and a detection-4 module, wherein the output of the CPAM-1 is taken as the input of the detection-1 module, the output of the CPAM-2 is taken as the input of the detection-2 module, the output of the Concat-1 module is taken as the input of the detection-3 module, and the output of the Concat-2 module is taken as the input of the detection-4 module; training the underwater garbage identification model based on the improvement YOLOv s by adopting the training set and the verification set, and storing the trained model parameters as an optimal model; Step 5, testing the stored optimal model by adopting the test set, and performing objective evaluation index evaluation on the test result of the test set to meet the precision requirement and obtain a final underwater garbage identification model based on YOLOv s; further, in the step1, the image in the first data set may be collected and acquired through a network or captured by an underwater robot; preferably, in the step 2, the training set, the verification set and the test set may be divided according to a ratio of 8:1:1. Further, the 2C 2 f-fast