CN-120708103-B - Unmanned aerial vehicle-based water surface garbage identification method and system

CN120708103BCN 120708103 BCN120708103 BCN 120708103BCN-120708103-B

Abstract

The invention relates to a water surface garbage identification method and system based on an unmanned aerial vehicle, and the method comprises the following steps of collecting unmanned aerial vehicle water surface garbage images containing target information, marking garbage, constructing an unmanned aerial vehicle water surface garbage image data set, constructing an LTM-YOLO11 detection model, adding a large-scale convolution kernel LSK attention mechanism between an SPPF module of a main layer and a C2PSA module of a neck layer, simultaneously adding an interactive triple attention mechanism behind each C3K2 module of a second column of the neck layer, connecting the outputs of the three interactive triple attention mechanisms with a multi-band self-adaptive expansion convolution module MADC of a head layer, obtaining target detection results through three detection heads, and detecting unmanned aerial vehicle water surface garbage by utilizing the unmanned aerial vehicle water surface garbage image data set training detection model. The problems that the detection of the garbage on the water surface of the unmanned aerial vehicle cannot meet the detection precision and the light weight of a network model at the same time are solved.

Inventors

ZHAO YANFEI

Assignees

天津市海河管理中心

Dates

Publication Date: 20260512
Application Date: 20250613

Claims (4)

1. The water surface garbage identification method based on the unmanned aerial vehicle is characterized by comprising the following steps of: 1) Collecting unmanned aerial vehicle water surface garbage images containing target information, marking garbage, and constructing an unmanned aerial vehicle water surface garbage image data set; 2) Constructing an LTM-YOLO11 detection model; The LTM-YOLO11 detection model is based on a YOLO11 reference model, a large-scale convolution kernel LSK attention mechanism is added between an SPPF module of a main layer and a C2PSA module of a neck layer, and meanwhile, an interactive triple attention mechanism is added behind each C3K2 module of a second column of the neck layer, and the output of the three interactive triple attention mechanisms is connected with a multi-band adaptive expansion convolution module MADC of a head layer; Decomposing input characteristics into three frequency bands of high, medium and low, and learning expansion rates of different magnitudes by controlling a convolution kernel sampling interval, carrying out convolution between adjacent pixels by adopting small expansion rate convolution with the expansion rate of 1 in a high frequency band region, carrying out processing by adopting medium expansion rate convolution with the expansion rate of 2 in a medium frequency band region, and carrying out processing by adopting large expansion rate convolution with the expansion rate of 4 in a low frequency band region; The large-scale convolution kernel LSK attention mechanism fuses a large convolution kernel and separable convolution, and in the LSK attention mechanism, the LSK attention mechanism acquires context characteristic information through the large kernel and decomposes the two-dimensional convolution into two one-dimensional convolution operations; Under the condition of no information bottleneck, the interactive triple attention mechanism can effectively extract the cross-dimensional characteristics in the characteristic diagram through different rotation and arrangement operations, and acquire the cross-dimensional interactive calculation attention weight of characteristic data by adopting three different structures; the specific process of the interactive triple attention mechanism is as follows: The three dimensions of the channel, the height and the width are mutually interacted and input to be processed through three branches respectively, and the first branch is processed through Z pooling operation respectively, and the size is that The convolution layer, sigmoid function and rotation operation of the first branch to obtain the output of the first branch, and the second branch is respectively subjected to Z pooling operation and a size of The convolution layer, sigmoid function and rotation operation of the second branch to obtain the output of the second branch, and the third branch is respectively subjected to Z pooling operation and has a size of The output of the three branches is subjected to average aggregation operation to obtain the output of an interactive triple attention mechanism; 3) Training and detecting a model; dividing the unmanned aerial vehicle water surface garbage image dataset in the step 1) into a training set, a prediction set and a verification set, training the training set by using an LTM-YOLO11 detection model to obtain a trained LTM-YOLO11 detection model, detecting unmanned aerial vehicle water surface garbage by using the trained LTM-YOLO11 detection model, and outputting a recognition result.
2. The method of claim 1, wherein the LTM-YOLO11 detection model has a detection accuracy mAP@0.5 of not less than 40%, mAP@0.5-0.95 of 24.9%.
3. A computer readable storage medium having stored thereon a computer program, characterized in that the program, when being executed by a processor, implements the steps of the identification method according to claim 1 or 2.
4. A water surface garbage identification system based on an unmanned aerial vehicle, wherein the system performs the steps of the identification method of claim 1 or 2, comprising: The unmanned aerial vehicle image acquisition equipment is used for acquiring the water surface garbage image; The image preprocessing module is used for labeling and enhancing the acquired image; the LTM-YOLO11 detection model is used for carrying out target detection on the water surface garbage with wide target size distribution and small target size; And the alarm module is used for carrying out early warning according to the identification result of the LTM-YOLO11 detection model, carrying out cleaning early warning if garbage exists on the water surface, and carrying out inspection and re-detection if no garbage unmanned aerial vehicle updates the geographic position.

Description

Unmanned aerial vehicle-based water surface garbage identification method and system Technical Field The invention belongs to the technical field of water surface garbage identification, and particularly relates to a water surface garbage identification method and system based on an unmanned aerial vehicle. Background With the development of deep learning, unmanned aerial vehicle target detection is a great challenge in the field of target detection, and the scale among detected targets is extremely unbalanced. The target size distribution range is wide, and the proportion of small-size targets is high. The image information of a small object is limited as compared to a large object, and is easily affected by various factors. There are two main types of methods for visually detecting targets by unmanned aerial vehicles. One is a single-stage based object detection method, such as Yolo, SSD. And the second is a two-stage target detection method, such as RCNN, fast-RCNN, fast-RCNN and the like. The target detection method based on the double stages needs to generate candidate frames, and then classify and regress the candidate frames. And the target detection method based on single stage can carry out classification regression without generating a candidate frame. Yolo is a single-stage target detection algorithm, and has the advantages of simplicity, rapidness, easiness in deployment and the like. Therefore, the method is widely applied to the fields of industrial target detection, target tracking, target segmentation and the like. The Yolo target detection algorithm is mainly characterized by high training speed and high detection accuracy, so that the method is suitable for detecting real-time small targets of unmanned aerial vehicles in complex scenes. Under the unmanned aerial vehicle scene, the image has variable dimensions and the proportion of small targets accounts for a large part, so that the problem of false detection and missing detection caused by extremely small targets in the unmanned aerial vehicle image exists, and the light weight condition can not be met while the precision speed is kept. The Chinese patent with publication number CN119887850A discloses a multi-target tracking method under a complex scene based on self-adaptive association, which takes YOLO11 as a reference model and completes multi-target detection by introducing context feature extraction and mixed attention, but a mixed attention mechanism comprises multi-branch pooling and anti-pooling operations, so that the calculation amount is large, the weight cannot be reduced, and the real-time detection result of the model is affected. Disclosure of Invention In view of the above problems in the prior art, the technical problem to be solved by the invention is to provide a water surface garbage identification method and system based on an unmanned aerial vehicle. The method solves the technical problem that the detection precision and the light weight requirement of the network model cannot be met simultaneously in the prior art. According to the invention, the YOLO11 model is taken as a reference network, and an LTM-YOLO11 detection model is constructed, so that the problems that the detection precision of the unmanned aerial vehicle water surface garbage cannot be simultaneously met, the network model is light and the like are solved. The technical scheme adopted for solving the technical problems is as follows: An unmanned aerial vehicle-based water surface garbage identification method comprises the following steps: 1) Collecting unmanned aerial vehicle water surface garbage images containing target information, marking garbage, and constructing an unmanned aerial vehicle water surface garbage image data set; 2) Constructing an LTM-YOLO11 detection model; The LTM-YOLO11 detection model is based on a YOLO11 reference model, a large-scale convolution kernel LSK attention mechanism is added between an SPPF module of a main layer and a C2PSA module of a neck layer, and meanwhile, an interactive triple attention mechanism is added behind each C3K2 module of a second column of the neck layer, and the output of the three interactive triple attention mechanisms is connected with a multi-band adaptive expansion convolution module MADC of a head layer; Decomposing input characteristics into three frequency bands of high, medium and low, and learning expansion rates of different magnitudes by controlling a convolution kernel sampling interval, carrying out convolution between adjacent pixels by adopting small expansion rate convolution with the expansion rate of 1 in a high frequency band region, carrying out processing by adopting medium expansion rate convolution with the expansion rate of 2 in a medium frequency band region, and carrying out processing by adopting large expansion rate convolution with the expansion rate of 4 in a low frequency band region; 3) Training and detecting a model; dividing the unmanned aerial vehicle wate