CN-121999424-A - Street garbage detection method based on improvement YOLOv s

CN121999424ACN 121999424 ACN121999424 ACN 121999424ACN-121999424-A

Abstract

The invention discloses a street garbage detection method based on an improvement YOLOv s, which comprises the following steps of manufacturing a street garbage dataset and preprocessing the dataset, fusing a C2f module in a YOLOv s model, forming a C2f-FLA module based on a FocusedLinearAttention self-attention mechanism, improving the characteristic fusion effect of the model and the representation capability of the model on global context information, introducing a AIFI module of RT-DETR in a negk layer in YOLOv s, enhancing the characteristic representation capability of the AIFI module through a multi-head self-attention mechanism and a feed-forward network layer, capturing global context information and detail characteristics more effectively, improving the detection accuracy of a small target, improving a CIoU loss function used in YOLOv s to be a Focal-EIoU loss function, and solving the problem of inaccurate positioning of street garbage detection. The improved YOLOv s model algorithm enhances the representation capability of the model on global context information, so that shallow information and deep information in an image are fully fused, and the image information can be processed more effectively. The invention can well solve the problem of low detection precision in a street garbage detection scene with complex background and multiple small targets, and simultaneously keeps higher detection speed.

Inventors

XU GUOYAN
Pu Zicai

Assignees

河海大学

Dates

Publication Date: 20260508
Application Date: 20241107

Claims (4)

1. The street garbage detection method based on the improvement YOLOv s is characterized by comprising the following steps of: Step 1, manufacturing a street garbage data set and preprocessing the data set; Step 2, based on YOLOv s model, fusing FocusedLinearAttention self-attention mechanism to form a C2f-FLA module, enhancing the expression capacity of the model, obtaining a larger receptive field, and improving the extraction capacity of the model to global context information; Step 3, introducing a AIFI module of RT-DETR into a YOLOv s negk layer, wherein the AIFI module enhances the characteristic representation capability through a multi-head self-attention mechanism and a feedforward network layer and improves the detection capability of a small target; and 4, improving the CIoU loss function used in YOLOv s to a Focal-EIoU loss function, and solving the problem of inaccurate street garbage detection and positioning.
2. The method for detecting street refuse according to claim 1, wherein in the step 2, the merging FocusedLinearAttention self-attention mechanism comprises the following steps: (1) FocusedLinearAttention uses a different linear attention than the transducer model SoftMax attention, which decouples SoftMax into two independent functions, enabling the order of calculation of the attention to be adjusted from (query-key) value to query, resulting in a reduction of the overall computational complexity to linear. (2) The focusing function in FocusedLinearAttention self-attention mechanism solves the problem that linear attention distribution average is realized, so that the output characteristic is close to the average value of all the characteristics, and can not focus on the characteristic with more information, and the focusing function expression is as follows: Where x p represents the power of p for x bitwise. It can be proved that under the premise that the focusing function is easy to meet, the similar vector inner products are larger, and the dissimilar vector inner products are smaller, so that focusing is realized; (3) FocusedLinearAttention self-attention mechanism for the problem that linear attention cannot get attention matrix of full rank, lightweight depth separable convolution (DWC) is used to recover the rank of the attention matrix, the mathematical formula is as follows: In the depth separable convolution, each query only focuses on several features which are spatially adjacent, but not all values, at this time, even if the outputs corresponding to two queries in linear attention are similar, the model can still obtain different outputs from the local features proposed by the DWC, so that feature diversity is ensured, and the function of the DWC can be explained from the viewpoint of matrix rank, and the mathematical formula is as follows:
3. The method for detecting street garbage by using the improved YOLOv s according to claim 1, wherein in the step 3, the AIFI module is an attention-based inter-scale feature interactive learning module, and involves a multi-head self-attention mechanism layer and a feed-forward network layer: (1) The multi-head self-attention mechanism layer is used for capturing global context information in an image and perfecting the self-attention layer, is provided with a plurality of groups of weight matrixes consisting of query vectors, key vectors and value vectors, maps the query, the key and the value through h different linear transformations, then splices different attentions, and finally carries out linear transformation again, wherein each group of attentions is used for mapping input to different sub-representation spaces, so that a model can pay attention to different positions in the different sub-representation spaces; (2) The feedforward network layer is a fully-connected feedforward network, each position independently passes through the completely same feedforward neural network, and consists of two linear transformations, namely two fully-connected layers, and the activation function of the first fully-connected layer is a ReLU activation function.
4. The method for detecting street refuse by improving YOLOv8s according to claim 1, wherein in the step 4, the CIoU loss function of YOLOv s is improved to be a Focal-EIoU loss function, and the Focal-EIoU (Focal Enhanced Intersection over Union) loss function adopted by the improved model explicitly measures the difference of three elements of overlapping area, center point and side length, and the mathematical expression is as follows: L EIoU ＝L IoU +L dis +L asp , Wherein L IoU is IoU loss, L dis is center point distance loss, L asp is length and width loss, and w and h are width and height of the smallest bounding box covering the predicted and real frames.

Description

Street garbage detection method based on improvement YOLOv s Technical Field The invention relates to a street garbage detection method based on improvement YOLOv s, belonging to the field of target detection. Background The research of the street garbage detection algorithm is helpful for improving the urban environmental sanitation management level, and realizing the intellectualization and the precision of garbage classification and cleaning. Meanwhile, the application of the algorithm can obviously reduce the labor cost, promote the recycling of resources and provide powerful support for building the environment of green, intelligent and suitable cities. However, in a street refuse detection scene with complex background and many small targets, the detection accuracy and the detection speed still have insufficient. The YOLO algorithm is used as a one-stage object detection algorithm, and solves the object detection task as a regression problem, directly from an input image to an output bounding box and category. All predictions are completed at one time, candidate areas are not required to be generated, target positioning and classification are directly carried out on the whole image, and the detection speed is obviously Faster than that of two-stage detection algorithms such as fast-RCNN, but the detection speed is emphasized, so that the capability of part of capturing detail features and global context information is sacrificed. Therefore, how to design a high-efficiency reliable street garbage image detection method aiming at street garbage detection scenes with complex background and multiple small targets is an urgent technical problem to be solved. Disclosure of Invention Aiming at the defects of the prior art, the invention provides the street waste detection method based on the improvement YOLOv s, solves the problem of lower detection precision in a street waste detection scene with complex background and multiple small targets, and improves the detection efficiency and detection precision of the small targets. In order to achieve the aim, the street garbage detection method based on the improvement YOLOv s provided by the invention comprises the following steps: Step 1, obtaining a street garbage image, marking and preprocessing, constructing a data set based on preprocessed data, and dividing the data set into a training set and a testing set in proportion; Step 2, based on YOLOv s model, fusing FocusedLinearAttention self-attention mechanism to form a C2f-FLA module, enhancing the expression capacity of the model, obtaining a larger receptive field, and improving the extraction capacity of the model to global context information; Step 3, introducing a AIFI module of RT-DETR into a YOLOv s negk layer, wherein the AIFI module enhances the characteristic representation capability through a multi-head self-attention mechanism and a feedforward network layer and improves the detection capability of a small target; And 4, improving the CIoU loss function of YOLOv s to a Focal-EIoU loss function, and solving the problem of inaccurate positioning of street garbage detection. Preferably, in step 2, the adding FocusedLinearAttention self-attention mechanism specifically includes: (1) FocusedLinearAttention uses a different linear attention than the transducer model SoftMax attention, which decouples SoftMax into two independent functions, enabling the order of calculation of the attention to be adjusted from (query-key) value to query, resulting in a reduction of the overall computational complexity to linear. (2) The distribution of linear attention is quite even, which makes the output features close to the average of all features, and the more informative features cannot be focused. Whereas the focusing function in Flatten Transformer solves this problem: Where x p represents the power of p for x bitwise. It can be proved that under the premise that the focusing function is easy to meet, the similar vector inner products are larger, and the dissimilar vector inner products are smaller, so that focusing is realized; (3) The linear attention cannot get a full rank attention matrix, the rank of which is limited by the smaller of the dimension d and the feature number N of each head, expressed as follows: In general, d is smaller than N, for example d=64 in DeiT and n=196 in Swin transducer, d=32 and n=49, so that a linear attention matrix cannot reach full rank N, which means that there is redundancy between weights of different rows, and since the self-attention output is obtained by weighted combination of these weights to the same group of values, the homogeneity of the weights inevitably leads to reduced diversity of model output, thus affecting model performance, for which the rank of the attention matrix is restored using a lightweight depth separable convolution (DWC), the mathematical formula is as follows: In the depth separable convolution, each query only focuses on several features which