CN-121982509-A - Underwater environment real-time lightweight target detection method and readable storage medium

CN121982509ACN 121982509 ACN121982509 ACN 121982509ACN-121982509-A

Abstract

The invention relates to the technical field of underwater target detection, and discloses a real-time lightweight target detection method for an underwater environment and a readable storage medium, wherein a standard YOLOv n model is improved, a C3-CPMSFA module is utilized, global context information and multi-scale local features are fused, and high-efficiency feature extraction is realized by combining a partial convolution strategy; the C2SFA module is utilized to expand the feature processing to the frequency domain, the dynamic frequency domain mask is utilized to realize the space-frequency joint self-adaptive perception, the backward scattering noise is accurately filtered, and the auxiliary head is utilized to realize the semantic noise reduction and the small target enhancement through the multi-scale feature gradual fusion and the depth separable convolution. The invention remarkably improves the accuracy and the robustness of underwater small target detection while keeping the light weight and the real-time performance.

Inventors

TANG ZIJUN
LV YONG
Miao Linghui

Assignees

苏州大学

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (10)

1. The real-time lightweight target detection method for the underwater environment is characterized by comprising the following steps of: inputting the underwater image into a modified YOLOv n model, and acquiring an underwater target detection result through a main network, a neck network and a detection network; Wherein, the improvement YOLOv of the YOLOv n model comprises the replacement of a C3K2 module in the standard YOLOv n model with a C3-CPMSFA module, and the C3-CPMSFA module comprises: The input features are convolved by a convolution unit and then are segmented along a channel by a segmentation unit, so that a first branch feature and a second branch feature are obtained; The method comprises the steps of enabling a first branch feature to sequentially pass through a plurality of C-PMSFA modules connected in series to obtain the output of the last C-PMSFA module as a multi-scale feature, dividing the input feature according to a preset channel proportion by the C-PMSFA module, extracting the feature of the corresponding scale by using a context mixing dynamic convolution unit of different scales, splicing the feature, and connecting the feature with the input feature residual to output; And splicing the multi-scale features with the second branch features, and outputting the multi-scale features through 1 multiplied by 1 convolution.
2. The method for real-time lightweight object detection in an underwater environment according to claim 1, wherein said C-PMSFA module comprises: Dividing the input features according to 1/2 channels of the current channel to obtain 1/2 channel features, a first 1/4 channel feature and a second 1/4 channel feature corresponding to the input features; The method comprises the steps of performing feature extraction on 1/2 channel features, first 1/4 channel features and second 1/4 channel features by using a context mixed dynamic convolution unit with convolution kernels of 3×3, 5×5 and 7×7 respectively to obtain initial features with corresponding scales; after the initial features corresponding to all scales are spliced, a1 multiplied by 1 convolution unit is adopted to obtain fusion splicing features; residual connection is carried out on the fusion splicing characteristics and the input characteristics, so that output characteristics of the C-PMSFA module are obtained; wherein the input characteristic of the first C-PMSFA module is the first branch characteristic, and the input characteristic of the other C-PMSFA modules is the output characteristic of the previous C-PMSFA module.
3. The method for real-time lightweight target detection in an underwater environment of claim 1, wherein the model YOLOv n is modified, further comprising replacing a C2PSA module in the backbone network with a C2SFA module, the C2SFA module comprising: the input features are convolved by a convolution unit and then are segmented along a channel by a segmentation unit, so that a third branch feature and a fourth branch feature are obtained; The third branch feature sequentially passes through a plurality of SFA modules connected in series to obtain the output of the last SFA module as an enhancement feature, wherein the SFA module extracts the attention feature and the global context feature of the input feature; and splicing the enhancement feature with the fourth branch feature, and outputting the enhancement feature through 1 multiplied by 1 convolution.
4. The method for real-time lightweight object detection in an underwater environment according to claim 3, wherein the SFA module performs attention feature and global context feature extraction on the input features, comprising: The input characteristics are subjected to a self-attention mechanism to acquire attention characteristics; residual connection is carried out on the attention features and the input features, and spliced attention features are obtained; the spliced attention features are subjected to global context feature extraction through the FFN module, and attention global features are obtained; And carrying out residual connection on the attention global feature and the spliced attention feature to serve as output of the SFA module.
5. The method for real-time lightweight target detection in an underwater environment according to claim 4, wherein the FFN module in the SFA module is replaced with ADFFN modules, the ADFFN module comprises: Carrying out partial convolution and depth separable convolution on the input spliced attention characteristic to obtain a convolution characteristic; After linear activation is carried out on the convolution feature, multiplying the convolution feature element by element to obtain a fusion convolution feature; The method comprises the steps of carrying out partial convolution on the fusion convolution characteristic, dividing the fusion convolution characteristic into two paths, carrying out fast Fourier transform on a first path to obtain a high-channel characteristic, and carrying out global average pooling, lightweight MLP and an activation function which are sequentially connected in series along a forward propagation direction on a second path to obtain a dynamic weight vector; multiplying the dynamic weight vector with the high channel feature element by element to obtain a weighted channel feature; the weighted channel characteristics are subjected to inverse fast fourier transform and block recombination in sequence and are used as the output of the ADFFN module.
6. The method of claim 5, wherein the lightweight MLP comprises a1 x1 convolution, a linear activation and a1 x1 convolution serially connected in sequence along the forward propagation direction.
7. The method for real-time lightweight object detection in an underwater environment according to claim 1, wherein the model YOLOv n is modified, further comprising configuring an auxiliary head branch in the detection network, the auxiliary head branch comprising: ordering the downsampling characteristics output by the backbone network from large to small according to downsampling multiplying power, and acquiring the first three downsampling characteristics as the input of the auxiliary head branches; the first downsampling feature is subjected to 1 multiplied by 1 convolution and 2 times upsampling to obtain a first upsampling feature; residual connection is carried out on the second downsampling characteristic and the first upsampling characteristic, and then upsampling is carried out for 2 times, so that the second upsampling characteristic is obtained; residual connection is carried out on the third downsampling characteristic and the second upsampling characteristic, and then convolution upsampling characteristics are obtained through depth separable convolution; the convolved up-sampling feature is input to the auxiliary head, and the prediction tag is output.
8. The method for real-time lightweight object detection in an underwater environment according to claim 7, wherein the training of the YOLOv n model is improved, comprising: obtaining a prediction label and a real label of a detection head, and constructing fine granularity loss; Obtaining a prediction label of the auxiliary head and a real label, and constructing coarse granularity loss; and calculating the weighted sum of the fine granularity loss and the coarse granularity loss, obtaining a total loss function, training the improved YOLOv n model, and obtaining a trained improved YOLOv n model.
9. The method for real-time lightweight object detection in an underwater environment according to claim 8, wherein the total loss function Expressed as: ; Wherein, the Indicating the number of branches where the detection head is located, , Represent the first The fine grain size of the individual detection heads is lost, Represent the first The predictive label of each of the detection heads, Representing a real label; The auxiliary weight coefficient is represented as such, Indicating a loss of coarse granularity of the auxiliary head, Predictive labels representing auxiliary headers.
10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed, implements the steps of the method for real-time lightweight object detection of an underwater environment according to any of claims 1 to 9.

Description

Underwater environment real-time lightweight target detection method and readable storage medium Technical Field The invention relates to the technical field of underwater target detection, in particular to a real-time lightweight target detection method for an underwater environment and a readable storage medium. Background The target detection algorithm based on deep learning is mainly divided into two types, namely a two-stage algorithm and a single-stage algorithm. The two-stage algorithm has high precision and low speed, is suitable for scenes with high precision requirements, such as R-CNN series and the like, and the single-stage algorithm has high speed, is suitable for real-time detection, but has slightly lower precision than the two-stage algorithm, such as YOLO series, SSD and the like. YOLO series is widely applied to the field of target detection because of its strong performance, YOLOv a 11 shows an exclamatory performance, and YOLOv a is excellent in performance, but problems such as image blurring and shielding are still encountered in underwater target detection. The C2PSA module of the neck of conventional YOLOv relies on clear color and texture features to activate spatial attention, while degradation of underwater images can impair the effectiveness of this mechanism, leading to failure of the attention weight distribution of the key objective. The C3k2 modules of the backbone focus on lightweight designs, but do not introduce special feature enhancement modules for low contrast, which may increase false detection rates in an underwater target detection environment. The aspect ratio consistency term of the loss function CIoU assumes that the shape distribution of the object is of some specification, and forcing the aspect ratio to fit may prevent the model from learning the true boundaries of the object when encountering long or irregular underwater objects. In summary, the conventional YOLOv-based underwater target recognition method cannot pay attention to the underwater key target, and ignores the irregular shape of the underwater target during training, so that the problems of low recognition accuracy and high false detection rate exist. Disclosure of Invention Therefore, the invention aims to solve the technical problems that the underwater key target cannot be focused in the prior art, and the irregular shape of the underwater target is ignored in training, so that the problems of low recognition precision and high false detection rate exist. In order to solve the technical problems, the invention provides a real-time lightweight target detection method for an underwater environment, which comprises the following steps: inputting the underwater image into a modified YOLOv n model, and acquiring an underwater target detection result through a main network, a neck network and a detection network; Wherein, the improvement YOLOv of the YOLOv n model comprises the replacement of a C3K2 module in the standard YOLOv n model with a C3-CPMSFA module, and the C3-CPMSFA module comprises: The input features are convolved by a convolution unit and then are segmented along a channel by a segmentation unit, so that a first branch feature and a second branch feature are obtained; The method comprises the steps of enabling a first branch feature to sequentially pass through a plurality of C-PMSFA modules connected in series to obtain the output of the last C-PMSFA module as a multi-scale feature, dividing the input feature according to a preset channel proportion by the C-PMSFA module, extracting the feature of the corresponding scale by using a context mixing dynamic convolution unit of different scales, splicing the feature, and connecting the feature with the input feature residual to output; And splicing the multi-scale features with the second branch features, and outputting the multi-scale features through 1 multiplied by 1 convolution. Preferably, the C-PMSFA module comprises: Dividing the input features according to 1/2 channels of the current channel to obtain 1/2 channel features, a first 1/4 channel feature and a second 1/4 channel feature corresponding to the input features; The method comprises the steps of performing feature extraction on 1/2 channel features, first 1/4 channel features and second 1/4 channel features by using a context mixed dynamic convolution unit with convolution kernels of 3×3, 5×5 and 7×7 respectively to obtain initial features with corresponding scales; after the initial features corresponding to all scales are spliced, a1 multiplied by 1 convolution unit is adopted to obtain fusion splicing features; residual connection is carried out on the fusion splicing characteristics and the input characteristics, so that output characteristics of the C-PMSFA module are obtained; wherein the input characteristic of the first C-PMSFA module is the first branch characteristic, and the input characteristic of the other C-PMSFA modules is the output characteristic of the pr