CN-121982506-A - Lightweight underwater semantic segmentation method based on fine granularity attention and self-adaptive feature fusion

CN121982506ACN 121982506 ACN121982506 ACN 121982506ACN-121982506-A

Abstract

The invention discloses a lightweight underwater semantic segmentation method based on fine granularity attention and self-adaptive feature fusion, and belongs to the field of computer vision and underwater image processing. Aiming at the problems of illumination attenuation of images under water in aquaculture, complex background and difficult segmentation of small targets, the invention provides a lightweight underwater semantic segmentation network and a matched training method, which mainly comprise three core improvements, namely, designing a fine-grained multistage feature attention module, enhancing shallow feature extraction through cascade cavity convolution and a spatial and channel attention mechanism, introducing an adaptive feature fusion module, combining wavelet downsampling to reserve space details and inhibit redundant features, and thirdly, providing a Gaussian price loss function, and solving the problem of unstable training caused by small target boundary deviation through Gaussian weighting. The invention effectively improves the segmentation precision under the complex underwater environment while obviously reducing the calculation cost.

Inventors

CAO LIJIE
HE ZHIQIAN
HE QIUSHI
WANG SIYUAN

Assignees

大连海洋大学

Dates

Publication Date: 20260505
Application Date: 20260119

Claims (10)

1. The lightweight underwater semantic segmentation method based on fine granularity attention and self-adaptive feature fusion is characterized by comprising the following steps of: s1, acquiring an original underwater image of an aquaculture environment, preprocessing the image, and constructing an underwater semantic segmentation data set comprising a training set and a verification set; S2, constructing a lightweight underwater semantic segmentation network, wherein the network comprises a fine-grained multistage feature attention module, a self-adaptive feature fusion module and a pixel-level classifier; the fine-granularity multi-stage feature attention module enhances shallow feature extraction on an input underwater image through multi-branch convolution and a space and channel attention mechanism; The self-adaptive feature fusion module reserves space structure information through wavelet transformation downsampling, and introduces a channel weighting and cosine similarity mechanism to realize self-adaptive complementary fusion of deep and shallow layer features; The pixel-level classifier generates a semantic segmentation mask map based on the deep semantic features after the self-adaptive feature fusion module outputs and fuses; s3, constructing a mixed Loss function containing Focal Loss and Gaussian Dice Loss, performing iterative training on the lightweight underwater semantic segmentation network constructed in the step S2 by utilizing the training set constructed in the step S1, and updating network parameters through a back propagation algorithm until the Loss function converges to obtain a trained optimal model; S4, inputting the underwater image to be detected into a trained optimal model, outputting a pixel-level semantic segmentation mask map, and completing recognition and segmentation of the underwater target.
2. The lightweight underwater semantic segmentation method based on fine-grained attention and adaptive feature fusion according to claim 1, wherein in step S1, the specific process of constructing the underwater semantic segmentation dataset comprises: Acquiring underwater images in an aquaculture environment, and ensuring images containing targets with different illumination conditions, different turbidity and different scales; performing pixel-level labeling on a target area in the acquired underwater image to generate a binary mask or a multi-class mask corresponding to the original underwater image; Preprocessing the marked underwater image, including image normalization, size unification, overturning, rotation and brightness adjustment, so as to enhance data diversity and generalization capability of the model and form an underwater semantic segmentation data set; the underwater semantic segmentation data set is divided into a training set, a verification set and a test set, so that the rationality of scene distribution is ensured.
3. The lightweight underwater semantic segmentation method based on fine-grained attention and adaptive feature fusion according to claim 1, wherein the processing flow of the fine-grained multi-level feature attention module is as follows: processing the input feature F through a two-dimensional channel normalization layer, and dividing the input feature F into four branch features in the channel dimension For the first branch Extraction of base features using standard 3 x 3 convolution For the second branch Fusing output features of a previous stage And extracting features by using a convolution kernel with the void ratio of 3 to obtain For the third branch Fusing output features of a previous stage Feature extraction using convolution kernel with void fraction 5 And is opposite to Performing spatial attention processing to obtain For the fourth branch Performing channel attention processing to obtain Finally, the output characteristics of the four branches Fusing by 3X 3 convolution after channel dimension splicing, further processing by normalization and activation functions, and outputting final characteristics 。
4. The lightweight underwater semantic segmentation method based on fine-grained attention and adaptive feature fusion according to claim 1, wherein the adaptive feature fusion module comprises a wavelet transform enhancement module and a channel perception fusion module; the wavelet transformation enhancement module applies two-dimensional discrete wavelet transformation operation to the input feature map to obtain four sub-bands, namely a low-frequency approximate component And three high-frequency direction components which are respectively horizontal Vertical direction Diagonal angle Integrating the three high frequency components and combining with the low frequency features to generate a downsampled feature comprising a global structure and local edges by convolution The process is expressed as: (6) Wherein, the Representing an input feature map comprising fine-grained multi-level feature attention module outputting enhanced shallow features Deep semantic features output by the last AFFM module And the characteristics of the original underwater image obtained by two-dimensional discrete wavelet transform; representing a two-dimensional discrete wavelet transform operation; Representing low frequency approximation components; , , high frequency components respectively representing horizontal, vertical and diagonal directions; The channel perception fusion module firstly aims at shallow layer characteristics Deep semantic features Downsampling feature Weighting and splicing to obtain primary fusion characteristics : (7) Wherein, the , , Representing a learnable parameter for adaptively controlling the importance of each characteristic channel; feature adjustment mechanism based on cosine similarity is introduced, and the fusion result is updated by measuring the dynamic weighting of the similarity among the features, so that redundant features are effectively inhibited and key semantic response is enhanced, and the formula is as follows: (8) Wherein, the Cosine similarity representing channel dimensions; And representing the final fusion characteristic after dynamic weighting adjustment, namely the deep semantic characteristic.
5. The lightweight underwater semantic segmentation method based on fine-grained attention and self-adaptive feature fusion according to claim 1, wherein the pixel classifier maps the deep semantic feature graph output by the self-adaptive feature fusion module into the category number set by a data set through a1×1 convolution layer to output an un-normalized predicted value, then applies a Softmax activation function to the output in the channel dimension to calculate the probability distribution of each pixel point belonging to each category to generate a probability graph, and finally performs Argmax operation on each spatial position of the probability graph to convert the probability distribution into a discrete category index graph, namely a final semantic segmentation mask output by a network, wherein the value of each pixel represents the object category to which the point is predicted.
6. The fine granularity attention and adaptive feature fusion-based lightweight underwater semantic segmentation method as set forth in claim 1, wherein the underwater semantic segmentation network adopts a mixed loss function combining Gaussian price loss and focus loss in the training process, wherein the mixed loss function The calculation formula of (2) is as follows: (9) Wherein, the Is a super parameter; is Gaussian Dice loss; is the focus loss.
7. The lightweight underwater semantic segmentation method based on fine granularity attention and adaptive feature fusion according to claim 6, wherein the gaussian price loss concrete construction process is as follows: firstly, generating a two-dimensional Gaussian distribution diagram according to the mass center of each connected target area in a real label to obtain a space weight matrix Wherein pixel points closer to the center of the target are given larger weight values The weight is smaller as the distance is farther, and the spatial weight matrix is based Defining Gaussian Dice coefficients Gaussian Dice loss The formula is as follows: (10) Wherein, the Representing the pixel index after flattening the feature map; Output for underwater semantic segmentation network A prediction probability that each pixel belongs to a target class; Is the first True label values for individual pixels; Is the first Spatial Gaussian weights corresponding to the individual pixels; is a smooth term.
8. The fine granularity attention and adaptive feature fusion-based lightweight underwater semantic segmentation method according to claim 6, wherein the focus loss calculation formula is: (11) Wherein, the A is a weight factor for balancing positive and negative samples; is a focus parameter.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor, when executing the computer program, causes the electronic device to perform the lightweight underwater semantic segmentation method based on fine-grained attention and adaptive feature fusion as set forth in any one of claims 1-8.
10. A storage medium comprising a computer program, characterized in that the computer program, when run on an electronic device, causes the electronic device to perform the lightweight underwater semantic segmentation method based on fine-grained attention and adaptive feature fusion as claimed in any of claims 1-8.

Description

Lightweight underwater semantic segmentation method based on fine granularity attention and self-adaptive feature fusion Technical Field The invention relates to the technical field of computer vision and deep learning, in particular to a lightweight underwater image semantic segmentation method used in an aquaculture environment. Background Along with the transition of the aquaculture industry to digitization and intellectualization, the automatic identification, state monitoring and behavior analysis of biological individuals such as fishes, shellfishes and the like in the aquaculture environment by utilizing a computer vision technology has become a key way for improving the aquaculture efficiency. The image semantic segmentation is used as a core technology of underwater perception, and can classify each pixel in the image, so that the fine segmentation of the target is realized. However, in practical underwater application scenarios, image semantic segmentation faces extremely serious challenges. On one hand, the problems of illumination attenuation, color distortion, blurry and the like of an image are caused by the absorption and scattering of water bodies on the light, so that the characteristic extraction is extremely difficult, and on the other hand, the underwater biological targets are often smaller and are easily confused with complex aquatic plants and rock backgrounds, so that the segmentation boundary is unclear. To address these challenges, the prior art has mainly employed either high-precision deep learning models or lightweight models, but both have significant limitations. High-precision models such as DeepLab series are accurate in identification, but large in parameter quantity and high in calculation complexity, and are difficult to run on an underwater robot with limited calculation resources in real time, while existing lightweight models are high in speed, but shallow space detail information is extremely easy to lose in a low-contrast underwater environment, so that missed detection of a small target is caused. In addition, the conventional DiceLoss loss function is too sensitive to small deviation of a prediction boundary when processing a small-scale target, so that a loss value in the training process is easy to shake severely, and the optimal solution is difficult to converge. Therefore, developing a method that can not only keep light weight, but also effectively enhance feature extraction capability and stabilize small target training is a technical problem to be solved currently. Disclosure of Invention In order to solve the problems that the model weight and the high-precision model are difficult to be compatible and the small target segmentation effect is poor in the prior art, the invention provides a lightweight underwater semantic segmentation method based on fine granularity attention and self-adaptive feature fusion. The method constructs a lightweight semantic segmentation network-LUSSNet for underwater scenes, firstly, designs a fine-granularity multi-level feature attention feature extraction module (FGMLFA) for enhancing feature extraction capability of a shallow backbone network under a high noise condition, effectively improves perception capability of a model to a target body and fuzzy edges, secondly, constructs a self-Adaptive Feature Fusion Module (AFFM) for realizing effective fusion of multi-scale semantic information and redundancy suppression of channel features, introduces a wavelet downsampling mechanism to acquire edge information of different scales and realize dynamic weighted integration of semantic features, and finally, provides a Gaussian price loss function combining class balance and small target modeling optimization for improving training weight of positive samples and relieving scale offset problem in small target segmentation, and remarkably improves precision performance of the model in positive and negative sample discrimination and small target segmentation tasks. The technical scheme of the invention is as follows: A lightweight underwater semantic segmentation method based on fine granularity attention and self-adaptive feature fusion comprises the following steps: s1, acquiring an original underwater image of an aquaculture environment, preprocessing the image, and constructing an underwater semantic segmentation data set comprising a training set and a verification set; s2, constructing a lightweight underwater semantic segmentation network LUSSNet, wherein the network comprises a fine-grained multistage feature attention module (FGMLFA), an Adaptive Feature Fusion Module (AFFM) and a pixel-level classifier; the fine-granularity multi-stage feature attention module enhances shallow feature extraction on an input underwater image through multi-branch convolution and a space and channel attention mechanism; The self-adaptive feature fusion module reserves space structure information through wavelet transformation downsampling, and i