CN-121982504-A - Underwater camouflage target segmentation method based on spatial domain and frequency domain collaborative awareness

CN121982504ACN 121982504 ACN121982504 ACN 121982504ACN-121982504-A

Abstract

The invention provides an underwater camouflage target segmentation method based on spatial domain frequency domain collaborative perception, which comprises the steps of inputting an underwater image to be processed into a coding and decoding network, obtaining first to fourth scale features and bottleneck features by down sampling of two times, setting a context feature fusion module for each scale, taking a current scale and adjacent deeper features as a pair of inputs, respectively carrying out global average pooling and convolution extraction on the features, obtaining two paths of weights by Sigmoid after splicing, carrying out channel-by-channel and pixel-by-pixel weighting on the two paths of features and polymerizing to form fusion features, introducing a double-domain attention module, modeling context in a spatial domain, extracting high-frequency and low-frequency components by two-dimensional discrete Fourier transform in the frequency domain, modulating by a leachable filter, fusing with spatial domain results after inverse transformation to highlight the camouflage target significance, inhibiting homogeneous bottleneck background, carrying out step-by step up sampling by step by taking the current scale and parallel polymerization and convolution reforming with the corresponding scale fusion features, and finally outputting pixel-level classification results of the underwater camouflage target.

Inventors

LIU MOUYUAN
WANG RUIXUE
CHENG BIN
ZHAO XINYU
Lu Miaoxin
ZHAN CHUNLEI
AN JIE

Assignees

大连海事大学

Dates

Publication Date: 20260505
Application Date: 20251230

Claims (7)

1. An underwater camouflage target segmentation method based on spatial domain frequency domain cooperative sensing is characterized by comprising the following steps of: Acquiring an underwater image to be processed, and constructing a coding and decoding type segmentation network consisting of four-level coding blocks and four-level decoding blocks, wherein two times of downsampling is carried out between adjacent coding blocks to obtain coding characteristics and bottleneck characteristics of first to fourth scales of the underwater image; Setting corresponding context feature fusion modules aiming at each scale coding feature, taking the coding feature of the scale as a low-layer feature and an adjacent deeper layer feature as a high-layer feature, wherein two paths of features keep respective original spatial resolution to enter the context feature fusion modules and do not perform spatial alignment; In the context feature fusion module, global average pooling and 1 multiplied by 1 convolution are respectively carried out on low-level features and high-level features to obtain context description vectors, two paths of weights are generated through splicing and Sigmoid normalization processing, and channel-by-channel and pixel-by-pixel weighted aggregation is carried out on the two paths of features according to the weights to form fusion features; Constructing a dual-domain attention module, extracting context characteristic response of an underwater image in a space domain, extracting high-frequency characteristics and low-frequency characteristics through two-dimensional discrete Fourier transform in a frequency domain, adjusting frequency band significance by utilizing a leachable filter, and fusing the two-dimensional discrete Fourier transform and the spatial characteristics to form spatial frequency collaborative characterization after inverse transformation; The decoding stage takes bottleneck characteristics as a starting point to perform up-sampling step by step, and respectively performs parallel aggregation and convolution reformation with fusion characteristics of corresponding scales to gradually recover space details and boundaries; and finally outputting a pixel level division result of the underwater camouflage target.
2. The method for partitioning an underwater camouflage target based on spatial domain frequency domain collaborative awareness according to claim 1, wherein the context feature fusion module comprises a global average pooling layer, a1×1 convolution layer, a stitching layer and a Sigmoid normalization unit, and is used for generating weighting coefficients of low-layer features and high-layer features.
3. The method for dividing the underwater camouflage target based on the collaborative perception of the spatial domain and the frequency domain according to claim 1, wherein the dual-domain attention module comprises a spatial attention sub-module and a frequency attention sub-module, the spatial attention sub-module extracts the context characteristics of the underwater image in parallel by using maximum pooling and average pooling, and the frequency attention sub-module decomposes the input characteristics into a high-frequency component and a low-frequency component through two-dimensional discrete Fourier transform.
4. The method for dividing the underwater camouflage target based on the spatial domain frequency domain collaborative awareness according to claim 1, wherein the high-frequency and low-frequency components of the frequency attention sub-module are reconstructed into frequency domain enhancement features through inverse two-dimensional discrete Fourier transform after being adjusted by a leachable filter.
5. The method for dividing the underwater camouflage target based on the collaborative perception of the spatial domain and the frequency domain according to claim 1, wherein the spatial domain and the frequency domain output are convolutionally fused, and a double-domain weight mask is generated through Sigmoid so as to dynamically adjust the response of the salient region in the channel and the spatial dimension.
6. The method for dividing the underwater camouflage target based on the spatial domain and frequency domain collaborative awareness according to claim 1, wherein the decoding stage adopts a step-by-step up-sampling mode and a cross-layer aggregation mode, and fusion features of each step of up-sampling result and corresponding scales are spliced after convolution and reforming, so that detail recovery and boundary enhancement are realized.
7. The method for dividing the underwater camouflage target based on the spatial domain and frequency domain collaborative awareness according to claim 1, wherein the bottleneck layer is subjected to channel compression through convolution and then subjected to convolution feature reforming so as to improve feature transfer efficiency and robustness in a decoding stage.

Description

Underwater camouflage target segmentation method based on spatial domain and frequency domain collaborative awareness Technical Field The invention relates to the technical field of underwater camouflage target segmentation, in particular to an underwater camouflage target segmentation method based on space domain frequency domain collaborative perception. Background Underwater camouflage targets (e.g., marine organisms, biomimetic equipment, etc.) tend to be highly similar in color, texture, and morphology to the background environment, resulting in extremely weak spectral distribution, edge gradients, and texture differences between the target and the background. In addition, the scattering and absorption effects of the water body on light further reduce imaging contrast, and the target contour is blurred and details are lost, so that the recognition and segmentation are very easy to cause. Traditional segmentation methods based on single-scale convolution features or single-domain enhancement are difficult to effectively characterize camouflage features of strong similarity and weak boundaries, and conflicts often exist between multi-scale semantic interaction and fine-granularity structure restoration, so that camouflage areas are missed or boundary misclassification is caused. In addition, the significance of the camouflage target depends on the subtle difference of the camouflage target and the background in the space domain and the frequency domain, the conventional method only models the significance in the space domain, the high-low frequency energy distribution in the frequency component is lack of depiction, and the synergetic absence of high-frequency details (textures and edges) and low-frequency semantics (shapes and contours) further weakens the sensitivity and the distinction of the model on weak features. Under complex background, low contrast and non-uniform illumination scenes, stable detection and accurate segmentation of camouflage targets are difficult to achieve by the method. Disclosure of Invention Aiming at the problems of the prior art, the invention discloses an underwater camouflage target segmentation method based on space domain frequency domain collaborative perception, which is characterized in that a context feature fusion and a double-domain attention mechanism are introduced into a coding and decoding framework, and the complementary relation between a space structure and frequency components is jointly modeled. Capturing cross-layer context and local geometric constraint in a space domain, analyzing high-low frequency energy distribution in a frequency domain, and performing self-adaptive modulation to enhance the salient response of a camouflage target, inhibit homogeneous background interference and realize fine segmentation and accurate boundary recovery of a weak characteristic region. The technical scheme of the invention specifically adopts the following steps: s1, inputting an underwater image to be processed into an encoding-decoding type segmentation network formed by four-level encoding blocks and four-level decoding blocks, and obtaining encoding characteristics and bottleneck characteristics of first to fourth scales by downsampling twice between adjacent encoding blocks; S2, setting a context feature fusion module aiming at each scale, taking the coding feature of the scale as a low-layer feature and the adjacent deeper features as high-layer features, and enabling the two features to enter the fusion module without space alignment while keeping the original spatial resolution of each feature; S3, global average pooling and 1 multiplied by 1 convolution are respectively carried out on the low-layer and high-layer features in a fusion module to obtain a context description vector, two paths of weights are generated through splicing and Sigmoid normalization, and channel-by-channel and pixel-by-pixel weighted aggregation is carried out on the two paths of features according to the weights; S4, further constructing a dual-domain attention module, extracting context feature response in a space domain, extracting high-frequency and low-frequency features through two-dimensional discrete Fourier transform in a frequency domain, adjusting frequency band significance by utilizing a leachable filter, and fusing the two-dimensional discrete Fourier transform with the space features after inverse transformation to form a space-frequency collaborative characterization; S5, the decoding stage takes bottleneck characteristics as a starting point to perform up-sampling operation step by step, and respectively performs parallel aggregation and convolution reformation with fusion characteristics of corresponding scales, so that space details and boundaries are gradually restored, and finally, a pixel fraction classification result of the underwater camouflage target is output. Furthermore, the dynamic fusion of the cross-scale features is realized under the condition of not