CN-121982047-A - Edge-aware image segmentation method, device, electronic equipment and storage medium

CN121982047ACN 121982047 ACN121982047 ACN 121982047ACN-121982047-A

Abstract

The application provides an edge perception image segmentation method, device, electronic equipment and storage medium, which comprise the steps of extracting features of an input image through a SAM encoder, extracting global edge features of the input image, dynamically fusing the features of the input image into all layers of features of the SAM encoder to form edge enhancement coding features, generating dense embedding according to the global edge features and the edge enhancement coding features, inputting the input image into an image description model to generate text description of the image, generating text semantic embedding by utilizing the text embedding model, mapping the global edge features to feature dimensions of the text semantic embedding, fusing to form multi-mode sparse prompt vectors, and inputting the multi-mode sparse prompt vectors, the coding features and the dense embedding into the SAM decoder to generate segmentation masks. The application solves the problem that the camouflage target is difficult to split under a complex background, and realizes the accurate extraction and efficient compression of key information in semantic communication.

Inventors

JU PENG
LI XIAOYANG
XIE DONGFENG
XIA LIANJIE
LI YANBO
ZHANG JIANJUN
LI GUANG
GAO ZHEN
CAI XIANGKE
GENG SHAOGUANG
WANG YALI

Assignees

天津七一二通信广播股份有限公司
天津电子信息职业技术学院

Dates

Publication Date: 20260505
Application Date: 20260409

Claims (8)

1. An edge-aware image segmentation method, comprising: Carrying out multi-level feature extraction on an input image acquired by an acquisition end through a transducer block of a SAM encoder, extracting global edge features of the input image through an edge perception module, and dynamically fusing the global edge features into each layer of features of the SAM encoder through an edge feature adapter to form edge enhanced coding features; inputting the global edge feature and the edge-enhanced coding feature into a dense embedding reconstruction module together, and generating dense embedding rich in semantic and structural information through variable convolution and nonlinear mapping; inputting an input image into an image description model to generate text description of the image, generating text semantic embedding by using a text embedding model, mapping the global edge feature to a feature dimension of the text semantic embedding to form a multi-mode sparse prompt vector in a fusion mode; And the multi-mode sparse prompt vector, the edge enhanced coding feature and the reconstructed dense embedding are input to a SAM decoder together to generate a segmentation mask, the mask is transmitted to a receiving end after being coded, and the receiving end decodes and utilizes a generation model to reconstruct a scene image with complete semantics.
2. The method according to claim 1, wherein the extracting global edge features of the input image by the edge-aware module comprises: gradually extracting feature layers with different scales based on ResNet networks to construct a feature pyramid, and constructing multi-scale edge features through deformable convolution; And (3) aggregating the local features by introducing a multi-head self-attention mechanism to obtain global edge features.
3. An edge-aware image segmentation method according to claim 1, characterized in that: Splicing the encoder characteristics output by the current transducer block and the global edge characteristics output by the edge perception module to form composite characteristics fused with local details and global boundary information; linearly projecting the spliced high-dimensional composite features through a full-connection layer to obtain the features after dimension reduction, then connecting a ReLU activation function, and projecting back to the original input dimension through the full-connection layer; and carrying out weighted summation on the edge modulation characteristic output by the edge characteristic adapter and the original encoder characteristic through a weight coefficient, and finally outputting the characteristic after edge enhancement to the next transducer block.
4. An edge-aware image segmentation method according to claim 1, characterized in that: Extracting a basic image from the output of the SAM encoder, splicing the basic image and the edge characteristic in the channel dimension to form a composite characteristic, and inputting the spliced composite characteristic into a deformable convolution layer to obtain a refined characteristic with boundary perception; And carrying out dimension mapping on the refined features through a full connection layer, introducing nonlinear transformation through GELU activation functions, and finally generating dense embedding.
5. The method of edge-aware image segmentation according to claim 4, wherein the adaptive feature extraction for the base embedding using deformable convolution is guided by global edge features generated by the edge-aware module, comprising: and dynamically predicting the sampling offset of the convolution kernel according to boundary direction and texture complexity information contained in the global edge feature, and carrying out feature resampling and aggregation on the basic image embedding through deformable convolution based on the predicted offset to obtain the refined feature.
6. An edge-aware image segmentation apparatus, comprising: The feature extraction and fusion unit is configured to extract multi-level features of an input image acquired by an acquisition end through a transducer block of the SAM encoder, extract global edge features of the input image through an edge perception module, and dynamically fuse the global edge features into each layer of features of the SAM encoder through an edge feature adapter to form edge enhanced coding features; the dense embedding generating unit is configured to input the global edge feature and the edge enhanced coding feature into the dense embedding reconstruction module together, and generate dense embedding rich in semantic and structural information through variable convolution and nonlinear mapping; the multi-modal prompt generation unit is configured to input an input image into the image description model to generate text description of the image, generate text semantic embedding by utilizing the text embedding model, map the global edge feature to the feature dimension of the text semantic embedding so as to form a multi-modal sparse prompt vector in a fusion mode; And the mask output unit is configured to input the multi-mode sparse prompt vector, the edge enhanced coding feature and the reconstructed dense embedding into a SAM decoder together to generate a segmentation mask, transmit the mask to a receiving end after the mask is coded, and decode the segmentation mask by the receiving end and reconstruct a scene image with complete semantics by using a generating model.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements an edge-aware image segmentation method according to any one of claims 1-5 when executing the program.
8. A non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium stores computer instructions for causing a computer to perform an edge-aware image segmentation method according to any one of claims 1-5.

Description

Edge-aware image segmentation method, device, electronic equipment and storage medium Technical Field The application belongs to the technical field of image processing, and particularly relates to an edge-aware image segmentation method, an edge-aware image segmentation device, electronic equipment and a storage medium. Background With the rapid development of applications such as the internet of things, automatic driving, remote interaction and the like, urgent demands are put forward on efficient and intelligent image transmission technologies. The semantic communication is taken as a next-generation communication paradigm, and the core of the semantic communication is to break through the transmission concept of the traditional bit fidelity, and turn to the understanding and transfer of the meaning of the image. The method firstly identifies and extracts key semantic information in the image at a transmitting end, and only encodes and transmits the information, so that the image with the same semantic content is reconstructed at a receiving end. This can fundamentally eliminate pixel level redundancy, achieve extremely high compression ratios, and is particularly suitable for bandwidth-limited or delay-sensitive scenes. As a basis for semantic understanding of images, high-quality image segmentation techniques are a core premise of semantic communication systems. In recent years, a visual basic model based on prompt, in particular a Segmentation All Model (SAM), provides a revolutionary tool for general image segmentation with a powerful zero-sample generalization capability and a flexible interaction mode. However, the direct application of SAM to semantic communication, especially in the face of camouflage targets (i.e. targets with highly similar background textures and colors, and indistinguishable boundaries), has significant drawbacks, namely firstly, the SAM lacks high-level semantic information for random initialization of dense embedding without external cues, which results in lack of basis for segmentation decision, secondly, the SAM is designed for general segmentation, the encoder has insufficient sensitivity to target boundaries, inaccurate or broken segmentation results are easily generated when processing camouflage targets with blurred boundaries, and finally, the semantic communication requires a system to output stable and accurate target masks for subsequent compression, while the uncertainty of SAM in complex scenes affects the reliability of the whole communication link. Disclosure of Invention In view of the foregoing, the present application aims to provide an edge-aware image segmentation method, an edge-aware image segmentation apparatus, an electronic device, and a storage medium, so as to solve at least one of the above problems. In order to achieve the above purpose, the technical scheme of the application is realized as follows: In a first aspect, the present application provides an edge-aware image segmentation method, comprising: Carrying out multi-level feature extraction on an input image acquired by an acquisition end through a transducer block of a SAM encoder, extracting global edge features of the input image through an edge perception module, and dynamically fusing the global edge features into each layer of features of the SAM encoder through an edge feature adapter to form edge enhanced coding features; inputting the global edge feature and the edge-enhanced coding feature into a dense embedding reconstruction module together, and generating dense embedding rich in semantic and structural information through variable convolution and nonlinear mapping; inputting an input image into an image description model to generate text description of the image, generating text semantic embedding by using a text embedding model, mapping the global edge feature to a feature dimension of the text semantic embedding to form a multi-mode sparse prompt vector in a fusion mode; And the multi-mode sparse prompt vector, the edge enhanced coding feature and the reconstructed dense embedding are input to a SAM decoder together to generate a segmentation mask, the mask is transmitted to a receiving end after being coded, and the receiving end decodes and utilizes a generation model to reconstruct a scene image with complete semantics. In a second aspect, based on the same inventive concept, the present application further provides an edge-aware image segmentation apparatus, including: The feature extraction and fusion unit is configured to extract multi-level features of an input image acquired by an acquisition end through a transducer block of the SAM encoder, extract global edge features of the input image through an edge perception module, and dynamically fuse the global edge features into each layer of features of the SAM encoder through an edge feature adapter to form edge enhanced coding features; the dense embedding generating unit is configured to input the global edge feature a