CN-121999233-A - Camouflage target segmentation method of reversible expansion network based on SAM (SAM) guidance
Abstract
The invention relates to the field of computer vision and camouflage target segmentation, in particular to a camouflage target segmentation method based on a SAM-guided reversible unfolding network, which comprises the steps of firstly constructing a foreground space priori graph, a background space priori graph and a high-quality SAM pseudo mask by using a segmentation all models SAM, eliminating priori redundancy through low-dimensional orthogonal subspace projection, and enhancing the separability of the foreground and the background; and then, fusing pixel-level and gradient-level feature fitting items and SAM subspace priori constraint items to construct an integral objective function, expanding the objective function into a multi-stage alternate iteration process of a foreground optimization submodule SFOS and a background optimization submodule SBOS, gradually refining a foreground feature map and a background feature map, and finally generating a camouflage target segmentation mask according to the foreground feature map after iterative optimization. And the integrity and the accuracy of the camouflage target segmentation are obviously improved through large model priori guidance, double-stage feature modeling and multi-stage expansion optimization.
Inventors
- DENG LIZHEN
- BAI JIAHAO
- XU GUOXIA
- ZHU HU
Assignees
- 南京邮电大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260410
Claims (10)
- 1. The method for dividing the camouflage target of the reversible expansion network based on SAM guidance is characterized by comprising the following steps: step one, receiving and preprocessing an input camouflage target image to obtain an original image; Inputting and dividing an original image into all models SAM, constructing and optimizing a foreground space prior image, a background space prior image and a high-quality SAM pseudo mask, and realizing self-adaptive foreground-background separation; Step three, performing dimension reduction and orthogonalization processing on the foreground space prior map and the background space prior map, eliminating feature redundancy, and enabling the foreground subspace to be orthogonalized, so as to obtain an orthogonalized reconstructed foreground prior map and an orthogonalized reconstructed background prior map; Step four, constructing an objective function based on the original image, the original image gradient feature map, the foreground space priori map, the background space priori map, the orthogonal reconstruction foreground priori map, the orthogonal reconstruction background priori map and the high-quality SAM pseudo mask, and expanding the objective function into the following steps The foreground and background updating process of the step alternating iteration is carried out to obtain a foreground characteristic map and a background characteristic map stage by stage; Step five, the first Foreground feature map input obtained by step iteration And the convolution layer obtains a segmentation probability map through a Sigmoid activation function, and binarizes the segmentation probability map to obtain a final segmentation mask of the camouflage target.
- 2. The method for dividing a camouflage target based on a SAM-guided reversible expansion network of claim 1, wherein the preprocessing flow of the first step is to normalize an input RGB camouflage target image to a uniform image size of , wherein, The height of the image is indicated and, Representing the image width, 3 representing the number of image channels, corresponding to three RGB color channels, and adopting a zero filling strategy to ensure the size consistency of subsequent gradient extraction and convolution operation, wherein the preprocessed image is recorded as an original image 。
- 3. The method for dividing a camouflage target of a reversible expansion network based on SAM guidance according to claim 1, wherein the specific implementation steps of the second step are as follows: step 1, original image is processed Proceeding with Geometric augmentation, including flipping, rotating, zooming, obtaining a plurality of augmented views, To augment the number of views; step 2, inputting each augmented view into the SAM to obtain a foreground soft response diagram under a single view Background soft response map under single view And a native segmentation mask, then mapping these outputs back to the original image coordinate system by inverse transformation, ensuring that the outputs of all views are aligned in the same space; step 3, calculating foreground-background prediction entropy for each pixel position; Step 4, respectively carrying out weighted fusion on the foreground/background soft response graph of each view according to the pixel-level weight and the image-level weight to obtain a final foreground space priori graph And background space prior map ; Step 5, carrying out weighted fusion on the original segmentation masks of the multiple views to obtain soft fusion pseudo masks, and finally obtaining high-quality SAM pseudo masks through thresholding, space filling and maximum connected domain screening post-processing operations For subsequent subspace prior constraints.
- 4. The method for camouflage target segmentation based on SAM guided reversible expansion network according to claim 3, wherein the dimension reduction and orthogonalization process in the third step comprises: Step 1, a two-dimensional foreground space priori diagram And background space prior map Respectively flattened into length Is marked as a priori flattened vector of a foreground space And background space prior flattening vector And performing minimum-maximum normalization to [0,1] to splice into a joint feature matrix Wherein For the total number of pixels of the image, The height of the image is indicated and, Representing the image width; step 2, pair-coupling feature matrix After decentralization, covariance matrix is calculated, and projection matrix is obtained through eigenvalue decomposition And then will And Projecting to a low-dimensional subspace to obtain a low-dimensional foreground feature vector Low-dimensional background feature vector ; Step 3, pairing And Carrying out the treatment of the gram-schmitt orthogonalization to obtain the mutually orthogonalization front Jing Xiangliang with low dimension And a low-dimensional orthogonal background vector ; Step 4, obtaining And By projection matrix Back projected back into the original pixel space and reshaped to size Obtaining an orthogonal reconstruction foreground prior image Orthographically reconstructing background prior maps 。
- 5. The method for camouflage target segmentation based on the SAM-guided reversible expansion network according to claim 4, wherein in the fourth step, an objective function of fusing a pixel-gradient two-stage feature fitting item and a SAM subspace prior constraint item is constructed, and the objective function is taken as an optimization basis for subsequent expansion, specifically: Step 1, two-stage characteristic fitting item The method comprises pixel level data fitting and gradient level feature fitting, so that the original image and the gradient features thereof can be reconstructed by the weighted foreground and background, and the formula is as follows: Wherein the method comprises the steps of Weights may be learned for the pixel-level fitting terms, Weights may be learned for the gradient level fitting terms, 、 Respectively an original image, a foreground characteristic image and a background characteristic image, As a prior map of the foreground space, As a background spatial prior map, For the Hadamard product, For the gradient feature map of the original image, For the gradient extraction operator, Is the square of the L2 norm; Step 2, defining a foreground priori graph reconstructed by orthogonalization And orthogonally reconstructing background prior map Guiding the obtained foreground response map And background response map : Wherein, the Representing a fixed response mapping operator for mapping the prior weighted foreground or background feature map to a single channel response space consistent with the pseudo mask; Step 3, constructing subspace prior constraint terms , Wherein, the In the case of a foreground response map, As a background response graph of the image, Pseudo mask for high quality SAM; Step 4, overall objective function: Wherein And controlling the contribution degree of the SAM prior for the fixed weight of the subspace prior constraint term.
- 6. The SAM-guided reversibly unfolding network-based camouflage target segmentation method according to claim 5, wherein the original image gradient feature map The method is obtained by extracting the gradient in the x/y direction through a Sobel operator, and comprises the following specific procedures: Step 1, defining a 3×3 Sobel horizontal convolution kernel And Sobel vertical convolution kernel Respectively executing Sobel convolution on three channels of the normalized original image RGB to obtain the gradient response of each channel in the x/y direction; step2, carrying out mean value fusion on three-channel gradients to obtain a global x-direction gradient value Global y-direction gradient values ; Step 3, will 、 Normalized to [0,1], spliced by channel to form Is a gradient feature map of the original image of (a) 。
- 7. The method for camouflage target segmentation based on the SAM guided reversible expansion network according to claim 6, wherein the alternately iterated foreground updating and background updating part in the fourth step comprises two alternately iterated sub-modules, namely a SAM guided foreground optimizing sub-module SFOS and a SAM guided background optimizing sub-module SBOS, wherein the SFOS is specifically implemented as: step 1, constructing the first Stage foreground optimization sub-problem, fix the first Stage background feature map Ignoring constant terms that are not related to foreground, simplifying the overall objective function to be related only to foreground feature maps Is an optimized form of (2); Step 2, solving an optimization sub-problem based on a near-end gradient algorithm, and deducing to obtain a first problem Stage foreground feature map Closed updating solution of (2): Wherein: the coefficient matrix of the sub-problem is optimized for the foreground, For the weight matrix of the foreground feature map of the previous stage, A foreground residual term formed by the pixel level fitting term and the gradient level fitting term, To reconstruct foreground prior map from orthogonality And high quality SAM pseudo mask The a priori alignment terms that are co-formed, For a fixed weight of the subspace a priori constraint term, Is the first Stage foreground feature map, initial value =0, All zero matrix; the specific implementation of the SBOS is as follows: step 1, constructing the first Stage background optimization sub-problem, fix Stage foreground feature map Ignoring constant terms that are not related to the background, simplifying the overall objective function to be only with respect to the background feature map Is an optimized form of (2); Step 2, solving an optimization sub-problem based on a near-end gradient algorithm, and deducing to obtain a first problem Stage background feature map Closed updating solution of (2): Wherein: The coefficient matrix of the sub-problem is optimized for the background, For the weight matrix of the background feature map of the previous stage, For a background residual term that is composed of a pixel level fitting term and a gradient level fitting term together, To reconstruct background prior figures from orthogonality And high quality SAM pseudo mask complement The a priori alignment terms that are co-formed, For a fixed weight of the subspace a priori constraint term, Is the first Stage background feature map, initial value =0, All zero matrix.
- 8. The SAM-guided reversibly deployable network-based camouflage target segmentation method of claim 7, wherein the SFOS submodule and the SBOS submodule employ The updating rule of the phase alternation iteration is specifically as follows: Wherein the method comprises the steps of , =0、 =0 Is the initial all-zero matrix, As the network stage number, the foreground and background outputs of the previous stage are used as the inputs of the next stage, and the feature map is gradually refined; The SAM-guided reversible expansion segmentation module guides iterative optimization by adopting a segmentation loss function with phase weighting, wherein the segmentation loss function is a weighted sum of segmentation losses of each phase, the phase weight is set according to exponential decay, and the formula is as follows: The loss function will be all The loss of each stage is weighted and summed, and the stage weight is set according to exponential decay, so that the model focuses on the later stage with finer results, wherein: is formed by a kth stage foreground feature map The generated segmentation mask is used to generate a segmentation mask, Is a Sigmoid activation function that is activated by, Is a1 x 1 convolutional layer; Is the true segmentation mask of the camouflage target; the weighted binary cross entropy loss is used for solving the problem of class imbalance in camouflage target segmentation tasks; Is a weighted cross ratio loss used for measuring the overlapping degree of the prediction mask and the real mask; Is the first The stage weight corresponding to the stage loss decays exponentially with the stage number, so that the model focuses on the later stage with finer foreground/background characteristics.
- 9. The method for camouflage target segmentation based on the SAM-guided reversibly unfolding network of claim 8, wherein the specific flow of the fifth step is as follows And inputting the final foreground feature map obtained after the stage iteration optimization into a1 multiplied by 1 convolution layer, generating a pixel level segmentation probability map through a Sigmoid activation function, and performing binarization processing on the probability map to obtain a final segmentation mask of the camouflage target.
- 10. A camouflage target segmentation system of a SAM-based reversible expansion network, characterized in that it is used to implement the SAM-based reversible expansion network camouflage target segmentation method as claimed in any one of claims 1 to 9, said system comprising The camouflage image input module is used for receiving and preprocessing an input camouflage target image; The SAM space prior construction module is used for constructing and optimizing a foreground space prior image and a background space prior image from SEGMENT ANYTHING Model, namely SAM native output, and realizing self-adaptive foreground-background separation by replacing hard masks; The low-dimensional orthogonal subspace projection module is used for carrying out dimension reduction and orthogonalization processing on the foreground and background space prior images, eliminating feature redundancy and forcing the foreground and background subspaces to be strictly orthogonalized; The SAM-guided reversible unfolding segmentation module is used for fusing pixel-gradient two-stage characteristic fitting and subspace priori constraint and realizing gradual segmentation of a camouflage target through the alternately iterative foreground optimization and background optimization submodule; and the segmentation result output module is used for generating a final segmentation mask of the camouflage target based on the iterative optimized foreground feature map.
Description
Camouflage target segmentation method of reversible expansion network based on SAM (SAM) guidance Technical Field The invention relates to the technical field of computer vision and artificial intelligence, in particular to a camouflage target segmentation task, and specifically relates to a camouflage target segmentation method and system integrating SEGMENT ANYTHING Model (SAM) strong vision priori with a reversible unfolding frame, which are suitable for image segmentation scenes such as camouflage target segmentation, polyp image segmentation and transparent target segmentation. Background Camouflage target segmentation (COS) is a basic and very challenging task in the field of computer vision, and the core target is to accurately segment a target region from an image highly fused with background vision, so that the method is widely applied to the fields of military reconnaissance, medical image analysis, industrial detection and the like. The core challenges of the camouflage target segmentation are mainly characterized in that firstly, the foreground and the background of the camouflage target have inherent similarity in visual characteristics, effective distinguishing characteristics are difficult to extract, and secondly, fine-granularity characteristics such as fine edges, textures and the like of the camouflage target are easily submerged by the background, so that the segmentation result is incomplete and the edges are blurred. With the development of deep learning technology, a camouflage target segmentation method based on a depth network has significantly progressed, and various depth models are applied to the task to improve segmentation performance. However, the existing method still has a plurality of problems to be solved, namely, firstly, a part of methods depend on a hard mask to establish the association between a foreground and an original image, the rigid constraint brought by the hard mask cuts off the fine discrimination characteristics of a camouflage target, so that the target edge and detail segmentation are lost, secondly, the existing method lacks effective priori guidance of a large visual basic model, regularization terms are mostly artificial design, generalization capability is limited, thirdly, most of methods only perform pixel-level data fitting in an objective function, neglect modeling of gradient-level fine-grained characteristics (such as edges and textures) which are critical for distinguishing the camouflage target from the background, and cannot capture the detail structure of the camouflage target. SEGMENT ANYTHING Model (SAM) is used as a powerful visual basic Model which is proposed in recent years, training is completed on the ultra-large scale mask data, excellent zero sample generalization capability and general segmentation performance are achieved, and effective visual priori is provided for various segmentation tasks. The SAM can output the soft mask probability of the target in the image and provides reliable priori information for the disguised target segmentation, but in the prior art, the utilization of the SAM priori is only remained on the single pseudo mask layer, the potential of the SAM for providing a space self-adaptive priori for the unfolding frame of the disguised target segmentation is not explored, and the method for effectively fusing the SAM strong visual priori with the model-driven reversible unfolding frame is also lacking, so that the priori value of the SAM for the disguised target segmentation cannot be fully mined. Therefore, a novel camouflage target segmentation method is needed, the advantages of strong visual priori and reversible unfolding framework of SAM can be effectively fused, the problems of hard mask constraint, lack of large model priori, neglect of gradient feature modeling and the like in the existing method are solved, the accuracy, robustness and generalization capability of camouflage target segmentation are improved, and meanwhile complex camouflage target segmentation tasks such as small targets, multiple targets and degraded scenes are adapted. Disclosure of Invention In order to solve the technical problems of the existing camouflage target segmentation method, the application provides a camouflage target segmentation method based on a SAM-guided reversible unfolding network, and the core idea is to deeply fuse a strong visual priori of the SAM with a reversible unfolding frame, reconstruct a camouflage target segmentation model from three dimensions of space priori guidance, pixel-gradient two-stage feature modeling and multi-stage reversible unfolding optimization, abandon rigid constraint of a hard mask, effectively inject a large model visual priori into the segmentation frame, strengthen modeling capability of fine-grained features, and finally realize accurate and robust segmentation of the camouflage target. The application refers to a reversible expansion, which is not a reversible neural network in a