CN-121999214-A - Multi-granularity image segmentation method based on unified query driving

CN121999214ACN 121999214 ACN121999214 ACN 121999214ACN-121999214-A

Abstract

The invention discloses a multi-granularity image segmentation method based on unified query driving, which comprises the steps of extracting a multi-scale feature map, inputting the multi-scale feature map into a pixel decoder to obtain pixel-level feature representation and mask features corresponding to the pixel-level feature representation, inputting an object query vector and the pixel-level feature representation into a transducer decoder, iteratively updating the object query in a multi-layer decoding structure to align potential objects, constructing segmentation loss functions of object levels, component levels and sub-component levels, carrying out weighted summation on the loss of each level, adopting a point sampling strategy based on prediction uncertainty to calculate mask loss and position loss on sampling points to carry out back propagation training, carrying out threshold and overlapping processing on the obtained mask and category prediction, generating corresponding component and sub-component results inside the object mask, and outputting a semantic segmentation map, an instance segmentation map, a component segmentation map and a sub-component segmentation map, realizing multi-granularity selectable reasoning, thereby flexibly balancing between precision and calculation cost.

Inventors

WANG WENBIN
GONG QIWEN
REN DONG
YU JUNBO

Assignees

三峡大学

Dates

Publication Date: 20260508
Application Date: 20260116

Claims (10)

1. A multi-granularity image segmentation method based on unified query driving is characterized by comprising the following steps: S1, extracting features, namely preprocessing an input image, and then sending the input image into a backbone network to extract a multi-scale feature map; S2, inputting the multi-scale feature map into a pixel decoder, and fusing different scale information through an attention or deformable attention mechanism to obtain pixel-level feature representation and mask features corresponding to the pixel-level feature representation; S3, object query generation and update, namely initializing a fixed number of object query vectors, inputting the object query vectors and pixel-level characteristic representations into a transducer decoder together, and iteratively updating the object query in a multi-layer decoding structure to align potential objects; S4, multi-granularity segmentation prediction, namely, based on the same group of object inquiry and shared mask characteristics updated by a transducer decoder, respectively generating an object-level segmentation mask, a class prediction, a component-level segmentation mask and a sub-component-level segmentation mask by adopting an object-level prediction head, a component decoder and a sub-component decoder, wherein the components and the sub-components are not separately classified, and corresponding masks are generated according to a predefined component and a sub-component set of the object class; S5, joint training, namely constructing a segmentation loss function of an object level, a component level and a sub-component level, carrying out weighted summation on the loss of each level, and adopting a point sampling strategy based on prediction uncertainty to calculate mask loss and Dice loss on sampling points so as to carry out back propagation training; S6, reasoning and post-processing, namely finishing forward reasoning according to the steps S1-S4, performing threshold and overlapping processing on the obtained mask and category prediction, generating corresponding component and sub-component results in the object mask, and outputting a semantic segmentation graph, an instance segmentation graph, a component segmentation graph and a sub-component segmentation graph.
2. The method for multi-granularity image segmentation based on unified query driving according to claim 1, wherein the multi-granularity segmentation prediction in step S4 comprises: S401, object level segmentation: Generating object masks and object category predictions for each object query by using an object-level mask classification head, and constructing a supervision signal containing classification loss, mask loss and Dice loss by adopting a Hungary matching algorithm to perform one-to-one correspondence on the prediction results and the marked objects; s402, for the object matched with the existing component label, inputting the corresponding object query and the mask feature into a component decoder, generating a component mask by the component decoder, embedding the component mask, performing inner product operation with the mask feature to generate a component mask, and calculating a component mask loss and a component level price loss, so that the component category classification is not performed independently; s403, sub-component level segmentation, namely inputting the corresponding object query and the mask characteristics into a sub-component decoder together for the object matched with the sub-component mark, generating a sub-component mask by the sub-component decoder, embedding the sub-component mask and the mask characteristics, performing inner product operation on the sub-component mask and the mask characteristics, and calculating a sub-component mask loss and a sub-component level price loss, wherein the sub-component category classification is not independently performed.
3. The unified query-driven multi-granularity image segmentation method according to claim 1, wherein the component decoder and the sub-component decoder are each of a multi-layer perceptron structure, each decoder comprising: an input layer for receiving an object query or transformed features thereof; at least one hidden full-connection layer adopting a ReLU activation function is used for nonlinear characteristic transformation; And an output layer for outputting a mask embedded vector matching the mask feature dimension, and generating a component or sub-component mask by performing inner product with the mask feature, wherein the component decoder and the sub-component decoder are structurally isomorphic but adopt mutually independent parameters.
4. A multi-granularity image segmentation method based on unified query driving according to claim 3, wherein the component decoder and the sub-component decoder take the object query and the mask feature as inputs directly, respectively output component mask embedding and sub-component mask embedding, and the generation process of the sub-component mask is independent of the component mask or the component feature, so that the component task and the sub-component task are independent in representation space, and the occupation of the component task on the sub-component representation is avoided.
5. The method for partitioning a multi-granularity image based on unified query driving as claimed in claim 1, wherein in step S2, the pixel-level feature coding adopts a multi-scale coding structure including deformable attention, and the method aligns and fuses backbone features from different scales, and outputs mask features with fixed channel dimensions and adjustable spatial resolution, so as to give consideration to both high resolution details and global context information.
6. The method for partitioning a multi-granularity image based on unified query driving as claimed in claim 1, wherein the point sampling strategy in step S5 includes calculating uncertainty of a log of a prediction mask in each forward propagation process, selecting importance sampling points according to uncertainty from candidate points obtained by random oversampling, and calculating mask loss and position loss on the sampling points.
7. The method for multi-granularity image segmentation based on unified query driving according to claim 6, wherein the total loss function in step S5 satisfies the following condition: ; when the sub-component-to-component hierarchical consistency constraint is enabled, the total loss function further comprises: ; Wherein, the For object level penalty, including object class penalty, object mask penalty, and object level race penalty; for the part level penalty, including a part mask penalty and a part level price penalty; For the sub-component level penalty, including a sub-component mask penalty and a sub-component level price penalty; Is the corresponding non-negative loss weight coefficient.
8. The method according to claim 1, wherein the reasoning and post-processing in the step S6 includes determining a pixel set covered by each object mask in the object-level segmentation result, and executing sigmoid and argmax operations on class channels respectively for part-level and sub-part-level mask predictions inside each pixel set covered by the object mask, and assigning pixels to the corresponding part or sub-part class of the object, wherein the part and sub-part masks are explicitly guaranteed to be contained in the corresponding object mask.
9. A multi-granularity image segmentation method based on unified query driving according to claim 1, wherein the component decoder or sub-component decoder is configured by a switch in an inference stage to implement multi-granularity selectable inference that only the object-level result is output, the object-level result and the component-level result are simultaneously output, or the object-level result, the component-level result and the sub-component-level result are simultaneously output.
10. A multi-granularity image segmentation model architecture based on unified query driving for executing the multi-granularity image segmentation method based on the unified query driving as claimed in any one of claims 1 to 9, wherein the architecture comprises a backbone network, a pixel decoder, a transducer decoder, a component decoder, a sub-component decoder and an output and post-processing module.

Description

Multi-granularity image segmentation method based on unified query driving Technical Field The invention belongs to the technical field of computer vision and deep learning, and particularly relates to a multi-granularity image segmentation method based on unified query driving. Background Image segmentation is one of the basic tasks in computer vision, and is widely applied to the fields of automatic driving, medical image analysis, industrial detection, precision manufacturing and the like. The existing mainstream method focuses on single-layer tasks such as semantic segmentation or instance segmentation or expands to an object-component two-layer structure on the basis of the single-layer tasks, and is used for analyzing components with coarser granularity. However, in the application scenarios of medical organ-region-fine structure, industrial product-component-part level defect, etc., it is often necessary to divide and identify the sub-component level structure of the component below more carefully, and it is difficult to satisfy the requirements of fine structure and hierarchical semantics only by providing object-level or object-component two-layer analysis. Existing unified object-component frameworks typically predict object masks and component masks simultaneously by sharing query and mask features, but the population remains in a two-tier structure. When a third-level sub-component needs to be introduced, common practice is to either train a new sub-component model alone or to concatenate a layer of sub-branches on a component branch, which easily leads to problems of complex structure, unstable training and confusion of hierarchical relationship processing. The prior art mainly has the following defects when facing to multi-granularity and hierarchical segmentation scenes: The segmentation level is insufficient, the efficient scheme of naturally expanding to an object-component-sub-component three-layer structure under a unified query-mask framework is lacking, sub-component level segmentation often depends on an additional model or a complex cascade structure, the task coupling degree is high, in a simple cascade design, sub-component prediction often depends on a component prediction result directly, sub-component representation is easy to be squeezed by component tasks, particularly when the semantic relation between the component and the sub-component is weak, mutual interference is easy to occur, and if the sub-component is made by simply sharing the same component branch parameter, the representation space is squeezed. If the sub-components are also plugged into the same set of component branches, the upper-layer task which is easier in the optimization process can lead gradients, and the sub-components are finer, sparser and harder to learn, so that the problems of sub-component prediction degradation, rough boundaries or neglected occur. Therefore, there is a need to design a multi-granularity image segmentation method based on unified query driving to solve the above problems. Disclosure of Invention The invention aims to solve the technical problem of providing a multi-granularity image segmentation method based on unified query driving, which expands the segmentation capability of a third layer of sub-components on the premise of keeping the original unified object-component framework and realization simplicity, enables the sub-components to have independent representation capability through the sub-component decoder which is isomorphic with the component decoder but independent in parameters, does not squeeze by the component tasks, directly predicts the parallel representation of the components based on object level instead of generating the sub-components in a cascade manner in the components, and can select and enable object-level branches and component-level branches as required in the reasoning stage, and further select and enable sub-component-level branches when enabling the component-level branches, thereby realizing multi-granularity selectable reasoning and flexibly balancing between precision and calculation cost. In order to achieve the technical effects, the technical scheme adopted by the invention is as follows: A multi-granularity image segmentation method based on unified query driving simultaneously drives the generation of object level, component level and sub-component level masks through a single set of object query under a unified transducer object query and mask feature framework, and specifically comprises the following steps: S1, extracting features, namely preprocessing an input image, and then sending the input image into a backbone network to extract a multi-scale feature map; S2, inputting the multi-scale feature map into a pixel decoder, and fusing different scale information through an attention or deformable attention mechanism to obtain pixel-level feature representation and mask features corresponding to the pixel-level feature representation; S3, o