CN-121982301-A - Intelligent mask generation method, device and equipment for interactive image segmentation

CN121982301ACN 121982301 ACN121982301 ACN 121982301ACN-121982301-A

Abstract

The invention provides an intelligent mask generation method, device and equipment for interactive image segmentation, wherein the method comprises the steps of obtaining initial image segmentation interaction information input by a user; the method comprises the steps of carrying out segmentation constraint signal conversion processing on initial image segmentation interaction information to obtain target image segmentation information, carrying out feature fusion on constraint masks of target image segmentation information areas to obtain a fusion feature map, carrying out primary segmentation on the fusion feature map to obtain an initial mask, carrying out segmentation on the initial mask according to secondary interaction information to obtain an intermediate mask, and carrying out edge fusion processing on the intermediate mask to obtain a target mask. The intelligent mask generation method and the intelligent mask generation device can improve the efficiency, accuracy and multi-mode understanding capability of intelligent mask generation.

Inventors

GAO TIANMING
LUO XINKAI
CAO HONGZHI
Sun Shidan

Assignees

中译文娱科技(青岛)有限公司

Dates

Publication Date: 20260505
Application Date: 20251229

Claims (10)

1. An intelligent mask generation method for interactive image segmentation, which is characterized by comprising the following steps: acquiring initial image segmentation interaction information input by a user; performing segmentation constraint signal conversion processing on the initial image segmentation interaction information to obtain target image segmentation information; Feature fusion is carried out on the constraint mask of the target image segmentation information region, and a fusion feature map is obtained; Performing preliminary segmentation on the fusion feature map to obtain an initial mask; dividing the initial mask according to the secondary interaction information to obtain an intermediate mask; And carrying out edge fusion processing on the intermediate mask to obtain the target mask.
2. The intelligent mask generation method of claim 1, wherein performing a segmentation constraint signal conversion process on the initial image segmentation interaction information to obtain target image segmentation information comprises: Extracting semantic tags and attribute weights according to the text instruction to obtain a target text instruction; Weighting and diffusing the point selection marks to obtain target point selection marks; and carrying out confidence degree inspection on the frame data to obtain target frame data.
3. The intelligent mask generation method for interactive image segmentation according to claim 1, wherein feature fusion is performed on the constraint mask of the target image segmentation information region to obtain a fused feature map, comprising: obtaining feature weights according to the importance of each feature in the target image segmentation information; And obtaining a fusion feature map according to the feature weights.
4. The intelligent mask generation method of claim 1, wherein performing a preliminary segmentation on the fused feature map to obtain an initial mask comprises: performing bounding box positioning on the fusion feature map to obtain a guide mask; combining the guide mask with the visual features of the image to obtain an initial mask.
5. The intelligent mask generation method according to claim 1, wherein the dividing the initial mask according to the secondary interaction information to obtain an intermediate mask comprises: performing deviation evaluation on the initial mask according to the secondary interaction information to obtain a deviation evaluation result; Obtaining a deviation thermodynamic diagram according to the deviation evaluation result; And adjusting the initial mask according to the deviation thermodynamic diagram to obtain an intermediate mask.
6. The intelligent mask generation method of claim 1, wherein performing edge fusion processing on the intermediate mask to obtain a target mask comprises: obtaining an edge gradient according to the horizontal gradient and the vertical gradient; and carrying out edge fusion processing on the intermediate mask according to the edge gradient to obtain the target mask.
7. The intelligent mask generation method of claim 1, wherein performing edge fusion processing on the intermediate mask to obtain a target mask, further comprises: And according to the semantic level of the target text instruction, carrying out layer splitting on the intermediate mask to obtain the target mask.
8. An intelligent mask generation apparatus for interactive image segmentation, comprising: The acquisition module is used for acquiring the initial image segmentation interaction information input by the user; The processing module is used for carrying out segmentation constraint signal conversion processing on the initial image segmentation interaction information to obtain target image segmentation information, carrying out feature fusion on the constraint mask of the target image segmentation information area to obtain a fusion feature map, carrying out primary segmentation on the fusion feature map to obtain an initial mask, carrying out segmentation on the initial mask according to secondary interaction information to obtain an intermediate mask, and carrying out edge fusion processing on the intermediate mask to obtain the target mask.
9. A computing device, comprising: One or more processors; Storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the method of any of claims 1to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program which, when executed by a processor, implements the method according to any of claims 1 to 7.

Description

Intelligent mask generation method, device and equipment for interactive image segmentation Technical Field The invention relates to the technical field of image editing processing, in particular to an intelligent mask generation method, device and equipment for interactive image segmentation. Background Currently, in image segmentation and mask generation techniques, the segmentation accuracy determines the efficiency of image processing. The existing image segmentation and mask generation technology has the following defects: The automatic segmentation method has poor adaptability to complex scenes, has low segmentation precision when a target is similar to a background and a shielding or fuzzy edge exists, and can not be directly used for accurate editing because the generated mask has the problems of omission or overdetection. Traditional interactive tools rely on a single interaction mode (such as only supporting clicking or frames), require multiple inputs by a user to approach a target, are cumbersome to operate, lack semantic understanding capability, and cannot specify a segmented target (such as "all circular objects in a segmentation map") through text description. Complex demands containing spatial position and attribute descriptions, such as "split left third white cat", are difficult to handle, and alignment accuracy of the split result with the user's intention is insufficient. The masking plate produced by the prior method has rough edges, poor segmentation effect on hairlines, glass, smoke and other fine granularity or transparent targets, needs manual secondary picture repair, and has low efficiency. Disclosure of Invention The invention aims to provide an intelligent mask generation method, device and equipment for interactive image segmentation. The efficiency, accuracy and multi-modal understanding capability of intelligent mask generation can be improved. In order to solve the technical problems, the technical scheme of the invention is as follows: An intelligent mask generation method for interactive image segmentation, comprising: acquiring initial image segmentation interaction information input by a user; performing segmentation constraint signal conversion processing on the initial image segmentation interaction information to obtain target image segmentation information; Feature fusion is carried out on the constraint mask of the target image segmentation information region, and a fusion feature map is obtained; Performing preliminary segmentation on the fusion feature map to obtain an initial mask; dividing the initial mask according to the secondary interaction information to obtain an intermediate mask; And carrying out edge fusion processing on the intermediate mask to obtain the target mask. Optionally, performing segmentation constraint signal conversion processing on the initial image segmentation interaction information to obtain target image segmentation information, including: Extracting semantic tags and attribute weights according to the text instruction to obtain a target text instruction; Weighting and diffusing the point selection marks to obtain target point selection marks; and carrying out confidence degree inspection on the frame data to obtain target frame data. Optionally, feature fusion is performed on the constraint mask of the target image segmentation information region to obtain a fused feature map, which includes: obtaining feature weights according to the importance of each feature in the target image segmentation information; And obtaining a fusion feature map according to the feature weights. Optionally, performing preliminary segmentation on the fused feature map to obtain an initial mask, including: performing bounding box positioning on the fusion feature map to obtain a guide mask; combining the guide mask with the visual features of the image to obtain an initial mask. Optionally, the segmentation of the initial mask according to the secondary interaction information to obtain an intermediate mask includes: performing deviation evaluation on the initial mask according to the secondary interaction information to obtain a deviation evaluation result; Obtaining a deviation thermodynamic diagram according to the deviation evaluation result; And adjusting the initial mask according to the deviation thermodynamic diagram to obtain an intermediate mask. Optionally, performing edge fusion processing on the intermediate mask to obtain a target mask, including: obtaining an edge gradient according to the horizontal gradient and the vertical gradient; and carrying out edge fusion processing on the intermediate mask according to the edge gradient to obtain the target mask. Optionally, performing edge fusion processing on the intermediate mask to obtain a target mask, and further including: And according to the semantic level of the target text instruction, carrying out layer splitting on the intermediate mask to obtain the target mask. The embodiment of the