EP-4154217-B1 - HYBRID VIDEO SEGMENTATION AIDED BY OPTICAL FLOW FOR OBJECTS IN MOTION

EP4154217B1EP 4154217 B1EP4154217 B1EP 4154217B1EP-4154217-B1

Inventors

CHEN, Mengyu
ZHU, Miaoqi
TAKASHIMA, YOSHIKAZU
OUYANG, Chao
DE LA ROSA, Daniel
LAFUENTE, Michael
SHAPIRO, STEPHEN

Dates

Publication Date: 20260506
Application Date: 20210528

Claims (14)

A computer-implemented method, comprising: applying a gamma adjustment (110) to frames of an input video with particular values of parameters to generate at least one set of gamma-adjusted frames; applying a segmentation technique (120) to the at least one set of gamma-adjusted frames to generate segmentation masks; applying an optical flow technique (130) to the at least one set of the gamma-adjusted frames to generate optical flow maps; and combining (140) the segmentation masks and the optical flow maps to generate hybrid segmentation masks; wherein combining the segmentation masks and the optical flow maps comprises multiplying a pixel-wise code value for red, green, and blue components for sub-pixels of objects which have been applied with the segmentation technique.
The method of claim 1, wherein the segmentation technique comprises: categorizing each frame of the at least one set of gamma-adjusted frames into a class; detecting objects within each frame and drawing boundaries around the objects; and identifying parts of each frame and corresponding the parts to the objects.
The method of claim 2, wherein each flow map of the optical flow maps is a map of motion of the objects between consecutive frames of the at least one set of gamma-adjusted frames.
The method of claim 1, wherein the at least one set of gamma-adjusted frames comprises a first set of gamma-adjusted frames and a second set of gamma-adjusted frames.
The method of claim 4, wherein i) the first set of gamma-adjusted frames is generated for the segmentation technique or ii) the second set of gamma-adjusted frames is generated for the optical flow technique.
The method of claim 1, further comprising: applying the gamma adjustment to the frames of the input video with different values of the parameters to generate different sets of gamma-adjusted frames and repeating the applications and the combination of the segmentation technique and the optical flow technique, when a misalignment exists between the segmentation masks and the optical flow maps such that the segmentation masks and the optical flow maps are out of alignment by a predetermined number of pixels along a boundary.
A system (600), comprising: a gamma function applicator (610) to apply a gamma adjustment to frames of an input video using a set of values of parameters to generate at least one set of gamma-adjusted frames; a segmentation mask generator (620) to apply a segmentation technique to the at least one set of gamma-adjusted frames to generate segmentation masks; an optical flow map generator (630) to apply an optical flow technique to the at least one set of the gamma-adjusted frames to generate optical flow maps; and a combiner (640) to combine the segmentation masks and the optical flow maps to generate hybrid segmentation masks; wherein the combiner (640) further multiplies a pixel-wise code value for red, green, and blue components for sub-pixels of objects which have been applied with the segmentation technique.
The system of claim 7, further comprising a processor to receive the hybrid segmentation masks and to determine if the combiner produced an acceptable alignment between the segmentation masks and the optical flow maps such that the segmentation mask and the optical flow map are not out of alignment by a predetermined number of pixels along a boundary, the processor to instruct the gamma function applicator to use another set of values of the parameters to improve the alignment between the segmentation masks and the optical flow maps and to repeat processes performed by the segmentation mask generator and the optical flow map generator, if the alignment is not acceptable.
The system of claim 7, wherein the segmentation mask generator comprises: a categorizer to categorize each frame of the at least one set of gamma-adjusted frames into a class; a detector to detect objects within each frame and drawing boundaries around the objects; and an identifier to identify parts of each frame and corresponding the parts to the objects.
The system of claim 9, wherein each flow map of the optical flow maps is a map of motion of the objects between consecutive frames of the at least one set of gamma-adjusted frames.
The system of claim 7, wherein the at least one set of gamma-adjusted frames comprises a first set of gamma-adjusted frames and a second set of gamma-adjusted frames.
The system of claim 11, wherein either i) the first set of gamma-adjusted frames is generated for the segmentation technique or ii) the second set of gamma-adjusted frames is generated for the optical flow technique.
A non-transitory computer-readable storage medium storing a computer program to generate segmentation masks, the computer program comprising executable instructions that cause a computer to: apply a gamma adjustment (110) to frames of an input video with particular values of parameters to generate at least one set of gamma-adjusted frames; apply a segmentation technique (120) to the at least one set of gamma-adjusted frames to generate segmentation masks; apply an optical flow technique (130) to the at least one set of the gamma-adjusted frames to generate optical flow maps; and combine (140) the segmentation masks and the optical flow maps to generate hybrid segmentation masks; wherein the segmentation technique further comprises executable instructions that cause the computer to multiply a pixel-wise code value for red, green, and blue components for sub-pixels of objects which have been applied with the segmentation technique.
The computer-readable storage medium of claim 13, wherein the segmentation technique comprises executable instructions that cause the computer to: categorize each frame of the at least one set of gamma-adjusted frames into a class; detect objects within each frame and drawing boundaries around the objects; and identify parts of each frame and correspond the parts to the objects.

Description

BACKGROUND Field The present disclosure relates to video segmentation, and more specifically, to generating improved segmentation masks. Background Conventional machine learning (ML) based segmentation techniques using masks generate good enough results for non-professional media contents, such as low-resolution videos on social media. However, the quality of segmentation masks may not be high enough to meet the requirements of professional image/video processing tasks. For example, the edge clarity varies from frame to frame, which may cause incorrectly-inferred sub-pixels to appear in the masked area. Thus, the ML-based segmentation techniques may fail to produce reliable and/or consistent segmentation masks in certain scenarios. These scenarios may include: High resolution images (e.g., HD, 4K); Dynamic scenes, particularly those that have fast-moving objects; Color-graded contents (e.g., low brightness, similar texture); Dark scenes; and Multiple target objects in the scene to be segmented independently. Prior art includes: CAO YANG ET AL: "A novel segmentation based video-denoising method with noise level estimation", INFORMATION SCIENCES, vol. 281 , pages 507 - 520, DOI: 10.1016/ J.INS.2014.05.031 discloses video denoising using optical flow following graph based segmentation taking gamma correction into consideration.PALMERO CRISTINA ET AL: "Multi-modal RGB-Depth-Thermal Human Body Segmentation", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, vol. 118, no. 2, 13 April 2016, pages 217 - 239, DOI: 10.1007/S11263-016-0901-X discloses human segmentation from RGB-depth-thermal video data without gamma correction using bounding boxes. Following background subtraction, ROIs are extracted. SUMMARY The present disclosure is defined by the appended set of claims. BRIEF DESCRIPTION OF THE DRAWINGS The details of the present disclosure, both as to its structure and operation, may be gleaned in part by study of the appended drawings, in which like reference numerals refer to like parts, and in which: FIG. 1 is a flow diagram of a method for generating a segmentation mask that is more consistent and accurate in accordance with one implementation of the present disclosure;FIG. 2 shows the original frames of the input video and the gamma-adjusted frames of the second video;FIG. 3 shows the gamma-adjusted frames and the generated segmentation masks;FIG. 4A shows one implementation of the gamma-adjusted frames and the generated optical flow maps;FIG. 4B shows the optical flow maps generated from the gamma-adjusted frames when the frames include multiple moving objects;FIG. 5 shows the segmentation masks, the optical flow maps, and the generated hybrid segmentation masks;FIG. 6 is a block diagram of a segmentation mask generation system in accordance with one implementation of the present disclosure;FIG. 7A is a representation of a computer system and a user in accordance with an implementation of the present disclosure; andFIG. 7B is a functional block diagram illustrating the computer system hosting the video application in accordance with an implementation of the present disclosure. DETAILED DESCRIPTION As described above, the conventional ML-based segmentation techniques may fail to produce reliable and/or consistent segmentation masks in scenarios involving high resolution images, dynamic scenes including fast-moving objects, color-graded contents, dark scenes, and/or multiple target objects in the scene. Certain implementations of the present disclosure provide methods and systems for using a hybrid segmentation and optical flow technique to generate a more consistent and accurate segmentation mask. Further, an image preprocessing technique including gamma correction is used to ensure the effectiveness of the technique. Although the optimal input video for a segmentation mask generation process and an optical flow map generation process may be different, each input video can be adjusted (e.g. with gamma correction) to achieve improved combined performance of the stacked layer segmentation result. After reading the below descriptions, it will become apparent how to implement the disclosure in various implementations and applications. Although various implementations of the present disclosure will be described herein, it is understood that these implementations are presented by way of example only, and not limitation. As such, the detailed description of various implementations should not be construed to limit the scope or breadth of the present disclosure. In one implementation, to improve the consistency and accuracy of a segmentation mask, following steps may be taken: (a) apply the gamma function to the raw frames to fine-tune the brightness and/or contrast; (b) apply a segmentation technique to the gamma-adjusted frames to generate the segmentation mask; (c) apply an optical flow technique to the gamma-adjusted frames to generate an optical flow map; and (d) stack