KR-20260065757-A - METHODS AND APPARATUS TO PROCESS VIDEO FRAME PIXEL DATA USING ARTIFICIAL INTELLIGENCE VIDEO FRAME SEGMENTATION

KR20260065757AKR 20260065757 AKR20260065757 AKR 20260065757AKR-20260065757-A

Abstract

The disclosed examples include a video frame segmenter circuit that generates segmentation data of a first video frame pixel data, the segmentation data includes metadata corresponding to a foreground region and a background region, and the foreground region corresponds to the first video frame pixel data. The disclosed examples also include a video encoder circuit that generates a first foreground boundary region and a first background boundary region based on the segmentation data, determines a first virtual tile of the first video frame pixel data—where the first virtual tile is located in the first foreground boundary region—encodes the first virtual tile into a video data bitstream without encoding the first background boundary region, and transmits the video data bitstream over a network.

Inventors

구루바 레디아르, 팔라니벨
보이스, 질
네어, 프라빈

Assignees

인텔 코포레이션

Dates

Publication Date: 20260511
Application Date: 20260408
Priority Date: 20211217

Claims (20)

As a video coding device, Machine-readable instructions; and Based on the above machine-readable commands: Segment the video frame into a foreground region and a background region; Generating a message comprising: first bounding box data specifying first coordinates of a first bounding box associated with the foreground area; a first label assigned to the first bounding box; second bounding box data specifying second coordinates of a second bounding box associated with the background area; and a second label assigned to the second bounding box; To encode samples associated with the foreground region of the above video frame and the above message into a video stream At least one programmed processor circuit A video coding device including
A video coding device according to claim 1, wherein the message is a supplemental enhancement information (SEI) message.
In paragraph 2, the video coding device, wherein the message is an annotated regions supplemental enhancement information message.
A video coding device according to claim 1, wherein the first coordinate specifies the top-left corner of the first bounding box, and the second coordinate specifies the top-left corner, width, and height of the second bounding box.
A video coding device according to claim 1, wherein the samples are first samples, and at least one of the at least one processor circuit encodes second samples associated with the background region of the video frame in the video stream.
A video coding device according to claim 1, wherein the first label identifies that the first bounding box is a foreground bounding box, and the second label identifies that the second bounding box is a background bounding box.
A video coding device according to claim 1, wherein the message is a first message, and at least one of the at least one processor circuit encodes an identifier of a first virtual background among a plurality of virtual backgrounds in at least one of the first message or the second message of the video stream, and pixel data associated with the second bounding box of the video frame is reconstructed based on the first virtual background.
As at least one non-transient computer-readable storage medium comprising instructions, said instructions cause at least one processor circuit to, at least: Segment the video frame into a foreground region and a background region; Generating a message comprising: first bounding box data specifying first coordinates of a first bounding box associated with the foreground area; a first label assigned to the first bounding box; second bounding box data specifying second coordinates of a second bounding box associated with the background area; and a second label assigned to the second bounding box; At least one non-transient computer-readable storage medium that encodes samples associated with the foreground region of the video frame and the message into a video stream.
In paragraph 8, the above message is at least one non-transient computer-readable storage medium, which is a supplemental enhancement information (SEI) message.
In paragraph 9, at least one non-transient computer-readable storage medium, wherein the message is an annotated regions supplemental enhancement information message.
In claim 8, at least one non-transient computer-readable storage medium wherein the first coordinate specifies the top-left corner of the first bounding box, and the second coordinate specifies the top-left corner, width, and height of the second bounding box.
In claim 8, the samples are first samples, and the instructions cause at least one of the at least one processor circuit to encode second samples associated with the background region of the video frame in the video stream, at least one non-transient computer-readable storage medium.
In claim 8, at least one non-transient computer-readable storage medium, wherein the first label identifies that the first bounding box is a foreground bounding box and the second label identifies that the second bounding box is a background bounding box.
In claim 8, the message is a first message, and the instructions cause at least one of the at least one processor circuit to encode an identifier of a first virtual background among a plurality of virtual backgrounds in at least one of the first message or the second message of the video stream, and pixel data associated with the second bounding box of the video frame is reconstructed based on the first virtual background, at least one non-transient computer-readable storage medium.
As a system, Means for segmenting a video frame into a foreground region and a background region; and A means of encoding a video stream Includes, The above-mentioned encoding means is: Generating a message comprising: first bounding box data specifying first coordinates of a first bounding box associated with the foreground area; a first label assigned to the first bounding box; second bounding box data specifying second coordinates of a second bounding box associated with the background area; and a second label assigned to the second bounding box; A system for encoding samples associated with the foreground region of the above video frame and the above message into a video stream.
In paragraph 15, the above message is a supplemental enhancement information (SEI) message, a system.
In paragraph 16, the above message is a system that is an annotated regions supplemental enhancement information message.
A system according to claim 15, wherein the first coordinate specifies the top-left corner of the first bounding box, and the second coordinate specifies the top-left corner, width, and height of the second bounding box.
In paragraph 15, the above samples are first samples, and the encoding means is a system that encodes second samples associated with the background region of the video frame in the video stream.
A system according to claim 15, wherein the first label identifies that the first bounding box is a foreground bounding box, and the second label identifies that the second bounding box is a background bounding box.

Description

Methods and apparatus to process video frame pixel data using artificial intelligence video frame segmentation The present disclosure generally relates to computers, and in particular to a method and apparatus for processing video frame pixel data using artificial intelligence video frame segmentation. Electronic user devices, such as laptops or mobile devices, include a camera that captures images. The camera can be used during a video call in which images of the device's user are transmitted to other user devices. Figure 1 is a process flowchart illustrating a conventional process of segmenting and encoding video frames during a video conference. Figure 2 is a flowchart illustrating a conventional process for decoding and displaying video frames during a video conference. FIG. 3 illustrates an exemplary user device structured to communicate in an end-to-end video conference using artificial intelligence video frame segmentation in accordance with the teachings of the present disclosure. FIG. 4 is a block diagram of an exemplary implementation of the video frame segmenter circuit and video encoder circuit of the user device of FIG. 3. FIG. 5 is a block diagram of an exemplary implementation of the video decoder circuit and video display controller circuit of the user device of FIG. 3. FIG. 6 illustrates an exemplary buffer pool in which read and write operations are performed at different times by the video decoder circuit and video display controller circuit of FIG. 3 and FIG. 5. FIG. 7 is an exemplary representation of video frames that are decoded by the exemplary video decoder circuit of FIG. 3 and FIG. 5 in low-power mode and displayed on a user device in accordance with the teachings of the present disclosure. FIG. 8 is a flowchart illustrating exemplary machine-readable instructions and/or operations that may be executed and/or instantiated by an exemplary processor circuit to implement the video frame segmenter circuit and video encoder circuit of FIG. 3 and/or FIG. 4 to segment, encode, and transmit video frames. FIG. 9 is a flowchart illustrating exemplary machine-readable instructions and/or operations that may be executed and/or instantiated by an exemplary processor circuit to implement the video decoder circuit and video display controller circuit of FIG. 3 and/or FIG. 5 to decode, render, and display video frames. FIG. 10 is a flowchart illustrating exemplary machine-readable instructions and/or operations that may be executed and/or instantiated by an exemplary processor circuit to implement read and write operations in a buffer pool by the video decoder circuit and video display controller circuit of FIG. 3, FIG. 5, and/or FIG. 6 to decode, store, and/or update intra-frame data in the buffer pool. FIG. 11 is a block diagram of an exemplary processing platform comprising a processor circuit structured to execute exemplary machine-readable instructions of FIG. 8, 9, and 10 to implement the user device of FIG. 3 through 5 to implement artificial intelligence video frame segmentation in accordance with the teachings of the present disclosure. FIG. 12 is a block diagram of an exemplary implementation of the processor circuit of FIG. 11. FIG. 13 is a block diagram of another exemplary implementation of the processor circuit of FIG. 11. FIG. 14 is a block diagram of an exemplary software distribution platform (e.g., one or more servers) for distributing software (e.g., software corresponding to the exemplary machine-readable instructions of FIG. 8, 9, and/or FIG. 10) to client devices associated with end users and/or consumers (e.g., for licensing, sales, and/or use), retailers (e.g., for sales, resale, licensing, and/or sub-licensing), and/or original equipment manufacturers (e.g., for inclusion in products to be distributed to other end users, such as retailers and/or direct purchase customers). Generally, the same reference numbers will be used throughout the drawing(s) and accompanying descriptions to refer to identical or similar parts. The drawings are not drawn in a fixed proportion. Instead, the thickness of layers or regions may be enlarged in the drawings. The drawings show layers and regions with clean lines and boundaries, but some or all of these lines and/or boundaries may be idealized. In reality, boundaries and/or lines may be unobservable and/or blended, or irregular. Unless specifically stated otherwise, predicates such as “first,” “second,” “third,” etc. are used herein without giving any meaning or otherwise indicating priority, physical order, arrangement within a list, and/or ordering, but are used merely as labels and/or any designations to distinguish elements for convenience of understanding the disclosed examples. In some examples, the predicate “first” may be used to refer to an element in the detailed description, and the same element may be referred to by a different predicate, such as “second” or “third,” in the claims. In such cases, it should be understood that the