EP-4738338-A1 - LOW LATENCY VIDEO STREAMING REDUCING FRAME BUFFERS

EP4738338A1EP 4738338 A1EP4738338 A1EP 4738338A1EP-4738338-A1

Abstract

In an example, a device may include logic to receive a plurality of video frames from a video source, logic to process the plurality of video frames, logic to provide at least some of the plurality of video frames to a video sink, and logic to reduce a display latency of each of the video frames provided to the video sink by reducing a number of frames in a video pipeline comprising the device.

Inventors

MAMIDWAR, RAJESH
CHEN, XUEMIN
HENG, BRIAN

Assignees

Avago Technologies International Sales Pte. Limited

Dates

Publication Date: 20260506
Application Date: 20251028

Claims (15)

A device, comprising logic to receive a plurality of encoded video frames from a video source; logic to process at least some of the plurality of encoded video frames; and logic to provide at least some of the processed video frames to a video sink; wherein processing the plurality of video frames reduces a display latency of each of the video frames provided to the video sink by reducing a number of video frames stored in one or more frame buffers in a video pipeline associated with the device.
The device of claim 1, wherein the video pipeline comprises a plurality of video processing stages within the device and a plurality of frame buffers, the plurality of processing stages comprising a decoder, and the plurality of frame buffers comprising a video processing frame buffer and a display frame buffer.
The device of claim 2, wherein processing the plurality of video frames comprises decoding at least some of the plurality of encoded video frames to produce a plurality of decoded video frames; and providing at least some of the processed video frames to the video sink comprises providing at least some of the decoded video frames to the video sink.
The device of claim 3, wherein reducing the number of frames stored in one or more frame buffers in the video pipeline comprises identifying plurality of available decoded video frames in the video pipeline; identifying a most recently-available decoded video frame; providing the most recently-available decoded video frame to the video sink; and discarding a remainder of the plurality of available decoded video frames.
The device of claim 3 or 4, wherein the received plurality of video frames are encoded at a fixed frame rate imposing an order on the video frames; and providing the most-recently available decoded video frame to the video sink comprises providing the most recently-available decoded video frame out of order from the fixed frame rate.
The device of any one of the claims 3 to 5, wherein reducing the number of frames stored in one or more buffers in the video pipeline comprises providing the plurality of decoded video frames to the video sink without storing any of the plurality of decoded video frames in the video processing frame buffer.
The device of any one of the claims 3 to 6, wherein reducing a number of frames stored in one or more buffers in the video pipeline comprises providing the plurality of decoded video frames to the video sink while storing no more than a single frame in the frame display buffer.
The device of any one of the claims 3 to 7, wherein providing the plurality of decoded video to the video sink comprises providing at least some of the plurality of decoded video frames to the video sink via a High-Definition Multimedia Interface (HDMI) connection; wherein in particular providing the plurality of decoded frames to the video sink further comprises providing at least some of the plurality of decoded video frames to the video sink via an alternate connection separate from the HDMI connection.
The device of any one of the claims 3 to 8, wherein reducing the number of frames stored in one or more buffers in the video pipeline further comprises storing a plurality of reference frames in a reference buffer outside the video pipeline and separate from a display buffer of the video pipeline; wherein in particular the device further comprises logic to generate a video frame from one or more of the reference frames stored in the reference buffer.
The device of any one of the claims 1 to 9, wherein reducing the number of frames stored in one or more buffers in the video pipeline further comprises allowing frame tearing when displaying the plurality of decoded video frames; wherein in particular allowing frame tearing when displaying the plurality of decoded video frames comprises selectively allowing frame tearing based on user input or on a characteristic of an application displaying the plurality of video frames.
The device of any one of the claims 1 to 10, wherein providing at least some of the processed video frames to the video sink comprises decoding at least some of the plurality of encoded video frames at a rate faster than a frame rate of the plurality of encoded video frames; and providing a decoded portion of an encoded video frame to an output of the device before an entirety of the encoded video frame is fully decoded; wherein in particular the decoded portion of the encoded video frame comprises one or more macroblock units or one or more coding units; and providing the decoded portion of the encoded video frame to the output of the device reduces a display latency of the decoded portion of the encoded video frame to a number of lines specified by the one or more macroblock units or one or more coding units.
The device of any one of the claims 1 to 11, wherein the device is a set-top box, a component of a set-top box, or a system on a chip (SoC); or wherein the device is a television.
The device of any one of the claims 1 to 12, further comprising logic to selectively disable the logic to reduce the display latency of the video frames provided to the video sink based on an application associated with the video frames; configuration settings; or user controls.
A method, comprising receiving, by a device, a plurality of video frames from a video source; processing the plurality of video frames; and providing at least some of the plurality of video frames to a video sink; wherein processing the plurality of video frames reduces a display latency of each of the video frames provided to the video sink by reducing a number of video frames in a video pipeline associated with the device.
A set-top box, comprising an input interface to receive a plurality of video frames from a video source; a decoder to decode the plurality of video frames to produce a plurality of decoded video frames; an interface to provide the plurality of decoded video frames to a video sink; and logic to reduce a display latency of each of the decoded video frames provided to the video sink by reducing a number of video frames stored in one or more frame buffers in a video pipeline of the set-top box.

Description

Technical Field This document relates generally to video streaming and more specifically to low latency video streaming by reducing a display latency of video frames provided to a video sink. Background Traditional video streaming, whether broadcast or IP-based, has historically relied on complex network infrastructure to deliver smooth video experiences to end users. This typically involves a video player decoding and sending every video and audio frame to the TV at the right time. Such traditional approaches relied on complex network architectures and extensive buffering to ensure consistent frame delivery. This often resulted in higher costs due to increased bandwidth and storage requirements. To address these challenges, various video coding standards (e.g., MPEG2, AVC, HEVC, VP9, AV1) have been developed for efficient compression, but this often comes at the cost of increased computational complexity. Further, to ensure smooth video playback, buffering mechanisms have been implemented at different stages of a video pipeline, including cloud servers, networks, and video players (e.g., set-top boxes or over-the-top (OTT) clients). For instance, popular streaming services like YouTube and Netflix typically buffer 10-40 seconds of video frames. Emerging applications such as cloud gaming, video conferencing, and virtual reality demand low-latency performance. These applications are driving the development of new network, encoder, and system standards. The cloud gaming has gained popularity as internet speeds have improved. In this model, the actual gaming server resides in the cloud, while the local game controller sends commands to the cloud server. The game is rendered on the cloud server, and the encoded video is transmitted to the user's device (e.g., TV or set-top box) over the video pipeline for display. Further, the video conferencing applications have become essential for remote work, online education, and telehealth. Low latency is crucial for a seamless experience. Many users are turning to OTT devices or set-top boxes for larger screen displays, as opposed to traditional conference equipment. Furthermore, the virtual reality experiences often require high-resolution video and low latency. Cloud-based rendering can provide the necessary processing power, while local devices can focus on displaying the rendered content. For applications such as gaming, video conference, and virtual reality and the like, end-to-end latency is more important than smooth video. Traditional set-top boxes and OTT devices, designed for smooth streaming video, often rely on fixed frame rates and buffering at multiple stages of the video pipeline. In other words, the video pipeline typically includes frame buffers at various stages, such as the encoder, decoder, video processing, high-definition multimedia interface (HDMI) input, and within the TV itself. This buffering ensures smooth playback but can also contribute to latency as each stage buffers multiple frames to ensure that complete frame data is available at the input before feeding the output stage. Traditional pipelines often require the transmission of entire frames, regardless of the number of pixels that have changed. This approach can introduce significant latency as well, which is undesirable for latency-sensitive applications like cloud gaming, video conferencing, and virtual reality. In other examples, at the HDMI interface, the frame rate is typically fixed and cannot be adjusted dynamically during playback. For example, in a 60 FPS configuration, one complete frame of data must be sent every 1/60th of a second. If the next frame is not ready in time (i.e., an underflow condition), the previous frame is repeated, resulting in potential visual artifacts. This fixed frame rate requirement can limit the ability to reduce latency in applications that demand real-time responsiveness. Brief Description of the Drawings Fig. 1 is a block diagram illustrating components of a device that can reduce latency in a video pipeline in accordance with some embodiments.Fig. 2 is a functional block diagram illustrating a device with a video pipeline in accordance with some embodiments.Fig. 3 is a frame timing diagram for a video pipeline in accordance with some embodiments.Fig. 4 is a flow diagram illustrating an exemplary method for reducing latency in a video pipeline in accordance with some embodiments.Fig. 5 is a flow diagram illustrating an exemplary method for reducing latency in a video pipeline in accordance with some embodiments.Fig. 6 is a functional block diagram illustrating a device comprising a video pipeline with reduced latency in accordance with some embodiments.Fig. 7 is a functional block diagram illustrating a device with a video pipeline having reduced latency in accordance with some embodiments.Fig. 8 is a functional block diagram illustrating a device with a video pipeline having reduced latency in accordance with some embodiments.Fig. 9 is a flow diag