US-12621439-B2 - Systems and methods for encoding and decoding video with memory-efficient prediction mode selection

US12621439B2US 12621439 B2US12621439 B2US 12621439B2US-12621439-B2

Abstract

A method of memory-efficient prediction mode selection includes receiving, by an encoder, a coded bitstream including a current frame, determining, by the encoder, costs of a first prediction mode and a second prediction mode, wherein determining further comprises determining, for the first prediction mode, a first bit cost and a first memory cost and determining, for the second prediction mode, a second bit cost and a second memory cost, selecting, by the encoder, a current prediction mode of the first prediction mode and the second prediction mode as a function of the first bit cost, first memory cost, second bit cost, and second memory cost, and encoding, by the encoder, the current frame using the current prediction mode. The prediction mode may be informed by at least one parameter received from the decoder.

Inventors

Hari Kalva
Borivoje Furht
Velibor Adzic

Assignees

OP SOLUTIONS, LLC

Dates

Publication Date: 20260505
Application Date: 20231229

Claims (11)

1 . A method of encoding features of a video signal in a system for machine video consumption comprising: receiving an input video signal; performing feature extraction on the received input video signal to generate a feature signal comprising a least one feature in a current video frame; determining costs of a first prediction mode and a second prediction mode, wherein determining further comprises: determining, for the first prediction mode, a first bit cost and a first memory cost associated with encoding the feature signal; and determining, for the second prediction mode, a second bit cost and a second memory cost associated with encoding the feature signal; selecting a current prediction mode of the first prediction mode and the second prediction mode as a function of the first bit cost, first memory cost, second bit cost, and second memory cost; and encoding the feature signal of the current frame using the current prediction mode.
2 . The encoding method of claim 1 , wherein determining the first memory cost further comprises retrieving a stored value representing the first memory cost.
3 . The encoding method of claim 1 , wherein determining the second memory cost further comprises retrieving a stored value representing the second memory cost.
4 . The encoding method of claim 1 , wherein determining the first memory cost further comprises receiving processor architecture data from a decoder, and determining the first memory cost from the processor architecture data.
5 . The encoding method of claim 1 , wherein determining the second memory cost further comprises receiving processor architecture data from a decoder, and determining the second memory cost from the processor architecture data.
6 . The encoding method of claim 1 , wherein selecting further comprises determining a threshold value based on the first memory cost and the second memory cost, and selecting as a function of the threshold value.
7 . The encoding method of claim 6 , wherein selecting further comprises comparing a difference between the first bit cost and the second bit cost to the threshold value.
8 . The encoding method of claim 1 , wherein the first prediction mode is vertical intra prediction and the second prediction mode is horizontal intra prediction.
9 . The encoding method of claim 8 , wherein selecting a current prediction mode includes determining a difference in bit cost between vertical intra prediction and horizontal intra prediction and selecting vertical intra prediction when the difference is less than a predetermined threshold value.
10 . The encoding method of claim 1 , wherein feature extraction is performed using a partial convolutional neural network trained using a machine model for a machine task performed at a decoder site.
11 . The encoding method of claim 10 , wherein encoding the feature signal is performed using a VVC compliant encoding method.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of international application PCT/US22/35647 filed on Jun. 30, 2022, and entitled SYSTEMS AND METHODS FOR ENCODING AND DECODING VIDEO WITH MEMORY-EFFICIENT PREDICTION MODE SELECTION, which application claims the benefit of priority of U.S. Provisional Application, Ser. No. 63/218,732, filed on Jul. 6, 2021, and entitled SYSTEMS AND METHODS FOR MEMORY-EFFICIENT PREDICTION MODE SELECTION, each of which is incorporated herein by reference in its entirety. FIELD OF THE INVENTION The present invention generally relates to the field of video encoding and decoding. In particular, the present invention is directed to systems and methods for organizing and searching a video database. BACKGROUND A video codec can include an electronic circuit or software that compresses or decompresses digital video. It can convert uncompressed video to a compressed format or vice versa. In the context of video compression, a device that compresses video (and/or performs some function thereof) can typically be called an encoder, and a device that decompresses video (and/or performs some function thereof) can be called a decoder. A format of the compressed data can conform to a standard video compression specification. The compression can be lossy in that the compressed video lacks some information present in the original video. A consequence of this can include that decompressed video can have lower quality than the original uncompressed video because there is insufficient information to accurately reconstruct the original video. There can be complex relationships between the video quality, the amount of data used to represent the video (e.g., determined by the bit rate), the complexity of the encoding and decoding algorithms, sensitivity to data losses and errors, ease of editing, random access, end-to-end delay (e.g., latency), and the like. Motion compensation can include an approach to predict a video frame or a portion thereof given a reference frame, such as previous and/or future frames, by accounting for motion of the camera and/or objects in the video. It can be employed in the encoding and decoding of video data for video compression, for example in the encoding and decoding using the Motion Picture Experts Group (MPEG)'s advanced video coding (AVC) standard (also referred to as H.264). Motion compensation can describe a picture in terms of the transformation of a reference picture to the current picture. The reference picture can be previous in time when compared to the current picture, from the future when compared to the current picture. When images can be accurately synthesized from previously transmitted and/or stored images, compression efficiency can be improved. SUMMARY OF THE DISCLOSURE A video encoder is provided that is configured with memory-efficient prediction mode selection. The encoder includes a processor programmed to perform the encoding operations. The encoder is configured to receive an input video including a current frame. The encoder determines costs of a first prediction mode and a second prediction mode. Preferably, the determining operation can further comprise determining, for the first prediction mode, a first bit cost and a first memory cost and determining, for the second prediction mode, a second bit cost and a second memory cost. The encoder selects a current prediction mode of the first prediction mode and the second prediction mode as a function of the first bit cost, first memory cost, second bit cost, and second memory cost. The current frame can be encoded using the current prediction mode. In some embodiments, determining the first memory cost can further comprise retrieving a stored value representing the first memory cost. Similarly, in certain embodiments, determining the second memory cost further comprises retrieving a stored value representing the second memory cost. In still other embodiments, determining the first memory cost may further comprise receiving processor architecture data from a decoder and determining the first memory cost from the processor architecture data. In further embodiments, determining the second memory cost can further comprise receiving processor architecture data from a decoder, and determining the second memory cost from the processor architecture data. In some encoder embodiments, selecting can further comprise determining a threshold value based on the first memory cost and the second memory cost, and selecting as a function of the threshold value. The selecting operation may further comprise comparing a difference between the first bit cost and the second bit cost to the threshold value. In one encoder embodiment, the first prediction mode is vertical intra prediction and the second prediction mode is horizontal intra prediction. A video decoder is provided that is configured to operate with an encoder having a memory-efficient prediction mode. The decoder can be configured t