US-20260129215-A1 - FEATURE PICTURE HEADER FOR FEATURE COMPRESSION BITSTREAMS

US20260129215A1US 20260129215 A1US20260129215 A1US 20260129215A1US-20260129215-A1

Abstract

Systems, methods, and instrumentalities for a feature picture header (FPH) for feature compression bitstreams. An example video decoding device may receive a first FPH network abstraction layer (NAL) unit that indicates a first feature picture parameter set (FPPS) associated with a first frame. The device may decode the first frame. The device may reconstruct a first set of features associated with the first frame based on the first FPPS. The device may receive a second FPH NAL unit that indicates a second FPPS associated with a second frame. The device may decode the second frame. The device may reconstruct a second set of features associated with the second frame based on the second FPPS.

Inventors

Fabien Racape
Hyomin Choi
Ahmed Hamza
Gurdeep Bhullar

Assignees

INTERDIGITAL VC HOLDINGS, INC.

Dates

Publication Date: 20260507
Application Date: 20241220

Claims (20)

1 . A device for video decoding, the device comprising: a processor configured to: receive a feature picture header (FPH) network abstraction layer (NAL) unit, wherein the FPH NAL unit comprises an identifier of a frame; associate the frame with the FPH NAL unit based on the identifier of the frame; and reconstruct a set of features associated with the frame based on the FPH NAL unit.
2 . The device of claim 1 , wherein the identifier of the frame comprises an indication of a picture order count associated with the frame.
3 . The device of claim 1 , wherein the frame is a first frame, the FPH NAL unit further indicates whether temporal up-sampling is enabled, and the processor is further configured to, on a condition that temporal up-sampling is enabled, automatically generate a second frame that is temporarily prior to the first frame.
4 . The device of claim 1 , wherein the FPH NAL unit further indicates whether temporal up-sampling is enabled, and the processor is further configured to, on a condition that temporal up-sampling is disabled, determine to skip temporal up-sampling.
5 . The device of claim 1 , wherein the processor is further configured to use the reconstructed set of features as an input to at least a part of a neural network.
6 . The device of claim 1 , wherein the FPH NAL unit is a first FPH NAL unit, the identifier of the frame is a first identifier of a first frame, the set of features is a first set of features, and the processor is further configured to: receive a second FPH NAL unit, wherein the second FPH NAL unit comprises a second identifier of a second frame; associate the second frame with the second FPH NAL unit based on the second identifier of the second frame; and reconstruct a second set of features associated with the second frame based on the second FPH NAL unit.
7 . The device of claim 1 , wherein the FPH NAL unit further indicates a feature picture parameter set (FPPS) and the processor being configured to reconstruct the set of features associated with the frame based on the FPH NAL unit comprises the processor being configured to reconstruct the set of features associated with the frame based on the FPPS.
8 . The device of claim 1 , wherein the FPH NAL unit further indicates a feature picture parameter set (FPPS), the FPPS comprises a parameter for reconstructing intermediate data associated with the frame, and the processor being configured to reconstruct the set of features associated with the frame based on the FPH NAL unit comprises the processor being configured to reconstruct the set of features based on the parameter for reconstructing the intermediate data associated with the frame.
9 . The device of claim 1 , wherein the frame is a decoded video frame.
10 . A device for video encoding, the device comprising: a processor configured to: associate a feature picture header (FPH) network abstraction layer (NAL) unit with a frame; encode the FPH NAL unit, wherein the FPH NAL unit comprises an identifier of the frame; and include the encoded FPH NAL unit in video data.
11 . The device of claim 10 , wherein the identifier of the frame comprises an indication of a picture order count associated with the frame.
12 . The device of claim 10 , wherein the frame is a first frame, the processor is further configured to determine that temporal up-sampling is enabled, and the FPH NAL unit further comprises an indication to use temporal up-sampling to automatically generate a second frame that is temporarily prior to the first frame.
13 . The device of claim 10 , wherein the processor is further configured to determine that temporal up-sampling is disabled, and the FPH NAL unit further comprises an indication to skip temporal up-sampling.
14 . The device of claim 10 , wherein the processor is further configured to receive, from an output of at least a part of a neural network, intermediate data associated with feature reconstruction, and wherein the FPH NAL unit indicates the intermediate data.
15 . The device of claim 10 , wherein the FPH NAL unit is a first FPH NAL unit, the identifier of the frame is a first identifier of a first frame, and the processor is further configured to: associate a second FPH NAL unit with a second frame; encode the second FPH NAL unit, wherein the second FPH NAL unit comprises a second identifier of the second frame; and include the encoded second FPH NAL unit in the video data.
16 . The device of claim 10 , wherein the processor is further configured to determine a feature picture parameter set (FPPS) associated with the frame, wherein the FPH NAL unit further indicates the FPPS.
17 . The device of claim 10 , wherein the processor is further configured to receive, from at least a part of a neural network, features associated with the frame, and the processor is further configured to determine a feature picture parameter set (FPPS) based on the features associated with the frame, wherein the FPH NAL unit further indicates the FPPS.
18 . The device of claim 10 , wherein the frame is an encoded video frame.
19 . A method for video decoding, the method comprising: receiving a feature picture header (FPH) network abstraction layer (NAL) unit, wherein the FPH NAL unit comprises an identifier of a frame; associating the frame with the FPH NAL unit based on the identifier of the frame; and reconstructing a set of features associated with the frame based on the FPH NAL unit.
20 . The method of claim 19 , wherein the identifier of the frame comprises an indication of a picture order count associated with the frame.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Application No. 63/715,339, filed Nov. 1, 2024, the contents of which is incorporated by reference herein. BACKGROUND The present application is related to video coding systems that may be used to compress digital video signals, e.g., to reduce the storage and/or transmission bandwidth needed for such signals. Video coding systems may include, for example, block-based, wavelet-based, and/or object-based systems. BRIEF SUMMARY Systems, methods, and instrumentalities are disclosed herein for a feature picture header for feature compression bitstreams. An example device for video decoding may receive a first feature picture header (FPH) network abstraction layer (NAL) unit that indicates a first feature picture parameter set (FPPS) associated with a first frame. The device may decode the first frame. The device may reconstruct a first set of features associated with the first frame based on the first FPPS. The device may receive a second FPH NAL unit that indicates a second FPPS associated with a second frame. The device may decode the second frame. The device may reconstruct a second set of features associated with the second frame based on the second FPPS. The first FPH further may indicate whether temporal up-sampling is enabled. On a condition that temporal up-sampling is enabled, the device may automatically generate a third frame that is temporarily prior to the first frame. The device may output the decoded first frame, the decoded second frame, the reconstructed first set of features, and the reconstructed second set of features. The first FPH NAL unit may include an indication of a first picture order count associated with the first frame. The second FPH NAL unit may include an indication of a second picture order count associated with the second frame. The device may identify the first frame based on the first picture order count. The device may identify the second frame based on the second picture order count. An example device for video encoding may determine a first FPPS associated with a first frame. The device may encode a first FPH NAL unit that comprises an indication of the first frame and the determined first FPPS. The device may determine a second FPPS associated with a second frame. The device may encode a second FPH NAL unit that comprises an indication of the second frame and the determined second FPPS. The FPH NAL may include an indication of whether temporal up-sampling is enabled. The first FPPS may include a first parameter for reconstructing intermediate data associated with the first frame. The second FPPS may include a second parameter for reconstructing intermediate data associated with the second frame. The first FPH NAL unit may include an indication of a first picture order count associated with the first frame. The second FPH NAL unit may include an indication of a second picture order count associated with the second frame. BRIEF DESCRIPTION OF THE DRAWINGS The following detailed description will be better understood when read in conjunction with the appended drawings, in which there are shown examples of one or more of the multiple embodiments of the present disclosure. It should be understood, however, that the embodiments described herein are not limited to the precise arrangements and instrumentalities shown in the drawings. FIG. 1 shows an example system according to one or more embodiments of the present disclosure. FIG. 2 shows an example video encoder according to one or more embodiments of the present disclosure. FIG. 3 shows an example video decoder according to one or more embodiments of the present disclosure. FIG. 4 shows an example of split computing in which the inference of a neural network (NN) may be shared between multiple devices (e.g., two remote devices). FIG. 5 illustrates an example of decomposing feature compression and/or decompression. FIG. 6 illustrates an example of a feature coding for machine (FCM) decoder. FIG. 7 illustrates an example bitstream structure for FCM using a video bitstream augmented with supplemental enhancement information (SEI) messages. FIG. 8 illustrates an example bitstream organization for FCM that includes sets of network abstraction layer (NAL) units. FIG. 9 illustrates an example FCM bitstream structure. FIG. 10 illustrates an example FCM bitstream structure. FIG. 11 illustrates an example structure augmented with SEI coding for vision model (VM) related information. FIG. 12 illustrates an example structure for video coding for machine (VCM). DETAILED DESCRIPTION In describing the various embodiments of the present disclosure, certain terminology is used herein for convenience only and should not be considered as limiting such embodiments. In the drawings, the same reference numerals are employed for designating the same elements throughout the several figures and the present description. Referring to the drawings, there is sh