EP-4736459-A1 - MULTI-LAYER SPLIT POINTS OUTPUT INFORMATION

EP4736459A1EP 4736459 A1EP4736459 A1EP 4736459A1EP-4736459-A1

Abstract

A WTRU may perform inference processing on video data to generate intermediate data. The WTRU may determine from the intermediate data a plurality of tuples and may generate metadata from the plurality of tuples. The metadata may comprise an encoding type that may indicate an encoding algorithm. The metadata may further comprise a length indicating the length of the metadata. The metadata may also comprise an indication of the number of tuples that are comprised in the metadata. The metadata may further comprise the plurality of tuples. Each tuple may comprise a respective layer identifier and tensor shape information. The device may generate a bitstream from the intermediate data and may transmit the bitstream and the metadata to another device which may perform split inference processing on the generated bitstream using the metadata.

Inventors

ONNO, STEPHANE
RACAPE, FABIEN
Quinquis, Cyril
FILOCHE, THIERRY

Assignees

InterDigital VC Holdings, Inc.

Dates

Publication Date: 20260506
Application Date: 20240717

Claims (1)

CLAIMS What is Claimed: 1. A wireless transmit and receive unit (WTRU), comprising: a processor configured to: perform split inference processing on video data to generate intermediate data; determine from the intermediate data a plurality of tensors, each tensor associated with a layer; and send the intermediate data and metadata associated with the intermediate data, the metadata comprising, for each of the plurality of tensors, a layer identifier and tensor shape information. The WTRU of claim 1, wherein the metadata comprises a plurality of tuples, each tuple associated with one of the plurality of tensors and comprising a respective layer identifier and tensor shape information. 3. The WTRU of claim 2, wherein the metadata further comprises: an encoding type, the encoding type indicating an encoding algorithm; and a length, the length indicating a length of the metadata. 4. The WTRU of claim 2, wherein the plurality of tuples are listed serially in the metadata. 5. The WTRU of claim 1, wherein the video data comprises a first portion of video data and a second portion of video data; and wherein the processor configured to perform split inference processing on the video data is further configured to perform split inference processing on the first portion of video data. 6. The WTRU of claim 1, wherein the tensor shape information comprises tensor dimension information. The WTRU of claim 6, wherein the tensor dimension information comprises a number of dimensions and, for each dimension, a dimension size. 8. A method comprising: performing split inference processing on video data to generate intermediate data; determining from the intermediate data a plurality of tensors, each tensor associated with a layer; and sending the intermediate data and metadata associated with the intermediate data, the metadata comprising, for each of the plurality of tensors, a layer identifier and tensor shape information. 9. The method of claim 8, wherein the metadata comprises a plurality of tuples, each tuple associated with one of the plurality of tensors and comprising a respective layer identifier and tensor shape information. 10. The method of claim 9, wherein the metadata further comprises: an encoding type, the encoding type indicating an encoding algorithm; and a length, the length indicating a length of the metadata. 11. The method of claim 9, wherein the plurality of tuples are listed serially in the metadata. 12. The method of claim 8, wherein the video data comprises a first portion of video data and a second portion of video data; and wherein performing split inference processing on the video data further comprises performing split inference processing on the first portion of the video data. 13. The method of claim 8, wherein the tensor shape information comprises tensor dimension information. 14. The method of claim 13, wherein the tensor dimension information comprises a number of dimensions and, for each dimension, a dimension size. 15. A network node comprising: a processor configured to: perform split inference processing on video data to generate intermediate data; determine from the intermediate data a plurality of tensors, each tensor associated with a layer; and send the intermediate data and metadata associated with the intermediate data, the metadata comprising, for each of the plurality of tensors, a layer identifier and tensor shape information. 16. The network node of claim 15, wherein the metadata comprises a plurality of tuples, each tuple associated with one of the plurality of tensors and comprising a respective layer identifier and tensor shape information. 17. The network node of claim 16, wherein the metadata further comprises: an encoding type, the encoding type indicating an encoding algorithm; and a length, the length indicating a length of the metadata. 18. The network node of claim 16, wherein the plurality of tuples are listed serially in the metadata. 19. The network node of claim 15, wherein the video data comprises a first portion of video data and a second portion of video data; and wherein the processor configured to perform split inference processing on the video data is further configured to perform split inference processing on the first portion of the video data. 20. The network node of claim 15, wherein the tensor shape information comprises tensor dimension information; and wherein the tensor dimension information comprises a number of dimensions and, for each dimension, a dimension size. 21. A network node, comprising: a processor configured to: receive intermediate data and metadata associated with the intermediate data, the intermediate data associated with a split inference and the metadata comprising, for each of a plurality of tensors, a layer identifier and tensor shape information; and perform split inference processing using the intermediate data and the metadata. 22. The network node of claim 21, wherein the processor configured to perform split inference processing is further configured to reconstruct the plurality of tensors. 23. The network node of claim 21, wherein the metadata comprises a plurality of tuples, each tuple associated with one of the plurality of tensors and comprising a respective layer identifier and tensor shape information. 24. The network node of claim 23, wherein the metadata further comprises: an encoding type, the encoding type indicating an encoding algorithm; and a length, the length indicating a length of the metadata. 25. The network node of claim 23, wherein the plurality of tuples are listed serially in the metadata. 26. The network node of claim 21, wherein the processor configured to perform split inference processing on the video data is further configured to perform split inference processing on a second portion of video data comprising a first portion and the second portion. 27. The network node of claim 21, wherein the tensor shape information comprises tensor dimension information. 28. A method comprising: receiving intermediate data and metadata associated with the intermediate data, the intermediate data associated with a split inference and the metadata comprising, for each of a plurality of tensors, a layer identifier and tensor shape information; and performing split inference processing using the intermediate data and the metadata. 29. The method of claim 28, wherein performing split inference processing further comprises reconstructing the plurality of tensors. 30. The method of claim 28, wherein the metadata comprises a plurality of tuples, each tuple associated with one of the plurality of tensors and comprising a respective layer identifier and tensor shape information. 31. The method of claim 30, wherein the metadata further comprises: an encoding type, the encoding type indicating an encoding algorithm; and a length, the length indicating a length of the metadata. 32. The method of claim 30, wherein the plurality of tuples are listed serially in the metadata. 33. The method of claim 29, wherein performing split inference processing on the video data further comprises performing split inference processing on a second portion of video data comprising a first portion and the second portion. 34. The method of claim 28, wherein the tensor shape information comprises tensor dimension information. 35. A wireless transmit and receive unit (WTRU), comprising: a processor configured to: receive intermediate data and metadata associated with the intermediate data, the intermediate data associated with a split inference and the metadata comprising, for each of a plurality of tensors, a layer identifier and tensor shape information; and perform split inference processing using the intermediate data and the metadata. 36. The WTRU of claim 35, wherein the processor configured to perform split inference processing is further configured to reconstruct the plurality of tensors. 37. The WTRU of claim 35, wherein the metadata comprises a plurality of tuples, each tuple associated with one of the plurality of tensors and comprising a respective layer identifier and tensor shape information. 38. The WTRU of claim 37, wherein the metadata further comprises: an encoding type, the encoding type indicating an encoding algorithm; and a length, the length indicating a length of the metadata. 39. The WTRU of claim 37, wherein the plurality of tuples are listed serially in the metadata. 40. The WTRU of claim 35, wherein the processor configured to perform split inference processing on the video data is further configured to perform split inference processing on a second portion of video data comprising a first portion and the second portion. 41. The WTRU of claim 35, wherein the tensor shape information comprises tensor dimension information.

Description

MULTI-LAYER SPLIT POINTS OUTPUT INFORMATION CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims the benefit of U.S. provisional patent application number 63/527,741, filed July 19, 2023, the contents of which are hereby incorporated by reference herein in their entirety. BACKGROUND [0002] Video coding systems may be used to compress digital video signals, e.g., to reduce the storage and/or transmission bandwidth needed for such signals. Video coding systems may include, for example, block-based, wavelet-based, and/or object-based systems. SUMMARY [0003] Systems, methods, and instrumentalities are described herein for providing metadata for use in split inference processing. [0004] A device, which may be, for example, a wireless transmit and receive unit (WTRU) or a network node, may be configured to receive input data which may comprise, for example, video data, 3D video data, point cloud data, etc. The device may be configured to perform inference processing on the input video data to generate intermediate data. The video data may comprise a first portion of video data and a second portion of video data. The inference processing may be split inference processing and the device may be configured to perform a first part of a split inference model on the first portion of the video data to generate the intermediate data. [0005] The device may be configured to determine, from the intermediate data, a plurality of tensors, wherein each tensor is associated with a layer. [0006] The device may be configured to send the intermediate data and metadata associated with the intermediate data. The metadata may comprise a plurality of tuples, wherein each tuple may be associated with one of the plurality of tensors and may comprise a respective layer identifier and tensor shape information. The plurality of tuples may be listed serially in the metadata. The metadata may further comprise an encoding type that indicates an encoding algorithm, and a length that indicates a length of the metadata. [0007] The metadata may comprise, for each of the plurality of tensors, a layer identifier and tensor shape information. The tensor shape information may comprise tensor dimension information which may include a number of dimensions and, for each dimension, a dimension size. [0008] The device may send the intermediate data and the metadata associated with the intermediate data to a second device. The second device, which may be, for example, a network node or WTRU, may be configured to receive the intermediate data and metadata and to use the intermediate data and the metadata to reconstruct intermediate tensor data and to perform a second part of the split inference model on the second portion of the video data. [0009] A device, which may be, for example, a wireless transmit and receive unit (WTRU) or a network node, may be configured to receive input data which may comprise, for example, video data, 3D video data, point cloud data, etc. The device may be configured to perform inference processing on the input video data and communicate intermediate data, which may comprise the output of the first part of the split inference, for inference processing of the second part of the split model at a second device. [0010] The device may be configured to determine from the intermediate data a plurality of tuples where each of the plurality of tuples may comprise layer information and tensor shape information. The device may generate metadata from the plurality of tuples. The device may generate metadata comprising an encoding type. The encoding type may indicate an encoding algorithm or encoding structure that may have been used for encoding the data. The metadata may further comprise a length indicating the length of the metadata. The metadata may also comprise an indication of the number of tuples that are comprised in the metadata. The metadata may further comprise the plurality of tuples. Each tuple may comprise a respective layer identifier and tensor shape information. The tensor shape information may comprise, for example, tensor dimension information which may comprise a number of dimensions, and for each dimension, a size of the dimension. The tuples may be serially arranged one after the other in the metadata. [0011] The device may generate a bitstream from the intermediate data. The bitstream may comprise the intermediate data. The device may transmit the bitstream and the metadata to another device which may perform split inference processing on the generated bitstream using the metadata. BRIEF DESCRIPTION OF THE DRAWINGS [0012] FIG.1A is a system diagram illustrating an example communications system in which one or more disclosed embodiments may be implemented. [0013] FIG.1B is a system diagram illustrating an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG.1A according to an embodiment. [0014] FIG.1C is a system diagram illustrating an e