Search

US-12621519-B2 - Carriage and signaling of neural network representations

US12621519B2US 12621519 B2US12621519 B2US 12621519B2US-12621519-B2

Abstract

A method is provided for defining a metadata box of a neural network representation (NNR) item data, wherein the NNR item data comprises an NNR bitstream; and defining an association between the NNR item data and an NNR configuration by using a configuration item property, wherein the NNR configuration item property comprises information about stored NNR item data. Corresponding apparatuses and computer program products are also provided.

Inventors

  • Emre Aksu
  • Miska Hannuksela
  • Francesco Cricrì
  • Hamed Rezazadegan Tavakoli

Assignees

  • NOKIA TECHNOLOGIES OY

Dates

Publication Date
20260505
Application Date
20241212

Claims (20)

  1. 1 . A method comprising: defining a media file comprising one or more neural network representation (NNR) tracks, wherein the one or more NNR tracks comprises one or more NNR units and metadata associated with the one or more NNR units and a media handler, wherein the media handler represents presence of NNR media in the media file, and wherein the NNR media is a media type referring to NNR; and using the media file for at least one of storing or streaming NNR.
  2. 2 . The method of claim 1 , wherein the media file further comprises: an NNR sample entry for describing one or more NNRs in the one or more NNR tracks; and an NNR decoder configuration record, wherein the NNR decoder configuration record comprises information for initializing an NNR decoder information corresponding to a neural network model.
  3. 3 . The method of claim 1 further comprising defining an NNR track sample comprising the one or more NNR units, wherein the NNR track sample is one of independently decodable or dependent of previous sample for decoding.
  4. 4 . The method of claim 3 , wherein the NNR track sample is linked to an NNR parameter set via one or more of a sample entry, a sample group, or a non-timed NNR item data.
  5. 5 . The method of claim 3 , wherein the media file further comprises one or more media tracks, wherein the one or more media tracks are linked to the one or more NNR tracks, and wherein the NNR track sample is used by the one or more media tracks for decoding media samples of the one or more media tracks, and wherein a track referencing mechanism is used to associate the one or more NNR tracks with the one or more media tracks for decoding the media samples of the one or more media tracks.
  6. 6 . An apparatus comprising at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: defining a media file comprising one or more neural network representation (NNR) tracks, wherein the one or more NNR tracks comprises one or more NNR units and metadata associated with the one or more NNR units and a media handler, wherein the media handler represents presence of NNR media in the media file, and wherein the NNR media is a media type referring to NNR; and using the media file for at least one of storing or streaming NNR.
  7. 7 . The apparatus of claim 6 , wherein the media file further comprises: an NNR sample entry for describing one or more NNRs in the one or more NNR tracks; and an NNR decoder configuration record, wherein the NNR decoder configuration record comprises information for initializing an NNR decoder information corresponding to a neural network model.
  8. 8 . The apparatus of claim 6 , wherein the apparatus is further caused to perform: defining an NNR track sample comprising the one or more NNR units, and wherein the NNR track sample is one of independently decodable or dependent of previous sample for decoding.
  9. 9 . The apparatus of claim 8 , wherein the NNR track sample is linked to an NNR parameter set via one or more of a sample entry, a sample group, or a non-timed NNR item data.
  10. 10 . The apparatus of claim 8 , wherein the media file further comprises one or more media tracks, wherein the one or more media tracks are linked to the one or more NNR tracks, and wherein the NNR track sample is used by the one or more media tracks for decoding media samples of the one or more media tracks, and wherein a track referencing mechanism is used to associate the one or more NNR tracks with the one or more media tracks for decoding the media samples of the one or more media tracks.
  11. 11 . A method comprising: receiving, one or more tracks and a media handler of the one or more tracks, within a media file; decoding the media handler of the one or more tracks to obtain a decoded media handler; when the decoded media handler indicates presence of a neural network representation (NNR) media in the media file, wherein the NNR media is a media type referring to NNR, treating the one or more tracks as one or more NNR tracks comprising one or more NNR units and metadata associated with the one or more NNR units; and decoding the one or more tracks to obtain an uncompressed neural network.
  12. 12 . The method of claim 11 , wherein the media file further comprises: an NNR sample entry for describing one or more NNRs in the one or more NNR tracks; and an NNR decoder configuration record, wherein the NNR decoder configuration record comprises information for initializing an NNR decoder information corresponding to a neural network model.
  13. 13 . The method of claim 11 , wherein the one or more NNR units are comprised in an NNR track sample and wherein the NNR track sample is one of independently decodable or dependent of previous sample for decoding.
  14. 14 . The method of claim 13 , wherein the NNR track sample is linked to an NNR parameter set via one or more of a sample entry, a sample group, or a non-timed NNR item data.
  15. 15 . The method of claim 13 , wherein the media file further comprises one or more media tracks, wherein the one or more media tracks are linked to the one or more NNR tracks, and wherein the NNR track sample is used by the one or more media tracks for decoding media samples of the one or more media tracks, and wherein a track referencing mechanism is used to associate the one or more NNR tracks with the one or more media tracks for decoding the media samples of the one or more media tracks.
  16. 16 . An apparatus comprising at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: receiving, one or more tracks and a media handler of the one or more tracks, within a media file; decoding the media handler of the one or more tracks to obtain a decoded media handler; when the decoded media handler indicates presence of a neural network representation (NNR) media in the media file, wherein the NNR media is a media type referring to NNR, treating the one or more tracks as one or more NNR tracks comprising one or more NNR units and metadata associated with the one or more NNR units; and decoding the one or more tracks to obtain an uncompressed neural network.
  17. 17 . The apparatus of claim 16 , wherein the media file further comprises: an NNR sample entry for describing one or more NNRs in the one or more NNR tracks; and an NNR decoder configuration record, wherein the NNR decoder configuration record comprises information for initializing an NNR decoder information corresponding to a neural network model.
  18. 18 . The apparatus of claim 16 , wherein the one or more NNR units are comprised in an NNR track sample and wherein the NNR track sample is one of independently decodable or dependent of previous sample for decoding.
  19. 19 . The apparatus of claim 18 , wherein the NNR track sample is linked to an NNR parameter set via one or more of a sample entry, a sample group, or a non-timed NNR item data.
  20. 20 . The apparatus of claim 18 , wherein the media file further comprises one or more media tracks, wherein the one or more media tracks are linked to the one or more NNR tracks, and wherein the NNR track sample is used by the one or more media tracks for decoding media samples of the one or more media tracks, and wherein a track referencing mechanism is used to associate the one or more NNR tracks with the one or more media tracks for decoding the media samples of the one or more media tracks.

Description

RELATED APPLICATION The present application is a divisional of U.S. patent application Ser. No. 18/247,631, filed on Mar. 31, 2023, which claims priority to International Application No. PCT/IB2021/059123; filed on Oct. 5, 2021, which claims benefit of U.S. Provisional Application No. 63/091,087, filed on Oct. 13, 2020, the content of which is incorporated herein by reference in its entirety. TECHNICAL FIELD The examples and non-limiting embodiments relate generally to multimedia transport and neural networks, and more particularly, to carriage and signaling of neural network representations. BACKGROUND It is known to provide standardized formats for exchange of neural network representations. SUMMARY An example method includes defining a metadata box for a neural network representation (NNR) item data, wherein the NNR item data comprises an NNR bitstream; and defining an association between the NNR item data and an NNR configuration by using a configuration item property, wherein the NNR configuration item property comprises information about stored NNR item data. An another example method includes defining a media file comprising one or more neural network representation (NNR) tracks, wherein the one or more NNR tracks comprises one or more NNR units and metadata associated with the one or more NNR units; one or more media tracks, wherein the one or more media tracks are linked to the one or more NNR tracks; an NNR media, wherein NNR is a media type referring to an NNR; and a media handler, wherein the media handler represents presence of the NNR media in the media file; and using the media file for at least one of storing or streaming NNR. An example apparatus comprises at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: define a metadata box for a neural network representation (NNR) item data, wherein the NNR item data comprises an NNR bitstream; and define an association between the NNR item data and an NNR configuration by using a configuration item property, wherein the NNR configuration item property comprises information about stored NNR item data. An another example apparatus includes at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform define a media file comprising one or more neural network representation (NNR) tracks, wherein the one or more NNR tracks comprises one or more NNR units and metadata associated with the one or more NNR units; one or more media tracks, wherein the one or more media tracks are linked to the one or more NNR tracks; an NNR media, wherein NNR is a media type referring to an NNR; and a media handler, wherein the media handler represents presence of the NNR media in the media file; and use the media file for at least one of storing or streaming NNR. BRIEF DESCRIPTION OF THE DRAWINGS The foregoing aspects and other features are explained in the following description, taken in connection with the accompanying drawings, wherein: FIG. 1 shows schematically an electronic device employing embodiments of the examples described herein. FIG. 2 shows schematically a user equipment suitable for employing embodiments of the examples described herein. FIG. 3 further shows schematically electronic devices employing embodiments of the examples described herein connected using wireless and wired network connections. FIG. 4 shows schematically a block chart of an encoder on a general level. FIG. 5 is a block diagram showing the interface between an encoder and a decoder in accordance with the examples described herein. FIG. 6 illustrates a system configured to support streaming of media data from a source to a client device; FIG. 7 is a block diagram of an apparatus that may be specifically configured in accordance with an example embodiment. FIG. 8 illustrates example structure of an NNR bitstream. FIG. 9 illustrates an example association between a neural network representation (NNR) item data and configuration of the NNR item data. FIG. 10 illustrates an example association between the NNR item data, configuration of the NNR item data, topology of the NNR item data, and data associated with NNR item data quantization, in accordance with an embodiment. FIG. 11 illustrates an example embodiment in which NNR data item is configured into one or more NNR unit items. FIG. 12 illustrates an example embodiment in which NNR unit header information is configured into an item property. FIG. 13 illustrates a media file for storing and streaming NNR data, in accordance with an embodiment. FIG. 14 is an example apparatus configured to implement mechanisms to link a high level syntax to an MPEG media storage and carriage format for a compressed representation of n