US-12627811-B2 - Method, an apparatus and a computer program product for video coding

US12627811B2US 12627811 B2US12627811 B2US 12627811B2US-12627811-B2

Abstract

The embodiments relate to a method for encoding, comprising receiving a representation ( 1005 ) of input media to be encoded: encoding the representation ( 1005 ) to generate encoded bitstreams to be delivered to a decoder, and using encoder's de-coder-side neural network (en-DSNN) for decoding the encoded representation and/or post-processing the decoded representation: adapting ( 1030 ) the en-DSNN based at least on the representation or a signal derived from the representation and on an output of the en-DSNN or a signal derived from an output of the en-DSNN, thus obtaining a weight-update ( 1040 ) as a result: performing one or more iterations of compressing and decompressing the weight-update ( 1040 ) by using a weight-update codec: selecting values of one or more configuration parameters of the weight-update codec and using the selected values for a final compression of the weight-update ( 1040 ); and signaling the compressed weight update to a decoder.

Inventors

MARIA CLAUDIA SANTAMARIA GOMEZ
Francesco Cricrì
Ruiying YANG
Honglei Zhang
Hamed Rezazadegan Tavakoli
Miska Matias Hannuksela

Assignees

NOKIA TECHNOLOGIES OY

Dates

Publication Date: 20260512
Application Date: 20230213
Priority Date: 20220404

Claims (4)

1 . An apparatus comprising: at least one processor; and at least one memory including instructions; wherein the at least one memory and the instructions are configured to, with the at least one processor, cause the apparatus at least to: receive a representation of input media to be encoded; encode the representation to generate encoded bitstreams to be delivered to a decoder; use decoder-side neural network of an encoder (en-DSNN) to decode the encoded representation and/or post-processing the decoded representation; adapt the en-DSNN based at least on the representation or a signal derived from the representation and on an output of the en-DSNN or a signal derived from an output of the en-DSNN to obtain a weight-update; perform one or more iterations of compressing and decompressing the weight-update by using a weight-update codec, wherein for each iteration the apparatus is further caused to: compress and/or decompress the weight-update based at least on tested values for one or more configuration parameters of a weight-update codec; use the decompressed weight-update for updating the en-DSNN; use the updated en-DSNN for decoding the encoded representation and/or for post-processing the decoded representation obtaining a reconstructed or processed representation; compute a quality of the reconstructed or processed representation based at least on a quality metric computed based at least on the reconstructed or processed representation and on a ground-truth representation; compute a score measuring rate-distortion performance for each of the tested values of one or more configuration parameters of the weight-update codec; and select the tested values that yield the highest score; use the selected tested values for a final compression of the weight-update; and signal the compressed weight update to the decoder.
2 . The apparatus according to claim 1 , wherein the score is derived based on at least one of the following: weight-update reconstruction quality; difference between bitrates incurred when weight-update is not used and not signaled to the decoder compared to when weight-update is used and signaled to the decoder, at same or substantially same reconstructed or processed representation quality; difference between a reconstructed or processed representation quality incurred when weight-update is used and signaled to the decoder compared to when weight-update is not used and not signaled to the decoder, at same or substantially same bitrate; computed slope or an approximation of a slope of a line segment passing through a first rate-quality point representing the case when weight-update is used and signaled to the decoder and a second rate-quality point representing the case when weight-update is not used and is not signaled to the decoder; value of a rate-distortion Lagrangian function; or spatial, temporal or spatiotemporal portion of the reconstructed representation.
3 . A method for encoding comprising: receiving a representation of input media to be encoded; encoding the representation to generate encoded bitstreams to be delivered to a decoder; using decoder-side neural network of an encoder (en-DSNN) to decode the encoded representation and/or post-processing the decoded representation; adapting the en-DSNN based at least on the representation or a signal derived from the representation and on an output of the en-DSNN or a signal derived from an output of the en-DSNN, to obtain a weight-update; performing one or more iterations of compressing and decompressing the weight-update by using a weight-update codec, wherein for each iteration the method comprises: compressing and/or decompressing the weight-update based at least on tested values for one or more configuration parameters of a weight-update codec; using the decompressed weight-update for updating the en-DSNN; using the updated en-DSNN for decoding the encoded representation and/or for post-processing the decoded representation obtaining a reconstructed or processed representation; computing a quality of the reconstructed or processed representation based at least on a quality metric computed based at least on the reconstructed or processed representation and on a ground-truth representation; computing a score measuring rate-distortion performance for each of the tested values of one or more configuration parameters of the weight-update codec; and selecting the tested values that yield the highest score; using the selected tested values for a final compression of the weight-update; and signaling the compressed weight update to the decoder.
4 . The method according to claim 3 , wherein the score is derived based on at least one of the following: weight-update reconstruction quality; difference between bitrates incurred when weight-update is not used and not signaled to the decoder compared to when weight-update is used and signaled to the decoder, at same or substantially same reconstructed or processed representation quality; difference between a reconstructed or processed representation quality incurred when weight-update is used and signaled to the decoder compared to when weight-update is not used and not signaled to the decoder, at same or substantially same bitrate; computed slope or an approximation of a slope of a line segment passing through a first rate-quality point representing the case when weight-update is used and signaled to the decoder and a second rate-quality point representing the case when weight-update is not used and is not signaled to the decoder; value of a rate-distortion Lagrangian function; and spatial, temporal or spatiotemporal portion of the reconstructed representation.

Description

RELATED APPLICATION This application claims priority to PCT Patent Application No. PCT/FI2023/050084, filed 13 Feb. 2023, which claims priority from Finland Application No. 20225285, filed on 4 Apr. 2022, which is incorporated herein by reference in its entirety. The project leading to this application has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 876019. The JU receives support from the European Union's Horizon 2020 research and innovation programme and Germany, Netherlands, Austria, Romania, France, Sweden, Cyprus, Greece, Lithuania, Portugal, Italy, Finland, Turkey. TECHNICAL FIELD The present solution generally relates to video encoding and coding. BACKGROUND One of the elements in image and video compression is to compress data while maintaining the quality to satisfy human perceptual ability. However, in recent development of machine learning, machines can replace humans when analyzing data for example in order to detect events and/or objects in video/image. Thus, when decoded image data is consumed by machines, the quality of the compression can be different from the human approved quality. Therefore, a concept Video Coding for Machines (VCM) has been provided. SUMMARY The scope of protection sought for various embodiments of the invention is set out by the independent claims. The embodiments and features, if any, described in this specification that do not fall under the scope of the independent claims are to be interpreted as examples useful for understanding various embodiments of the invention. Various aspects include a method, an apparatus and a computer readable medium comprising a computer program stored therein, which are characterized by what is stated in the independent claims. Various embodiments are disclosed in the dependent claims. According to a first aspect, there is provided an apparatus for encoding comprising means for receiving a representation of input media to be encoded; means for encoding the representation to generate encoded bitstreams to be delivered to a decoder, and means for using encoder's decoder-side neural network (en-DSNN) for decoding the encoded representation and/or post-processing the decoded representation; means for adapting the en-DSNN based at least on the representation or a signal derived from the representation and on an output of the en-DSNN or a signal derived from an output of the en-DSNN, thus obtaining a weight-update as a result; means for performing one or more iterations of compressing and decompressing the weight-update by using a weight-update codec; means for selecting values of one or more configuration parameters of the weight-update codec and using the selected values for a final compression of the weight-update; and means for signaling the compressed weight update to a decoder. According to a second aspect, there is provided an apparatus for decoding, comprising means for receiving an encoded bitstream; means for obtaining a compressed weight-update signal; means for decompressing the compressed weight-update signal; means for adapting a decoder-side neural network (DSNN) based at least on the decompressed weight-update signal; means for decompressing the bitstream and/or for post-processing the decompressed bitstream to generate a representation of an output media, based at least on the adapted decoder-side neural network. According to a third aspect, there is provided a method for encoding, comprising: receiving a representation of input media to be encoded; encoding the representation to generate encoded bitstreams to be delivered to a decoder, and using encoder's decoder-side neural network (en-DSNN) for decoding the encoded representation and/or post-processing the decoded representation; adapting the en-DSNN based at least on the representation or a signal derived from the representation and on an output of the en-DSNN or a signal derived from an output of the en-DSNN, thus obtaining a weight-update as a result; performing one or more iterations of compressing and decompressing the weight-update by using a weight-update codec; selecting values of one or more configuration parameters of the weight-update codec and using the selected values for a final compression of the weight-update; and signaling the compressed weight update to a decoder. According to a fourth aspect, there is provided a method for decoding, comprising: receiving an encoded bitstream; obtaining a compressed weight-update signal; decompressing the compressed weight-update signal; adapting a decoder-side neural network (DSNN) based at least on the decompressed weight-update signal; decompressing the bitstream and/or post-processing the decompressed bitstream to generate a representation of an output media, based at least on the adapted DSNN. According to a fifth aspect, there is provided an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, wi