CN-122029815-A - Progressive warp layer with guided optical flow for INR-based video compression

CN122029815ACN 122029815 ACN122029815 ACN 122029815ACN-122029815-A

Abstract

Systems, methods, and tools are disclosed for performing INR-based video compression, for example, with a progressive warp layer having guided optical flow. A video decoding device is capable of obtaining an Implicit Neural Representation (INR) function that has been trained on a first image of a scene. For example, the INR function may have been previously calculated using optimization of the loss function and can be valid for the entire image or a portion of the image (e.g., a tile, a super pixel, and/or any connected or disconnected set of pixels). The video decoding apparatus is capable of obtaining a warp layer associated with the INR function. The warp layer may have been trained via optical flow to indicate a displacement field of a second image of the scene. The video decoding apparatus is capable of decoding the second image based on the displacement field.

Inventors

OISEL,LIONEL
B. B. Damodalan
ANNE LAMBERT
F. Schnitzel

Assignees

交互数字CE专利控股有限公司

Dates

Publication Date: 20260512
Application Date: 20241011
Priority Date: 20231012

Claims (20)

1. A video encoding apparatus, the video encoding apparatus comprising: The processor may be configured to perform the steps of, the processor is configured to: Obtaining an Implicit Neural Representation (INR) function, wherein the INR function is trained on a first image portion of a scene; Obtaining a warped layer associated with the INR function, wherein the warped layer is trained using optical flow to indicate a displacement field of a second image portion of the scene; determining parameters associated with the warp layer, and The parameters associated with the warp layer are transmitted in a bitstream.
2. The video encoding device of claim 1, wherein the first image portion belongs to a first image and the second image portion belongs to the first image or a second image.
3. The video encoding device of claim 1, wherein the processor being configured to obtain the warped layer comprises the processor being configured to obtain the warped layer using coordinates associated with the first image portion as input and mapping the coordinates to a vector indicative of the displacement field.
4. The video encoding device of claim 1, wherein the processor is further configured to determine an approximation of the displacement field based on an accuracy threshold.
5. The video encoding device of claim 4, wherein the processor is configured to determine an approximation of the displacement field comprises increasing a number of layers associated with the warp layer if the approximation of the displacement field is less than the accuracy threshold.
6. The video encoding device of claim 1, wherein the parameters associated with the warp layers comprise at least one of a number of warp layers, a dimension associated with each of the warp layers, a set of weights associated with the warp layers, or a set of updated parameters.
7. The video encoding device of claim 1, wherein the processor is configured to: encoding the second image portion using the displacement field, and The encoded second image portion is transmitted.
8. The video encoding device of claim 1, wherein the processor is configured to determine a transform parameter associated with the first image portion, wherein the transform parameter is determined by minimizing a loss function.
9. A video decoding apparatus, the video decoding apparatus comprising: The processor may be configured to perform the steps of, the processor is configured to: receiving a set of parameters associated with at least one warp layer associated with an Implicit Neural Representation (INR) function, wherein the INR function is trained on a first image portion of a scene; Obtaining the at least one warp layer using the set of parameters; reconstructing a neural network based on an Implicit Neural Representation (INR) function and the at least one warping layer; a second image portion of the scene is decoded using the reconstructed neural network.
10. The video decoding device of claim 9, wherein the first image portion belongs to a first image and the second image portion belongs to the first image or a second image.
11. The video decoding device of claim 9, wherein the processor is further configured to receive a set of weights.
12. The video decoding device of claim 11, wherein the processor is further configured to obtain the warp layer based on the set of parameters and the set of weights.
13. The apparatus of any one of claims 1 to 12, further comprising a memory operatively connected to the processor.
14. A method for video encoding, the method comprising: Obtaining an Implicit Neural Representation (INR) function, wherein the INR function is trained on a first image portion of a scene; Obtaining a warped layer associated with the INR function, wherein the warped layer is trained using optical flow to indicate a displacement field of a second image portion of the scene; determining parameters associated with the warp layer, and The parameters associated with the warp layer are transmitted in a bitstream.
15. The method of claim 14, wherein the first image portion belongs to a first image and the second image portion belongs to the first image or a second image.
16. The method of claim 14, wherein obtaining the warped layer comprises obtaining the warped layer using coordinates associated with the first image portion as input and mapping the coordinates to a vector indicative of the displacement field.
17. The method of claim 14, further comprising determining an approximation of the displacement field based on an accuracy threshold.
18. The method of claim 17, wherein determining the approximation of the displacement field comprises increasing a number of layers associated with the warp layer if the approximation of the displacement field is less than the accuracy threshold.
19. The method of claim 14, wherein the parameters associated with the warp layers comprise at least one of a number of warp layers, a dimension associated with each of the warp layers, a set of weights associated with the warp layers, or a set of updated parameters.
20. The method of claim 14, wherein the method further comprises: encoding the second image portion using the displacement field, and The encoded second image portion is transmitted.

Description

Progressive warp layer with guided optical flow for INR-based video compression Cross Reference to Related Applications The present application claims the benefit of EP provisional patent application number 23306779.2 filed 10/12 of 2023, the disclosure of which is incorporated herein by reference in its entirety. Background Video coding systems may be used to compress digital video signals, for example, to reduce the storage and/or transmission bandwidth required for such signals. Video coding systems may include, for example, block-based, wavelet-based, and/or object-based systems. Disclosure of Invention Systems, methods, and tools are disclosed for performing INR-based video compression, for example, with a progressive warp layer having guided optical flow. In an example, a video encoding device may be configured to obtain an Implicit Neural Representation (INR) function, and may train the INR function on a first image portion of a scene. The device may obtain a warped layer associated with the INR function and may train the warped layer using the optical flow to indicate a displacement field of a second image portion of the scene. The device may determine parameters associated with the warp layer. The device may send parameters associated with the warp layer in the bitstream to a decoding device. The first image portion may belong to a first image and the second image portion may belong to the first image or the second image. In an example, the device may obtain the warped layer using coordinates associated with the first image portion as input and mapping the coordinates to a vector indicative of the displacement field. The device may determine an approximation of the displacement field based on an accuracy threshold. If the approximation of the displacement field is less than the accuracy threshold, the device may increase the number of layers associated with the warp layer or the neural network of which the warp layer is a part. The parameters associated with the warp layers may include at least one of a number of warp layers, a dimension associated with each of the warp layers, a set of weights associated with the warp layers, and/or a set of updated parameters. The device may encode the second image portion using the displacement field. The device may transmit the encoded second image portion. The device may determine transformation parameters associated with the first image portion and may determine the transformation parameters by minimizing a loss function. In an example, a video decoding device may be configured to receive a set of parameters associated with at least one warp layer that is associated with an Implicit Neural Representation (INR) function and may train the INR function on a first image portion of a scene. The device may use the set of parameters to obtain at least one warp layer. The device may reconstruct a neural network based on an Implicit Neural Representation (INR) function and at least one warping layer. The device may decode a second image portion of the scene using the reconstructed neural network. The first image portion may belong to the first image and the second image portion belongs to the first image or the second image. The device may receive a set of weights. The device may obtain a warp layer based on the set of parameters and the set of weights. A video decoding device may obtain an Implicit Neural Representation (INR) function that has been trained on a first image of a scene. For example, the INR function may have been previously calculated using optimization of the loss function and may be valid for the entire image or a portion of the image (e.g., a tile, a superpixel, and/or any connected or disconnected set of pixels). The video decoding device may obtain a warp layer associated with the INR function. The warp layer may have been trained via optical flow to indicate a displacement field of a second image of the scene. The video decoding device may decode the second image based on the displacement field. The video decoding device may obtain a plurality of parameters associated with the warp layers (e.g., each warp layer). The video decoding device may reconstruct the neural network based on the INR function and the warp layer. The video decoding device may decode the second image based on the reconstructed neural network. The second image may be one of a coding tree, a set of coding blocks, or a set of super-pixels. A video encoding device may obtain an Implicit Neural Representation (INR) function. For example, the INR function may have been trained on a first image of the scene. The video encoding device may obtain a warp layer associated with the INR function. The warp layer may have been trained via optical flow to indicate a displacement field of a second image of the scene. The video encoding device may encode the second image based on the displacement field. The video encoding device may determine the displacement field including parameterizing the