CN-119922332-B - Video coding method and system based on implicit neural video representation

CN119922332BCN 119922332 BCN119922332 BCN 119922332BCN-119922332-B

Abstract

The invention provides a video coding method and a video coding system based on implicit neural video representation, belonging to the technical field of video coding; the method comprises the steps of obtaining local and all hidden layer state characteristics of a current video frame at a moment, compensating global hidden layer state characteristics of the current video frame at the moment based on mixed residual characteristics extracted by a residual grid, processing the compensated global hidden layer state characteristics by using a coupling mapping RNN network to obtain local content characteristics and background characteristics, obtaining global information characteristics based on a first mapping RNN module, inputting the global characteristics and the global hidden layer state characteristics of the current moment into a second mapping RNN module, obtaining the global hidden layer state characteristics of the current moment and inputting the global hidden layer state characteristics of the current moment into an up-sampling module to reconstruct the video frame, and obtaining a reconstructed video. The processing capability of the time domain information between the video frames is effectively enhanced, the decomposition of global and local motions between the video frames is realized, and the reconstruction quality of the video frames is improved.

Inventors

LI SHUAI
LI YIYANG
GAO YANBO
LEI JIANJUN
ZHANG JINGLIN
CAI XUN
YUAN HUI

Assignees

山东大学

Dates

Publication Date: 20260512
Application Date: 20250121

Claims (10)

1. A video encoding method based on implicit neurovideo representation, comprising: Acquiring local hidden layer state characteristics and global hidden layer state characteristics at a moment on a current video frame; Extracting mixed residual characteristics in a video frame based on a mixed residual grid, and adding the mixed residual characteristics with global hidden layer characteristics at a moment on a current video frame to obtain compensated global hidden layer characteristics; Inputting the compensated global hidden layer state characteristics into a coupling mapping RNN module for iterative updating to obtain a reconstructed video frame; The coupling mapping RNN module processes the compensated global hidden layer state characteristics by convolution to obtain local content characteristics and background characteristics; The method comprises the steps of inputting local content characteristics and local hidden layer characteristics at a moment on a current video frame into a first mapping RNN module to obtain local hidden layer characteristics at the current moment, aggregating the local hidden layer characteristics at the current moment and background characteristics to obtain global information characteristics, inputting the global information characteristics and the global hidden layer characteristics at the moment on the current video frame into a second mapping RNN module to obtain global hidden layer state characteristics at the current moment; And inputting the global hidden layer state characteristics at the current moment into an up-sampling module to reconstruct the video frame, and finally obtaining the reconstructed video.
2. The video coding method based on implicit neurovideo representation of claim 1, wherein the hybrid residual grid is: where L, C, H and W are the time dimension resolution, number of channels, height and width of the grid, respectively.
3. The video coding method based on implicit neural video presentation of claim 1, wherein the specific process of the coupling map RNN module to process the compensated global hidden layer feature using convolution to obtain the local content feature and the background feature is: inputting the compensated global hidden layer state characteristics into a convolution layer, generating a soft mask, and decomposing the compensated global hidden layer state characteristics into local content characteristics And background features The formula is: In the formula, The global hidden layer characteristics after compensation are obtained; Is a hybrid residual feature; The global hidden layer state characteristics at the moment on the current video frame are obtained; Is a soft mask; representing element-by-element multiplication.
4. The video coding method based on implicit neurovideo presentation of claim 1, wherein the local hidden layer state at the current time is characterized by: In the formula, Local hidden layer characteristics at the current moment; representing processing operations of the first mapped RNN module; locally hiding the layer-like features for a moment on the current video frame; is a local content feature; a representative motion information generation module; Representing a deformation mapping operation.
5. The video coding method based on implicit neurovideo presentation of claim 1, wherein the global information features are specifically: In the formula, Is a global information feature; Local hidden layer characteristics at the current moment; is a background feature.
6. The video coding method based on implicit neural video presentation of claim 1, wherein the global hidden layer state at the current time is characterized by: In the formula, Layer characteristics are hidden for the global; representing processing operations of the second mapped RNN module; () Representing a deformation mapping operation; The global hidden layer state characteristics at the moment on the current video frame are obtained; Is a global information feature.
7. The video coding method based on implicit neural video representation as claimed in claim 1, wherein the process of inputting global hidden layer state characteristics at the current moment into the upsampling module to reconstruct video frames and finally obtaining reconstructed video is as follows: the method comprises the steps of carrying out space projection on global hidden layer characteristics to generate appearance expression characteristics, carrying out reconstruction of video frames on the appearance expression characteristics by utilizing an up-sampling module, and specifically comprising the following steps: In the formula, Representing the video after the reconstruction, The representative up-sampling module is formed by stacking a plurality of layers of convolution and up-sampling operations; Representing spatial projection, implemented by a convolution layer; Layer features are hidden for the global.
8. A video coding system based on implicit neurovideo presentation, comprising: The hidden layer state feature acquisition module is configured to acquire local hidden layer state features and global hidden layer state features at a moment on a current video frame; The residual error feature compensation module is configured to extract mixed residual error features in the video frame based on the mixed residual error grid, add the mixed residual error features with global hidden layer features at a moment on the current video frame, and acquire compensated global hidden layer features; The video frame reconstruction module is configured to input the compensated global hidden layer state characteristics into the coupling mapping RNN module for iterative updating to obtain a reconstructed video frame; The coupling mapping RNN module processes the compensated global hidden layer state characteristics by convolution to obtain local content characteristics and background characteristics; The method comprises the steps of inputting local content characteristics and local hidden layer characteristics at a moment on a current video frame into a first mapping RNN module to obtain local hidden layer characteristics at the current moment, aggregating the local hidden layer characteristics at the current moment and background characteristics to obtain global information characteristics, inputting the global information characteristics and the global hidden layer characteristics at the moment on the current video frame into a second mapping RNN module to obtain global hidden layer state characteristics at the current moment; And inputting the global hidden layer state characteristics at the current moment into an up-sampling module to reconstruct the video frame, and finally obtaining the reconstructed video.
9. A computer readable storage medium having stored thereon a program, which when executed by a processor performs the steps in a video coding method based on an implicit neurovideo presentation as claimed in any of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor performs the steps in a video encoding method based on an implicit neurovideo presentation as claimed in any one of claims 1 to 7 when the program is executed.

Description

Video coding method and system based on implicit neural video representation Technical Field The invention belongs to the technical field of video coding, and particularly relates to a video coding method and system based on implicit neural video representation. Background The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art. With the development of networking and digitalization, the artificial intelligence is used for amplifying the wonderful colors in various fields, and a series of breakthrough achievements are obtained. In the video field, due to the improvement of equipment and the development of related industries, more and more high-quality videos participate in daily life of people and play an important role in information interaction, so that a video coding method with both video compression efficiency and video reconstruction quality is required. The implicit neural video representation (Neural Representations for Videos, neRV) is an implicit neural representation method for processing video signals frame by frame, which fits the video signals into an implicit neural network, converts the video compression problem into a neural network compression problem, and can compress the implicit neural network fitted with video information by using common model compression methods such as model pruning, model quantization and entropy coding, thereby realizing the effect of video compression. Compared with the traditional video coding method and the learning-based video coding method, the implicit neural video representation method has the advantages of faster decoding speed, higher video compression efficiency and the like. However, the existing implicit neural video representation method has some problems that the time-space relationship among long-term frames is not fully considered when a video sequence is processed, the processing capability of time domain information among video frames is poor, the decomposition of global and local motions among video frames cannot be effectively realized, regular information and irregular residual information in the video cannot be modeled respectively, and a great improvement space is still reserved on the video representation quality and compression efficiency. Disclosure of Invention In order to overcome the defects in the prior art, the invention provides a video coding method and a video coding system based on implicit nerve video representation, which can effectively process time domain information between video frames by utilizing a coupling mapping RNN (Coupled WarpRNN) network, simultaneously respectively map and enhance global motion information and local motion information in video, better process the motion information between the video frames, model regular information and irregular information in video signals, and further effectively improve video reconstruction quality and compression efficiency. To achieve the above object, one or more embodiments of the present invention provide the following technical solutions: The first aspect of the invention provides a video coding method based on implicit neural video representation; a video encoding method based on implicit neurovideo representation, comprising: Acquiring local hidden layer state characteristics and global hidden layer state characteristics at a moment on a current video frame; Extracting mixed residual characteristics in a video frame based on a mixed residual grid, and adding the mixed residual characteristics with global hidden layer characteristics at a moment on a current video frame to obtain compensated global hidden layer characteristics; Inputting the compensated global hidden layer state characteristics into a coupling mapping RNN module for iterative updating to obtain a reconstructed video frame; The coupling mapping RNN module processes the compensated global hidden layer state characteristics by convolution to obtain local content characteristics and background characteristics; The method comprises the steps of inputting local content characteristics and local hidden layer characteristics at a moment on a current video frame into a first mapping RNN module to obtain local hidden layer characteristics at the current moment, aggregating the local hidden layer characteristics at the current moment and background characteristics to obtain global information characteristics, inputting the global characteristics and the global hidden layer characteristics at the moment on the current video frame into a second mapping RNN module to obtain global hidden layer state characteristics at the current moment; And inputting the global hidden layer state characteristics at the current moment into an up-sampling module to reconstruct the video frame, and finally obtaining the reconstructed video. As a further technical solution, the hybrid residual grid is: Where L, C, H and W are the time dimension res