CN-122002083-A - Video interpolation frame generation method, device, equipment and medium

CN122002083ACN 122002083 ACN122002083 ACN 122002083ACN-122002083-A

Abstract

The application discloses a video interpolation frame generation method, device, equipment and medium, which relate to obtaining a front frame image and a rear frame image of adjacent video frame images respectively, combining a camera parameter matrix, carrying out re-projection on the rear frame image to obtain a plane scanning body image, obtaining mixed weights and background images of multi-plane images according to the front frame image and the plane scanning body image, constructing the multi-plane images comprising a plurality of depth layers, obtaining a plurality of depth plane images, carrying out transformation on each depth plane image to obtain an intermediate reference frame image, extracting multi-scale characteristics of the images based on the front frame image, the rear frame image and the intermediate reference frame image, and then carrying out layer-by-layer refinement on the optical flow to obtain a fusion image, aligning the fusion image, extracting context characteristics of the images, and generating a video interpolation frame. The method effectively combines the spatial structure information and the motion modeling in the interpolation process to generate the interpolation video frame with higher precision.

Inventors

ZHAO YANG
HONG RUI
Diao Shanding
JIA WEI
LIU XIAOPING

Assignees

合肥工业大学

Dates

Publication Date: 20260508
Application Date: 20251222

Claims (10)

1. A method for generating a video interpolation frame, comprising Acquiring adjacent video frame images which are respectively a front frame image and a rear frame image; Combining the camera parameter matrix, and carrying out re-projection processing on the rear frame image to obtain a plane scanning body image; obtaining mixed weight and background image of the multi-plane image according to the previous frame image and the plane scanning body image, constructing the multi-plane image comprising a plurality of depth layers, and obtaining a plurality of depth plane images; Transforming each depth plane map to obtain an intermediate reference frame image; Extracting multi-scale features of the images based on the front frame image, the rear frame image and the middle reference frame image, and then refining the optical flow layer by layer to obtain a fusion image; and aligning the fusion map, extracting the context characteristics of the image, and generating a video interpolation frame.
2. The method for generating a video interpolation frame as set forth in claim 1, wherein said re-projecting said post-frame image to obtain a planar scan volume image comprises, Projecting, namely re-projecting the rear frame image to a group of fixed depth planes during projection; And carrying out plane processing, namely carrying out plane scanning body processing on the fixed depth plane to obtain a plane scanning body image.
3. The method for generating a video interpolation frame as set forth in claim 1, wherein the step of obtaining a mixed weight of the multi-plane image and a background image from the previous frame image and the plane scan volume image, constructing the multi-plane image including a plurality of depth layers, comprises, And inputting the front frame image and the plane scanning body image into a viewpoint synthesis module, acquiring the mixing weight and the background image of a group of multi-plane images based on a multi-plane image network, and simultaneously carrying out MPI mixing, thereby constructing the multi-plane image comprising a plurality of depth layers.
4. The method for generating a video interpolation frame as set forth in claim 3, wherein said obtaining a plurality of depth maps comprises, The multi-plane image containing a plurality of depth layers is organized into a plurality of depth plane images according to a preset depth sequence, and each depth plane image comprises a color image and a transparency image.
5. The method for generating a video interpolation frame as set forth in claim 4, wherein said transforming each depth plane map to obtain an intermediate reference frame image comprises, And projecting each depth plane image to a target viewpoint through inverse perspective transformation, and then synthesizing each image to obtain an intermediate reference frame image.
6. The method for generating a video interpolation frame according to any one of claims 1 to 5, wherein the image multi-scale features are extracted based on the preceding frame image, the following frame image, and the intermediate reference frame image, and then the optical flow is refined layer by layer to obtain a fusion map, comprising, When the multi-scale features of the image are extracted, a feature extractor is adopted; After extraction, the features are input Module by And the module performs fusion after optical flow estimation and refinement to obtain a fusion graph.
7. The method of claim 6, wherein aligning the fusion map, extracting image context features, and generating a video interpolation frame comprises performing reverse optical flow warping on an original image to generate an aligned image, and then based on the aligned image, at VFIBlock The module extracts the contextual characteristics of the image; Finally, based on The module integrates the front and rear frame images, the middle reference frame image, the alignment feature, the final optical flow and the fusion image to generate a video interpolation frame.
8. A video interpolation frame generation apparatus, comprising, The viewpoint synthesis module is used for combining the camera parameter matrix, re-projecting the rear frame image to obtain a plane scanning body image, obtaining a mixed weight and a background image of the multi-plane image according to the front frame image and the plane scanning body image, constructing the multi-plane image comprising a plurality of depth layers, and obtaining a plurality of depth plane images; The module extracts image multi-scale features based on the front frame image, the rear frame image and the middle reference frame image, and refines the optical flow layer by layer to obtain a fusion image so as to obtain the fusion image; The module is used for aligning the fusion graph and extracting the contextual characteristics of the image; And the module is used for generating a video interpolation frame.
9. An electronic device, comprising: A memory for storing a computer program; a processor for executing the computer program to implement the video interpolation frame generation method of any one of claims 1-7.
10. A computer readable storage medium for storing a computer program, wherein the computer program when executed by a processor implements the video interpolation frame generation method of any one of claims 1-7.

Description

Video interpolation frame generation method, device, equipment and medium Technical Field The present invention relates to the field of video interpolation technologies, and in particular, to a method, an apparatus, a device, and a medium for generating a video interpolation frame for a viewpoint change sequence. Background Video Frame Interpolation (VFI) aims to increase the frame rate or create a fluent slow motion effect by generating new intermediate frames between adjacent frames. Conventional video frame interpolation methods rely primarily on optical flow estimation or pixel-level motion analysis to generate intermediate frames by inferring motion information between adjacent frames. Such methods can achieve good results when processing conventional video, but are prone to artifacts, blurring or unnatural transitions in complex scenes, especially in the presence of occlusions, rapid movements or non-rigid deformations. In recent years, with the development of deep learning, interpolation methods based on convolutional neural networks are becoming mainstream gradually, and they can alleviate the shortages of the conventional optical flow methods to some extent. However, these methods typically model three-dimensional spatial motion projected onto a two-dimensional plane, lacking perception of real-space structure, and thus perform poorly in video sequences of viewpoint changes. On the other hand, the viewpoint synthesis technology can infer the spatial structure of a scene by utilizing multi-viewpoint images and generate new viewpoint images, and is widely applied to the fields of three-dimensional reconstruction, virtual reality, free viewpoint video and the like. Although viewpoint synthesis can provide rich spatial information, holes and artifacts are easily generated in occlusion or exposure areas, and direct use for video interpolation can lead to degradation of visual quality. Meanwhile, the viewpoint changes in the video are usually smooth and gradual, and sufficient spatial information is often difficult to obtain by performing viewpoint synthesis only by relying on a small number of adjacent frames, and the quality of the generated intermediate frames is limited. Disclosure of Invention The invention provides a video interpolation frame generation method, device, equipment and medium for a viewpoint change sequence, which are easy to generate distortion due to lack of space structure perception in the traditional video interpolation method, and are easy to introduce artifacts and incomplete images by a method of purely relying on viewpoint synthesis, so that the problem of how to effectively combine space structure information and motion modeling in the interpolation process to generate an interpolation video frame is solved, and the problem of improving video interpolation precision and visual quality in a viewpoint change scene is further solved. In order to achieve the above purpose, the technical scheme provided by the invention is as follows: a video interpolation frame generation method includes Acquiring adjacent video frame images which are respectively a front frame image and a rear frame image; Combining the camera parameter matrix, and carrying out re-projection processing on the rear frame image to obtain a plane scanning body image; obtaining mixed weight and background image of the multi-plane image according to the previous frame image and the plane scanning body image, constructing the multi-plane image comprising a plurality of depth layers, and obtaining a plurality of depth plane images; Transforming each depth plane map to obtain an intermediate reference frame image; Extracting image multi-scale features based on the front frame image, the rear frame image and the middle reference frame image, and then refining the optical flow layer by layer to obtain a fusion image; and aligning the fusion map, extracting the context characteristics of the image, and generating a video interpolation frame. As a further improvement, said processing said post-frame image re-projection to obtain a planar scan volume image, comprising, Projecting, namely re-projecting the rear frame image to a group of fixed depth planes during projection; And carrying out plane processing, namely carrying out plane scanning body processing on the fixed depth plane to obtain a plane scanning body image. As a further improvement, the step of obtaining a mixed weight of the multi-plane image and a background image according to the previous frame image and the plane scanning body image, and constructing the multi-plane image comprising a plurality of depth layers comprises the steps of, And inputting the front frame image and the plane scanning body image into a viewpoint synthesis module, acquiring the mixing weight and the background image of a group of multi-plane images based on a multi-plane image network, and simultaneously carrying out MPI mixing, thereby constructing the multi-plane image comprising a