CN-122001996-A - Frame inserting method and system based on deep learning

CN122001996ACN 122001996 ACN122001996 ACN 122001996ACN-122001996-A

Abstract

The invention provides a frame inserting method and a system based on deep learning, which comprises the following steps of extracting image characteristics of two adjacent frame images, predicting the image characteristics of the two adjacent frame images through an inserting frame model F main to output an intermediate frame K predict serving as an inserting frame, and Wherein K t 、K t+1 is two adjacent frames of images respectively, delta K t+0.5 is residual component of output result of the interpolation frame model F main , and delta K t+0.5 =F main (K t ,K t+1 ). When the method is used for frame insertion, the intermediate frame prediction network F main can directly generate the intermediate frame through the original image and the residual component, the whole operation time is shorter, the network structure is easier to optimize, and the frame insertion uniformity is ensured.

Inventors

HUANG CHENG
LU YANG
LI MENG
CHEN BO
XIONG XIAOYI

Assignees

武汉高德智感科技有限公司

Dates

Publication Date: 20260508
Application Date: 20251229

Claims (10)

1. The frame inserting method based on the deep learning is characterized by comprising the following steps of: extracting image characteristics of two adjacent frames of images; Predicting the image characteristics of two adjacent frames of images by an interpolation frame model F main to output an intermediate frame K predict as an interpolation frame, and ; Wherein K t 、K t+1 is two adjacent frames of images respectively, delta K t+0.5 is residual component of the output result of the interpolation frame model F main , and delta K t+0.5 =F main (K t ,K t+1 ).
2. The frame interpolation method according to claim 1, wherein the frame interpolation model F main is obtained by: Respectively extracting the image characteristics of the adjacent t frame image I t and t+1st frame image I t+1 ; Predicting the image characteristics of the t frame image I t and the t+1st frame image I t+1 based on the intermediate frame prediction network F main to obtain a residual component delta I t+0.5 of an output result of the intermediate frame prediction network F main , and generating a predicted intermediate frame between two adjacent frame images based on the residual component delta I t+0.5 And predicts an intermediate frame The residual component Δi t+0.5 =F main (I t ,I t+1 ); The intermediate frame prediction network F main is trained based on the intermediate frame prediction loss function L main until the intermediate frame prediction network F main meets the training requirement, and the intermediate frame prediction network F main is output to serve as an interpolation frame model.
3. The method of frame insertion according to claim 2, wherein the inter-frame prediction loss function L main is as follows: ; Wherein, the Is an L1 loss function; is a loss function of GAN; Lambda GAN is the weight coefficient of GAN loss, and I gt is the true intermediate frame between the t-th frame image I t and the t+1st frame image I t+1 .
4. The method of frame interpolation according to claim 2, wherein the step of obtaining the frame interpolation model F main further comprises: Inputting the t frame image I t , the t+1st frame image I t+1 , and the real intermediate frame I gt between the t frame image I t and the t+1st frame image I t+1 into an optical flow auxiliary prediction network F aux to obtain auxiliary prediction optical flow information F aux ; warp transformation based on auxiliary prediction optical flow information f aux to generate an optical flow auxiliary prediction intermediate frame ; Predicting intermediate frames with the aid of the optical flow Generating predicted intermediate frames for intermediate frame prediction network F main Is supervised.
5. The frame interpolation method of claim 4, wherein the auxiliary prediction optical flow information f aux includes an auxiliary prediction optical flow between the t-th frame image I t and the real intermediate frame I gt And auxiliary prediction optical flow between the real intermediate frame I gt and the t+1st frame image I t+1 ; And Warp represents optical flow distortion transformation and M 1 is the first mask matrix.
6. The framing method of claim 4, wherein the framing method further comprises: Inputting the t frame image I t , the t+1st frame image I t+1 , the t frame image I t , and the real intermediate frame I gt between the t+1st frame image I t+1 into the teacher network F teacher to obtain teacher network predicted optical flow information F teather ; An optical-flow-aided prediction loss function L aux is constructed based on teacher-network-predicted optical-flow information F teather , and an optical-flow-aided prediction intermediate frame is generated for an optical-flow-aided prediction network F aux by the optical-flow-aided prediction loss function L aux Is supervised: ; Wherein, the Lambda flow is the optical flow loss proportionality coefficient; is a loss function of GAN.
7. The framing method of claim 6, wherein the framing method further comprises: An optical flow output loss function L flow is constructed based on the optical flow information output from the teacher network F teacher , and the process of outputting the auxiliary prediction optical flow information F aux by the optical flow auxiliary prediction network F aux is supervised by the optical flow output loss function L flow , wherein, And is the L1 loss function.
8. The framing method of claim 6, wherein the framing method further comprises: Warp transformation is performed based on teacher network prediction optical flow information f teather to generate teacher network prediction intermediate frames ; And predicting an intermediate frame based on the teacher network Constructing a teacher network loss function L teacher , and training the teacher network F teacher through the teacher network loss function L teacher ; Wherein, the And is the L1 loss function.
9. The method of claim 8, wherein the teacher network predicted optical flow information f teather includes a teacher network predicted optical flow between a t-th frame image I t , a real intermediate frame I gt Teacher network prediction optical flow between real intermediate frame I gt and t+1st frame image I t+1 ; And Warp represents an optical flow distortion transformation and M 2 is a second mask matrix.
10. A framing system, comprising: the feature extraction module is used for extracting image features of two adjacent frames of images based on a feature extraction network; And a frame inserting module for receiving image features of the adjacent two frames of images and outputting an intermediate frame I predict as an inserted frame based on the frame inserting model.

Description

Frame inserting method and system based on deep learning Technical Field The invention relates to the technical field of image processing, in particular to a frame inserting method and system based on deep learning. Background In the prior art, a video frame inserting technology is generally adopted to convert a low-frame-rate video into a high-frame-rate video, so that video playing is smoother, video content details are richer, and viewing experience of a user is improved. At present, video interpolation is completed by a method based on optical flow estimation, and the method is used for generating an intermediate frame by estimating optical flow information between adjacent frames and performing warp transformation on an image by using optical flow. However, in fast motion, occlusion, and complex texture scenarios, the optical flow estimation results are often not accurate enough, resulting in ghosting, blurring, and artifacts in the interpolated results. In addition, in the prior art, a method for implementing frame interpolation based on various deep learning has been developed, for example, a convolutional neural network or an attention mechanism is used to directly predict an intermediate frame, but the method lacks constraint on motion information, so that the frame interpolation is uneven, and the motion is unsmooth and unnatural. Furthermore, in the prior art, when optical flow estimation is performed, the generated optical flow estimation result is not accurate enough, and especially when a target moves fast and is blocked, blurring and structural dislocation are easy to cause, and meanwhile, when an optical flow estimation network is trained, the situation that the model training effect cannot reach expectations due to the fact that optical flow supervision data are scarce easily occurs, and further the performance of the model is limited. Disclosure of Invention The invention aims to provide a frame inserting method and a frame inserting system based on deep learning, which can directly generate an intermediate frame through an original image and residual components by an intermediate frame prediction network F main when frame inserting is carried out, so that the whole operation time is shorter, the network structure is easier to optimize, and the uniformity of frame inserting is ensured. In order to achieve the above purpose, the present invention provides the following technical solutions: in one aspect, a frame inserting method based on deep learning is provided, which includes the following steps: extracting image characteristics of two adjacent frames of images; Predicting the image characteristics of two adjacent frames of images by an interpolation frame model F main to output an intermediate frame K predict as an interpolation frame, and ; Wherein K t、Kt+1 is two adjacent frames of images respectively, delta K t+0.5 is residual component of the output result of the interpolation frame model F main, and delta K t+0.5=Fmain(Kt,Kt+1). In another aspect, there is also provided a frame insertion system, including: the feature extraction module is used for extracting image features of two adjacent frames of images based on a feature extraction network; And a frame inserting module for receiving image features of the adjacent two frames of images and outputting an intermediate frame I predict as an inserted frame based on the frame inserting model. In summary, compared with the prior art, the invention has the following beneficial effects: When the method is used for frame insertion, the intermediate frame prediction network F main can directly generate an intermediate frame through the original image and the residual component delta I t+0.5, compared with the existing optical flow estimation scheme, the overall operation time is shorter, the network structure is easier to optimize, and an accurate intermediate frame can be generated in a target fast motion and shielding scene so as to ensure that the frame insertion is uniform; Meanwhile, when model training is performed, the residual optical flow prediction network F main, the optical flow auxiliary prediction network F aux and the teacher network F teacher are constructed to perform online distillation, so that the optical flow output process of the optical flow auxiliary prediction network F aux can be supervised based on the accurate optical flow information output by the teacher network, and the optical flow auxiliary prediction network F aux can also output the accurate optical flow information under the condition that optical flow labeling data are scarce; Furthermore, the invention specifically learns the optical flow information through the optical flow auxiliary prediction network F aux, generates an intermediate frame through Warp and Soft-Mask, and simultaneously generates an optical flow auxiliary prediction intermediate frame for the optical flow auxiliary prediction network F aux by using the teacher network F teacherThe opti