CN-121985229-A - Video frame inserting method, device, computer equipment and storage medium

CN121985229ACN 121985229 ACN121985229 ACN 121985229ACN-121985229-A

Abstract

The application relates to a video frame inserting method, a video frame inserting device, computer equipment and a storage medium. The method is applied to a double-shot system, the double-shot system comprises a main camera and an event camera, the method comprises the steps of obtaining an image set to be inserted, which is acquired by the main camera at a first frame rate, event stream data output by the event camera and a reference image set which is synchronously acquired with the image set to be inserted, calculating a space alignment optical flow field between the main camera and the event camera according to the image set to be inserted and the reference image set, aligning the event stream data with the image set to be inserted according to the space alignment optical flow field to obtain aligned event data, and obtaining a target inserted frame image according to the image set to be inserted and the aligned event data. By adopting the method, the frame inserting accuracy can be improved in a complex dynamic scene.

Inventors

GUO WENLONG
WANG JIN
LU YANQING
QIU SHUANGZHONG
LEI HUA

Assignees

虹软科技股份有限公司

Dates

Publication Date: 20260505
Application Date: 20260126

Claims (13)

1. A video framing method, characterized by being applied to a dual-shot system, the dual-shot system including a main camera and an event camera, the video framing method comprising: Acquiring a to-be-inserted frame image set acquired by the main camera at a first frame rate, and outputting event stream data and a reference image set acquired synchronously with the to-be-inserted frame image set by the event camera; Calculating a space alignment optical flow field between the main camera and the event camera according to the image set to be inserted and the reference image set; Aligning the event stream data with the image set of the frame to be inserted according to the space alignment optical flow field to obtain alignment event data; and obtaining a target frame inserting image according to the frame image set to be inserted and the alignment event data.
2. The method of claim 1, wherein the obtaining the set of to-be-interpolated images acquired by the primary camera at the first frame rate, and the event stream data output by the event camera and the set of reference images acquired synchronously with the set of to-be-interpolated images comprises: Acquiring a first image acquired by the main camera at a first moment and a second image acquired by the main camera at a second moment to obtain a to-be-inserted frame image set, wherein the time interval between the first moment and the second moment corresponds to the first frame rate; Acquiring a third image acquired by the event camera at the first moment and a fourth image acquired by the event camera at the second moment to obtain a reference image set; And acquiring event stream data output by the event camera between the first moment and the second moment.
3. The video interpolation method of claim 1, wherein the computing a spatially aligned optical flow field between the primary camera and the event camera from the set of images to be interpolated and the set of reference images comprises: and inputting the image set to be inserted and the reference image set to a first optical flow estimation network for pre-training to obtain the space alignment optical flow field.
4. The video frame interpolation method of claim 3, wherein the first optical flow estimation network further outputs a first optical flow confidence set corresponding to the spatially aligned optical flow field, and wherein the obtaining the spatially aligned optical flow field further comprises: if the first optical flow confidence coefficient set does not meet the preset abnormal condition, identifying an optical flow value with the confidence coefficient lower than a first confidence coefficient threshold value in the first optical flow confidence coefficient set to obtain an optical flow value to be optimized; and stopping inserting frames to the image set to be inserted if the first optical flow confidence coefficient set meets a preset abnormal condition.
5. The video interpolation method of claim 4, wherein the inputting the set of to-be-interpolated images and the set of reference images into a pre-trained first optical flow estimation network is preceded by: Preprocessing the frame image set to be inserted to obtain a preprocessed frame image set to be inserted, wherein the preprocessing comprises graying processing and size adjustment.
6. The method of video interpolation according to claim 1, wherein obtaining a target interpolation image according to the set of images to be interpolated and the alignment event data comprises: The method comprises the steps of inputting the image set to be interpolated, the alignment event data and the target interpolation moment to a pre-trained interpolation frame synthesis network to obtain a target interpolation frame image, wherein the interpolation frame synthesis network comprises a second optical flow estimation network and an interpolation frame result synthesis network, the second optical flow estimation network is used for generating a bidirectional optical flow and a fusion mask from the target interpolation frame moment to the image set to be interpolated according to the image set to be interpolated, the alignment event data and the target interpolation frame moment, and the interpolation frame result synthesis network is used for carrying out interpolation frame calculation according to the bidirectional optical flow, the fusion mask and the image set to be interpolated to obtain the target interpolation frame image.
7. The method of video interpolation according to claim 6, wherein the performing interpolation computation according to the bidirectional optical flow, the fusion mask, and the set of to-be-interpolated images to obtain a target interpolated image includes: reversely mapping the image set to be inserted to the target frame inserting moment according to the bidirectional optical flow to obtain a mapped image; and carrying out weighted fusion according to the fusion mask and the mapping image to obtain a target frame inserting image.
8. The video interpolation method of claim 7, wherein the second optical flow estimation network further outputs a second optical flow confidence set corresponding to the bidirectional optical flow, and wherein the interpolation combining network further comprises an interpolation feasibility determination module configured to: respectively identifying image key points in the image set of the frame to be inserted to obtain a first key point characteristic and a second key point characteristic; calculating feature matching stability according to the first key point feature and the second key point feature; Performing optical flow consistency calculation according to the bidirectional optical flow to obtain motion credibility; Determining an interpolation risk scoring result of the image set to be interpolated according to the second optical flow confidence coefficient set, the feature matching stability and the motion reliability; if the frame inserting risk scoring result does not meet the preset condition, frame inserting filtering or weight reducing processing is carried out according to the risk area in the frame inserting risk scoring result.
9. The video interpolation method of claim 7, wherein the training process of the interpolation network comprises: Acquiring a main shooting image sample set acquired by the main shooting camera or a camera of the same type as the main shooting camera at a second frame rate, and acquiring sample event stream data output by the event camera and a reference image sample set synchronously acquired with the main shooting image sample set, wherein the second frame rate is larger than the first frame rate; Respectively extracting frames from the main shot image sample set and the reference image sample set according to the first frame rate, determining a plurality of sample to-be-inserted frame image sets and sample reference image sets under the first frame rate, and extracting a sample event stream data subset corresponding to each sample to-be-inserted frame image set from the sample event stream data; constructing a sample data set according to the extracted frame sample image and the acquisition time of the frame sample image in the frame extraction process, and the sample to-be-inserted frame image set, the sample reference image set and the sample event stream data subset; And training a preset machine learning model based on the sample data set to obtain the frame interpolation synthesis network.
10. The video framing method of claim 9, wherein constructing the sample dataset comprises: calculating a sample space alignment optical flow field according to the sample to-be-inserted frame image set and the sample reference image set, and aligning the sample event stream data subset with the sample to-be-inserted frame image according to the sample space alignment optical flow field to obtain sample alignment event data; and according to the frame extraction sample image and the acquisition time of the frame extraction sample image, taking the sample to-be-inserted frame image set, the sample alignment event data and the acquisition time as input characteristics, and taking the frame extraction sample image as output characteristics to construct a sample data set.
11. A video framing apparatus for use in a dual-shot system comprising a main camera and an event camera, the video framing apparatus comprising: The image acquisition module is used for acquiring a to-be-inserted frame image set acquired by the main camera at a first frame rate, and event stream data output by the event camera and a reference image set acquired synchronously with the to-be-inserted frame image set; the optical flow calculation module is used for calculating a space alignment optical flow field between the main camera and the event camera according to the image set to be inserted and the reference image set; The event alignment module is used for aligning the event stream data with the image set of the frame to be inserted according to the space alignment optical flow field to obtain alignment event data; and the image frame inserting module is used for obtaining a target frame inserting image according to the image set to be inserted and the alignment event data.
12. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of any one of claims 1 to 11 when executing the computer program.
13. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any one of claims 1 to 11.

Description

Video frame inserting method, device, computer equipment and storage medium Technical Field The present application relates to the field of video processing technologies, and in particular, to a video frame inserting method, apparatus, computer device, and storage medium. Background The video interpolation technology is a method for improving the video frame rate or fluency by generating an intermediate frame between original video frames, and along with the continuous improvement of video quality and video requirements of people, the accuracy of generating the intermediate frame is more critical. The traditional video interpolation method generally predicts the intermediate frame by using an optical flow estimation or depth learning model, wherein the prediction mode is to calculate possible pixel information of the intermediate frame in the change process according to pixel change in an input frame, and the prediction accuracy of the intermediate frame is highly dependent on the pixel change, namely the accuracy of the optical flow estimation, so that the performance of the intermediate frame is highly dependent on clear and continuous space-time information in the input frame, however, under the condition of high-speed movement or low illumination, the traditional video interpolation method is difficult to accurately identify the optical flow change condition according to the front frame and the rear frame, so that the interpolation of the intermediate frame is wrong, and the interpolation result still has the abnormal conditions of artifacts, blurring or structural distortion and the like, thereby seriously affecting the viewing experience of a user. Therefore, the problem of low frame inserting accuracy in a complex dynamic scene still exists in the prior art. Disclosure of Invention In view of the foregoing, it is desirable to provide a video frame inserting method, apparatus, computer device, and storage medium capable of improving frame inserting accuracy in a complex dynamic scene. In a first aspect, the present application provides a video frame inserting method applied to a dual-camera system, where the dual-camera system includes a main camera and an event camera, the video frame inserting method includes: Acquiring a to-be-inserted frame image set acquired by the main camera at a first frame rate, and outputting event stream data and a reference image set acquired synchronously with the to-be-inserted frame image set by the event camera; Calculating a space alignment optical flow field between the main camera and the event camera according to the image set to be inserted and the reference image set; Aligning the event stream data with the image set of the frame to be inserted according to the space alignment optical flow field to obtain alignment event data; and obtaining a target frame inserting image according to the frame image set to be inserted and the alignment event data. In one embodiment, the acquiring the set of to-be-interpolated frame images acquired by the main camera at the first frame rate, and the event stream data output by the event camera and the set of reference images acquired synchronously with the set of to-be-interpolated frame images include: Acquiring a first image acquired by the main camera at a first moment and a second image acquired by the main camera at a second moment to obtain a to-be-inserted frame image set, wherein the time interval between the first moment and the second moment corresponds to the first frame rate; Acquiring a third image acquired by the event camera at the first moment and a fourth image acquired by the event camera at the second moment to obtain a reference image set; And acquiring event stream data output by the event camera between the first moment and the second moment. In one embodiment, the computing the spatially aligned optical flow field between the primary camera and the event camera from the set of to-be-interpolated images and the set of reference images comprises: and inputting the image set to be inserted and the reference image set to a first optical flow estimation network for pre-training to obtain the space alignment optical flow field. In one embodiment, the first optical flow estimation network further outputs a first optical flow confidence set corresponding to the spatially aligned optical flow field, and the obtaining the spatially aligned optical flow field further includes: if the first optical flow confidence coefficient set does not meet the preset abnormal condition, identifying an optical flow value with the confidence coefficient lower than a first confidence coefficient threshold value in the first optical flow confidence coefficient set to obtain an optical flow value to be optimized; and stopping inserting frames to the image set to be inserted if the first optical flow confidence coefficient set meets a preset abnormal condition. In one embodiment, before inputting the set of to-be-interpolated frame images