EP-4736129-A1 - KEYFRAME EXTRACTION FROM VIDEOS

EP4736129A1EP 4736129 A1EP4736129 A1EP 4736129A1EP-4736129-A1

Abstract

Examples described herein provide a method that includes receiving a video of an environment. The method further includes extracting keyframes from the video using a machine learning model to generate extracted keyframes. The method further includes performing blur detection on the extracted keyframes to remove invalid keyframes from the extracted keyframes to generate candidate keyframes. The method further includes performing image enhancement on at least one of the invalid keyframes to generate at least one enhanced keyframe, the at least one enhanced keyframe being added to the candidate keyframes. The method further includes generating a desired output based at least in part on the candidate keyframes.

Inventors

DU, Changyu
PARIAN, Jafar, Amiri

Assignees

Faro Technologies, Inc.

Dates

Publication Date: 20260506
Application Date: 20240627

Claims (20)

1. A method comprising: receiving a video of an environment; extracting keyframes from the video using a machine learning model to generate extracted keyframes; performing blur detection on the extracted keyframes to remove invalid keyframes from the extracted keyframes to generate candidate keyframes; performing image enhancement on at least one of the invalid keyframes to generate at least one enhanced keyframe, the at least one enhanced keyframe being added to the candidate keyframes; and generating a desired output based at least in part on the candidate keyframes.
2. The computer-implemented method of claim 1, wherein generating the desired output comprises at least one of: generating a video summary of the video using the candidate keyframes; estimating a trajectory for the video using the candidate keyframes; and generating a point cloud of the environment using the candidate keyframes.
3. The computer-implemented method of claim 1, wherein extracting the keyframes is performed using a deep learning-based approach to extract local features and keyframes.
4. The computer-implemented method of claim 3, wherein the deep learning-based approach comprises: for a first keyframe of the video: add the first keyframe of the video to an image collection using a neural network model; extracting first key points and first local features from the first keyframe of the video; and storing the first key points as current key points and the first local features as current descriptors; and for a second keyframe of the video: adding the second keyframe of the video to the image collection using the neural network model; extracting second key points and second local features from the second keyframe of the video; and storing the second key points as next key points and the second local features as next descriptors.
5. The computer-implemented method of claim 4, wherein the deep learning-based approach further comprises: matching the current descriptors and the next descriptors to determine corresponding key points between the current key points and next key points; calculating an average distance of the corresponding key points; and determining the average distance.
6. The computer-implemented method of claim 5, wherein the deep learning-based approach further comprises: responsive to determining that the average distance of the corresponding key points exceeds the threshold distance, use the second keyframe as the current frame and repeat the deep learning-based approach to extract local features and keyframes.
7. The computer-implemented method of claim 6, wherein the deep learning-based approach further comprises: responsive to determining that the average distance of the corresponding key points does not exceed the threshold distance, repeating the keyframe extraction using subsequent keyframes until the video is complete.
8. The computer-implemented method of claim 1, wherein performing the blur detection comprises convolving the extracted keyframes with a Laplacian kernel, calculating a variance on the convolution result, and using the variance to determine at least one of the extracted keyframes is valid.
9. The computer-implemented method of claim 1, wherein performing the blur detection comprises applying a blur detect filter.
10. The computer-implemented method of claim 1, wherein performing the image enhancement on the at least one of the invalid keyframes to generate the at least one enhanced keyframe comprises applying a machine learning-based deblurring technique.
11. A processing system comprising: a memory comprising computer readable instructions; and a processing device for executing the computer readable instructions, the computer readable instructions controlling the processing device to perform operations comprising: receiving a video of an environment; extracting keyframes from the video using a machine learning model to generate extracted keyframes; performing blur detection on the extracted keyframes to remove invalid keyframes from the extracted keyframes to generate candidate keyframes; performing image enhancement on at least one of the invalid keyframes to generate at least one enhanced keyframe, the at least one enhanced keyframe being added to the candidate keyframes; and generating a desired output based at least in part on the candidate keyframes.
12. The processing system of claim 11, wherein generating the desired output comprises at least one of generating a video summary of the video using the candidate keyframes; estimating a trajectory for the video using the candidate keyframes; and generating a point cloud of the environment using the candidate keyframes.
13. The processing system of claim 11, wherein extracting the keyframes is performed using a deep learning-based approach to extract local features and keyframes.
14. The processing system of claim 13, wherein the deep learning-based approach comprises: for a first keyframe of the video, add the first keyframe of the video to an image collection using a neural network model; extracting first key points and first local features from the first keyframe of the video; and storing the first key points as current key points and the first local features as current descriptors; and for a second keyframe of the video, adding the second keyframe of the video to the image collection using the neural network model; extracting second key points and second local features from the second keyframe of the video; and storing the second key points as next key points and the second local features as next descriptors.
15. The processing system of claim 14, wherein the deep learning-based approach further comprises: matching the current descriptors and the next descriptors to determine corresponding key points between the current key points and next key points; calculating an average distance of the corresponding key points; and determining the average distance.
16. The processing system of claim 15, wherein the deep learning-based approach further comprises: responsive to determining that the average distance of the corresponding key points exceeds the threshold distance, use the second keyframe as the current frame and repeat the deep learning-based approach to extract local features and keyframes.
17. The processing system of claim 15, wherein the deep learning-based approach further comprises: responsive to determining that the average distance of the corresponding key points does not exceed the threshold distance, repeating the keyframe extraction using subsequent keyframes until the video is complete.
18. The processing system of claim 11, wherein performing the blur detection comprises convolving the extracted keyframes with a Laplacian kernel, calculating a variance on the convolution result, and using the variance to determine at least one of the extracted keyframes is invalid.
19. The processing system of claim 11, wherein performing the blur detection comprises applying a blur detect filter.
20. The processing system of claim 11, wherein performing the image enhancement on the at least one of the invalid keyframes to generate the at least one enhanced keyframe comprises applying a machine learning-based deblurring technique.

Description

KEYFRAME EXTRACTION FROM VIDEOS CROSS-REFERENCE TO RELATED APPLICATION [0001] This application claims the benefit of U.S. Provisional Patent Application Serial No. 63/510,744, filed June 28, 2023 and entitled “Keyframe Extraction From Videos,” the contents of which are incorporated by reference herein in their entirety. BACKGROUND [0002] Processing systems (e.g., smartphones, laptop computers, tablet computers, wearable computing devices, and/or the like including combinations and/or multiples thereof) can include a sensor (e.g., a camera) for capturing images, such as of an object or environment. In some cases, the images are processed, analyzed, or otherwise used for some purpose, such as to measure environments or objects. For example, photogrammetry is a technique for measuring objects using images, such as photographic images acquired by a camera or other suitable sensor of a processing system. Photogrammetry can make 3D measurements from 2D images or photographs. [0003] Accordingly, while existing processing systems are suitable for their intended purposes the need for improvement remains, particularly in providing a processing system have the features described herein. BRIEF DESCRIPTION [0004] In one embodiment, a method is provided. The method includes receiving a video of an environment. The method further includes extracting keyframes from the video using a machine learning model to generate extracted keyframes. The method further includes performing blur detection on the extracted keyframes to remove invalid keyframes from the extracted keyframes to generate candidate keyframes. The method further includes performing image enhancement on at least one of the invalid keyframes to generate at least one enhanced keyframe, the at least one enhanced keyframe being added to the candidate keyframes. The method further includes generating a desired output based at least in part on the candidate keyframes. [0005] In another embodiment a system includes a memory having computer readable instructions. The system further includes a processing device for executing the computer readable instructions. The computer readable instructions control the processing device to perform operations. The operation include receiving a video of an environment. The operations further include extracting keyframes from the video using a machine learning model to generate extracted keyframes. The operations further include performing blur detection on the extracted keyframes to remove invalid keyframes from the extracted keyframes to generate candidate keyframes. The operations further include performing image enhancement on at least one of the invalid keyframes to generate at least one enhanced keyframe, the at least one enhanced keyframe being added to the candidate keyframes. The operations further include generating a desired output based at least in part on the candidate keyframes. [0006] The above features and advantages, and other features and advantages, of the disclosure are readily apparent from the following detailed description when taken in connection with the accompanying drawings. BRIEF DESCRIPTION OF DRAWINGS [0007] The specifics of the exclusive rights described herein are particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other features and advantages of one or more embodiments described herein are apparent from the following detailed description taken in conjunction with the accompanying drawings in which: [0008] FIG. 1 is a schematic illustration of a processing system for keyframe extraction for videogrammetry according to one or more embodiments described herein; [0009] FIG. 2 is a schematic illustration of a system for keyframe extraction for videogrammetry according to one or more embodiments described herein; [0010] FIG. 3A is a flow diagram of a method for keyframe extraction for videogrammetry according to one or more embodiments described herein; [0011] FIG. 3B is a flow diagram of a method for a deep learning-based approach to extract local features and keyframes according to one or more embodiments described herein; [0012] FIG. 4 is a flow diagram of a method for keyframe extraction for videogrammetry according to one or more embodiments described herein; [0013] FIG. 5 is a schematic illustration of a machine learning training and inference system according to one or more embodiments described herein; and [0014] FIG. 6 is a schematic illustration of a processing system for implementing the presently described techniques according to one or more embodiments described herein. [0015] The detailed description explains embodiments of the disclosure, together with advantages and features, by way of example with reference to the drawings. DETAILED DESCRIPTION [0016] Embodiments described herein provide for extracting keyframes from videos from a video. According to an embodiment, keyframes are extracted from a video based on video scen