CN-122024125-A - Computer-implemented method, computing device, and article of manufacture for enhancing video streams

CN122024125ACN 122024125 ACN122024125 ACN 122024125ACN-122024125-A

Abstract

Provided are a computer-implemented method, computing device, and article of manufacture for enhancing video streams. An example method includes receiving, by a first computing device, a plurality of input images forming the video stream, applying, for each of the plurality of input images, a geometric model to the input image to determine a surface orientation map indicative of a distribution of illumination on an object in the input image based on a surface geometry of the object, applying an ambient light estimation model to the input image to determine a direction of composite illumination to be applied to the input image to enhance at least a portion of the input image, applying a light energy model to determine a quotient image indicative of an amount of light energy to be applied to each pixel of the input image based on the surface orientation map and the direction of composite illumination, and enhancing a portion of the input image to generate a re-illuminated image based on the quotient image, and displaying the plurality of re-illuminated images as an enhanced video stream.

Inventors

S. R.F. Farnello
N. P. Sama
Y-T.Cai
R, K, pan di
P. Debevich
M. Milne
C. Legendre
J.T. Baron
C. Lehman
S. Bouaziz

Assignees

谷歌有限责任公司

Dates

Publication Date: 20260512
Application Date: 20210517
Priority Date: 20200930

Claims (20)

1. A computer-implemented method for enhancing a video stream, the method comprising: Receiving, by a first computing device, a plurality of input images forming the video stream; for each of the plurality of input images: Applying a geometric model to the input image to determine a surface orientation map indicative of a lighting distribution on an object in the input image based on a surface geometry of the object; Applying an ambient light estimation model to the input image to determine a direction of a composite illumination to be applied to the input image to enhance at least a portion of the input image; applying a model of light energy based on the surface orientation map and the direction of the composite illumination to determine a quotient image indicative of the amount of light energy to be applied to each pixel of the input image, and Enhancing the portion of the input image based on the quotient image to generate a re-illuminated image, and The plurality of re-illuminated images are displayed as an enhanced video stream.
2. The method of claim 1, wherein applying the ambient light estimation model comprises: detecting a gesture of the object in the input image, and A direction of the composite illumination is determined based on the pose.
3. The method of claim 2, wherein detecting the pose comprises inferring a 3D surface geometry using a face geometry solution comprising a face pose transformation matrix and a triangular face mesh.
4. The method of claim 2, wherein the pose of the object is used to automatically infer the direction of the composite illumination.
5. The method of claim 2, wherein detecting the gesture comprises performing a high-fidelity upper body gesture tracking that infers a two-dimensional 2D upper body landmark from the plurality of input images.
6. The method of claim 1, wherein at least one of the geometric model, the ambient light estimation model, or the light energy model comprises a machine learning model, and wherein the method further comprises training the machine learning model based on a training dataset comprising a plurality of images of the object having a plurality of illumination profiles.
7. The method of claim 1, wherein the enhancing is performed on a second computing device, and the method further comprises: Transmitting, by the first computing device, data associated with the input image to the second computing device, and The re-illuminated image is received by a first computing device from the second computing device.
8. The method of claim 7, wherein the first computing device is a mobile device and the second computing device is a remote server.
9. The method of claim 1, wherein applying the ambient light estimation model comprises: Receiving user preferences for directions of the composite lighting via an interactive graphical user interface, and The direction of the composite illumination is adjusted based on user preferences.
10. The method of claim 1, further comprising: Detecting a plurality of objects in the input image, and Different re-lighting effects are applied to at least two objects of the plurality of objects.
11. The method of claim 1, further comprising: generating a light visibility map based on the surface orientation map and the direction of the composite illumination, and Wherein the quotient image is determined based on the light visibility map.
12. The method of claim 1, further comprising: the re-illuminated image is post-processed including one or more of compensation for exposure levels in the input image, compensation for brightness levels in the input image, or matting refinement of smooth edges.
13. The method of claim 1, further comprising: A video stream is received from a camera integrated with the first computing device.
14. The method of claim 1, further comprising: the enhanced video stream is displayed on a display integrated with the first computing device.
15. The method of claim 1, further comprising: the method is optimized to run at interactive frame rates by using UNet model and float16 quantization.
16. The method of claim 1, wherein the quotient image is a quotient of the re-illuminated image and the input image.
17. The method of claim 1, further comprising: an interactive graphical user interface GUI is provided on the first computing device to enable a user to adjust the light position and intensity scale of the composite illumination in real-time.
18. The method of claim 1, wherein applying the ambient light estimation model to the input image comprises generating a high dynamic range HDR illumination environment from a low dynamic range LDR image of a set of reference objects, wherein each object in the set of reference objects has a respective bi-directional reflectance distribution function BRDF.
19. The method of claim 1, wherein the receiving of the plurality of input images occurs at a real-time frame rate, and wherein the enhancing of each input image occurs at a rate sufficient to maintain the real-time frame rate.
20. A computing device for enhancing a video stream, comprising: One or more processors, and A data storage device, wherein the data storage device has stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing apparatus to perform a method comprising the computer-implemented method of any of claims 1 to 19.

Description

Computer-implemented method, computing device, and article of manufacture for enhancing video streams The application is a divisional application of an application patent application with international application date of 2021, 5-month and 17-date, chinese application number of 202180067003.5 and the application name of 'enhanced photo re-illumination based on machine learning model'. Cross Reference to Related Applications The present application claims priority from U.S. provisional patent application No. 63/085,529 filed on 9/30 of 2020, the entire contents of which are incorporated herein by reference. Technical Field The present disclosure relates to methods and apparatus for re-illuminating photographs for enhancement, and more particularly to methods and apparatus for re-illuminating photographs based on machine learning models. Background Many modern computing devices, including mobile phones, personal computers, and tablets, include image capture devices such as still and/or video cameras. The image capturing device may capture images, including for example images of a person, an animal, a landscape, and/or an object. Some image capture devices and/or computing devices may correct or otherwise 78 modify the captured image. For example, some image capture devices may provide "red eye" correction that removes artifacts such as red-appearing eyes of humans and animals that may be present in images captured using intense light such as flash illumination. After correcting the captured image, the corrected image may be saved, displayed, transferred, printed onto paper, and/or otherwise used. A professional photographer (such as, for example, a portrait photographer) creates an attractive picture of a subject using the attribute of light on the subject. Such photographers often use specialized equipment, such as a camera external flash and a reflector (reflector), to position the illumination and illuminate their subjects to achieve a professional appearance. In some instances, such activities are performed in a controlled studio environment and involve expertise in equipment, lighting, and the like. Mobile phone users often do not have access to such dedicated portrait studio resources or know how to use them. However, users may prefer to obtain professional, high quality results for experienced portrait photographers. Disclosure of Invention In one aspect, the image capture device may be configured to translate a professional photographer's understanding of light and use of camera external illumination into a computer-implemented method. Powered by the system of machine learning components, the image capture device may be configured to enable a user to create attractive illumination for a portrait or other type of image. In some aspects, the mobile device may be configured with these features so that the image may be enhanced in real-time. In some cases, the mobile device may automatically enhance the image. In other aspects, mobile phone users may enhance images non-destructively to match their preferences. Further, for example, pre-existing images in a user image library may be enhanced based on the techniques described herein. In one aspect, a computer-implemented method is provided. The computing device applies a geometric model to the input image to determine a surface orientation map indicative of a lighting distribution on an object in the input image based on a surface geometry of the object. The computing device applies an ambient light estimation model to the input image to determine a direction of composite illumination to be applied to the input image to enhance at least a portion of the input image. The computing device applies a model of light energy based on the surface orientation map and the direction of the composite illumination to determine a quotient image indicative of the amount of light energy to be applied to each pixel of the input image. The computing device enhances the portion of the input image based on the quotient image. In another aspect, a computing device is provided. The computing device includes one or more processors and data storage. The data storage has stored thereon computer-executable instructions that, when executed by one or more processors, cause the computing device to perform functions. The functions include applying a geometric model to the input image to determine a surface orientation map indicative of an illumination distribution on an object in the input image based on a surface geometry of the object, applying an ambient light estimation model to the input image to determine a direction of composite illumination to be applied to the input image to enhance at least a portion of the input image, applying a light energy model to determine a quotient image indicative of light energy to be applied to each pixel of the input image based on the surface orientation map and the direction of composite illumination, and enhancing the portion of the input image