Search

KR-20260062990-A - FACIAL SYNTHESIS IN AUGMENTED REALITY CONTENT FOR THIRD PARTY APPLICATIONS

KR20260062990AKR 20260062990 AKR20260062990 AKR 20260062990AKR-20260062990-A

Abstract

The present technology receives a selection of selectable graphic items to initiate the generation of augmented reality content including face synthesis, the selection is received by a third-party application, and the third-party application is executed by a computing device separate from the first-party application and the messaging server system. The present technology captures image data by the client device. The present technology generates sets of source pose parameters by one or more hardware processors and based at least partially on frames of source media content. The present technology generates output media content using an interface communicating with a messaging server system based at least partially on the sets of source pose parameters. The present technology provides augmented reality content based at least partially on the output media content for display on the computing device.

Inventors

  • 트카첸코, 그리고리
  • 자이체바, 인나

Assignees

  • 스냅 인코포레이티드

Dates

Publication Date
20260507
Application Date
20220330
Priority Date
20210331

Claims (20)

  1. As a method, A step of providing, by one or more hardware processors, to display a first graphic interface in a third-party application—the first graphic interface includes a first selectable graphic item for a text entry, the third-party application is executed by a computing device distinct from a first-party application and a messaging server system associated with the first-party application, and the first-party application and the messaging server system correspond to the same entity—; In response to receiving the selection of the first selectable graphic item, the step of providing, by the one or more hardware processors, to display a second graphic interface in the third-party application—the second graphic interface includes at least a second selectable graphic item to initiate the creation of augmented reality content including face synthesis—; A step of receiving a selection of the second selectable graphic item to initiate the generation of augmented reality content including face synthesis by the above one or more hardware processors - the selection of the second selectable graphic item is received by the third-party application -; A step of generating a set of source pose parameters by the above-mentioned one or more hardware processors and based at least partially on the frames of the source media content—the set of source pose parameters includes the positions of expressions of the source actor's head in the frames of the source media content and the facial expressions of the source actor—; and A step of generating output media content using an interface communicating with the messaging server system by the one or more hardware processors, at least partially based on the sets of source pose parameters; wherein each frame of the output media content comprises, in at least one frame of the output media content, an image of a target face from captured image data, and the image of the target face is modified based on at least one of the sets of source pose parameters to mimic at least one of facial expressions in the frames of the source media content and the positions of the head of the source actor. A method including
  2. In paragraph 1, A method further comprising the step of providing a set of selectable graphic items corresponding to source media content having a source actor for display and face synthesis, wherein each selectable graphic item is associated with each source media content of a different source actor, and each source media content of the different source actor further comprises media content that is different from the media content of the different source actor associated with each selectable graphic item.
  3. In paragraph 2, A step of receiving a selection of a specific selectable graphic item from a set of selectable graphic items by the above-mentioned third-party application; and A method further comprising the step of modifying an image of the target face based on a set of source pose parameters to mimic at least one of facial expressions in frames of the source media content and the head positions of the source actor, by the messaging server system in response to the above selection.
  4. In paragraph 3, the modified target face image is based on generating a first set of face features based on a set of frames of the source media content using a first encoder network, wherein the generation generates a first set of image data with a lower dimensionality than the frames of the source media content, and the lower dimensionality includes a resolution lower than a specific resolution of the frames of the source media content.
  5. In paragraph 4, A step of generating a first set of output images or frames included in the output media content using a first decoder network associated with the target face, wherein the first set of output images or frames includes a first modification of the expressions of the head and face of the target actor based on a specific set of facial expressions of the source actor and facial features from the head of the source actor. A method that further includes.
  6. In paragraph 5, the first encoder network includes a first deep convolutional neural network, and the first decoder network includes a second deep convolutional neural network, A method in which the first deep convolutional neural network and the second deep convolutional neural network are different neural networks.
  7. In paragraph 3, the step of modifying the image of the target face A step of receiving a request from a third-party application to modify an image of the target face through an interface communicating with the messaging server system; In response to the above request, the messaging server system performs a set of operations to modify the image of the target face in each frame included in the output media content; and A method comprising the step of transmitting the augmented reality content, which includes a modified image of the target face in each frame of the output media content, to the third-party application via the interface by the messaging server system.
  8. In paragraph 4, the step of generating a first set of face features using the first encoder network A step of deploying a first machine learning model corresponding to the first encoder network by the messaging server system, wherein the first machine learning model is deployed to the messaging server system for execution in a machine learning execution environment provided by the messaging server system; and A method comprising the step of causing the first machine learning model to perform a set of operations to generate a first set of face features by the messaging server system.
  9. In claim 5, the step of generating the first set of output images or frames included in the output media content using the first decoder network associated with the target face A step of deploying a second machine learning model corresponding to the first encoder network by the messaging server system, wherein the second machine learning model is deployed to the messaging server system for execution in a machine learning execution environment provided by the messaging server system; and A method comprising the step of causing the second machine learning model to perform a set of operations for generating the first set of output images or frames included in the output media content by the messaging server system.
  10. In claim 1, the step of providing the augmented reality content based at least partially on the output media content for display on the computing device A step of transmitting the augmented reality content to the third-party application through the interface by the messaging server system; A step of receiving the augmented reality content by the above-mentioned third-party application; and A method comprising the step of rendering the augmented reality content to display it on the graphic interface of the third-party application by the third-party application.
  11. As a system, processor; and It includes memory containing instructions, and when the instructions are executed by the processor, the processor causes An operation provided by one or more hardware processors to display a first graphic interface in a third-party application—the first graphic interface includes a first selectable graphic item for a text entry, the third-party application is executed by a computing device distinct from a first-party application and a messaging server system associated with the first-party application, and the first-party application and the messaging server system correspond to the same entity—; An operation provided by one or more hardware processors in response to receiving the selection of the first selectable graphic item to display a second graphic interface in the third-party application - the second graphic interface includes at least a second selectable graphic item to initiate the creation of augmented reality content including face synthesis -; The operation of receiving a selection of the second selectable graphic item for initiating the generation of augmented reality content including face synthesis by the above one or more hardware processors - the selection of the second selectable graphic item is received by the third-party application -; The operation of generating sets of source pose parameters by the above-mentioned one or more hardware processors and based at least partially on frames of source media content—the sets of source pose parameters include the positions of expressions of the source actor's head in the frames of the source media content and the facial expressions of the source actor—; and An operation of generating output media content using an interface communicating with the messaging server system by the one or more hardware processors, at least partially based on the sets of source pose parameters above—each frame of the output media content includes, in at least one frame of the output media content, an image of a target face from captured image data, and the image of the target face is modified based on at least one of the sets of source pose parameters to mimic at least one of facial expressions in the frames of the source media content and the positions of the head of the source actor. A system that enables the performance of operations including
  12. In Clause 11, the above operations are A system further comprising the operation of providing a set of selectable graphic items corresponding to source media content having a source actor for display and face synthesis, wherein each selectable graphic item is associated with each source media content of a different source actor, and each source media content of the different source actor further comprises media content that is different from the media content of the different source actor associated with each selectable graphic item.
  13. In Clause 12, the above operations are The operation of receiving a selection of a specific selectable graphic item from a set of selectable graphic items by the above-mentioned third-party application; and A system further comprising, in response to the above selection, an operation of modifying an image of the target face based on a set of source pose parameters to mimic at least one of facial expressions in frames of the source media content and the head positions of the source actor by the messaging server system.
  14. In claim 13, the modified target face image is based on generating a first set of face features based on a set of frames of the source media content using a first encoder network, wherein the generation generates a first set of image data with a lower dimensionality than the frames of the source media content, and the lower dimensionality includes a resolution lower than a specific resolution of the frames of the source media content.
  15. In Clause 14, the above operations are An operation to generate a first set of output images or frames included in the output media content using a first decoder network associated with the target face, wherein the first set of output images or frames includes a first modification of the expressions of the head and face of the target actor based on the facial expressions of the source actor and a specific set of facial features from the head of the source actor. A system that further includes
  16. In paragraph 15, the first encoder network comprises a first deep convolutional neural network, and the first decoder network comprises a second deep convolutional neural network, A system in which the first deep convolutional neural network and the second deep convolutional neural network are different neural networks.
  17. In Clause 13, the operation of modifying the image of the target face is The operation of receiving a request from the third-party application to modify the image of the target face through an interface communicating with the messaging server system; In response to the above request, the messaging server system performs a set of operations for modifying the image of the target face in each frame included in the output media content; and A system comprising the operation of transmitting the augmented reality content, which includes a modified image of the target face in each frame of the output media content, to the third-party application via the interface by the messaging server system.
  18. In claim 14, the operation of generating the first set of face features using the first encoder network is The operation of deploying a first machine learning model corresponding to the first encoder network by the messaging server system - the first machine learning model is deployed to the messaging server system for execution in a machine learning execution environment provided by the messaging server system -; and A system comprising, by the messaging server system, an operation to cause the first machine learning model to perform a set of operations for generating a first set of face features.
  19. In paragraph 15, the operation of generating the first set of output images or frames included in the output media content using the first decoder network associated with the target face is The operation of deploying a second machine learning model corresponding to the first encoder network by the messaging server system - the second machine learning model is deployed to the messaging server system for execution in a machine learning execution environment provided by the messaging server system -; and A system comprising, by the messaging server system, an operation to cause the second machine learning model to perform a set of operations for generating the first set of output images or frames included in the output media content.
  20. A non-transient computer-readable medium comprising instructions, wherein, when executed by a computing device, the computing device causes An operation provided by one or more hardware processors to display a first graphic interface in a third-party application—the first graphic interface includes a first selectable graphic item for a text entry, the third-party application is executed by a computing device distinct from a first-party application and a messaging server system associated with the first-party application, and the first-party application and the messaging server system correspond to the same entity—; An operation provided by one or more hardware processors in response to receiving the selection of the first selectable graphic item to display a second graphic interface in the third-party application - the second graphic interface includes at least a second selectable graphic item to initiate the creation of augmented reality content including face synthesis -; The operation of receiving a selection of the second selectable graphic item for initiating the generation of augmented reality content including face synthesis by the above one or more hardware processors - the selection of the second selectable graphic item is received by the third-party application -; The operation of generating sets of source pose parameters by the above-mentioned one or more hardware processors and based at least partially on frames of source media content—the sets of source pose parameters include the positions of facial expressions of the source actor and expressions of the source actor's head in the frames of the source media content—; and An operation of generating output media content using an interface communicating with the messaging server system by the one or more hardware processors, at least partially based on the sets of source pose parameters above—each frame of the output media content includes, in at least one frame of the output media content, an image of a target face from captured image data, and the image of the target face is modified based on at least one of the sets of source pose parameters to mimic at least one of facial expressions in the frames of the source media content and the positions of the head of the source actor. A non-transient computer-readable medium that enables the performance of operations including

Description

Facial Synthesis in Augmented Reality Content for Third-Party Applications Claim of priority This application claims the benefit of priority to U.S. Provisional Application No. 63/200,882 filed March 31, 2021, the full text of which is incorporated herein by reference for all purposes. With the increasing use of digital images, the affordability of portable computing devices, the availability of digital storage media with increased capacity, and increased network connection bandwidth and accessibility, digital images have become a part of daily life for an increasing number of people. [Prior Art Literature] - U.S. Patent Application Publication US 2022/0319127 A1 - International Publication WO2020/234852 A1 To facilitate the identification of discussions regarding any specific element or function, the top digit or number of the reference number refers to the drawing number where the element is first introduced. FIG. 1 is a schematic representation of a network environment in which the present disclosure may be placed, according to some exemplary embodiments. FIG. 2 is a schematic representation of a messaging client application according to some exemplary embodiments. FIG. 3 is a schematic representation of a data structure as maintained in a database, according to some exemplary embodiments. FIG. 4 is a schematic representation of a message according to some exemplary embodiments. FIG. 5 is a schematic diagram illustrating the structure of message annotations as described in FIG. 4, which includes additional information corresponding to a given message according to some embodiments. FIG. 6 is a block diagram illustrating various modules of a messaging client application according to specific exemplary embodiments. FIG. 7 illustrates exemplary interfaces (e.g., graphical user interfaces) according to some embodiments. FIG. 8 illustrates exemplary interfaces (e.g., graphical user interfaces) according to some embodiments. FIG. 9 illustrates exemplary interfaces (e.g., graphical user interfaces) according to some embodiments. FIG. 10 is a flowchart illustrating a method according to a specific exemplary embodiment. FIG. 11 is a block diagram illustrating a software architecture in which the present disclosure can be implemented, according to some exemplary embodiments. FIG. 12 is a schematic representation of a machine in the form of a computer system in which a set of instructions can be executed to enable the machine to perform one or more of the methodologies discussed, according to some exemplary embodiments. Users with diverse interests from various locations capture digital images of various subjects and make these captured images available to others through networks such as the Internet. To enhance users' experiences with digital images and provide various features, it can be difficult and computationally intensive to enable computing devices to perform image processing operations on various objects and/or features captured under a wide range of varying conditions (e.g., changes in image scales, noise, lighting, motion, or geometric distortion). Augmented reality technology aims to bridge the gap between virtual and real-world environments by providing an enhanced real-world environment augmented by electronic information. As a result, the electronic information appears to be part of the real-world environment as perceived by the user. In one example, augmented reality technology further provides a user interface for interacting with electronic information overlaid on the enhanced real-world environment. As mentioned above, with the increasing use of digital images, the availability of portable computing devices, the availability of digital storage media with increased capacity, and increased network connection bandwidth and accessibility, digital images have become a part of daily life for an increasing number of people. Users of various interests from various locations capture digital images of various subjects and make these captured images available to others through networks such as the Internet. To enhance users' experiences with digital images and provide diverse features, it can be difficult and computationally intensive to enable computing devices to perform image processing operations on various objects and/or features captured under a wide range of varying conditions (e.g., changes in image scales, noise, lighting, motion, or geometric distortion). Messaging systems are frequently and increasingly utilized by users of mobile computing devices in various settings to provide various types of functionality in a convenient manner. As described herein, the messaging system includes actual applications that provide improvements in rendering augmented reality content generators (e.g., providing augmented reality experiences) on media content (e.g., images, videos, etc.) where specific augmented reality content generators can be activated, through an improved system that enables the provision of aug