EP-4315281-B1 - TRUE SIZE EYEWEAR IN REAL TIME

EP4315281B1EP 4315281 B1EP4315281 B1EP 4315281B1EP-4315281-B1

Inventors

ASSOULINE, Avihay
BERGER, ITAMAR
LUO, JEAN
ZOHAR, MATAN

Dates

Publication Date: 20260513
Application Date: 20220321

Claims (15)

A method comprising: receiving (1001), by one or more processors, an image that includes a depiction of a face of a user; generating (1002) a plurality of landmarks of the face based on the received image; removing (1003) a set of predetermined landmarks from the plurality of landmarks resulting in a remaining set of landmarks of the plurality of landmarks; tracking movement of the remaining set of landmarks over a plurality of frames comprising the image; for each landmark of the remaining set of landmarks: determining a stability parameter as a function of the movement of the landmark over the plurality of frames; and in dependence on the landmark having a stability parameter that is lower than a threshold value, removing that landmark from the remaining set of landmarks; obtaining (1004) a depth map for the face of the user, the depth map being associated with the received image; and computing (1005) facial measurements of the face of the user based on the depth map and the remaining set of landmarks, wherein the computing (1005) involves determining, based on the depth map, a distance from a client device associated with the image to one or more of the landmarks.
The method of claim 1, further comprising: sorting the remaining set of landmarks based on a visibility parameter of each parameter, the visibility parameter indicating an amount of a corresponding landmark that is visible in the image.
The method of any one of claims 1-2, further comprising: obtaining a generic three-dimensional facial model representation; matching the remaining set of landmarks to the generic three-dimensional facial model representation; and determining the visibility parameter as a function of a number of the remaining set of landmarks that match the generic three-dimensional facial model representation.
The method of any one of claims 1-3, wherein a number of frames included in the plurality of frames is based on a frame rate used to capture the plurality of frames.
The method of any one of claims 1-4, wherein the set of predetermined landmarks comprises at least one of a hair region, one or more facial garments, or a face mask.
The method of any one of claims 1-5, wherein generating the plurality of landmarks of the face based on the received image comprise applying a machine learning technique to the image to identify the plurality of landmarks.
The method of any one of claims 1-6, further comprising: selecting a threshold number of top landmarks from the remaining set of landmarks based on visibility or stability parameters associated with each of the remaining set of landmarks, wherein the facial measurements of the face of the user are computed based on the depth map and the selected threshold of top landmarks; preferably, wherein the threshold number comprises two top landmarks.
The method of claim 7, further comprising: selecting eyes and nose landmarks as the top landmarks in response to determining that the eyes and nose landmarks are associated with greater visibility and stability parameters than other landmarks in the remaining set of landmarks.
The method of any one of claims 7-8, wherein the top landmarks are selected at random from the remaining set of landmarks; preferably, wherein a first set of top landmarks are selected for a first subset of frames of a video comprising the image, and wherein a second set of top landmarks are selected for a second subset of frames of the video.
The method of any one of claims 1-9, further comprising: updating the plurality of landmarks as each frame of a video depicting the face is received; and iteratively correcting the facial measurements based on the updated plurality of landmarks by repeating the removing and obtaining operations for the updated plurality of landmarks.
The method of any one of claims 1-10, further comprising: obtaining an augmented reality graphical element comprising augmented reality eyewear; identifying a nose bridge landmark based on the remaining set of landmarks; and positioning the augmented reality graphical element within the image on the face of the user based on the nose bridge landmark and the depth map; preferably, further comprising: positioning a nose bridge portion of the augmented reality graphical element a predetermined distance above the nose bridge landmark.
The method of claim 11, further comprising: adjusting a scale of the augmented reality graphical element based on the computed facial measurements of the face of the user.
The method of claim 12, further comprising: computing a distance between the remaining set of landmarks; retrieving a measure of depth for the remaining set of landmarks; and generating a scaling factor based on the distance and the measured depth that relates a size of the face of the user in the image to a real-world size of the face of the user, wherein a size of the augmented reality graphical element is modified as a function of the scaling factor.
A system comprising: a processor (1102); and a memory component (1104) having instructions stored thereon that, when executed by the processor (1102), cause the processor (1102) to perform operations comprising: receiving (1001) an image that includes a depiction of a face of a user; generating (1002) a plurality of landmarks of the face based on the received image; removing (1003) a set of predetermined landmarks from the plurality of landmarks resulting in a remaining set of landmarks of the plurality of landmarks; tracking movement of the remaining set of landmarks over a plurality of frames comprising the image; for each landmark of the remaining set of landmarks: determining a stability parameter as a function of the movement of the landmark over the plurality of frames; and in dependence on the landmark having a stability parameter that is lower than a threshold value, removing that landmark from the remaining set of landmarks; obtaining (1004) a depth map for the face of the user, the depth map being associated with the received image; and computing (1005) facial measurements of the face of the user based on the depth map and the remaining set of landmarks, wherein the computing (1005) involves determining a distance from the client device to one or more of the landmarks based on the depth map.
A non-transitory computer-readable storage medium having stored thereon, instructions that when executed by a processor, cause the processor to perform operations comprising: receiving (1001) an image that includes a depiction of a face of a user; generating (1002) a plurality of landmarks of the face based on the received image; removing (1003) a set of predetermined landmarks from the plurality of landmarks resulting in a remaining set of landmarks of the plurality of landmarks; tracking movement of the remaining set of landmarks over a plurality of frames comprising the image; for each landmark of the remaining set of landmarks: determining a stability parameter as a function of the movement of the landmark over the plurality of frames; and in dependence on the landmark having a stability parameter that is lower than a threshold value, removing that landmark from the remaining set of landmarks; obtaining (1004) a depth map for the face of the user, the depth map being associated with the received image; and computing facial measurements of the face of the user based on the depth map and the remaining set of landmarks, wherein the computing (1005) involves determining a distance from the client device to one or more of the landmarks based on the depth map.

Description

CLAIM OF PRIORITY This application claims the benefit of priority to U.S. Patent Application Serial No. 17/208,159, filed on March 22, 2021. TECHNICAL FIELD The present disclosure relates generally to providing augmented reality experiences using a messaging application. BACKGROUND Augmented-Reality (AR) is a modification of a virtual environment. For example, in Virtual Reality (VR), a user is completely immersed in a virtual world, whereas in AR, the user is immersed in a world where virtual objects are combined or superimposed on the real world. An AR system aims to generate and present virtual objects that interact realistically with a real-world environment and with each other. Examples of AR applications can include single or multiple player video games, instant messaging systems, and the like. WO 2019/056579 A1 describes a method performed by an electronic device. The method includes receiving an image. The image depicts a face. The method also includes detecting at least one facial landmark of the face in the image. US 2020/219326 A1 describes systems, methods, and machine-readable media for virtual try-on of items such as spectacles. A virtual try-on interface may be implemented at a server or at a user device, and may use collision detection between three-dimensional models of the spectacles and of a user's face and head to determine the correct size and position of the spectacles for virtual try-on. Hassner Tal et al. "Effective face frontalization in unconstrained images" describes a method for face frontalization, the method involving using a single, unmodified, 3D surface as an approximation to the shape of all input faces. BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the figure number in which that element is first introduced. Some nonlimiting examples are illustrated in the figures of the accompanying drawings in which: FIG. 1 is a diagrammatic representation of a networked environment in which the present disclosure may be deployed, in accordance with some examples.FIG. 2 is a diagrammatic representation of a messaging client application, in accordance with some examples.FIG. 3 is a diagrammatic representation of a data structure as maintained in a database, in accordance with some examples.FIG. 4 is a diagrammatic representation of a message, in accordance with some examples.FIG. 5 is a block diagram showing an example true size estimation system, according to example examples.FIG. 5 is a block diagram showing an example true size estimation system, according to example examples.FIGS. 6-9 are diagrammatic representations of outputs of the true size estimation system, in accordance with some examples.FIGS. 10A and 10B are flowcharts illustrating example operations of the messaging application server, according to examples.FIG. 11 is a diagrammatic representation of a machine in the form of a computer system within which a set of instructions may be executed for causing the machine to perform any one or more of the methodologies discussed herein, in accordance with some examples.FIG. 12 is a block diagram showing a software architecture within which examples may be implemented. DETAILED DESCRIPTION The invention is defined by the appended claims. The description that follows includes systems, methods, techniques, instruction sequences, and computing machine program products that embody illustrative examples of the disclosure. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide an understanding of various examples. It will be evident, however, to those skilled in the art, that examples may be practiced without these specific details. In general, well-known instruction instances, protocols, structures, and techniques are not necessarily shown in detail. Typically, virtual reality (VR) and augmented reality (AR) systems allow users to add augmented reality elements, such as augmented reality glasses, to a face of the user depicted in a captured image. To do so, the typical VR/AR systems use specialized techniques that require calibration to determine a scale of the user's face in the image. For example, such systems instruct the user to place a reference object, such as a credit card, on the user's face or next to the user's face so that a scale of the face can be computed. The systems can then display augmented reality glasses on the user's face based on the calibration. While such systems generally work well, the need to calibrate the systems places an additional burden on the users and takes away from the enjoyment of the experience. Also, computing the scale by calibrating the system takes additional time and resources, making such systems less efficient