KR-102964453-B1 - Method and apparatus for creating head avatar using shooted video
Abstract
The present invention relates to a method and apparatus for generating a head avatar using captured images, wherein a head mesh in a neutral state and a head texture map in a neutral state are generated based on at least one captured image representing a neutral face among a plurality of captured images generated from a base mesh of a base model and a camera captured image, and a head mesh reflecting a specific expression and a head texture reflecting a specific expression are generated based on at least one captured image representing a specific expression among a plurality of captured images, a head mesh in a neutral state, and a head texture map in a neutral state.
Inventors
- 강준석
- 안상철
Assignees
- 한국과학기술연구원
Dates
- Publication Date
- 20260513
- Application Date
- 20231012
Claims (18)
- A step of receiving a captured image obtained by capturing the face of a subject using a camera and a base model, which is a mesh model representing the shape of a human head; A step of generating a plurality of captured images representing a neutral face and a specific expression of the subject from the above-mentioned captured video; A step of generating a head mesh of the subject in a neutral state and a head texture map of the subject in a neutral state based on the base mesh of the input base model and at least one captured image representing a neutral face among the plurality of captured images generated above; and The method includes the step of generating a head mesh reflecting a specific expression and a head texture reflecting a specific expression based on at least one captured image representing a face with a specific expression among the plurality of captured images generated above, the generated head mesh in a neutral expression state, and the generated head texture map in a neutral expression state. The step of generating the head mesh of the subject in a neutral state and the head texture map of the subject in a neutral state is A step of performing learning on the expressionless face of the subject by optimizing the Neural Implicit Function (NIF) in a direction that minimizes the difference between a single captured image representing the expressionless face and a rendered image, which is a rendered image from the camera's shooting position regarding the subject; and A method for generating a head avatar characterized by including the step of restoring a head mesh in a neutral state of the subject using the optimized NIF above.
- delete
- In Article 1, The step of performing learning on the expressionless face of the subject being photographed optimizes the NIF and the head texture map in a direction that minimizes the difference between any one photographed image representing the expressionless face and the rendering image, and A method for generating a head avatar characterized by optimizing the head texture map to generate the optimized head texture map as a head texture map of the subject in an expressionless state.
- In Paragraph 3, The step of performing learning on the expressionless face of the subject being photographed is A step of obtaining a warp field for each vertex of the base mesh by inputting a plurality of vertex coordinates of the base mesh into the NIF; A step of generating a head mesh by applying a warp field for each vertex of the acquired base mesh to each vertex of the base mesh; A step of generating a rendered image, which is an image rendered at the camera's shooting position for the shooting target, by performing neural rendering on the generated head mesh and the head texture map; A step of calculating the image loss of the generated rendering image from the difference between the generated rendering image and any one captured image representing the expressionless face; and A method for generating a head avatar characterized by including the step of optimizing the NIF and the head texture map by backpropagating the image loss calculated above.
- In Article 4, The method further includes the step of generating a plurality of subbase meshes, each of which is a mesh representing each part of the head, by dividing the base mesh according to each part of the head. The step of performing learning on the expressionless face of the subject being photographed is The method further includes the step of calculating the normalization loss of the generated head mesh using the generated head mesh and the generated plurality of subbase meshes. A method for generating a head avatar, characterized in that the step of optimizing the NIF and the head texture map optimizes the NIF and the head texture map by backpropagating the calculated normalization loss and the calculated image loss.
- In Paragraph 3, The step of restoring the head mesh of the subject in a blank expression state is A step of obtaining a warp field for each vertex of the base mesh by inputting a plurality of vertex coordinates of the base mesh into the optimized NIF; and A method for generating a head avatar characterized by including the step of restoring a head mesh of the subject in an expressionless state by applying a warp field for each vertex of the acquired base mesh to each vertex of the base mesh.
- In Article 1, The step of generating a head mesh reflecting the specific expression and a head texture reflecting the specific expression is A step of restoring the blend shape of the hair mesh in a neutral state from the generated hair mesh in a neutral state, the base mesh, and the blend shape of the base mesh; A step of restoring a head mesh reflecting the specific expression from a blend shape of a single captured image representing a face with the specific expression and the generated head mesh in a neutral state; and A method for generating a head avatar characterized by including the step of restoring a head texture reflecting the specific expression from a single captured image representing a face with the specific expression and a generated head texture map in a neutral state.
- In Article 7, The step of generating the blend shape of the head mesh in the above expressionless state is A step of calculating a blend shape offset of a base mesh by subtracting the base mesh from the blend shape of the base mesh; and A method for generating a head avatar, characterized by including the step of generating a blend shape of a head mesh in a neutral state by adding the calculated blend shape offset to the head mesh in a neutral state generated above.
- In Article 7, The step of generating a head mesh reflecting the aforementioned specific facial expression A step of performing mesh learning for a specific facial expression of a subject by optimizing weights in a direction that minimizes the difference between a single captured image representing a specific facial expression and a rendered image, which is a rendered image obtained from the camera's shooting position regarding the subject; and A method for generating a head avatar characterized by including the step of restoring a head mesh reflecting a specific expression by applying the optimized weights to the blend shape of the generated head mesh in a neutral state.
- In Article 9, The step of performing mesh learning on a face with a specific facial expression of the above-mentioned shooting target is to A step of generating a head mesh by applying weights to the blend shape of the head mesh in a neutral state generated above; A step of generating a rendered image, which is an image rendered at a camera shooting position for the shooting target, by performing neural rendering on the generated head mesh and the generated head texture map in a neutral state; A step of calculating the image loss of the generated rendering image from the difference between the generated rendering image and any one captured image representing a face with a specific expression; and A method for generating a head avatar characterized by including a step of optimizing the weights by backpropagating the image loss calculated above.
- In Article 10, A method for generating a head avatar, characterized in that the step of restoring a head mesh reflecting a specific expression comprises multiplying each of the meshes of the blend shape of the generated head mesh in a neutral state by the optimized weight, and combining the meshes to which the weights have been multiplied to restore the head mesh reflecting the specific expression.
- In Article 7, The step of generating a head texture that reflects the aforementioned specific facial expression A step of performing texture learning for a specific facial expression of a subject by optimizing NIF in a direction that minimizes the difference between a single captured image representing a specific facial expression and a rendered image, which is an image rendered at the camera's shooting position for the subject; and A method for generating a head avatar characterized by including the step of restoring a head texture reflecting a specific expression from a head texture map of a neutral state generated using the optimized NIF.
- In Article 12, The step of performing texture learning for a specific facial expression of the above-mentioned shooting target is to A step of restoring the amount of change in the head texture map according to the change in the facial expression of the subject being photographed by inputting the texture coordinates and latent code of the head texture map in the neutral state generated above into the NIF; A step of generating a head texture by adding the amount of change of the restored head texture map to the generated head texture map in an expressionless state; A step of generating a rendered image, which is an image rendered at the camera's shooting position for the shooting target, by performing neural rendering on the head mesh reflecting the restored specific expression and the generated head texture; A step of calculating image loss of the generated rendering image from the difference between the generated rendering image and a captured image having a specific expression among the generated plurality of captured images; and A method for generating a head avatar characterized by including the step of optimizing the NIF and the latent code by backpropagating the image loss calculated above.
- In Article 13, The step of restoring the head mesh reflecting the aforementioned specific facial expression A step of restoring the amount of change in the head texture map according to the change in the facial expression of the subject being photographed by inputting the texture coordinates of the generated head texture map in a neutral state and the optimized latent code into the optimized NIF; and A method for generating a head avatar characterized by including the step of restoring a head texture that reflects a specific expression by adding a change amount of the head texture map to the head texture map of the above-generated expressionless state.
- In Article 1, A step of modifying at least one of the generated head mesh in an expressionless state and the generated head texture map in an expressionless state according to information input by a user; and A method for generating a head avatar, further comprising the step of regenerating a head mesh reflecting the specific expression and a head texture reflecting the specific expression based on at least one captured image representing a face with the specific expression, the modified head mesh in a neutral state, and the modified head texture map in a neutral state.
- In Article 1, A method for generating a head avatar, characterized by further including the step of modifying a head mesh reflecting a specific expression generated above and a head texture map reflecting a specific expression generated above according to information input by a user.
- A computer-readable recording medium storing a program for executing the method of claim 1 on a computer.
- A data input unit that receives a captured image obtained by capturing the face of a target using a camera and a base model, which is a mesh model representing the shape of a human head; A preprocessing unit that generates a plurality of captured images representing a neutral face and a specific expression of the subject from the input captured video; A first head avatar generation unit that generates a head mesh of the subject in an expressionless state and a head texture map of the subject in an expressionless state based on at least one captured image representing an expressionless face among the plurality of captured images generated above and a base mesh of the input base model; and It includes a second head avatar generation unit that generates a head mesh reflecting a specific expression and a head texture reflecting a specific expression based on at least one captured image representing a face with a specific expression among the plurality of captured images generated above, the generated head mesh in a neutral expression state, and the generated head texture map in a neutral expression state. The head avatar generation device is characterized by the first head avatar generation unit performing learning on the expressionless face of the subject by optimizing the Neural Implicit Function (NIF) in a direction that minimizes the difference between a single captured image representing the expressionless face and a rendered image, which is a rendered image at the camera's shooting position for the subject, and restoring the head mesh of the subject in an expressionless state using the optimized NIF.
Description
Method and apparatus for creating head avatar using shooted video This is about a method and device for generating head avatars. Recently, with the advancement of virtual reality technology and the increasing demand for non-face-to-face services due to COVID-19, the need for personal avatars that can represent an individual has significantly increased across various fields. Consequently, the demand for avatar creation technology has also increased significantly. However, conventional avatar creation technology requires production experts or complex systems. Furthermore, there have been problems such as significantly lower graphic precision in avatars created by conventional avatar creation technology, or difficulty in modifying avatars created by such technology. FIG. 1 is a configuration diagram of a head avatar generation device according to one embodiment of the present invention. FIG. 2 is a flowchart of a method for generating a head avatar according to one embodiment of the present invention. Figure 3 is a detailed flowchart of step 401 shown in Figure 2. Figure 4 is a detailed flowchart of step 113 shown in Figure 3. Figure 5 is a detailed flowchart of the 115 steps illustrated in Figure 3. Figure 6 is a detailed flowchart of step 402 shown in Figure 2. Figure 7 is a detailed flowchart of the 22 steps illustrated in Figure 6. Figure 8 is a detailed flowchart of the 23 steps illustrated in Figure 6. Embodiments of the present invention will be described in detail below with reference to the drawings. Embodiments of the present invention relate to a method and apparatus for generating a head avatar that can easily produce a head avatar that precisely represents an individual's face simply by rotating and photographing the face using a camera such as a smartphone. Hereinafter, such a method and apparatus will be briefly referred to as the "head avatar generation method" and the "head avatar generation apparatus." FIG. 1 is a configuration diagram of a head avatar generation device according to an embodiment of the present invention. Referring to FIG. 1, the head avatar generation device according to the present embodiment comprises a data input unit (10), a preprocessing unit (20), a mesh splitting unit (30), a head avatar generation unit (40), a data output unit (50), and a user interface (60). The data input unit (10), the preprocessing unit (20), the mesh splitting unit (30), the head avatar generation unit (40), and the data output unit (50) can be implemented as a combination of a processor and memory. The user interface (60) can be implemented as a combination of a display panel, a touchscreen panel, a keyboard, and a mouse. FIG. 2 is a flowchart of a method for generating a head avatar according to an embodiment of the present invention. Referring to FIG. 2, the method for generating a head avatar according to the present embodiment consists of the following steps performed by the head avatar generating device illustrated in FIG. 1. In step 100, the data input unit (10) receives a captured image obtained by taking a picture of a person's face with various expressions corresponding to the subject of the shoot using a camera such as a smartphone, and a base model which is a mesh model representing the shape of the person's head. The captured image input to the data input unit (10) can be obtained by taking a picture of the subject's neutral face from various angles using a camera, and taking a picture of the subject's face with a specific expression from various angles. The base model consists of a base mesh representing a neutral face and a blend shape of the base mesh. The blend shape of the base mesh consists of meshes representing faces with various expressions that are modified from the mesh representing the neutral face. The base model can be obtained by downloading it from the internet or by creating it directly. In step 200, the preprocessing unit (20) extracts a plurality of captured images from the captured image input to the data input unit (10) in step 100, and generates a plurality of captured images with the background removed by removing the background from each of the plurality of captured images thus extracted. The preprocessing unit (20) generates information on face feature points for each of the plurality of captured images representing a neutral face among the plurality of captured images thus generated, and generates camera parameter information representing the shooting position of the camera for the shooting target. Multiple captured images represent the neutral and specific expressions of the subject's face. For example, some captured images may be images of a neutral face, while others may be images of a smiling or crying face. To create a neutral head avatar that represents the overall shape and texture of a 3D head centered on the face, multiple captured images representing the overall shape and texture of the subject's head are required. Since a specific expression head