US-12620189-B2 - Method, electronic device and computer program

US12620189B2US 12620189 B2US12620189 B2US 12620189B2US-12620189-B2

Abstract

A method for user command-guided editing of an initial textured 3D morphable model of an object comprising: obtaining the initial textured 3D morphable model of the object comprising an initial texture map and an initial 3D mesh model of the object; and determining an edited texture map of the object corresponding to the user command by editing the initial texture map of the object based on a first artificial neural network; and/or determining an edited 3D mesh model of the object corresponding to the user command by editing the initial 3D mesh model of the object based on a second artificial neural network; and generating an edited textured 3D morphable model of the object corresponding to the user command based on the edited texture map of the object and/or the edited 3D mesh model of the object.

Inventors

Lev Markhasin
Iheb BELGACEM
Shivangi ANEJA
Matthias Niessner
Angela DAI

Assignees

SONY SEMICONDUCTOR SOLUTIONS CORPORATION

Dates

Publication Date: 20260505
Application Date: 20231128
Priority Date: 20221202

Claims (16)

1 . A method for user command-guided editing of an initial textured 3D morphable model of an object comprising: obtaining the initial textured 3D morphable model of the object comprising an initial texture map of the object and an initial 3D mesh model of the object; and determining an edited texture map of the object corresponding to the user command by editing the initial texture map of the object based on a first artificial neural network; and/or determining an edited 3D mesh model of the object corresponding to the user command by editing the initial 3D mesh model of the object based on a second artificial neural network; and generating an edited textured 3D morphable model of the object corresponding to the user command based on the edited texture map of the object and/or the edited 3D mesh model of the object, wherein the edited texture map is determined by a third artificial neural network based on a sum of the initial texture latent code and an offset texture latent code, wherein the third artificial neural network is trained by adversarial self-supervised training on a plurality of two-dimensional RGB images using differentiable rendering.
2 . The method of claim 1 , further comprising: generating the initial texture map of the object and a corresponding initial texture latent code based on a third artificial neural network; and/or generating the initial 3D mesh model of the object based on an initial general appearance parameter of the 3D mesh model.
3 . The method of claim 2 , further comprising: determining the offset texture latent code based on the initial texture latent code by the first artificial neural network corresponding to the user command; and/or determining an offset general appearance parameter of the 3D mesh model based on the initial texture latent code by the second artificial neural network corresponding to the user command, and determining the edited 3D mesh model of the object based on the offset general appearance parameter.
4 . The method of claim 2 , wherein the object is a human face, and the initial 3D mesh model is a FLAME model and the initial general appearance parameter of the 3D mesh model is linear expression coefficients; and/or wherein the object is a human person or parts thereof, and the initial 3D mesh model is a SMPL-X model, and the initial general appearance parameter of the 3D mesh model is a jaw joint parameter, finger joints parameter, remaining body joints parameter, combined body, face, hands shape parameters and/or facial expression parameters.
5 . The method of claim 1 , wherein the first artificial neural network and/or the second artificial neural network are trained based on one or more texture maps and corresponding texture latent codes, wherein the texture maps and corresponding texture latent codes are generated based on a third artificial neural network.
6 . The method of claim 5 , wherein the third artificial neural network is trained by an adversarial self-supervised training.
7 . The method of claim 6 , wherein the third artificial neural network is trained based on a plurality of RGB images.
8 . The method of claim 1 , wherein the first artificial neural network and/or the second artificial neural network are trained with regards to a loss function which is based on a pre-trained vision-language model supervision and the user command.
9 . The method of claim 8 , wherein a difference measure is determined between the user command and a descriptive text, which is generated by the pre-trained vision-language model, of a visual context in a rendered image based on the edited textured 3D morphable model of the object.
10 . The method of claim 8 , wherein the first artificial neural network and/or the second artificial neural network are trained based on a plurality of different user commands.
11 . The method of claim 1 , further comprising: rendering the edited texture map of the object or the initial texture map of the object together with the edited 3D mesh model of the object or the initial 3D mesh model of the object to obtain an image corresponding to the user command.
12 . The method of claim 1 , further comprising transcribing the user command, which is a speech input in human language, into a text.
13 . The method of claim 1 , further comprising: obtaining a plurality of different initial 3D mesh models of the object; and determining a plurality of edited texture maps of the object corresponding to the plurality of the initial 3D mesh models of the object and the user command by jointly editing the initial texture map of the object a plurality of times based on the first artificial neural network and based on and the plurality of initial 3D mesh models; and generating a plurality of edited textured 3D morphable models of the object corresponding to the user command based on the edited texture maps of the object and on the plurality of initial 3D mesh models of the object; and rendering the plurality of edited texture maps of the object together with the plurality of initial 3D mesh models of the object to obtain a plurality of images corresponding to the user command.
14 . A non-transitory computer-readable storage medium storing computer-readable instructions thereon which, when executed by a computer, cause the computer to carry out the steps of claim 1 .
15 . An electronic device for performing a user command-guided editing of an initial textured 3D morphable model, comprising: processing circuitry configured to obtain the initial textured 3D morphable model of the object comprising an initial texture map of the object and an initial 3D mesh model of the object; and determine an edited texture map of the object corresponding to the user command by editing the initial texture map of the object based on a first artificial neural network; and/or determine an edited 3D mesh model of the object corresponding to the user command by editing the initial 3D mesh model of the object based on a second artificial neural network (E); and generate an edited textured 3D morphable model of the object corresponding to the user command based on the edited texture map of the object and/or the edited 3D mesh model of the object, wherein the edited texture map is determined by a third artificial neural network based on a sum of the initial texture latent code and an offset texture latent code, wherein the third artificial neural network is trained by adversarial self-supervised training on a plurality of two-dimensional RGB images using differentiable rendering.
16 . A method for user command-guided editing of an initial textured 3D morphable model of an object comprising: obtaining the initial textured 3D morphable model of the object comprising an initial texture map of the object and an initial 3D mesh model of the object; and determining an edited texture map of the object corresponding to the user command by editing the initial texture map of the object based on a first artificial neural network; and/or determining an edited 3D mesh model of the object corresponding to the user command by editing the initial 3D mesh model of the object based on a second artificial neural network; and generating an edited textured 3D morphable model of the object corresponding to the user command based on the edited texture map of the object and/or the edited 3D mesh model of the object, wherein the third artificial neural network is trained using both a full-image discriminator and a patch discriminator that evaluates texture patches to generate high-frequency texture details.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application is based upon and claims the benefit of priority of the prior European Patent Application No. 22211197.3, filed on Dec. 2, 2022, the entire contents of which are incorporated herein by reference. TECHNICAL FIELD The present disclosure generally pertains to the technical field of 3D image model generation, in particular to devices, methods and programs for a user command-guided editing of an initial textured 3D morphable model of an object. TECHNICAL BACKGROUND Modeling 3D content is central to many applications in our modern digital age, including asset creation for video games and films, as well as mixed reality and the like. In particular, modeling 3D human face avatars or full body avatars is an important element towards digital expression. Often for 3D modeling, 3D mesh models are used for many different applications, as they often rely on the classical graphics pipeline with existing editing and animation frameworks which may guarantee a high interface compatibility. Also, neural implicit 3D representations for digital humans and scenes are used for 3D modeling. Further, 3D morphable models may be used as an approach for modeling animatable avatars, for example with popular blendshape models used for human faces or bodies, which may offer a compact, parametric representation to model an object, while maintaining a mesh representation that fits standard graphics pipelines for editing and animation. Many content creation processes require extensive time from highly-skilled artists in creating compelling 3D object models (for example of face models or human bodies or other objects), especially if a creator might have a precise idea about how a 3D object, for example a 3D avatar or an animated sequence should look like. Therefore, it is generally desirable to improve a user commanded generation of 3D objects. SUMMARY According to a first aspect, the disclosure provides a method for user command-guided editing of an initial textured 3D morphable model of an object comprising: obtaining the initial textured 3D morphable model of the object comprising an initial texture map and an initial 3D mesh model of the object; and determining an edited texture map of the object corresponding to the user command by editing the initial texture map of the object based on a first artificial neural network; and/or determining an edited 3D mesh model of the object corresponding to the user command by editing the initial 3D mesh model of the object based on a second artificial neural network; and generating an edited textured 3D morphable model of the object corresponding to the user command based on the edited texture map of the object and/or the edited 3D mesh model of the object. According to a second aspect, the disclosure provides an electronic device comprising circuitry configured to perform a user command-guided editing of an initial textured 3D morphable model of an object comprising: obtaining the initial textured 3D morphable model of the object comprising an initial texture map and an initial 3D mesh model of the object; and determining an edited texture map of the object corresponding to the user command by editing the initial texture map of the object based on a first artificial neural network; and/or determining an edited 3D mesh model of the object corresponding to the user command by editing the initial 3D mesh model of the object based on a second artificial neural network; and generating an edited textured 3D morphable model of the object corresponding to the user command based on the edited texture map of the object and/or the edited 3D mesh model of the object. Further aspects are set forth in the dependent claims, the following description and the drawings. BRIEF DESCRIPTION OF THE DRAWINGS Embodiments are explained by way of example with respect to the accompanying drawings, in which: FIG. 1 shows a given 3D mesh with fixed topology, and an arbitrary generated face textures as texture maps, and renderings from multiple viewpoints which are based thereon (top row on the left); FIG. 2 schematically shows a texture generation; FIG. 3 schematically shows a text-guided synthesis of textured 3D face models; FIG. 4 schematically shows a texture manipulation for animation sequences; FIG. 5 shows a comparison of different texturing quality; FIG. 6 shows a qualitative comparison for texture manipulation; FIG. 7 shows texture manipulations; FIG. 8 shows expression manipulation that generates video sequences; FIG. 9 shows results for expression manipulation; FIG. 10 shows a flowchart of the generation of a desired avatar with regards to a user input; FIG. 11 shows a flowchart of different use cases for changing of an existing avatar with regards to a user input; and FIG. 12 schematically describes an embodiment of an electronic device which may implement the functionality of the method for user command-guided editing of an initial textured 3D morphable model o