CN-120495532-B - Three-dimensional face generation and expression editing method and system based on text picture driving

CN120495532BCN 120495532 BCN120495532 BCN 120495532BCN-120495532-B

Abstract

The invention belongs to the field of computer vision processing, and provides a three-dimensional face generation and expression editing method and system based on text picture driving, wherein key description information is extracted from source face text description, a control vector is generated based on the common mapping of the key description information and source face pictures, the source face pictures are enhanced by using the control vector, and the source face pictures after text enhancement are obtained; the method comprises the steps of generating a source face picture with an expression migration according to a target face picture and a source face picture, enhancing the source face picture with the expression migration by using a target face text description to obtain an expression source face picture with the enhanced text, and carrying out three-dimensional reconstruction on the expression source face picture with the enhanced text to obtain a source three-dimensional face model with the edited expression.

Inventors

CHI JING
XU MING
XU MINFENG
WANG HAOTIAN
HAN SIYI

Assignees

山东财经大学

Dates

Publication Date: 20260512
Application Date: 20250530

Claims (8)

1. The three-dimensional face generation and expression editing method based on text picture driving is characterized by comprising the following steps of: Extracting key description information from the source face text description, generating a control vector based on the key description information and the source face picture through joint mapping, and enhancing the source face picture by using the control vector to obtain the source face picture with enhanced text; the extracting the key description information from the source face text description, generating a control vector based on the key description information and the common mapping of the source face picture, and enhancing the source face picture by using the control vector comprises the following steps: Generating a random potential code based on random noise, inverting the source face picture, and then jointly encoding with the random potential code to generate a potential code; Encoding the source face text description to extract key description information to obtain text embedding; performing attention decoding based on text embedding and potential encoding to generate a control vector; inputting the control vector and the potential code into a generator to obtain a source face picture with enhanced text; Extracting face model parameters based on the source face picture after text reinforcement, generating a rough shape by taking the face model parameters as guidance, and carrying out detail reinforcement and map rendering on the rough shape to generate a final three-dimensional face model; Extracting a target expression according to a target face picture and migrating the target expression to a source face picture, generating a source face picture after expression migration, enhancing the source face picture after expression migration by using target face text description to obtain a text enhanced expression source face picture, wherein the process is the same as that of the previous text enhancement; extracting a target expression according to a target face picture and migrating the target expression to a source face picture to generate a source face picture with the migrated expression, wherein the method comprises the following steps: extracting a source identity of a source face picture by using an identity embedder; Extracting features of the source face picture and the target face picture to generate a facial feature matrix; and generating the source face picture after expression migration by using the identity feature condition denoising diffusion probability model based on the source identity, the noise target face picture of the target face picture at the time step and the time step.
2. The method for generating and editing three-dimensional face based on text picture driving according to claim 1, wherein the method for extracting face model parameters based on the text-enhanced source face picture uses the face model parameters as a guide to generate a rough shape, specifically comprises the following steps: coding the source face picture reinforced by the text by using a visual coder to obtain source face coding characteristics; Decoding the source face coding features to obtain face shape parameters, posture parameters, expression parameters, albedo parameters, illumination parameters and camera parameters; The facial shape parameters, the posture parameters, and the expression parameters are input into the facial model to generate a rough shape.
3. The text-picture-driven three-dimensional face generation and expression editing method according to claim 1, wherein the performing detail enhancement and map rendering on the rough shape generates a final three-dimensional face model, specifically: inputting albedo parameters, illumination parameters and camera parameters in the facial model parameters into a trained DECA decoder to obtain a displacement map and a model surface map; Preprocessing and then encoding based on the rough shape, and extracting global features and local features of the rough shape; decoding the global features and the local features to obtain fine shapes; And generating a normal map based on the displacement map, carrying out detail rendering on the normal map and the detailed shape, and then attaching the model surface map to a detail rendering result to obtain the final three-dimensional face model.
4. The method for generating and editing three-dimensional face based on text picture driving according to claim 3, wherein the generating a normal map based on displacement mapping, performing detail rendering on the normal map and a fine shape, and then attaching a model surface map to a detail rendering result to obtain a final three-dimensional face model, specifically: generating a normal map based on the displacement map; Splicing the product of the normal map and the displacement map with the fine shape to obtain the fine shape; Carrying out detail rendering on the fine shape by utilizing the illumination rendering model to obtain a detail rendering result; and pasting the model surface mapping on the detail rendering result to obtain the final three-dimensional face model.
5. A three-dimensional face generation and expression editing system based on text picture driving, characterized in that the three-dimensional face generation and expression editing method based on text picture driving as claimed in any one of claims 1-4 is adopted, comprising: The text strengthening module is configured to extract key description information from the text description of the source face, generate a control vector based on the key description information and the common mapping of the source face picture, and strengthen the source face picture by using the control vector to obtain the source face picture with the text strengthened; The three-dimensional face reconstruction module is configured to extract face model parameters based on the source face picture after text reinforcement, take the face model parameters as guidance to generate a rough shape, and perform detail enhancement and map rendering on the rough shape to generate a final three-dimensional face model; The expression editing module is configured to extract a target expression according to the target face picture and migrate the target expression to the source face picture, generate a source face picture after expression migration, enhance the source face picture after expression migration by using the target face text description, obtain the expression source face picture after text enhancement, and perform three-dimensional reconstruction on the expression source face picture after text enhancement, so as to obtain a source three-dimensional face model after expression editing.
6. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the text picture based driven three-dimensional face generation and expression editing method as claimed in any of claims 1-4.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps in the text picture based driven three-dimensional face generation and expression editing method of any of claims 1-4 when the program is executed.
8. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the text picture based driven three-dimensional face generation and expression editing method of any of claims 1-4.

Description

Three-dimensional face generation and expression editing method and system based on text picture driving Technical Field The invention belongs to the technical field of computer vision processing, and particularly relates to a three-dimensional face generation and expression editing method and system based on text picture driving. Background The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art. Along with the continuous development of technology, 3D face modeling technology is gradually applied to the fields of face recognition, virtual reality, game development, and the like. One common expression method for face models is to express faces under various different expressions as a static face (i.e., a face without expression) and a set of mixed shape (blendshape) bases. Each mixed shape base models the change of a face under a specific expression relative to a static face, including the change of geometry and appearance, and has specific semantics. Thus, these mixed shape bases can be linearly combined using mixed shape coefficients and applied to a static face to obtain a face under a certain expression. The three general methods of 3D face modeling are software modeling, instrument acquisition and image-based modeling, wherein the 3D face modeling method based on graphics also has some improvements, and the reduction degree of the images of the faces is improved by combining texts while using graphics. However, the three-dimensional face model generated by the existing method has low restoration degree to the real face picture, the face details are missing, and the problems of identity drift, poor expressive force on extreme expressions and the like are easily caused when the three-dimensional face model is subjected to expression editing. Disclosure of Invention In order to solve the problems, the invention provides a three-dimensional face generation and expression editing method and system based on text picture driving, according to the invention, the natural language description about the face details is combined with the real face picture input by the user, so that the three-dimensional face model and the expression which meet the user expectation and are real and natural can be generated. According to some embodiments, the first scheme of the invention provides a three-dimensional face generation and expression editing method based on text picture driving, which adopts the following technical scheme: a three-dimensional face generation and expression editing method based on text picture driving comprises the following steps: Extracting key description information from the source face text description, generating a control vector based on the key description information and the source face picture through joint mapping, and enhancing the source face picture by using the control vector to obtain the source face picture with enhanced text; Extracting face model parameters based on the source face picture after text reinforcement, generating a rough shape by taking the face model parameters as guidance, and carrying out detail reinforcement and map rendering on the rough shape to generate a final three-dimensional face model; Extracting a target expression according to the target face picture and migrating the target expression to the source face picture, generating a source face picture after expression migration, enhancing the source face picture after expression migration by using target face text description to obtain the expression source face picture after text enhancement, and carrying out three-dimensional reconstruction on the expression source face picture after text enhancement to obtain a source three-dimensional face model after expression editing. Further, the extracting key description information from the source face text description generates a control vector based on the common mapping of the key description information and the source face picture, and the enhancing of the source face picture by using the control vector specifically comprises: Generating a random potential code based on random noise, inverting the source face picture, and then jointly encoding with the random potential code to generate a potential code; Encoding the source face text description to extract key description information to obtain text embedding; performing attention decoding based on text embedding and potential encoding to generate a control vector; and inputting the control vector and the potential code into a generator to obtain the source face picture with enhanced text. Further, the extracting face model parameters based on the source face picture after text reinforcement takes the face model parameters as guidance to generate a rough shape, specifically comprises the following steps: coding the source face picture reinforced by the text by using a visual coder to obtain source face coding characteristics; D