CN-121982224-A - Dynamic face reconstruction method and device based on partial differential equation

CN121982224ACN 121982224 ACN121982224 ACN 121982224ACN-121982224-A

Abstract

The invention discloses a dynamic face reconstruction method and device based on partial differential equation. The method comprises the steps of taking a template grid as a reference in a unified two-dimensional parameter domain, differentiating three-dimensional vertex coordinates of each frame grid in a dynamic face sequence with the template coordinate to construct a three-dimensional displacement vector field in the parameter domain, compactly representing three-dimensional displacement by adopting a two-dimensional space mode under the condition of meeting boundary constraint through introducing a sheet vibration equation model, adaptively constructing a weighted least square objective function based on the principal curvature of the template grid vertex, stably solving frame-by-frame mode coefficients, parameterizing and modeling the time evolution of the model coefficients, recovering the three-dimensional displacement field at a decoding end according to the space-time mode parameters, and reconstructing the dynamic face grid sequence. The invention obviously reduces the storage and transmission cost of the dynamic face data while ensuring the reconstruction precision of the high curvature area, and has good stability, compactness and practicality.

Inventors

WANG YIHAN
JIN YAO
ZHANG HUAXIONG

Assignees

浙江理工大学

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (10)

1. The dynamic face reconstruction method based on the partial differential equation is characterized by comprising the following steps of: (1) Acquiring grid data of each frame of a face sequence, wherein the topological connection relation of grids of each frame is consistent and has boundaries, and each frame shares the same face patch connection and vertex indexes are in one-to-one correspondence; (2) Taking the 1 st frame grid in the step (1) as a template grid to carry out parameterization treatment to obtain two-dimensional parameter coordinates corresponding to each vertex of the template grid, and carrying out normalization treatment to the parameter coordinates to construct a unified UV parameter domain; (3) In a unified UV parameter domain, constructing a parameterized expression function of a dynamic face based on an analytic solution of a two-dimensional sheet vibration equation, wherein the parameterized expression function is used for representing the change of a parameter domain displacement scalar field along with time, and the parameters comprise modal parameters corresponding to each spatial mode and time parameters describing the time change of each spatial mode; (4) Constructing a two-dimensional cosine space modal basis function family under the constraint of Neumann boundary conditions in a unified UV parameter domain, expressing a parameter domain displacement scalar field as a modal superposition form of finite term truncation, and constructing an objective function model comprising a data fitting term and a smooth term; (5) And (3) solving the minimum value of the objective function in the step (4) to obtain the modal parameters and time parameters corresponding to each spatial modal, storing and transmitting the coefficients and the truncated order as compression parameters, recovering a parameter domain displacement scalar field according to the compression parameters, and writing back to the three-dimensional grid vertexes to obtain the reconstructed face grid sequence.
2. The method of claim 1, wherein step (2) comprises extracting a sequence of boundary ring vertices for the template mesh, mapping the boundary ring to a closed boundary on a two-dimensional parameter plane with the boundary ring as a parameterized boundary constraint, solving a harmonic mapping equation under the boundary constraint to obtain initial UV coordinates, and performing iterative optimization with the initial UV coordinates as initial values to obtain a set of UV coordinates.
3. The method according to claim 1, wherein in the step (2), the UV calculation method is implemented using a local low deformation distortion anti-roll map, in particular using a SLIM parameterization method, and using symmetric Dirichlet energy as a parameterized deformation distortion measure, the local homography of the map is constrained to suppress triangle roll-over and reduce UV-roll-out distortion.
4. The method according to claim 1, wherein in the step (3), the specific process of constructing the parameterized expression function of the dynamic face is that the dynamic deformation of the face surface is expressed as a space-time variation process constrained by sheet dynamics, a two-dimensional sheet vibration equation is introduced as a physical priori of a displacement field, and a solution of the equation is taken as the expression function of the dynamic face.
5. The method of claim 1, wherein in the step (3), the two-dimensional sheet vibration equation is used to make the displacement field satisfy both the time continuity and the space smoothness constraint by introducing the quality term in the time dimension and introducing the double-tone and the bending term in the space dimension, so as to suppress jitter and noise amplification caused by the frame-by-frame independent reconstruction and enhance the cross-frame consistency and the interpretability.
6. The method of claim 1, wherein in step (3), in the unified UV parameter domain, a solution is performed to a sheet vibration equation satisfying homogeneous Neumann boundary conditions, the displacement field solution Expressed as a finite term superposition of the mode expansion form, in particular two-dimensional cosine eigenbasis functions, stretched by orthogonal basis functions satisfying the boundary conditions, to obtain the coordinates of the parameters Time of day Is expressed as a function of the displacement of (c).
7. The method of claim 1, wherein in step (4), in the unified UV parameter domain, using a weighted least squares method based on vertex weights with three-dimensional displacement observations defined at UV sampling points as inputs, the displacements are respectively determined in The components in the three coordinate directions solve corresponding modal coefficients, wherein the vertex weights are adaptively determined based on the principal curvatures of the vertices of the template grid; fitting the discrete coefficient sequences of each spatial mode in three coordinate directions to obtain corresponding space-time mode parameters.
8. The method according to claim 7, wherein in the step (4), the space-time modal parameter fitting is achieved by solving an objective function, the objective function is composed of a data fitting term and a smooth term, the data fitting term adopts a mean square error and is used for measuring a fitting error between a value and an observed value of a face expression function at each moment, and the smooth term is constrained to be a Laplacian sub-square of the function so as to enable the Laplacian sub-square to conform to a smooth priori of dynamic deformation of the face.
9. The method of claim 1, wherein in the step (5), the optimization solution of the objective function is a nonlinear least square solution, an LM algorithm is adopted for iterative optimization, space-time modal parameters are obtained by solving, the space-time modal parameters are stored or transmitted as compression parameters, and three-dimensional displacement is recovered at a decoding end according to the compression parameters and written back to three-dimensional grid vertices, so that a reconstructed face grid sequence is obtained.
10. A partial differential equation based dynamic face reconstruction device comprising a memory and one or more processors, the memory having executable code stored therein, wherein the processor, when executing the executable code, implements a partial differential equation based dynamic face reconstruction method as claimed in any one of claims 1-9.

Description

Dynamic face reconstruction method and device based on partial differential equation Technical Field The invention belongs to the technical field of computer graphics and image processing, and particularly relates to a dynamic face reconstruction method and device based on partial differential equation. Background Along with the rapid development of computer vision and three-dimensional face analysis technology, dynamic face reconstruction has been widely applied in the fields of virtual reality, expression capture, digital human, medical image analysis and the like. The goal of dynamic reconstruction is to recover the continuous geometric changes of the face between different frames in the form of a time series. However, facial expression changes contain a large amount of high-dimensional spatiotemporal information, especially in high-resolution triangular mesh or quadrilateral mesh scenarios, where the amount of single-frame data is huge, and where the accumulation of multi-frame sequences increases exponentially. Therefore, how to effectively compress the face time sequence data becomes a core problem in the process of real-time reconstruction and transmission. In the prior art, on the one hand, dynamic face reconstruction schemes based on multi-view geometry have been widely used. For example, in a professional motion capture shed, multiple views of the actor's face are captured by tens of synchronous cameras, and then a high-density three-dimensional grid sequence is obtained by combining stereo matching, multi-view stereo reconstruction and other methods. Such systems can achieve high reconstruction accuracy in a controlled environment, but each frame of grid typically contains tens of thousands to hundreds of thousands of vertices, with only a few minutes of performance producing up to several GB of data. In order to facilitate the later editing and rendering, the engineering practice is often directly stored and managed in the form of a frame-by-frame grid sequence, and a systematic compression and modeling mechanism is lacked, so that the time sequence redundancy is difficult to use, and the storage and transmission burden is heavy. On the other hand, the dynamic face reconstruction method based on the three-dimensional deformable template and the statistical shape model is also widely researched and applied. For example, based on a three-dimensional deformable face model (3D Morphable Model, 3 DMM) or a statistical basis model trained based on large-scale face data, the personalized face shape and expression parameters are restored by fitting single or multiple frame images. The method realizes a certain degree of data dimension reduction from the parameter level, and is practically adopted in expression driving and animation production. However, the statistical model usually acquires a shape base through linear principal component analysis, the expression capacity of the shape base is limited by training data distribution and linear assumption, high-frequency details such as severe local deformation, wrinkles and folds in real expressions are not sufficiently depicted, a large number of methods focus on static frames or fit frame by frame independently, explicit modeling is lacking in time dimension, time sequence correlation cannot be restrained and utilized from a dynamics angle, and the compression efficiency and reconstruction stability of a dynamic sequence are still limited. In recent years, more and more efforts have been made to introduce deep learning to promote the robustness and degree of automation of dynamic face reconstruction. For example, a convolutional neural network or a graph neural network is used to directly predict the three-dimensional face grid or displacement field of each frame from the monocular video for driving the avatar or real-time expression capture. In a typical mobile terminal application, the front-facing camera captures video at a high frame rate, and the reconstruction module needs to continuously output a corresponding three-dimensional human face shape sequence. Although such methods can achieve near real-time effects on common devices, network output is still typically high-dimensional vertex coordinates or dense displacement vectors, lacks interpretable physical structure, strong correlation between time series data is not explicitly modeled and used for compression, and jitter and noise are also easily generated under complex conditions such as low light, occlusion or rapid motion, making further data compression difficult. In addition, there are also studies on attempts to constrain the deformation of the face surface by means of a physical model, for example, regarding the face as an elastic thin plate or an elastic body, introducing elastic potential energy or thin plate energy into the displacement field of the face to regularize, and using the face as smoothing processing after static expression reconstruction or interpolation calculation between k