CN-122023642-A - Multi-prior non-action constraint three-dimensional dynamic human body reconstruction method based on three-dimensional Gaussian sputtering under sparse view industrial camera input
Abstract
The invention relates to the technical field of three-dimensional dynamic human body reconstruction, in particular to a multi-prior non-action constraint three-dimensional dynamic human body reconstruction method, a system and a computer readable storage medium based on three-dimensional Gaussian sputtering under the input of a sparse view industrial camera. The method comprises the steps of firstly obtaining model input data, then calculating the proportion of each layer of point cloud covering a human body region by utilizing a rasterization technology based on normal vector information of a human body parameterized model vertex, stopping amplification when the coverage proportion reaches a preset threshold value, sampling a speed characteristic vector of the parameterized model vertex by a K nearest neighbor algorithm, predicting linear skin coefficient offset of Gaussian points, normalizing to obtain a dynamic deformation mapping relation, constructing a time-varying visual angle-independent dynamic color prediction model, and generating a new visual angle image by adopting a multi-loss function joint optimization strategy through a micro rasterization technology. According to the invention, under the input condition of the sparse view industrial camera, high-precision dynamic human body three-dimensional reconstruction without specific action constraint is realized.
Inventors
- LIU YONGJIN
- LV TIAN
- ZHOU YU
Assignees
- 清华大学
Dates
- Publication Date
- 20260512
- Application Date
- 20251225
Claims (10)
- 1. The multi-prior motion-free constraint three-dimensional dynamic human body reconstruction method based on three-dimensional Gaussian sputtering is characterized by comprising the following steps of: s1, acquiring video information in real time by using a sparse view industrial camera, acquiring an internal reference and external reference set of the sparse view industrial camera and an RGB image sequence of a corresponding frame number, and combining a human body parameterized model parameter sequence and a human body foreground mask image sequence as input data; S2, based on normal vector information of vertexes of the human body parameterized model, iteratively amplifying the vertexes along the normal direction to generate a multi-layer structure prior point cloud, calculating the proportion of each layer of point cloud covering the human body area by using a rasterization technology, and stopping amplification when the coverage proportion reaches a preset threshold value; S3, sampling a speed feature vector of a parameterized model vertex by a K nearest neighbor algorithm, taking the vertex as a Gaussian point, inputting the Gaussian point into a neural network by combining a time parameter, a Gaussian point space position and a human body action parameter, predicting a linear skin coefficient offset of the Gaussian point, and normalizing to obtain a dynamic deformation mapping relation; s4, constructing a time-varying visual angle-independent dynamic color prediction model, inputting Gaussian point space position codes, spherical harmonic coefficient characteristics, speed characteristic vectors and time parameters into a neural network, and predicting RGB color values of Gaussian points at the current moment; S5, generating a new view angle image by adopting a multi-loss function joint optimization strategy through a micro-rasterization technology, and performing model training based on L1 loss, structural similarity loss, depth perception characteristic loss, mask loss and Gaussian shape constraint loss of a predicted image and a truth image.
- 2. The method of claim 1, wherein S1 comprises: s11, defining an internal reference and an external reference set of the sparse view camera as , wherein, As the internal parameters of the camera, In order to rotate the matrix is rotated, For translation vector, the extrinsic matrix of the camera includes a rotation matrix Translation vector ; S12, the camera with each view angle is set to be a set of RGB images with N multiplied by T sparse views Wherein the resolution of the image is unified as T is the number of frames acquired by the camera and corresponds to the length of the human motion sequence.
- 3. The method of claim 1, wherein S2 further comprises: S21, selecting a video initial frame as a reference frame, and calculating the vertexes of the parameterized human surface by using human motion parameters and human morphological parameters in the human parameterized model through a linear mixed skin algorithm to obtain a human grid model in world space; s22, setting the vertex determined by the initial human body parameterized model as The normal vector of all vertices is The preset step length is Amplified first The layer vertices are expressed as: ; First, the The number of the layer vertexes is consistent with that of the initial vertexes, the topological relation of each vertex is determined through a human body parameterized model, a grid model applicable to the current layer is obtained, and a region covered by the grid model in an image space is obtained through rasterization operation; S23, comparing the area covered by the grid model in the image space with the human mask, calculating the ratio of the area of the grid model to the area of the human mask, if the ratio is larger than a set threshold value, indicating that the current point cloud basically covers the human and clothing areas, stopping amplification, and recording the current amplification layer number as Otherwise, continuing iteration until the ratio reaches a set threshold, and finally marking the vertex for initialization as: 。
- 4. A method according to claim 3, wherein the method of computing vertices of the parameterized human surface is: Let the current vertex be The vertex is covered by All planes are shared, take the plane Three vertices included By: Calculating normal vector information of the plane contribution, wherein To correct the normalization function, adding the normal vector information of each plane and performing normalization processing to obtain a vertex normal vector 。
- 5. The method of claim 1, wherein S3 further comprises: S31, processing the input video information by using a human body parameterization model to obtain a human body parameterization model sequence, and using a linear hybrid skin algorithm to obtain space coordinates of human body surface Gaussian points at T moments in world space for the human body parameterization model sequence, wherein the space coordinates are recorded as ; S32, calculating the speed of a Gaussian point on the surface of the human body at t, wherein the calculation formula is as follows: Wherein, the Is a frame interval; s33, converting the speed of the vertex of the parameterized model into the speed characteristic of a Gaussian point by adopting a K nearest neighbor algorithm; S34, constructing a Gaussian point linear mixed skin offset prediction network, wherein each layer of the prediction network uses a one-dimensional convolution+ReLU activation function as a component unit, inputting parameters including time, gaussian point space positions, body joints in human parameters and Gaussian point speed feature vectors, and outputting the number of joints of which the dimension is the number of joints using a human body parameterization model; S35, setting the linear mixed skin coefficient sampling of the current Gaussian point as the coefficient of the nearest parameterized vertex.
- 6. The method of claim 5, wherein step S33 further comprises, Consider a gaussian point from the current moment when k=3 Speed of the nearest 3 parameterized model vertices And the distance of the gaussian point to these three points The six vectors are spliced to obtain the velocity characteristic vector of the current Gaussian point I.e. converting the speed of the parameterized model vertices into the speed characteristics of gaussian points.
- 7. The method of claim 5, wherein step S35 further comprises Normalizing the coefficients of the parameterized vertices: Wherein, the Is the kth linear hybrid skin coefficient of the closest parameterized vertex to the current gaussian point, The kth component of the linear hybrid skin coefficient offset for the predicted network.
- 8. The method of claim 1, wherein S5 further comprises: s51, rendering an RGB image under a test view angle by utilizing a micro-rasterization technology; s52, model training is carried out based on L1 loss, structural similarity loss, depth perception feature loss, mask loss and Gaussian shape constraint loss of the predicted image and the truth image, wherein an L1 loss function optimization formula is as follows: , Wherein, the Respectively, predicted image and image true value The structural similarity loss calculation formula is as follows: ; wherein, SSIM represents a structural similarity index; the depth perception characteristic loss calculation formula is as follows: ; Wherein, the Represents a VGG feature extractor; The mask loss calculation formula is: ; Wherein, the Respectively obtaining true values of the predicted human body mask image and the human body foreground mask image; the gaussian shape constraint loss calculation formula is: ; Wherein the method comprises the steps of Representing the longest and shortest axes in a gaussian ellipsoid respectively, The parameters are predefined.
- 9. The utility model provides a three-dimensional dynamic human reconstruction device of many priors no action constraint based on three-dimensional Gaussian sputtering under sparse visual angle which characterized in that includes: the parameter and image acquisition module is used for acquiring an internal reference set and an external reference set of the sparse visual angle industrial camera and a corresponding human RGB image sequence and mask image sequence; The multi-layer Gaussian point cloud generation and initialization module is used for iteratively amplifying the vertexes to generate multi-layer Gaussian point clouds based on normal information of vertexes of the human body parameterized model and determining the initialized point clouds through rasterization coverage detection; the dynamic feature fusion and skin coefficient prediction module is used for sampling the speed features of the vertexes of the parameterized model by using a K nearest neighbor algorithm and inputting linear skin coefficient offset of the predicted Gaussian points of the neural network by combining time, space position and human body parameters; the time-varying visual angle irrelevant RGB prediction module is used for inputting dynamic RGB colors of the predicted Gaussian points of the neural network by combining time attributes, spatial positions and speed characteristics on the assumption of time-varying visual angle irrelevant; The multi-modal loss optimization module is used for carrying out rasterization rendering on the initialized Gaussian point cloud, calculating multi-modal loss functions comprising L1 loss, SSIM loss, LPIPS loss, mask loss and shape constraint loss and optimizing model parameters.
- 10. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any one of claims 1-8.
Description
Multi-prior non-action constraint three-dimensional dynamic human body reconstruction method based on three-dimensional Gaussian sputtering under sparse view industrial camera input Technical Field The invention relates to the technical field of three-dimensional dynamic human body reconstruction, in particular to a multi-prior non-action constraint three-dimensional dynamic human body reconstruction method, a system and a computer readable storage medium based on three-dimensional Gaussian sputtering under the input of a sparse view industrial camera. Background The dynamic human body three-dimensional reconstruction is an important research direction in the fields of human body modeling, computer vision and human-computer interaction, and is important for obtaining a vivid human body three-dimensional sequence and realizing human body appearance, morphology and action analysis. The reconstruction method needs to break through the limitation of the acquired two-dimensional data, and a reasonable three-dimensional representation method is realized to complete modeling of the human body, and the capability is a key technical support for high-precision modeling of the human body. In the field of human body reconstruction, the traditional three-dimensional representation human body reconstruction method is often based on point cloud and grid structures, but the expression capacity of the traditional three-dimensional representation human body reconstruction method is limited, and high-precision human body geometry and texture reconstruction is difficult to realize. In recent years, with the synchronous development of deep learning and three-dimensional computer vision, the neural radiation field (Neural RADIANCE FIELDS, NERF) and the three-dimensional Gaussian sputtering (3D Gaussian Splatting,3DGS) can realize high-precision three-dimensional reconstruction of human bodies. The neural radiation field usually requires a great deal of time to complete reconstruction, and rendering real-time is difficult to ensure, so that practical application of the algorithm is limited. As the most advanced three-dimensional representation technology, three-dimensional Gaussian sputtering can realize extremely efficient rendering through a series of point clouds containing various attribute parameters and a splash rendering method, and has high-precision reconstruction quality. Nevertheless, three-dimensional gaussian sputtering, as a discretized representation, typically relies on dense viewing angle inputs, greatly increasing the deployment costs of the associated devices. Therefore, the three-dimensional dynamic human body reconstruction method under sparse visual angle input is a hot spot and an important point of current research. Dynamic human reconstruction requires the introduction of an additional time dimension relative to the original three-dimensional gaussian sputtering technique. Some existing methods, such as forming 3DGS and 4DGS, spacetime DGS, introduce time conditions to control attribute changes of three-dimensional gaussian point clouds in a scene, so as to realize dynamic visual effects. These methods have poor modeling effect on fine-grained actions. For human body motion, some front edge work comprises GauHuman and 3DGS-Avatar is based on a human body parameterized model, a modeling algorithm of non-rigid motion based on a neural network is introduced, the sense of realism of three-dimensional human body reconstruction is further enhanced, however, the algorithms of the front edge work are only aimed at monocular input, so that the motion of a target human body is constrained (for example, the human body needs to make one turn to ensure that all parts enter a camera acquisition range), and the diversity of the motion of the target human body is limited. In addition, part of the methods such as LIFe-GoM allow for rapid new angle rendering for the target human body in a feed-forward manner by pre-training a priori model about the three-dimensional geometry, appearance distribution of the human body based on a large-scale synthetic human body dataset. However, such methods are generally limited by the quality and distribution of the upstream human data set, and generalization is still currently not well-behaved. Disclosure of Invention The present invention aims to solve at least one of the technical problems in the related art to some extent. Therefore, a first object of the present invention is to provide a multi-prior motion-free constrained three-dimensional dynamic human body reconstruction method based on three-dimensional gaussian sputtering, comprising the steps of: s1, acquiring video information in real time by using a sparse view industrial camera, acquiring an internal reference and external reference set of the sparse view industrial camera and an RGB image sequence of a corresponding frame number, and combining a human body parameterized model parameter sequence and a human body foreground m