CN-116645464-B - Three-dimensional human body reconstruction method based on hand painting

CN116645464BCN 116645464 BCN116645464 BCN 116645464BCN-116645464-B

Abstract

The embodiment of the invention discloses a three-dimensional human body reconstruction method based on hand drawing, which comprises the steps of manufacturing a new data set Sketch3D, establishing a mapping relation between a human body hand drawing Sketch and a three-dimensional model, using a convolution neural network and an SMPL model based end-to-end method, modifying a network according to the characteristic of the hand drawing, and creating a new training mode, namely step by step migration training (SST STEP TRAINING). By adopting the invention, the capability of extracting the characteristics of the hand-drawn image of the model is improved by creating the multi-channel branch attention network, the function of reconstructing the three-dimensional model of the human body from one hand-drawn human body sketch is realized, and the function is not only suitable for users with drawing foundations, but also suitable for users without drawing foundations. The efficiency of human body three-dimensional modeling is greatly improved.

Inventors

WANG FEI
Tang Kongzhang
ZHU CHANGSHENG
CAI HAO

Assignees

汕头大学

Dates

Publication Date: 20260512
Application Date: 20230323

Claims (8)

1. The three-dimensional human body reconstruction method based on hand painting is characterized by comprising the following steps of: s1, preparing a data set corresponding to a hand-drawn sketch-three-dimensional human body model, wherein the data set comprises a synthetic drawing and a hand-drawn drawing, and a three-dimensional model corresponding to a corresponding human body; s2, using ResNet as an encoder of the convolutional neural network, and enabling information to flow from an input end to an output end directly; the decoder structure is designed into a multi-branch structure and comprises an attitude decoder, a shape decoder and a projection position decoder, wherein the decoders are formed by stacking an Attention module and a linear layer, and Attention optimization is respectively carried out on the characteristics Zhang Xiangliang extracted by the convolutional neural network through different branches; For the characteristic enhancement part of the decoder, the gesture decoder, the shape decoder and the projection position decoder both adopt self-attention model to enhance the characteristic vector obtained by the encoder, optimize the characteristic vector obtained from the encoder, and then dimension scale the respective optimized characteristic vector according to different dimensions of gesture, shape and projection parameters; S3, pre-training is used for weak supervision of two-dimensional joint point data owned by the synthetic data of the dataset, the three-dimensional target marked with errors in the dataset is corrected by optimizing through a SMPLify method, and hand-drawing sketch data is adopted for gradual fine-tuning training to fine-tune the gesture.
2. The three-dimensional human body reconstruction method based on hand painting according to claim 1, wherein the step S1 specifically comprises the steps of: S11, two-position joint point information of a sketch is obtained by using two-dimensional projection parameters trained by a synthetic data set; s12, acquiring the position information of the human body in the sketch board from the coordinates of the two-position articulation point.
3. The three-dimensional human body reconstruction method based on hand drawing according to claim 2, wherein the step S12 further comprises the steps of using a preprocessing program to directly detect the upper, lower, right and left boundaries of the human body in the sketch to obtain the boundaries of the sketch in the target detection drawing board, and using the boundaries to calculate the center coordinates and scaling of the sketch after redefining the shape as (224 ), obtaining a set of data representing the position information of the human body in each sketch, storing the obtained information in a file, and reading out the obtained information during training.
4. The three-dimensional human body reconstruction method based on hand drawing according to claim 1, wherein the decoding mode of the motion mode comprises the steps of establishing topological relations among different joint points of a human body, constructing a 24-dimensional list, wherein each dimension represents 24 joint points of the SMPL human body model from top to bottom, storing a sub-list in each dimension of the list, and the content comprises indexes of other joint points which are corresponding to each joint point and have the greatest influence on the joint points.
5. The method of claim 1, wherein the step of enhancing the feature vector obtained by the encoder using self-attention model comprises: First by inputting an image Obtaining image features via the encoder Then the image features are pooled and dimension reduced into Feature vector Will be Inputting three self-attention mechanism networks; In the three self-attention mechanism networks, feature vectors are used Obtaining query parameters through a linear layer Key parameter And the value of itself Then input into the self-attention mechanism for calculation Global attention to three branches Where i=1, 2,3, will eventually be And Obtaining optimized global features through addition Wherein i represents the branch where the calculation process is: Wherein, the 、、 Is three different parameters obtained by the characteristic vector generated by the encoder through the neural network, the three parameters respectively represent the value of the characteristic vector, the index of the characteristic vector after the characteristic vector is encoded through the neural network and the value after the characteristic vector is encoded, Is composed of self-attention modules which respectively correspond to 、、 Is an Attention pipeline corresponding to the pipeline and needs to be used 、、 As input to calculate the self-attention mechanism, the final output is the same as the output dimension produced by the encoder, both of which are , As a result of the enhancement of the final features, the dimension is also 。
6. The method for reconstructing a three-dimensional human body based on hand-drawing according to claim 1, wherein, The decoding process of the decoder is that In the above-mentioned formula(s), 、 Cam represents the final predicted SMPL pose parameter, shape parameter and projection position parameter of the network, poseDecoder, shapDecoder, camDecoder represents the pose decoder, shape decoder and projection position decoder, respectively, concat represents the stitching operation, 、、 The feature vectors obtained through the feature enhancement operation are respectively represented by vectors with dimensions of 2048, init_phase, init_shape and init_cam respectively represent the SMPL parameters, and the model is subjected to initial human body posture, shape and projection parameters without any deformation.
7. The hand-drawn based three-dimensional human reconstruction method according to claim 6, further comprising the steps of: Will be 、、 Performing splicing operation on the posture parameters, the shape parameters and the projection parameters respectively corresponding to the SMPL initialization model 、、 Respectively and together with 、、 Performing splicing operation in the first dimension to produce three new shapes respectively of 、、 Then input into respective multi-layer perceptron to obtain final result, wherein the output of the gesture decoder is that The output of the shape decoder is The output of the projection position decoder is Output, output And Passing parameters to the SMPL to generate a 3D mesh file and then passing the 3D mesh through The parameters are projected into the artwork.
8. The method of three-dimensional human reconstruction based on hand-drawing according to any one of claims 1 to 7, wherein S3 further comprises the calculation of an objective function: Wherein L represents a loss function for parameter updating in deep learning training, shape3D is a three-dimensional human body grid, 3D is a three-dimensional human body joint point, 2D is a two-dimensional human body joint point, θ is a gesture parameter of SMPL, β is a shape parameter of SMPL, Representing the Mesh predicted by the network, Representing the object Mesh of the object, Representing a three-dimensional point of care, Representing a two-dimensional point of care, 、 Representing the pose and shape parameters of the SMPL respectively, Representing a batch during training; The goal of the pre-training phase is to: the goals used in the fine tuning training are: 。

Description

Three-dimensional human body reconstruction method based on hand painting Technical Field The invention relates to a three-dimensional reconstruction method, in particular to a three-dimensional human body reconstruction method based on hand painting. Background The historical source of three-dimensional reconstruction can be traced back to the origin of computer graphics, and the earliest three-dimensional reconstruction techniques have been proposed in the sixty-seventies of the twentieth century. However, since the method is very dependent on the detection of real objects by the scanner and the camera, the method consumes very much manpower and material resources and is not suitable for the public. The earliest three-dimensional reconstruction technology was proposed by the us computer graphics precursor Ivan Sutherland [1], which developed rapidly over the next few decades with the development of computers and the advancement of hardware devices, particularly GPUs. DAVID MARR proposes the concept of "visual intelligence" and a theoretical framework for extracting three-dimensional information from images. Marc Levoy et al propose a method of scanning the surface of a real object with a laser scanner and reconstructing a three-dimensional model, which is one of the earliest three-dimensional reconstruction methods based on real data. Chen and Medioni propose a three-dimensional reconstruction method based on visual geometry, which uses images from multiple viewing angles for reconstruction. After the twentieth century, three-dimensional reconstruction techniques have been developed and popularized due to the advent of GPUs, with the most representative methods including a point cloud-based reconstruction method, a stereoscopic vision-based reconstruction method, a light field-based reconstruction method, an image-based reconstruction method, and the like. However, three-dimensional human reconstruction methods based on hand-drawn sketches have limited development, and most of these methods use neural networks to predict mesh vertices of a human model from a hand-drawn drawing. Brodt and Bessmeltsev transfer sketch gestures to a standard 3D manikin by predicting three key factors (2D bone tangent, self-contact and preshrinking), which can directly derive a standard three-dimensional body mesh from a hand drawing. The existing three-dimensional human body reconstruction technology based on the hand-drawn sketch still relies on joint point labeling and self-connecting area labeling in images, and the labeling wastes a great deal of manpower and material resources, and is troublesome to create a large data set. And in consideration of abstraction and randomness of hand drawing, key information of human body cannot be obtained from sketches of any style by using the prior method. The method is fundamentally a three-dimensional reconstruction work based on single-VIEW IMAGE, but the direct action on sketch by using the current method has a plurality of problems. (1) Previous networks are all aimed at RGB images with rich image features, a user needs to take a picture with a specific posture, and then inputs the picture into the network to obtain a reconstructed three-dimensional human body. This method appears to be very convenient, but if a large number of different poses of the mannequin are to be obtained, a large number of posing shots are required, and some actions are to be performed by a professional, which presents a great inconvenience. (2) Sketches are used as an abstract and sparse expression mode, so that confusion of lines and imbalance of shape proportions can occur, and the style and the concrete degree of drawing the sketches by different users are unpredictable. Image features extracted using previous methods may not be able to obtain critical information for the sketch. (3) The information contained by the human body comprises a gesture, a shape and a position of the human body in a picture, the previous method is to directly predict three parameters of the human body by using a branch, the gesture, the shape and a projection position in a sketch can be mutually influenced, and a simple sketch is more biased to transmit the gesture information to a three-dimensional model, and the shape information can also have a certain influence on the gesture. More importantly, when the hand-drawn sketch is drawn at the position of 2D joints where no mark exists, the accurate projection of the three-dimensional model on the human body in the two-dimensional space cannot be obtained by using the previous method. It is therefore a very challenging task for a computer to identify specific information expressed by a sketch. (4) The existing reconstruction method based on the hand-drawn sketch still uses a network based on a real image, and relies on a large number of data marks, such as sketch node marks, sketch self-contact marks and the like, so that the workload of acquiring a large data set is huge. Discl