CN-115861540-B - Three-dimensional reconstruction method, device, equipment and storage medium for two-dimensional face

CN115861540BCN 115861540 BCN115861540 BCN 115861540BCN-115861540-B

Abstract

The invention relates to the field of three-dimensional reconstruction, and discloses a three-dimensional reconstruction method, device and equipment of a two-dimensional face and a storage medium. The method comprises the steps of receiving a two-dimensional face image, carrying out mouth key point detection processing on the two-dimensional face image based on mediapipe neural network to obtain a two-dimensional key point set, obtaining a three-dimensional face image based on pix2pix neural network, carrying out mouth key point detection processing on the three-dimensional face image based on mediapipe neural network to obtain a three-dimensional key point set, extracting N pairs of two-dimensional key points from the two-dimensional key point set and extracting N pairs of three-dimensional key points from the three-dimensional key point set, wherein N is a positive integer, obtaining a deviation absolute value according to a preset deviation algorithm, judging whether the deviation absolute value is smaller than an error threshold value, adding the two-dimensional face image into a reconstructed two-dimensional image training set if the deviation absolute value is smaller than the error threshold value, inputting pictures in the reconstructed two-dimensional image training set into the pix2pix neural network to carry out training processing, and generating a new pix2pix neural network.

Inventors

Su Pengyang
CHEN YONGHUA
LI WEI

Assignees

上海积图科技有限公司

Dates

Publication Date: 20260505
Application Date: 20221209

Claims (9)

1. A three-dimensional reconstruction method of a two-dimensional face, comprising the steps of: Receiving a two-dimensional face image, and carrying out mouth key point detection processing on the two-dimensional face image based on a preset mediapipe neural network to obtain a two-dimensional key point set; Based on a preset pix2pix neural network, performing three-dimensional reconstruction processing on the two-dimensional face image to obtain a three-dimensional face image; Based on a preset mediapipe neural network, carrying out mouth key point detection processing on the three-dimensional face image to obtain a three-dimensional key point set; extracting N pairs of two-dimensional key points from the two-dimensional key point set, and extracting N pairs of three-dimensional key points from the three-dimensional key point set, wherein N is a positive integer; according to a preset deviation algorithm, performing deviation value operation processing on the N pairs of two-dimensional key points and the N pairs of three-dimensional key points to obtain a deviation absolute value; judging whether the absolute value of the deviation is smaller than a preset error threshold value or not; If the two-dimensional face image is smaller than a preset error threshold value, adding the two-dimensional face image into a preset reconstructed two-dimensional image training set; inputting the pictures in the reconstructed two-dimensional picture training set into a preset pix2pix neural network for training treatment, and generating a new pix2pix neural network; and performing bias value operation processing on the N pairs of two-dimensional key points and the N pairs of three-dimensional key points according to a preset bias algorithm to obtain a bias absolute value, wherein the bias absolute value comprises: calculating the average number of pixels which are longitudinally spaced and correspond to each pair of two-dimensional key points in the N pairs of two-dimensional key points to obtain a two-dimensional closing value, wherein the N pairs of two-dimensional key points comprise N pairs of two-dimensional upper and lower lip detection points; Calculating the average number of pixels corresponding to the longitudinal interval in each pair of three-dimensional key points in the N pairs of three-dimensional key points to obtain a three-dimensional closed value, wherein the N pairs of three-dimensional key points comprise N pairs of three-dimensional upper and lower lip detection points; And calculating the absolute value of the difference between the two-dimensional closed value and the three-dimensional closed value to obtain the absolute value of the deviation.
2. The method of three-dimensional reconstruction of a two-dimensional face according to claim 1, wherein extracting N pairs of two-dimensional keypoints in the two-dimensional set of keypoints, and extracting N pairs of three-dimensional keypoints in the three-dimensional set of keypoints, comprises: extracting N pairs of two-dimensional key points in the two-dimensional key point set based on a preset mediapipe neural network; based on a preset mediapipe neural network, extracting N pairs of three-dimensional key points from the three-dimensional key point set according to the corresponding relation between the N pairs of two-dimensional key points and the three-dimensional face image.
3. The method for three-dimensional reconstruction of a two-dimensional face according to claim 1, wherein after inputting the pictures in the reconstructed two-dimensional graph training set to a preset pix2pix neural network for training processing to generate a new pix2pix neural network, further comprising: And replacing the new pix2pix neural network with a preset pix2pix neural network.
4. The method for three-dimensional reconstruction of a two-dimensional face according to claim 1, wherein after the determining whether the absolute value of deviation is smaller than a preset error threshold, before inputting the pictures in the reconstructed two-dimensional graph training set to a preset pix2pix neural network for training processing, generating a new pix2pix neural network, further comprises: And if the two-dimensional facial image is not smaller than the preset error threshold value, adding the two-dimensional facial image into a preset verification graph set.
5. The method for three-dimensional reconstruction of a two-dimensional face according to claim 4, wherein after inputting the pictures in the reconstructed two-dimensional graph training set to a preset pix2pix neural network for training processing to generate a new pix2pix neural network, further comprising: Based on the new pix2pix neural network, performing three-dimensional reconstruction processing on each image of the verification atlas to obtain a verification three-dimensional image set; Performing deviation analysis processing on the verification three-dimensional image set according to a preset verification algorithm to obtain an analysis result; And when the analysis result is qualified, replacing the preset pix2pix neural network with the new pix2pix neural network.
6. The method for three-dimensional reconstruction of a two-dimensional face according to claim 5, wherein the performing a deviation analysis process on the verification three-dimensional image set according to a preset verification algorithm to obtain an analysis result comprises: extracting M pairs of two-dimensional key points and M pairs of three-dimensional key points corresponding to the verification atlas of the verification three-dimensional image set based on a preset mediapipe neural network, wherein M is a positive integer; According to a preset deviation algorithm, performing deviation value operation processing on the M pairs of two-dimensional key points and the M pairs of three-dimensional key points to obtain a deviation absolute value; when the absolute value of the deviation is smaller than a preset verification threshold value, the analysis result is confirmed to be a qualified result; And when the absolute value of the deviation is not smaller than a preset check threshold, determining the analysis result as a disqualified result.
7. A three-dimensional reconstruction apparatus of a two-dimensional face, characterized in that the three-dimensional reconstruction apparatus of a two-dimensional face comprises: The two-dimensional detection module is used for receiving a two-dimensional face image, and carrying out mouth key point detection processing on the two-dimensional face image based on a preset mediapipe neural network to obtain a two-dimensional key point set; The three-dimensional reconstruction module is used for carrying out three-dimensional reconstruction processing on the two-dimensional face image based on a preset pix2pix neural network to obtain a three-dimensional face image; the three-dimensional detection module is used for carrying out mouth key point detection processing on the three-dimensional face image based on a preset mediapipe neural network to obtain a three-dimensional key point set; the extraction module is used for extracting N pairs of two-dimensional key points from the two-dimensional key point set and extracting N pairs of three-dimensional key points from the three-dimensional key point set, wherein N is a positive integer; the deviation operation module is used for carrying out deviation value operation processing on the N pairs of two-dimensional key points and the N pairs of three-dimensional key points according to a preset deviation algorithm to obtain a deviation absolute value; The judging module is used for judging whether the absolute value of the deviation is smaller than a preset error threshold value or not; the training set adding module is used for adding the two-dimensional facial image into a preset reconstructed two-dimensional image training set if the training set is smaller than a preset error threshold value; the training module is used for inputting pictures in the reconstructed two-dimensional picture training set into a preset pix2pix neural network to carry out training processing and generating a new pix2pix neural network; the deviation operation module is specifically configured to: calculating the average number of pixels which are longitudinally spaced and correspond to each pair of two-dimensional key points in the N pairs of two-dimensional key points to obtain a two-dimensional closing value, wherein the N pairs of two-dimensional key points comprise N pairs of two-dimensional upper and lower lip detection points; Calculating the average number of pixels corresponding to the longitudinal interval in each pair of three-dimensional key points in the N pairs of three-dimensional key points to obtain a three-dimensional closed value, wherein the N pairs of three-dimensional key points comprise N pairs of three-dimensional upper and lower lip detection points; And calculating the absolute value of the difference between the two-dimensional closed value and the three-dimensional closed value to obtain the absolute value of the deviation.
8. The three-dimensional reconstruction device of the two-dimensional face is characterized by comprising a memory and at least one processor, wherein the memory is stored with instructions, and the memory and the at least one processor are interconnected through a line; the at least one processor invokes the instructions in the memory to cause the three-dimensional reconstruction device of the two-dimensional face to perform the three-dimensional reconstruction method of the two-dimensional face as set forth in any one of claims 1-6.
9. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the method of three-dimensional reconstruction of a two-dimensional face as claimed in any one of claims 1-6.

Description

Three-dimensional reconstruction method, device, equipment and storage medium for two-dimensional face Technical Field The present invention relates to the field of three-dimensional reconstruction, and in particular, to a method, apparatus, device, and storage medium for three-dimensional reconstruction of a two-dimensional face. Background The existing 3d reconstruction algorithm of the 2d face is quite large, but the existing 3d reconstruction algorithm has the problem that lips are inaccurate. For example, the mouth of a 2d face is closed, but after 3d reconstruction, the lips of the 3d face are open. The method can accurately convert the 2d face photo into the 3d face by utilizing the strong corresponding (conversion) relation of pix2 pix. The problem that the mouth is not closed accurately almost inevitably exists in 3d reconstruction of 2d face photos is solved by utilizing the strong corresponding (conversion) relation of pix2 pix. The problem of inaccurate mouth closing in 3d reconstruction is from the characteristics of the neural network in training, the loss function of the neural network considers global loss, and the generation quality of the whole 3d reconstructed picture can be considered in training instead of the accuracy of single mouth closing. Therefore, although the existing 3d reconstruction technology has many problems, no exception exists that the mouth closing degree is not accurate enough. Therefore, a new technology is needed to solve the technical problem that the reconstruction of the mouth of the current two-dimensional face picture into the three-dimensional picture is not accurate enough. Disclosure of Invention The invention mainly aims to solve the technical problem that the reconstruction of the mouth of the current two-dimensional face picture reconstructed into the three-dimensional picture is not accurate enough. The first aspect of the present invention provides a three-dimensional reconstruction method of a two-dimensional face, comprising the steps of: Receiving a two-dimensional face image, and carrying out mouth key point detection processing on the two-dimensional face image based on a preset mediapipe neural network to obtain a two-dimensional key point set; Based on a preset pix2pix neural network, performing three-dimensional reconstruction processing on the two-dimensional face image to obtain a three-dimensional face image; Based on a preset mediapipe neural network, carrying out mouth key point detection processing on the three-dimensional face image to obtain a three-dimensional key point set; extracting N pairs of two-dimensional key points from the two-dimensional key point set, and extracting N pairs of three-dimensional key points from the three-dimensional key point set, wherein N is a positive integer; according to a preset deviation algorithm, performing deviation value operation processing on the N pairs of two-dimensional key points and the N pairs of three-dimensional key points to obtain a deviation absolute value; judging whether the absolute value of the deviation is smaller than a preset error threshold value or not; If the two-dimensional face image is smaller than a preset error threshold value, adding the two-dimensional face image into a preset reconstructed two-dimensional image training set; And inputting the pictures in the reconstructed two-dimensional picture training set into a preset pix2pix neural network for training processing, and generating a new pix2pix neural network. Optionally, in a first implementation manner of the first aspect of the present invention, the performing, according to a preset deviation algorithm, a deviation value operation process on the N pairs of two-dimensional key points and the N pairs of three-dimensional key points, to obtain a deviation absolute value includes: Calculating the average number of the pixels at intervals corresponding to each pair of two-dimensional key points in the N pairs of two-dimensional key points to obtain a two-dimensional closed value; Calculating the average number of the pixels corresponding to the interval of each pair of three-dimensional key points in the N pairs of three-dimensional key points to obtain a three-dimensional closed value; And calculating the absolute value of the difference between the two-dimensional closed value and the three-dimensional closed value to obtain the absolute value of the deviation. Optionally, in a second implementation manner of the first aspect of the present invention, the extracting N pairs of two-dimensional keypoints from the two-dimensional keypoint set, and extracting N pairs of three-dimensional keypoints from the three-dimensional keypoint set includes: extracting N pairs of two-dimensional key points in the two-dimensional key point set based on a preset mediapipe neural network; based on a preset mediapipe neural network, extracting N pairs of three-dimensional key points from the three-dimensional key point set according to