CN-117315742-B - Three-dimensional face key point detection method, device and storage medium

CN117315742BCN 117315742 BCN117315742 BCN 117315742BCN-117315742-B

Abstract

The invention discloses a three-dimensional face key point detection method, a device and a storage medium, wherein the three-dimensional face key point detection method comprises the steps of obtaining original input data, and representing three-dimensional face shape and texture data; the method comprises the steps of carrying out point cloud grid uniform sampling on original input data to generate first sampling data, filling the first sampling data to generate first filling data, outputting a distribution probability map model of three-dimensional face key points on the first filling data by using a map convolution network, and predicting the positions of the three-dimensional face key points based on the distribution probability map model. The method and the device for converting the three-dimensional human face key point probability map model into the key point position can adapt to the training of the map convolution network, and improve the robustness and the accuracy of the training result. In addition, the invention provides a normalized representation method of three-dimensional face geometric and texture data, which can enable a training network to adapt to different types of data, thereby having generalization capability across data sets.

Inventors

FAN ZHENFENG
XIA SHIHONG
ZHAO ZEJUN
DING BO

Assignees

中国科学院计算技术研究所

Dates

Publication Date: 20260505
Application Date: 20230906

Claims (11)

1. The three-dimensional face key point detection method is characterized by comprising the following steps of: Acquiring original input data, wherein the original input data is used for representing three-dimensional face shape and texture data; Uniformly sampling the original input data by using a point cloud grid to generate first sampling data; filling the first sampling data to generate first filling data; Outputting a distribution probability graph model of the three-dimensional face key points to the first filling data by using a graph convolution network; based on the distribution probability graph model, predicting the positions of key points of the three-dimensional face by adopting a sampling method which is suitable for graph rolling network training and has balanced directivity.
2. The method for detecting three-dimensional face key points according to claim 1, wherein, The raw input data includes: first format data representing shape data, the shape data comprising x, y, z channel data; second format data representing shape and black and white texture data, the black and white texture data comprising v-channel data; and third format data representing shape and color texture data, the color texture data comprising r, g, b three-way color texture data.
3. The method for detecting three-dimensional face key points according to claim 1 or 2, wherein, The point cloud grid is uniformly sampled as follows: , Wherein, the Representing the surface area of the input three-dimensional face point cloud grid, Represents the number of sampling points, and r represents the sampling interval radius.
4. The method for detecting three-dimensional face key points according to claim 3, wherein, The first sampling data comprises first format sampling data, second format sampling data and third format sampling data which are sequentially used for representing data generated after the point cloud grids of the first, second and third format data are uniformly sampled.
5. The method for three-dimensional face key point detection as defined in claim 4, wherein, The first filling data adopts Six-channel normalized data representation method acquisition, the method comprises the following steps of The six-channel normalized data representation method comprises the following steps: when the first format sampling data is input, the x, y and z channel data are sequentially filled into the data First, second and third channels of six channels, said The fourth, fifth and sixth channels of the six channels fill-1; When the second format sampling data is input, the x, y, z and v channel data are sequentially filled into the second format sampling data The first, second, third and fourth channels of six channels, the fifth and sixth channels filling-1; When the sampling data in the third format is input, the data of the x, y, z, r, g and b channels are sequentially filled into the data of the third format In six channels.
6. The method for detecting three-dimensional face key points according to claim 1, wherein, The probability map is in the form of a probability map which takes the position of a key point as the center and obeys Gaussian distribution.
7. The method for detecting three-dimensional face key points according to claim 1,2, 4, 5 or 6, wherein, Based on the distribution probability map model, predicting three-dimensional face key points comprises the following steps: acquiring a target sampling point set; And processing the target sampling point set based on a soft threshold maximizing method, and outputting the positions of the three-dimensional face key points.
8. The method for detecting three-dimensional face keypoints according to claim 7, wherein the acquiring the target sampling point set comprises the steps of: step 1, selecting a first center point corresponding to the maximum value of the thermodynamic diagram Constructing an initial set of sampling points ; Step 2, selecting a plurality of points closest to the center point as a first set ; Step 3, in the first set Selecting a second center point corresponding to the maximum thermodynamic diagram And the second center point Updating to the initial sampling point set to obtain an updated sampling point set ; Step 4, in the first set Removing the second center point And all and connect to the first central point And the second center point The vector formed The included angle is smaller than a point of a preset angle; Step 5, circularly executing the step 3 and the step 4 until the first set And removing all points in the set, and taking the finally obtained sampling point set as the target sampling point set.
9. The method for three-dimensional face key point detection according to claim 7, wherein the soft threshold maximizing method is represented by the following formula: , Wherein, the The graph of the thermodynamic diagram is shown, The coordinates of the key points of the object are represented, The coefficient representing the index is represented by a number, A value representing the i-th coordinate in the thermodynamic diagram, Representing the coordinates of the ith three-dimensional vertex.
10. A three-dimensional face key point detection apparatus employing the three-dimensional face detection method according to any one of claims 1 to 9, comprising: The data acquisition module is used for acquiring original input data comprising three-dimensional face shape and texture data; the data sampling module is used for uniformly sampling the point cloud grid of the original input data to generate first sampling data; the data filling module is used for filling the first sampling data and generating first filling data; the probability map model generation module is used for outputting a distribution probability map model of three-dimensional face key points by using a map convolution network for the first filling data; And the three-dimensional face key point position output module is used for predicting the three-dimensional face key point position by adopting a sampling method which is suitable for graph rolling network training and has balanced directivity based on the distribution probability graph model.
11. A storage medium storing a program for executing the three-dimensional face key point detection method according to any one of claims 1 to 9.

Description

Three-dimensional face key point detection method, device and storage medium Technical Field The invention relates to the field of face detection, in particular to a three-dimensional face key point detection method, a three-dimensional face key point detection device and a storage medium. Background The existing three-dimensional face key point detection method is mainly divided into a traditional method and a deep learning method. The traditional method mainly utilizes the traditional machine learning method to detect key points of the three-dimensional face, and the key points are usually positioned by adopting traditional geometric features, a three-dimensional face prior model or three-dimensional face dense registration. For example, segndo et al propose three-dimensional face keypoint detection algorithms that combine three-dimensional face surface curvature and depth confidence curves, gilani et al propose methods for dense registration of three-dimensional faces using adaptive geometric functions and further locating face keypoints; The et al propose a method for face key point detection using scale-invariant features (SIFT: scale-INVARIANT FEATURE) and a grid function. The traditional method has certain robustness by constructing some geometrical artificial features. However, due to the lack of effective utilization and learning of data, the precision of the method has a large limitation, and the method is difficult to apply to different three-dimensional face data so as to adapt to a wide application scene. The deep learning method adopts the advanced deep learning technology at present, and adopts a self-adaptive data driving mode to position key points of the three-dimensional face. Such methods can in turn be subdivided into convolutional neural network (CNN: convolutional neural network) based methods and graph convolutional network (GCN: graph convolutional network) based methods. The method based on the convolutional neural network generally converts a three-dimensional shape into an image grid format, such as a multi-view depth map, a geometric map and the like, further utilizes the convolutional neural network to detect two-dimensional key points, and finally converts the three-dimensional key points into three-dimensional key points. Such methods typically rely on subjective preprocessing methods and suffer from two-to three-dimensional re-projection system errors. The related research of the method based on the graph rolling network just begins, wang et al propose a method for adaptively regressing three-dimensional face key point detection by using the graph rolling network, but only support the extraction of geometric features and do not merge texture features, and only support the geometric data after fixed resolution sampling, so that generalization capability is limited. Because of the particularity of three-dimensional face point cloud data, the existing three-dimensional face key point detection method mainly has the defects that the mapping from the three-dimensional face to the key point position is learned by using a graph rolling network, the problem is a pathological optimization, and the result obtained by directly returning to the three-dimensional coordinates is lack of robustness. And three-dimensional face data can be divided into textured data and non-textured data, and the existing method cannot process different types of data at the same time, so that the existing method can accurately detect on the existing single data set, but lacks generalization performance of cross data. Disclosure of Invention Aiming at the defects existing in the prior art, the invention provides a three-dimensional face key point detection method, which provides conversion from a probability map to key point positions so as to adapt to training of a map convolution network and overcome the defect that the prior art lacks robustness. In order to achieve the above object, the present invention provides a three-dimensional face key point detection method, including the following steps: Acquiring original input data, wherein the original input data is used for representing three-dimensional face shape and texture data; Uniformly sampling the original input data by using a point cloud grid to generate first sampling data; filling the first sampling data to generate first filling data; Outputting a distribution probability graph model of the three-dimensional face key points to the first filling data by using a graph convolution network; and predicting the positions of the key points of the three-dimensional face based on the distribution probability map model. Preferably, the raw input data includes: first format data representing shape data, the shape data comprising x, y, z channel data; second format data representing shape and black and white texture data, the black and white texture data comprising v-channel data; and third format data representing shape and color texture data, the color text