CN-116434302-B - Face alignment method of face alignment neural network based on boundary perception

CN116434302BCN 116434302 BCN116434302 BCN 116434302BCN-116434302-B

Abstract

The application discloses a face alignment method based on a boundary perception face alignment neural network, which comprises a boundary heat map estimation sub-network and a coordinate regression sub-network, wherein the boundary heat map estimation sub-network comprises a CoordConv layer and a plurality of shallow and deep feature fusion SDFusion modules, the coordinate regression sub-network comprises a self-attention-based feature re-extraction SAfeature module and a transducer-decoder module, a user at the input end of the boundary heat map estimation sub-network inputs user original face image information, a boundary heat map is generated at the output end of the boundary heat map estimation sub-network, the coordinate regression sub-network fuses the original face image information, potential features of the boundary heat map estimation sub-network and the generated boundary heat map, and the SAfeature module and the transducer-decoder module are used for mapping the fused features to key point coordinates so as to further improve the accuracy of predicting the boundary heat map.

Inventors

PENG JINGLIANG
LI YINGXIN
NIU DONGMEI

Assignees

济南大学

Dates

Publication Date: 20260505
Application Date: 20230323

Claims (6)

1. The face alignment method based on the boundary perception face alignment neural network is characterized by comprising a boundary heat map estimation sub-network and a coordinate regression sub-network, wherein the boundary heat map estimation sub-network comprises a CoordConv layer and a plurality of shallow and deep feature fusion SDFusion modules, the coordinate regression sub-network comprises a self-attention-based feature re-extraction SAfeature module and a transducer-decoder module, a user at the input end of the boundary heat map estimation sub-network inputs user original face image information, a boundary heat map is generated at the output end of the boundary heat map estimation sub-network, the boundary heat map generated by fusing the original face image information, potential features of the boundary heat map estimation sub-network and the fused features are mapped to key point coordinates by using the SAfeature module and the transducer-decoder module; the SDFusion module comprises a first residual module, wherein the input end of the first residual module inputs first image information, the output end of the first residual module is connected with the input ends of the first convolution module and the pyramid pooling module, the output end of the first convolution module outputs a first boundary sub-heat map of the first image information after activating a function, and the first boundary sub-heat map is input to the second convolution module and is output after activating the function; The method comprises the steps that first image information processed by the first residual error module at the input end of the pyramid pooling module is spliced with a second image information channel to obtain third image information after passing through the pyramid pooling module, a third convolution module, a second residual error module and a fourth convolution module are sequentially connected after passing through the pyramid pooling module, and the third image information and a first boundary sub-heat map output by the second convolution module are subjected to element-by-element multiplication operation to obtain fourth image information with enhanced outline; The SAfeature module comprises a fifth convolution module, a sixth convolution module, a seventh convolution module and an eighth convolution module which are arranged in parallel, wherein output information of the fifth convolution module and output information of the sixth convolution module are subjected to matrix outer product operation to obtain two groups of feature images, the two groups of feature images are subjected to channel splicing to obtain new feature image information, the output information of the seventh convolution module and the new feature image information are added element by element and then input into the ninth convolution module, the output information of the eighth convolution module and the output information of the ninth convolution module are added element by element and then input into a third residual module, and the output end of the third residual module is connected with the input end of the converter-decoder module; And if the predicted key points need to be drawn on the original image, parameters of a rectangular frame of the face in data preprocessing are used, the model output result is multiplied by the width and height of the rectangular frame, and then the key points are drawn on the original image by utilizing the positions of the rectangular frame.
2. The face alignment method based on the boundary sensing face alignment neural network according to claim 1, wherein the boundary heat map estimation sub-network further comprises a tenth convolution module, a fourth residual module and a first fuzzy pooling module which are sequentially connected, an output end of the first fuzzy pooling module is connected with an input end of the CoordConv layers, a fifth residual module, a sixth residual module and a first hourglass module are sequentially arranged between the CoordConv layers and the first SDFusion module, a first hourglass module is arranged between adjacent SDFusion modules, an output end of a last SDFusion module is sequentially connected with a second hourglass module and an eleventh convolution module, and the output of the eleventh convolution module outputs the boundary heat map after activating functions.
3. The face alignment method of a boundary-aware based face alignment neural network of claim 2, wherein the input of a first residual module in a first SDFusion module is from an element-wise addition of the output of the sixth residual module and the output of a first hourglass module, the input of a first residual module in the remaining SDFusion modules is from an element-wise addition of the output of a SDFusion module in the previous stage and the output of the first hourglass module, and the second image information is from the CoordConv layers.
4. A face alignment method based on boundary sensing of a face alignment neural network according to claim 2 or 3, wherein the coordinate regression sub-network further comprises a twelfth convolution module, a second blurring pooling module, a thirteenth convolution module, a fourteenth convolution module, a third blurring pooling module, a fifteenth convolution module, a fourth blurring pooling module, a sixteenth convolution module and a fifth blurring pooling module which are sequentially connected, and an output end of the fifth blurring pooling module is connected with an input end of the SAfeature module.
5. The face alignment method based on the face alignment neural network of claim 4, wherein the boundary heat map obtained by the boundary heat map estimation sub-network is spliced with the original face image information channel after the average pooling operation after the approach difference operation, and is input to the twelfth convolution module, and the output information of the second fuzzy pooling module is spliced with the output information channel of the last SDFusion modules, and is input to the thirteenth convolution module.
6. The face alignment method of face alignment neural network based on boundary sensing as claimed in claim 5, wherein determining a Loss function Loss, the whole Loss comprising And Corresponding to the coordinate Loss of the key point and the boundary heat map Loss respectively, the total Loss is defined as follows: Wherein, the The number of key points of the face is represented, And Respectively representing predicted key point coordinates and key point labeling coordinates, The number of times that the prediction boundary heat map is represented is equal to the number of stacked hourglass modules, And Representing a predicted boundary heat map and a true boundary heat map respectively, And beta is a super parameter that adjusts the two types of losses, default to 0.001.

Description

Face alignment method of face alignment neural network based on boundary perception Technical Field The application relates to the technical field of image processing, in particular to a face alignment method based on a face alignment neural network of boundary perception. Background Face alignment, also known as FLD (FACIAL LANDMARKS detection), refers to the automatic localization of a set of already predefined semantic feature points in a face image. Accurate face alignment plays a critical role in various face applications including face recognition and verification, face reproduction, face deformation, and the like. Most depth learning based FLD algorithms locate keypoint coordinates based on coordinate regression or keypoint heat map regression. The former directly predicts the location of each keypoint, while the latter estimates the heat map for each keypoint and locates the keypoint at the highest response point in the heat map. However, with these algorithms, semantic and geometric correlations between keypoints are not fully exploited to guide web learning, and thus the keypoints of a face cannot be accurately captured. Disclosure of Invention In order to solve the technical problems, the application provides the following technical scheme: In a first aspect, the embodiment of the application provides a face alignment method of a face alignment neural network based on boundary perception, which comprises a boundary heat map estimation sub-network and a coordinate regression sub-network, wherein the boundary heat map estimation sub-network comprises a CoordConv layer and a plurality of shallow and deep feature fusion SDFusion modules, the coordinate regression sub-network comprises a self-attention-based feature re-extraction SAfeature module and a transducer-decoder module, a user at an input end of the boundary heat map estimation sub-network inputs user original face image information, a boundary heat map is generated at an output end, the coordinate regression sub-network fuses the original face image information, potential features of the boundary heat map estimation sub-network and the generated boundary heat map, and the fused features are mapped to key point coordinates by using the SAfeature module and the transducer-decoder module. In a possible implementation manner, the SDFusion module includes a first residual module, the input end of the first residual module inputs first image information, the output end of the first residual module is connected with the input ends of the first convolution module and the pyramid pooling module, the output end of the first convolution module outputs a first boundary subheat map of the first image information after being subjected to an activation function, the first boundary subheat map is input to the second convolution module and is output after being subjected to an activation function, the first image information processed by the input end of the pyramid pooling module is spliced with the second image information channel after being subjected to the pyramid pooling module to obtain third image information, the pyramid pooling module is sequentially connected with the third convolution module, the second residual module and the fourth convolution module, and the third image information is subjected to element-by-element multiplication operation to obtain fourth image information with the first boundary subheat map output by the second convolution module. In a possible implementation manner, the SAfeature module includes a fifth convolution module, a sixth convolution module, a seventh convolution module and an eighth convolution module which are arranged in parallel, output information of the fifth convolution module and output information of the sixth convolution module are subjected to matrix outer product operation to obtain two sets of feature graphs, the two sets of feature graphs are subjected to channel splicing to obtain new feature graph information, the output information of the seventh convolution module and the new feature graph information are added element by element and then input into the ninth convolution module, the output information of the eighth convolution module and the output information of the ninth convolution module are added element by element and then input into a third residual module, and an output end of the third residual module is connected with an input end of the converter-decoder module. In a possible implementation manner, the boundary thermal map estimation sub-network further includes a tenth convolution module, a fourth residual module and a first fuzzy pooling module which are sequentially connected, an output end of the first fuzzy pooling module is connected with an input end of the CoordConv layers, a fifth residual module, a sixth residual module and a first hourglass module are sequentially arranged between the CoordConv layers and the first SDFusion module, a first hourglass module is arranged betwee