Search

CN-116152896-B - Face key point prediction method, APP, terminal equipment and storage medium

CN116152896BCN 116152896 BCN116152896 BCN 116152896BCN-116152896-B

Abstract

Acquiring a rough key point coordinate set of a face region in an image frame, positioning a target five-sense organ region in the face region based on the rough key point coordinate set to obtain an image to be predicted containing the target five-sense organ region, wherein the rough key point coordinate set is used for representing face structure information; the rough key point coordinate set and the image to be predicted are input into a first lightweight neural network, fusion processing is carried out by utilizing the first lightweight neural network, a fine key point coordinate set is obtained, the fine key point coordinate set comprises coordinates of a plurality of target five-sense organ areas, the rough key point coordinates and the fine key point coordinates are combined, a key point prediction result is obtained, and the key point prediction precision of the target five-sense organ areas in the face can be improved while the computing performance of the mobile terminal is adapted.

Inventors

  • QIAN LIHUI
  • LI ZHIWEN
  • HAN XINTONG

Assignees

  • 佛山虎牙虎信科技有限公司

Dates

Publication Date
20260512
Application Date
20230217

Claims (9)

  1. 1. The method for predicting the key points of the human face is characterized by comprising the following steps of: The method comprises the steps of obtaining a rough key point coordinate set of a face area in an image frame, positioning a target five-sense organ area in the face area based on the rough key point coordinate set to obtain an image to be predicted containing the target five-sense organ area, wherein the rough key point coordinate set is used for representing face structure information, the step of obtaining the image to be predicted containing the target five-sense organ area at least comprises the steps of obtaining an image to be detected containing the target five-sense organ area based on the rough key point coordinate set, wherein the target five-sense organ comprises eyes, the image to be detected is obtained according to a first expansion ratio in a brow shaking scene, the image to be detected is obtained according to a second expansion ratio in a non-brow shaking scene, the first expansion ratio is larger than the second expansion ratio, the brow shaking represents eyebrow shake relative to eyes in the eye area, and the expansion ratio represents the ratio of the image to be detected to the eye area image; Inputting the rough key point coordinate set and the image to be predicted into a first lightweight neural network, and performing fusion processing by using the first lightweight neural network to obtain a fine key point coordinate set, wherein the fine key point coordinate set comprises coordinates of a plurality of target five-sense organ areas; and combining the coarse key point coordinates and the fine key point coordinates to obtain a key point prediction result, wherein the key point prediction result is used for placing a virtual makeup body on the target five-sense organ region.
  2. 2. The method of claim 1, wherein prior to the acquiring the coarse key point coordinate set for the face region in the image frame, the method further comprises: correcting the orientation of a face area in the image frame to be vertical; The step of obtaining the key point prediction result comprises the following steps: and restoring the direction of the combined face area to the direction of the face area in the image frame to obtain the key point prediction result.
  3. 3. The method according to claim 2, wherein the step of obtaining the image to be predicted including the target five-element region comprises: and correcting the orientation of the target five-element region to be vertical based on the rough key point coordinate set to obtain an image to be detected containing the vertical target five-element region, wherein the fine key point coordinate is the coordinate of the vertical target five-element region.
  4. 4. The method of claim 3, wherein the target five sense organs include eyes, and the step of obtaining the image to be detected including the vertical target five sense organ region comprises: and carrying out mirror image overturning on the eye area of one eye to obtain the image to be detected.
  5. 5. The method of claim 1, wherein the first lightweight neural network is generated by training an input sample image with a preset probability of adding an occlusion color patch representing a color patch occluding a target five-element region.
  6. 6. The method of claim 5, wherein the first lightweight neural network is trained based on a distance loss function that characterizes a distance relationship between keypoints on the sample image and predicted keypoints.
  7. 7. The method according to claim 1, wherein the method further comprises: and determining face pixels and occlusion pixels in the image frame by utilizing a segmentation algorithm to obtain face segmentation information, wherein the face segmentation information can be used for carrying out virtual makeup body placement on the target five sense organs by combining the key point prediction result.
  8. 8. The method of claim 1, wherein the key point prediction result is processed by a sliding window smoothing algorithm, a smoothing parameter of the sliding window smoothing algorithm is greater than or equal to a preset value, and the image to be predicted is smoothed by a gaussian filter operator.
  9. 9. An APP deployed with a first lightweight neural network as claimed in any one of claims 1 to 8 and deployed with a second lightweight neural network; The second lightweight neural network is used for taking an image frame as input data, outputting a rough key point coordinate set of a face area in the image frame, wherein the APP is used for acquiring the rough key point coordinate set output by the second lightweight neural network, positioning a target five-element area in the face area based on the rough key point coordinate set to obtain an image to be predicted containing the target five-element area, the rough key point coordinate set is used for representing face structure information, and inputting the rough key point coordinate set and the image to be predicted into the first lightweight neural network, the step of obtaining the image to be predicted containing the target five-element area at least comprises the steps of obtaining the image to be detected containing the target five-element area based on the rough key point coordinate set, wherein the target five-element area comprises eyes, and the image to be detected is obtained according to a first external expansion proportion in a trembling scene; the first lightweight neural network is used for performing fusion processing to obtain a fine key point coordinate set, and the fine key point coordinate set comprises coordinates of a plurality of target five-sense organ areas; The APP is further used for combining the coarse key point coordinates and the fine key point coordinates to obtain a key point prediction result, wherein the key point prediction result is used for placing a virtual cosmetic on the target five-sense organ region.

Description

Face key point prediction method, APP, terminal equipment and storage medium Technical Field The application relates to the field of face recognition, in particular to a face key point prediction method, APP, terminal equipment and a computer readable storage medium. Background The face key point prediction refers to predicting the position point of a specific part of a face, and the position or the size of the specific part is judged through the position point, so that the face key point prediction can be applied to various scenes, and taking a video makeup body placement scene as an example, the video makeup body placement refers to performing various makeup body placements on the specific part of the face appearing in the video, such as placing a specific virtual makeup or virtual decoration for the specific part of the face, so that the face shows the effect required by a user in the video. Currently, the demand of makeup body placement of a specific part of a face mainly exists in a mobile terminal, and the mobile terminal realizes video makeup body placement through the key point prediction of the specific part of the face, however, due to the limitation of the computing performance of the mobile terminal, the key point prediction precision of the specific part of the face is difficult to improve while adapting to the computing performance of the mobile terminal. Disclosure of Invention Aiming at the technical problems, the application provides a face key point prediction method, APP, terminal equipment and a computer readable storage medium, and the technical scheme is as follows: according to a first aspect of the present application, there is provided a face key point prediction method, the method comprising: Acquiring a rough key point coordinate set of a face region in an image frame, and positioning a target five-sense organ region in the face region based on the rough key point coordinate set to obtain an image to be predicted containing the target five-sense organ region, wherein the rough key point coordinate set is used for representing face structure information; Inputting the rough key point coordinate set and the image to be predicted into a first lightweight neural network, and performing fusion processing by using the first lightweight neural network to obtain a fine key point coordinate set, wherein the fine key point coordinate set comprises coordinates of a plurality of target five-sense organ areas; and combining the coarse key point coordinates and the fine key point coordinates to obtain a key point prediction result. According to a second aspect of the present application, there is provided an APP deployed with the first lightweight neural network as described in the first aspect, and deployed with a second lightweight neural network; The APP is used for acquiring a rough key point coordinate set output by the second lightweight neural network, positioning a target five-sense organ region in the face region based on the rough key point coordinate set to obtain an image to be predicted containing the target five-sense organ region, wherein the rough key point coordinate set is used for representing face structure information; the first lightweight neural network is used for performing fusion processing to obtain a fine key point coordinate set, and the fine key point coordinate set comprises coordinates of a plurality of target five-sense organ areas; the APP is also used for combining the coordinates of each coarse key point and the coordinates of each fine key point to obtain a key point prediction result. According to the technical scheme, the fine key point prediction of the target five-sense organ region can be performed on the mobile terminal through the lightweight neural network, so that the calculated amount of the mobile terminal is greatly reduced, the calculated amount of the mobile terminal is reduced, meanwhile, in order to ensure the prediction accuracy, the coarse key point coordinate set of the face region and the image to be predicted containing the target five-sense organ region are combined in the lightweight neural network, the fine key point prediction of the target five-sense organ region is performed, the lightweight neural network is utilized to perform fusion processing of the coarse key point coordinate set and the image to be predicted, the coarse key point coordinate set of the face region can provide face structured information, the lightweight neural network is assisted to perform prediction, the lightweight neural network can fully utilize the obtained coarse key point coordinate, the operation waste is avoided, the additional calculated amount is reduced, the calculated amount of the mobile terminal is further reduced while the calculation accuracy of the mobile terminal is improved, and the key point prediction accuracy of the target five-sense organ region in the face can be improved while the calculation performance of the mobile terminal is ada