EP-3980974-B1 - SINGLE IMAGE-BASED REAL-TIME BODY ANIMATION

EP3980974B1EP 3980974 B1EP3980974 B1EP 3980974B1EP-3980974-B1

Inventors

NEMCHINOV, EGOR
GORBATYUK, SERGEI
MASHRABOV, Aleksandr
SPIRIN, EGOR
SOKOLOV, IAROSLAV
SMIRDIN, ANDREI
TUKH, IGOR

Dates

Publication Date: 20260506
Application Date: 20200520

Claims (11)

A computer-implemented method for single image-based body animation, the method comprising: receiving (1505) an input image (110), the input image (110) including a body of a person; segmenting (1510) the input image (110) into a body portion (115) and a background portion (120), wherein the body portion (115) includes pixels of the input image (110), the pixels corresponding to the body of the person; fitting (1515) a model to the body portion (115) by: receiving, by the model, a set of pose parameters (130) representing a pose of the body; and generating, based on the set of pose parameters (130) and the model, an output image, the output image including an image of the body adopting the pose; receiving (1520) a series of further sets of pose parameters, each of the further sets of pose parameters representing at least one of further poses of the body; providing (1525) each of the series of further sets of pose parameters to the model to generate a series of output images of the body adopting the further poses; and generating (1530), based on the series of output images, an output video (140), wherein each frame (150) of the output video (140) includes at least one of the output images; wherein the model includes: a set of joint points in three-dimensional, 3D, space, the joint points indicating a location of joints in the body; a mesh including mesh points in the 3D space, each of the mesh points being assigned a set of skinning weights, each of the skinning weights associated with at least one of the joint points; and a texture map to generate a texture on the mesh; and wherein the texture map is generated by: unwrapping the mesh to generate a two-dimensional, 2D, representation of the mesh; and for each face of the 2D representation of the mesh: determining whether the face corresponds to a part of the body visible in the input image (110); based on determination that the face corresponds to the part of the body visible in the input image (110), assigning a segment of the body portion (115) to the face of the 2D representation of the mesh; and based on determination that the face does not correspond to the part of the body visible in the input image (110): generating a predicted face based on the body portion (115); and assigning the predicted face to the face of the 2D representation of the mesh.
The method of claim 1, wherein the segmenting (1510) the input image is performed by a neural network.
The method of claim 1 or 2, wherein: the set of pose parameters (130) includes rotational angles of the joint points with respect to a reference point; and generating the output image includes: transforming the mesh by transforming the mesh points, wherein each of the mesh point is rotated by angles, the angles being determined based on the rotational angles of the joint points and the skinning weights; and applying the texture map to the transformed mesh to generate a texture of the transformed mesh.
The method of claim 1 or 2, wherein the fitting (1515) the model includes: determining, based on the body portion (115), a generic model (1020), the generic model (1020) including a set of key points (410) indicative of the joints in the body and set of shape parameters indicative of a shape of the body; determining, based on the body portion (115), a first silhouette of the body image; determining, based on the generic model (1020), a second silhouette of the body image; determining a set of pairs of points (540, 550), wherein each of the pairs of points include a first point (540) located on the first silhouette and a second point (550) located on the second silhouette; warping, based on the set of the pairs of points (540, 550), the generic model (1020) to obtain a warped model; and determining, based on the warped model, the mesh and the set of joint points.
The method of claim 4, wherein the set of joint points is generated based on the mesh.
The method of claim 4, wherein the set of joint points is the set of key points (410).
The method of claim 4, wherein the set of key points (410) is determined by a first neural network and the generic model (1020) is determined by a second neural network.
The method of any of the preceding claims, wherein the method is performed by a computing device (200).
A computing device comprising at least one processor (1610) configured to perform the method of any of claims 1-7.
A non-transitory processor-readable medium having instructions stored thereon, which when executed by one or more processors (1610), cause the one or more processors (1610) to implement a method for single image-based body animation according to any of claims 1-7.
A computer program comprising instructions which, when the program is executed by a computer (1600), cause the computer (1600) to carry out the method of any of claims 1-7.

Description

TECHNICAL FIELD This disclosure generally relates to digital image processing. More particularly, this disclosure relates to methods and systems for single image-based real-time body animation. BACKGROUND Body animation can be used in many applications, such as advertisements, entertainment shows, social media networks, computer games, videos, video conversations, virtual reality, augmented reality, and the like. An animation of a body of a person based on a single photograph can be specifically useful in various applications. For example, a person on the photograph can "come alive" by performing movements similar to a real video, for example, dancing, performing acrobatics, fighting, and so forth. Animation of the body of a person based on a single photograph entails creating a realistic model of a body of a particular person and having the model perform actions or interactions within scenes. International Patent Application Publication No. WO 2011/045768 A2 discloses an image animation method. The method includes: fitting a fitting model to at least an object in the image, and animating the object in accordance with a corresponding animation model. The fitting model is at least as rigid as the animation model, and the animation model is no more rigid than the fitting model. U.S. Patent Application Publication No. US 2019/0116322 A1 relates generally to systems and methods for analyzing and manipulating images and video. In particular, a multi-view interactive digital media representation (MVIDMR) of a person can be generated from live images of a person captured from a hand-held camera. Using the image data from the live images, a skeleton of the person and a boundary between the person and a background can be determined from different viewing angles and across multiple images. Using the skeleton and the boundary data, effects can be added to the person, such as wings. The effects can change from image to image to account for the different viewing angles of the person captured in each image. Further, International Patent Application Publication No. WO 2017/029488 A2 describes a method of generating an image file of a personalized 3D head model of a user. The method comprising the steps of: (i) acquiring at least one 2D image of the user's face; (ii) performing automated face 2D landmark recognition based on the at least one 2D image of the user's face; (iii) providing a 3D face geometry reconstruction using a shape prior; (iv) providing texture map generation and interpolation with respect to the 3D face geometry reconstruction to generate a personalized 3D head model of the user; and (v) generating an image file of the personalized 3D head model of the user. A related system and computer program product are also described. SUMMARY This section is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description section. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. The invention is defined in the independent claims. Particular embodiments are set out in the dependent claims. According to one embodiments of the disclosure, a method for single image-based real-time body animation is provided. The method may include receiving, by a computing device, an input image. The input image may include a body of a person. The method may further include segmenting, by the computing device, the input image into a body portion and a background portion. The body portion may include pixels of the input image corresponding to the body of the person. The method may also include fitting, by the computing device, a model to the body portion. The model can be configured to receive a set of pose parameters representing a pose of the body and generate, based on the set of pose parameters, an output image. The output image may include an image of the body adopting the pose. The method may also include receiving, by the computing device, a series of further sets of pose parameters. Each of the further sets of pose parameters may represent at least one of further poses of the body. The method may include providing, by the computing device, each further set of the series of further sets of pose parameters to the model to generate a series of output images of the body adopting the further poses. The method may also include generating, by the computing device and based on the series of output images, an output video, wherein each frame of the output video includes at least one of the output images. The segmenting of the input image can be performed by a neural network. The series of further sets of pose parameters can be generated based on a motion video. The motion video can feature a further person adopting the further poses. The model may include a set of joint points in a three-dimensional (3D) space. The joint poin