US-20260127800-A1 - NEURAL MOTION RIG FOR INTERACTIVE MOTION AUTHORING

US20260127800A1US 20260127800 A1US20260127800 A1US 20260127800A1US-20260127800-A1

Abstract

One embodiment of the present invention sets forth a technique for generating a motion for a virtual character. The technique includes determining a graph representation of a plurality of sets of joints corresponding to a sequence of poses for the virtual character based on (i) one or more input poses for the virtual character and (ii) a set of constraints associated with one or more joints included in the plurality of sets of joints. The technique also includes generating, via execution of a neural network, a set of updated node states for the plurality of sets of joints based on the graph representation. The technique further includes generating, based on the updated node states, the motion that includes (i) a first set of joint positions for the plurality of sets of joints and (ii) a first set of joint orientations for the plurality of sets of joints.

Inventors

Martin Guay
Dhruv Agrawal
Robert Walker Sumner
Jakob Joachim BUHMANN
Dominik Tobias BORER

Assignees

DISNEY ENTERPRISES, INC.
ETH Zürich (Eidgenössische Technische Hochschule Zürich)

Dates

Publication Date: 20260507
Application Date: 20241104

Claims (20)

1 . A computer-implemented method for generating a motion for a virtual character, comprising: determining a graph representation of a plurality of sets of joints corresponding to a sequence of poses for the virtual character based on (i) one or more input poses for the virtual character and (ii) a set of constraints associated with one or more joints included in the plurality of sets of joints; generating, via execution of a first neural network, a set of updated node states for the plurality of sets of joints based on the graph representation; and generating, based on the set of updated node states, the motion that includes (i) a first set of joint positions for the plurality of sets of joints and (ii) a first set of joint orientations for the plurality of sets of joints.
2 . The computer-implemented method of claim 1 , further comprising training the first neural network using (i) a first loss that is computed between a subset of the first set of joint positions and a second set of joint positions included in the one or more input poses and (ii) a second loss that is computed between a subset of the first set of joint orientations and a second set of joint orientations included in the one or more input poses.
3 . The computer-implemented method of claim 2 , further comprising training the first neural network based on one or more additional losses associated with the set of constraints.
4 . The computer-implemented method of claim 2 , wherein the first loss is further computed based on a first set of control parameters associated with preservation of the second set of joint positions in the motion and the second loss is computed based on a second set of control parameters associated with preservation of the second set of joint orientations in the motion.
5 . The computer-implemented method of claim 1 , wherein determining the graph representation comprises: generating, via execution of a second neural network, a first set of embeddings associated with (i) a set of identities for the plurality of sets of joints and (ii) a temporal position of each set of joints included in the plurality of sets of joints within the sequence of poses; determining, based on the one or more input poses and the set of constraints, (i) a second set of joint positions for the plurality of sets of joints and (ii) a second set of joint orientations for the plurality of sets of joints; and converting, via execution of a third neural network, the second set of joint positions and the second set of joint orientations into a second set of embeddings for the plurality of sets of joints.
6 . The computer-implemented method of claim 5 , wherein the second set of joint positions and the second set of joint orientations are further determined based on an interpolation associated with the one or more input poses and the set of constraints.
7 . The computer-implemented method of claim 1 , wherein converting the graph representation into the set of updated node states comprises generating the set of updated node states based a hierarchy of resolutions associated with the graph representation and a set of message-passing iterations.
8 . The computer-implemented method of claim 1 , wherein generating the motion comprises: converting, via execution of one or more additional neural networks, the set of updated node states into the first set of joint positions and the first set of joint orientations; and updating the first set of joint positions and the first set of joint orientations based on a rest pose for the virtual character.
9 . The computer-implemented method of claim 1 , wherein the set of constraints comprises at least one of a position constraint, an orientation constraint, or a ground contact constraint.
10 . The computer-implemented method of claim 1 , wherein the first neural network comprises a set of cross-layer attention blocks associated with a plurality of resolutions for a skeletal structure of the virtual character.
11 . One or more non-transitory computer-readable media storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: determining a graph representation of a plurality of sets of joints corresponding to a sequence of poses for a virtual character based on (i) one or more input poses for the virtual character and (ii) a set of constraints associated with one or more joints included in the plurality of sets of joints; generating, via execution of a first neural network, a set of updated node states for the plurality of sets of joints based on the graph representation; and generating, based on the set of updated node states, a motion that includes (i) a first set of joint positions for the plurality of sets of joints and (ii) a first set of joint orientations for the plurality of sets of joints.
12 . The one or more non-transitory computer-readable media of claim 11 , wherein the operations further comprise training the first neural network using (i) a first loss that is computed between a first subset of the first set of joint positions and a second set of joint positions included in a ground truth sequence of poses for the virtual character and (ii) a second loss that is computed between a first subset of the first set of joint orientations and a second set of joint orientations included in the ground truth sequence of poses.
13 . The one or more non-transitory computer-readable media of claim 12 , wherein the operations further comprise further training the first neural network based on one or more additional losses associated with the one or more input poses and the set of constraints.
14 . The one or more non-transitory computer-readable media of claim 13 , wherein the operations further comprise sampling the set of constraints from the ground truth sequence of poses prior to computing the one or more additional losses.
15 . The one or more non-transitory computer-readable media of claim 13 , wherein the one or more additional losses comprise (i) a third loss that is computed between a second subset of the first set of joint positions and a third set of joint positions included in the one or more input poses and the set of constraints and (ii) a fourth loss that is computed between a second subset of the first set of joint orientations and a third set of joint orientations included in the one or more input poses and the set of constraints.
16 . The one or more non-transitory computer-readable media of claim 11 , wherein converting the graph representation into the set of updated node states comprises: computing a set of attention scores based on the graph representation; and generating the set of updated node states based on the set of attention scores.
17 . The one or more non-transitory computer-readable media of claim 16 , wherein the set of attention scores is further computed based on a set of masks associated with the one or more input poses or the set of constraints.
18 . The one or more non-transitory computer-readable media of claim 11 , wherein the graph representation comprises a plurality of nodes corresponding to the plurality of sets of joints, a plurality of spatial edges between a first subset of node pairs included in the plurality of nodes, and a plurality of temporal edges between a second subset of node pairs included in the plurality of nodes.
19 . The one or more non-transitory computer-readable media of claim 11 , wherein the operations further comprise: outputting a set of motion curves corresponding to at least a portion of the motion within a user interface; determining an update to the set of constraints based on user input associated with the set of motion curves; and generating an updated motion for the virtual character based on the update to the set of constraints, wherein the updated motion includes (i) a second set of joint positions associated with the update to the set of constraints and (ii) a second set of joint orientations associated with the update to the set of constraints.
20 . A system, comprising: one or more memories that store instructions, and one or more processors that are coupled to the one or more memories and, when executing the instructions, are configured to perform operations comprising: determining graph representation of a plurality of sets of joints corresponding to a sequence of poses for a virtual character based on (i) a starting input pose for the virtual character and (ii) and ending input pose for the virtual character; generating, via execution of a first neural network, a set of updated node states for the plurality of sets of joints based on the graph representation; and generating, based on the set of updated node states, a motion that includes (i) a first set of joint positions for the plurality of sets of joints and (ii) a first set of joint orientations for the plurality of sets of joints.

Description

BACKGROUND Field of the Various Embodiments Embodiments of the present disclosure relate generally to computer vision and machine learning and, more specifically, to a neural motion rig for interactive motion authoring. Description of the Related Art Films, video games, virtual reality (VR) systems, augmented reality (AR) systems, mixed reality (MR) systems, robotics, and/or other types of interactive environments frequently include entities (e.g., characters, robots, etc.) that are posed and/or animated in three-dimensional (3D) space. Traditionally, an entity is posed via a time-consuming, iterative, and laborious process of manually manipulating multiple control handles corresponding to joints (or other parts) of the entity. An inverse kinematics (IK) technique can also be used to compute the positions and orientations of remaining joints (or parts) of the entity that result in the desired configuration of the manipulated joints (or parts). To animate the entity, this manual process is repeated for additional keyframes within a sequence of poses representing movements of the entity, with poses for frames between keyframes generated by interpolating between the keyframes using parametric curves. More recently, advancements in machine learning and deep learning have led to the development of neural motion completion models, which include deep neural networks that leverage full-body correlations learned from large datasets to predict frames that fall between key frames within an animation. However, conventional neural motion completion models are associated with a number of limitations that interfere with use of the neural motion completion models in animation workflows. More specifically, conventional neural motion completion models operate using a dense context and/or set of constraints, such as a fully body pose, an upper and/or lower body pose, and/or a complete trajectory for a single joint. Defining this dense context involves significant time and resource overhead that is analogous to traditional techniques for manually defining a pose via control handles. This dense context additionally prevents animators and/or other users from exploring, refining, and/or controlling the motion in a finer-grained manner. Further, conventional neural motion completion models cannot be used to perform motion editing, in which changes are made to select portions of an existing motion while preserving the remainder of the motion. Instead, these models may disregard existing motion while preserving constraints. As the foregoing illustrates, what is needed in the art are more effective techniques for performing neural motion completion. SUMMARY One embodiment of the present invention sets forth a technique for generating a motion for a virtual character. The technique includes determining a graph representation of a plurality of sets of joints corresponding to a sequence of poses for the virtual character based on (i) one or more input poses for the virtual character and (ii) a set of constraints associated with one or more joints included in the plurality of sets of joints. The technique also includes generating, via execution of a first neural network, a set of updated node states for the plurality of sets of joints based on the graph representation. The technique further includes generating, based on the set of updated node states, the motion that includes (i) a first set of joint positions for the plurality of sets of joints and (ii) a first set of joint orientations for the plurality of sets of joints. One technical advantage of the disclosed techniques relative to the prior art is the ability to generate complete motions from sparse poses and joint-level constraints. The disclosed techniques thus reduce time and resource overhead associated with manually defining dense poses and/or constraints in traditional animation workflows and/or as input into conventional neural completion models. The disclosed techniques additionally provide finer-grained control over the generated motions than conventional neural completion models that require dense context and/or constraints on poses within an animation. Another technical advantage of the disclosed techniques is the ability to make select changes to certain portions of a base motion while preserving remaining portions of the base motion. Consequently, the disclosed techniques can be used in motion editing workflows, unlike conventional approaches that disregard existing motion after constraints on the motion are specified. These technical advantages provide one or more technological improvements over prior art approaches. BRIEF DESCRIPTION OF THE DRAWINGS So that the manner in which the above recited features of the various embodiments can be understood in detail, a more particular description of the inventive concepts, briefly summarized above, may be had by reference to various embodiments, some of which are illustrated in the appended drawings. It is to be noted, howeve