US-20260127801-A1 - NEURAL MOTION RIG FOR INTERACTIVE MOTION EDITING

US20260127801A1US 20260127801 A1US20260127801 A1US 20260127801A1US-20260127801-A1

Abstract

One embodiment of the present invention sets forth a technique for generating a motion for a virtual character. The technique includes determining a graph representation of a plurality of sets of joints corresponding to a sequence of poses for the virtual character based on(i) a base motion associated with the sequence of poses and(ii) a set of constraints associated with one or more joints included in the plurality of sets of joints. The technique also includes generating, via execution of a first neural network, a set of updated node states for the plurality of sets of joints based on the graph representation. The technique further includes generating, based on the set of updated node states, the motion that includes(i) a first set of joint positions for the plurality of sets of joints and(ii) a first set of joint orientations for the plurality of sets of joints.

Inventors

Martin Guay
Dhruv Agrawal
Robert Walker Sumner
Jakob Joachim BUHMANN
Dominik Tobias BORER

Assignees

DISNEY ENTERPRISES, INC.
ETH Zürich (Eidgenössische Technische Hochschule Zürich)

Dates

Publication Date: 20260507
Application Date: 20241104

Claims (20)

1 . A computer-implemented method for generating a motion for a virtual character, the method comprising: determining a graph representation of a plurality of sets of joints corresponding to a sequence of poses for the virtual character based on(i) a base motion associated with the sequence of poses and(ii) a set of constraints associated with one or more joints included in the plurality of sets of joints; generating, via execution of a first neural network, a set of updated node states for the plurality of sets of joints based on the graph representation; and generating, based on the set of updated node states, the motion that includes(i) a first set of joint positions for the plurality of sets of joints and(ii) a first set of joint orientations for the plurality of sets of joints.
2 . The computer-implemented method of claim 1 , further comprising training the first neural network using(i) a first loss that is computed between a subset of the first set of joint positions and a second set of joint positions included in the base motion and (ii) a second loss that is computed between a subset of the first set of joint orientations and a second set of joint orientations included in the base motion.
3 . The computer-implemented method of claim 2 , further comprising training the first neural network based on one or more additional losses associated with the set of constraints.
4 . The computer-implemented method of claim 2 , wherein at least one of the first loss or the second loss comprise a weight mask that is applied to a subset of the plurality of sets of joints based on a temporal proximity to the set of constraints.
5 . The computer-implemented method of claim 1 , further comprising training the first neural network using a reconstruction loss that is computed between the motion and a ground truth motion associated with the sequence of poses.
6 . The computer-implemented method of claim 5 , further comprising: sampling an additional set of constraints from the ground truth motion; and generating, via execution of a second neural network, the base motion based on the additional set of constraints.
7 . The computer-implemented method of claim 1 , wherein determining the graph representation comprises: generating, via execution of a second neural network, a first set of embeddings associated with(i) a set of identities for the plurality of sets of joints and(ii) a temporal position of each set of joints included in the plurality of sets of joints within the sequence of poses; determining, based on the base motion and the set of constraints,(i) a second set of joint positions for the plurality of sets of joints and(ii) a second set of joint orientations for the plurality of sets of joints; and converting, via execution of a third neural network, the second set of joint positions and the second set of joint orientations into a second set of embeddings for the plurality of sets of joints.
8 . The computer-implemented method of claim 1 , wherein determining the graph representation comprises initializing(i) a second set of joint positions for the plurality of sets of joints and(ii) a second set of joint orientations for the plurality of sets of joints using the base motion.
9 . The computer-implemented method of claim 1 , wherein generating the motion comprises: converting, via execution of one or more additional neural networks, the set of updated node states into the first set of joint positions and the first set of joint orientations; and updating the first set of joint positions and the first set of joint orientations based on a rest pose for the virtual character.
10 . The computer-implemented method of claim 1 , wherein the set of constraints comprises at least one of a position constraint, an orientation constraint, or a ground contact constraint.
11 . One or more non-transitory computer-readable media storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: determining a graph representation of a plurality of sets of joints corresponding to a sequence of poses for a virtual character based on(i) a base motion associated with the sequence of poses and(ii) a set of constraints associated with one or more joints included in the plurality of sets of joints; generating, via execution of a first neural network, a set of updated node states for the plurality of sets of joints based on the graph representation; and generating, based on the set of updated node states, a motion that includes(i) a first set of joint positions for the plurality of sets of joints and(ii) a first set of joint orientations for the plurality of sets of joints.
12 . The one or more non-transitory computer-readable media of claim 11 , wherein the operations further comprise training the first neural network using(i) a first loss associated with the base motion and(ii) a second loss associated with the set of constraints.
13 . The one or more non-transitory computer-readable media of claim 12 , wherein the first loss comprises a weight mask that is applied to a subset of the plurality of sets of joints based on a temporal proximity to the set of constraints.
14 . The one or more non-transitory computer-readable media of claim 12 , wherein the first loss is scaled by a control parameter associated with preservation of a second set of joint positions and a second set of joint orientations from the base motion in the motion.
15 . The one or more non-transitory computer-readable media of claim 11 , wherein the operations further comprise training the first neural network using a reconstruction loss that is computed between the motion and a ground truth motion associated with the sequence of poses.
16 . The one or more non-transitory computer-readable media of claim 15 , wherein the operations further comprise: sampling an additional set of constraints from the ground truth motion; and generating, via execution of a second neural network, the base motion based on the additional set of constraints.
17 . The one or more non-transitory computer-readable media of claim 11 , wherein converting the graph representation into the set of updated node states comprises: computing a set of attention scores based on the graph representation; and generating the set of updated node states based on the set of attention scores.
18 . The one or more non-transitory computer-readable media of claim 11 , wherein the operations further comprise: outputting a set of motion curves corresponding to at least a portion of the motion within a user interface; determining an update to the set of constraints based on user input associated with the set of motion curves; and generating an updated motion for the virtual character based on the update to the set of constraints, wherein the updated motion includes(i) a second set of joint positions associated with the update to the set of constraints and(ii) a second set of joint orientations associated with the update to the set of constraints.
19 . The one or more non-transitory computer-readable media of claim 11 , wherein the graph representation comprises a plurality of nodes corresponding to the plurality of sets of joints, a plurality of spatial edges between a first subset of node pairs included in the plurality of nodes, and a plurality of temporal edges between a second subset of node pairs included in the plurality of nodes.
20 . A system, comprising: one or more memories that store instructions, and one or more processors that are coupled to the one or more memories and, when executing the instructions, are configured to perform the steps of: determining a graph representation of a plurality of sets of joints corresponding to a sequence of poses for a virtual character based on(i) a base motion associated with the sequence of poses and(ii) a set of constraints associated with one or more joints included in the plurality of sets of joints; generating, via execution of a first neural network, a set of updated node states for the plurality of sets of joints based on the graph representation; and generating, based on the set of updated node states, a motion that includes(i) a first set of joint positions for the plurality of sets of joints and(ii) a first set of joint orientations for the plurality of sets of joints.

Description

BACKGROUND Field of the Various Embodiments Embodiments of the present disclosure relate generally to computer vision and machine learning and, more specifically, to a neural motion rig for interactive motion editing. DESCRIPTION OF THE RELATED ART Films, video games, virtual reality(VR) systems, augmented reality(AR) systems, mixed reality(MR) systems, robotics, and/or other types of interactive environments frequently include entities(e.g., characters, robots, etc.) that are posed and/or animated in three-dimensional(3D) space. Traditionally, an entity is posed via a time-consuming, iterative, and laborious process of manually manipulating multiple control handles corresponding to joints(or other parts) of the entity. An inverse kinematics(IK) technique can also be used to compute the positions and orientations of remaining joints(or parts) of the entity that result in the desired configuration of the manipulated joints(or parts). To animate the entity, this manual process is repeated for additional keyframes within a sequence of poses representing movements of the entity, with poses for frames between keyframes generated by interpolating between the keyframes using parametric curves. More recently, advancements in machine learning and deep learning have led to the development of neural motion completion models, which include deep neural networks that leverage full-body correlations learned from large datasets to predict frames that fall between key frames within an animation. However, conventional neural motion completion models are associated with a number of limitations that interfere with use of the neural motion completion models in animation workflows. More specifically, conventional neural motion completion models operate using a dense context and/or set of constraints, such as a fully body pose, an upper and/or lower body pose, and/or a complete trajectory for a single joint. Defining this dense context involves significant time and resource overhead that is analogous to traditional techniques for manually defining a pose via control handles. This dense context additionally prevents animators and/or other users from exploring, refining, and/or controlling the motion in a finer-grained manner. Further, conventional neural motion completion models cannot be used to perform motion editing, in which changes are made to select portions of an existing motion while preserving the remainder of the motion. Instead, these models may disregard existing motion while preserving constraints. As the foregoing illustrates, what is needed in the art are more effective techniques for performing neural motion completion. SUMMARY One embodiment of the present invention sets forth a technique for generating a motion for a virtual character. The technique includes determining a graph representation of a plurality of sets of joints corresponding to a sequence of poses for the virtual character based on(i) a base motion associated with the sequence of poses and(ii) a set of constraints associated with one or more joints included in the plurality of sets of joints. The technique also includes generating, via execution of a first neural network, a set of updated node states for the plurality of sets of joints based on the graph representation. The technique further includes generating, based on the set of updated node states, the motion that includes(i) a first set of joint positions for the plurality of sets of joints and(ii) a first set of joint orientations for the plurality of sets of joints. One technical advantage of the disclosed techniques relative to the prior art is the ability to generate complete motions from sparse poses and joint-level constraints. The disclosed techniques thus reduce time and resource overhead associated with manually defining dense poses and/or constraints in traditional animation workflows and/or as input into conventional neural completion models. The disclosed techniques additionally provide finer-grained control over the generated motions than conventional neural completion models that require dense context and/or constraints on poses within an animation. Another technical advantage of the disclosed techniques is the ability to make select changes to certain portions of a base motion while preserving remaining portions of the base motion. Consequently, the disclosed techniques can be used in motion editing workflows, unlike conventional approaches that disregard existing motion after constraints on the motion are specified. These technical advantages provide one or more technological improvements over prior art approaches. BRIEF DESCRIPTION OF THE DRAWINGS So that the manner in which the above recited features of the various embodiments can be understood in detail, a more particular description of the inventive concepts, briefly summarized above, may be had by reference to various embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the a