US-12620157-B2 - Avatar dance animation system

US12620157B2US 12620157 B2US12620157 B2US 12620157B2US-12620157-B2

Abstract

Method of generating a real-time avatar animation starts with a processor receiving acoustic segments of a real-time acoustic signal. For each of the acoustic segments, processor generates using a music analyzer neural network a tempo value and a dance energy category and selects dance tracks based on the tempo value and the dance energy category. Processor generates using the dance tracks dance sequences for avatars, generates real-time animations for the avatars based on the dance sequences and avatar characteristics for the avatars, and causes to be displayed on a first client device the real-time animations of the avatars. Other embodiments are described herein.

Inventors

Gurunandan Krishnan Gorumkonda
Shree K. Nayar

Assignees

SNAP INC.

Dates

Publication Date: 20260505
Application Date: 20231130

Claims (20)

1 . A method comprising: receiving, by a processor, a real-time acoustic signal comprising a plurality of acoustic segments; generating using a music analyzer neural network a tempo value and a dance energy category for each of the acoustic segments; selecting a plurality of dance tracks based on the tempo value and the dance energy category, wherein each of the plurality of dance tracks comprises a plurality of dance segments including a plurality of movements that are sequential; generating using the dance tracks a first dance sequence for a first avatar and a second dance sequence for a second avatar, wherein generating the first dance sequence and the second dance sequence comprises: generating a first dance segment by selecting a first starting movement of the plurality of movements in a first track of the plurality of tracks, and selecting a first transition movement in the plurality of movements in the first track, wherein the first dance segment starts at the first starting movement and ends at the first transition movement, and generating a second dance segment by identifying a second starting movement in a second dance track of the plurality of tracks that is similar to the first transition movement within a predetermined threshold, wherein the second dance segment starts at the second starting movement, wherein the first dance sequence and the second dance sequence include the first dance segment and the second dance segment; generating a first real-time animation of the first avatar and a second real-time animation of a second avatar based on the first dance sequence and the second dance sequence and a plurality of avatar characteristics associated with the first avatar and the second avatar; and causing to be displayed on a first client device the first real-time animation of the first avatar and the second real-time animation of the second avatar.
2 . The method of claim 1 , wherein generating the first dance sequence and the second dance sequence further comprises: selecting a second transition movement in the plurality of movements in the second track, wherein the second dance segment ends at the second transition movement; and generating a third dance segment by identifying a third starting movement in a third dance track of the plurality of tracks that is similar to the second transition movement within the predetermined threshold, wherein the third dance segment starts at the third starting movement, wherein the first dance sequence and the second dance sequence include the third dance segment.
3 . The method of claim 1 , wherein the tempo value comprises a value indicating beats per minute, and wherein the dance energy category is one of: idle, slow, lively, or vigorous.
4 . The method of claim 3 , further comprising: training the music analyzer neural network, wherein training the music analyzer neural network comprises: receiving a plurality of test acoustic signals including a plurality of test acoustic segments, determining a plurality of test tempo values associated with the test acoustic segments, and associating each of the test acoustic segments with one of a plurality of test dance energy categories, wherein the test dance energy categories comprise idle, slow, lively, and vigorous.
5 . The method of claim 4 , wherein associating each of the test acoustic segments with one of the test dance energy categories is based on music features of the test acoustic segments, wherein the music features comprise frequency response, chromagram, tempogram, or any combination thereof.
6 . The method of claim 5 , wherein generating using the music analyzer neural network a tempo value and a dance energy category further comprises: generating the tempo value and the dance energy category for each of the acoustic segments based on the test tempo values and the test dance energy categories associated with the test acoustic segments.
7 . The method of claim 4 , further comprising: receiving a plurality of test videos including a dancer performing dance movements and the test acoustic signals, the test videos comprising a plurality of test video segments, wherein each of the test video segments comprises a plurality of test video frames; and determining body poses for each of the test video frames using skeletal approximation of the dancer, wherein the body poses comprise joint positions and angles.
8 . The method of claim 7 , further comprising: mapping the body poses for each of the test video frames to a plurality of avatar body poses associated with an avatar skeleton; and generating a plurality of avatar test videos using the plurality of avatar body poses.
9 . The method of claim 8 , wherein the plurality of avatar test videos comprise the plurality of dance tracks.
10 . The method of claim 1 , wherein generating the first dance sequence and the second dance sequence further comprises: generating the first dance sequence and the second dance sequence based on a position of the first avatar displayed on the first client device and a position of the second avatar displayed on the first client device to prevent an overlapping display of the first avatar and the second avatar.
11 . The method of claim 10 , wherein the first transition movement in the first dance segment is selected to prevent the overlapping display of the first avatar and the second avatar.
12 . The method of claim 10 , wherein the second starting movement in the second dance track is identified to prevent the overlapping display of the first avatar and the second avatar.
13 . The method of claim 1 , further comprising: causing to be displayed on a second client device the real-time animation of the first avatar and the second avatar.
14 . The method of claim 1 , wherein the first client device is associated with a first user and the second client device is associated with a second user, wherein the first user is associated with the first avatar, and the second user is associated with the second avatar.
15 . A system comprising: a processor; and a memory storing instructions that, when executed by the processor, cause the system to perform operations comprising: receiving a real-time acoustic signal comprising a plurality of acoustic segments; generating using a music analyzer neural network a tempo value and a dance energy category for each of the acoustic segments; selecting a plurality of dance tracks based on the tempo value and the dance energy category, wherein each of the plurality of dance tracks comprises a plurality of dance segments including a plurality of movements that are sequential; generating using the dance tracks a first dance sequence for a first avatar and a second dance sequence for a second avatar, wherein generating the first dance sequence and the second dance sequence comprises: generating a first dance segment by selecting a first starting movement of the plurality of movements in a first track of the plurality of tracks, and selecting a first transition movement in the plurality of movements in the first track, wherein the first dance segment starts at the first starting movement and ends at the first transition movement, and generating a second dance segment by identifying a second starting movement in a second dance track of the plurality of tracks that is similar to the first transition movement within a predetermined threshold, wherein the second dance segment starts at the second starting movement, wherein the first dance sequence and the second dance sequence include the first dance segment and the second dance segment; generating a first real-time animation of the first avatar and a second real-time animation of a second avatar based on the first dance sequence and the second dance sequence and a plurality of avatar characteristics associated with the first avatar and the second avatar; and causing to be displayed on a first client device the first real-time animation of the first avatar and the second real-time animation of the second avatar.
16 . The system of claim 15 , wherein the tempo value comprises a value indicating beats per minute, and wherein the dance energy category is one of: idle, slow, lively, or vigorous.
17 . The system of claim 16 , wherein the system to perform operations further comprising: training the music analyzer neural network, wherein training the music analyzer neural network comprises: receiving a plurality of test acoustic signals including a plurality of test acoustic segments, determining a plurality of test tempo values associated with the test acoustic segments, and associating each of the test acoustic segments with one of a plurality of test dance energy categories, wherein the test dance energy categories comprise idle, slow, lively, and vigorous.
18 . The system of claim 17 , wherein generating using the music analyzer neural network a tempo value and a dance energy category further comprises: generating the tempo value and the dance energy category for each of the acoustic segments based on the test tempo values and the test dance energy categories associated with the test acoustic segments.
19 . The system of claim 17 , wherein the system to perform operations further comprising: receiving a plurality of test videos including a dancer performing dance movements and the test acoustic signals, the test videos comprising a plurality of test video segments, wherein each of the test video segments comprises a plurality of test video frames; determining body poses for each of the test video frames using skeletal approximation of the dancer, wherein the body poses comprise joint positions and angles; mapping the body poses for each of the test video frames to a plurality of avatar body poses associated with an avatar skeleton; and generating a plurality of avatar test videos using the plurality of avatar body poses, wherein the plurality of avatar test videos comprise the plurality of dance tracks.
20 . The system of claim 15 , wherein generating the first dance sequence and the second dance sequence further comprises: generating the first dance sequence and the second dance sequence based on a position of the first avatar displayed on the first client device and a position of the second avatar displayed on the first client device to prevent an overlapping display of the first avatar and the second avatar, wherein the first transition movement in the first dance segment is selected to prevent the overlapping display of the first avatar and the second avatar, or wherein the second starting movement in the second dance track is identified to prevent the overlapping display of the first avatar and the second avatar.

Description

CROSS REFERENCED TO RELATED APPLICATIONS This claims priority to U.S. Provisional Patent Application Ser. No. 63/385,611, filed Nov. 30, 2022, the contents of which are incorporated herein by reference in their entirety. BACKGROUND The popularity of electronic messaging, augmented reality, and virtual reality continues to grow. Users increasingly use customized avatars within different platforms reflecting a global demand to communicate more visually. These customized avatars can be personalized by the users to represent the users in various applications, video games, messaging services, etc. Since the customized avatars can be generated in a different array of situations, displaying various emotions, or even be animated, the users are able to communicate their feelings more accurately in messages and on different platforms using the customized avatars and hence, more adequately be represented by proxy using their customized avatars. BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the figure number in which that element is first introduced. Some non-limiting examples are illustrated in the figures of the accompanying drawings in which: FIG. 1 is a diagrammatic representation of a networked environment in which the present disclosure may be deployed, according to some examples. FIG. 2 is a diagrammatic representation of a messaging system, according to some examples, that has both client-side and server-side functionality. FIG. 3 is a diagrammatic representation of a data structure as maintained in a database, according to some examples. FIG. 4 is a diagrammatic representation of a message, according to some examples. FIG. 5 illustrates details of the avatar animation system 232 in accordance with one embodiment. FIG. 6 illustrates an example of a transition graph 600 in accordance with one embodiment. FIG. 7 illustrates an example of a transition graph 700 in accordance with one embodiment. FIG. 8 illustrates a process 800 of generating a real-time avatar animation in accordance with one embodiment. FIG. 9 illustrates a system in which the head-wearable apparatus, according to some examples. FIG. 10 is a diagrammatic representation of a machine in the form of a computer system within which a set of instructions may be executed to cause the machine to perform any one or more of the methodologies discussed herein, according to some examples. FIG. 11 is a block diagram showing a software architecture within which examples may be implemented. DETAILED DESCRIPTION Current avatar animation systems can generate dance animations for the avatar to perform based on the music being played. Most of these dance animations are hard-coded and do not take into account the music features of the song being played that should be driving the avatar's dance. Further, the different parts of a given song can also greatly vary in music features such that a more realistic dance animation needs to take into account these changes. The music features can include, for example, tempo, rhythm, melody, harmony, timbre, dynamics, texture, and form. Embodiments of the present disclosure improve the functionality of dancing animation systems by incorporating in, an avatar animation system, a music analyzer neural network, a dance track selector and a dance synthesizer and a dance animation controller. The music analyzer neural network is configured to generate a tempo value (e.g., beats per minute) and a dance energy category (e.g., idle, slow, lively, or vigorous) for each segment of a song being played which is further used to inform the dance track selector which dance tracks to use in order to realistically animate avatars that are dancing to each segment of a song. Further, in order to generate avatar animations that appear dynamic and varied in contrast to the pre-recorded dances that are repetitive, the dance synthesizer generates dance sequences for the avatars by combining segments of different dance tracks while ensuring that the transitions between the segments are seamless by identifying transition movements in the tracks that are similar. The dance animation controller then generates real-time avatar dancing animations for the avatars using the dance sequences. The tempo and the level of dance energy of a segment of a song can inform the danceability of that segment, which is defined as the quality or state of being able to be used for dancing. A highly danceable segment of a song is a segment that has the music features that people c