CN-116309688-B - Human motion prediction method and device, intelligent equipment and storage medium

CN116309688BCN 116309688 BCN116309688 BCN 116309688BCN-116309688-B

Abstract

The invention discloses a human motion prediction method, a device, intelligent equipment and a storage medium, wherein the method comprises the steps of obtaining the kinematic information of a target human body, and carrying out preset simplified dynamics operation on the kinematic information to obtain the dynamic information of the target human body; the method comprises the steps of inputting the kinematic information and the dynamic information into a neural network encoder to obtain the kinematic time-space characteristic and the dynamic time-space characteristic respectively, and inputting the kinematic time-space characteristic and the dynamic time-space characteristic into a neural network decoder to obtain the human motion prediction result of a target human body. By the human motion prediction method, the expression of human motion is complemented and coupled by the kinematic information and the dynamic information, the human motion is comprehensively described from different angles, and the motion gesture of the human in a certain time in the future can be predicted more accurately and for a longer time.

Inventors

LI HAO
DAI JU
PAN JUNJUN

Assignees

鹏城实验室

Dates

Publication Date: 20260505
Application Date: 20230308

Claims (6)

1. The human motion prediction method is applied to a human motion prediction system, and the human motion prediction system at least comprises a neural network encoder and a neural network decoder, wherein the neural network encoder comprises a kinematic encoder and a kinematic encoder, the kinematic encoder comprises a first space Transformer and a first encoder time Transformer, the kinematic encoder comprises a second space Transformer and a second encoder time Transformer, the neural network decoder comprises a kinematic decoder and a dynamic decoder, the dynamic decoder comprises a third space Transformer and a first decoder time Transformer, and the kinematic decoder comprises a second decoder time Transformer; The method comprises the following steps: Acquiring the kinematic information of a target human body, and performing preset simplified dynamics operation on the kinematic information to obtain the dynamic information of the target human body; The kinematic information is input to the first space Transformer after being combined with a preset space position embedding function so as to obtain kinematic space characteristics; flattening the kinematic space features, combining a preset encoder time position embedding function, and then inputting the flattened kinematic space features into the first encoder time converter to obtain kinematic space-time features; flattening the dynamic space characteristics, combining a preset encoder time position embedding function, and then inputting the flattened dynamic space characteristics to the second encoder time converter to obtain dynamic space-time characteristics; The method comprises the steps of copying the last frame of coded data corresponding to kinematic information by a preset quantity, inputting the last frame of coded data to a third space Transformer to obtain a first query vector, taking the dynamics space-time characteristic as a first key value vector, inputting the first key value vector and the first query vector to a first decoder time Transformer to obtain a second query vector, taking the kinematics space-time characteristic as a second key value vector, inputting the second key value vector and the second query vector to a second decoder time Transformer to obtain an initial prediction result, converting an exponential mapping of the initial prediction result into a quaternion through a quaternion conversion layer, and training and optimizing in a quaternion space to obtain a human motion prediction result of the target human body.
2. The method for predicting human motion of claim 1, wherein the kinematic information comprises a rotation angle of a joint point of a human skeleton topology, and the kinetic information comprises joint forces between joint points; the step of performing a preset simplified dynamics operation on the kinematic information to obtain the dynamics information of the target human body includes: Determining an end articulation point in the human skeleton topological structure, and determining joint quality and acceleration of the end articulation point according to the articulation point rotation angle; Determining an end joint force of the end joint point based on the joint mass and the acceleration; based on Newton Euler iteration rules and the end joint forces, iterating inwards from the end joint points to obtain all joint forces between the joint points.
3. The human motion prediction method according to claim 2, wherein the step of determining the joint quality of the end-point of articulation from the angle of rotation of the point of articulation comprises: determining a three-dimensional distance between the end joint point and a parent joint point corresponding to the end joint point according to the joint point rotation angle; And inputting the three-dimensional distance into a preset joint quality algorithm to obtain the joint quality of the end joint point.
4. A human motion prediction device is characterized in that, the human motion prediction apparatus includes: The dynamic operation module is used for acquiring the kinematic information of the target human body, and carrying out preset simplified dynamic operation on the kinematic information to obtain the dynamic information of the target human body; The feature extraction module is used for inputting the kinematic information and the dynamic information to a neural network encoder to obtain a kinematic time-space feature and a dynamic time-space feature respectively; The prediction output module is configured to input the kinematic time-space feature and the dynamic time-space feature to a neural network decoder to obtain a human motion prediction result of the target human body, where the human motion prediction device further includes: The neural network encoder comprises a kinematic encoder and a dynamic encoder, wherein the kinematic encoder comprises a first space Transformer and a first encoder time Transformer, the dynamic encoder comprises a second space Transformer and a second encoder time Transformer, the neural network decoder comprises a kinematic decoder and a dynamic decoder, the dynamic decoder comprises a third space Transformer and a first decoder time Transformer, and the kinematic decoder comprises a second decoder time Transformer, and the human motion prediction system further comprises a quaternion conversion layer; Acquiring the kinematic information of a target human body, and performing preset simplified dynamics operation on the kinematic information to obtain the dynamic information of the target human body; The kinematic information is input to the first space Transformer after being combined with a preset space position embedding function so as to obtain kinematic space characteristics; flattening the kinematic space features, combining a preset encoder time position embedding function, and then inputting the flattened kinematic space features into the first encoder time converter to obtain kinematic space-time features; flattening the dynamic space characteristics, combining a preset encoder time position embedding function, and then inputting the flattened dynamic space characteristics to the second encoder time converter to obtain dynamic space-time characteristics; The method comprises the steps of copying the last frame of coded data corresponding to kinematic information by a preset quantity, inputting the last frame of coded data to a third space Transformer to obtain a first query vector, taking the dynamics space-time characteristic as a first key value vector, inputting the first key value vector and the first query vector to a first decoder time Transformer to obtain a second query vector, taking the kinematics space-time characteristic as a second key value vector, inputting the second key value vector and the second query vector to a second decoder time Transformer to obtain an initial prediction result, converting an exponential mapping of the initial prediction result into a quaternion through a quaternion conversion layer, and training and optimizing in a quaternion space to obtain a human motion prediction result of the target human body.
5. A smart device comprising a processor, a storage unit, and a human motion prediction program stored on the storage unit that is executable by the processor, wherein the human motion prediction program, when executed by the processor, implements the steps of the human motion prediction method of any one of claims 1 to 3.
6. A computer-readable storage medium, on which a human motion prediction program is stored, wherein the human motion prediction program, when executed by a processor, implements the steps of the human motion prediction method according to any one of claims 1 to 3.

Description

Human motion prediction method and device, intelligent equipment and storage medium Technical Field The present invention relates to the field of human motion analysis technologies, and in particular, to a human motion prediction method, a device, an intelligent apparatus, and a computer readable storage medium. Background Understanding and predicting human motion is an important topic in computer vision. Three-dimensional human motion prediction aims at predicting the most probable human posture in the future based on motion data observed in the past. This task is a fundamental topic of research in the computer community. It plays a key role in a wide range of applications such as human-machine interaction, motion analysis, autopilot and character animation. One common task of this topic is to predict a person's most likely 3D pose in the future by learning a model from a sequence of 3D poses. Because of the inherent high complexity of human behavior, accurate prediction of human motion with fidelity is challenging, as is uncertainty in human posture, thus making long-term prediction more difficult. However, whether the model of the conventional probability is based on the existing deep learning method, it only considers the kinematic data of the isolated skeletal joints as network input, i.e. only the positions or rotations of the joints. These methods ignore higher-order interactions between skeleton segments (joints), i.e., ignore kinetic information of the human body, which makes the current results of human motion prediction inaccurate and predictable for a short time. Disclosure of Invention The invention mainly aims to provide a human motion prediction method, a device, intelligent equipment and a computer readable storage medium, and aims to solve the technical problems that the result of the current human motion prediction is inaccurate and the predictable time is short. In order to achieve the above object, the present invention provides a human motion prediction method applied to a human motion prediction system, the human motion prediction system at least comprising a neural network encoder and a neural network decoder; The method comprises the following steps: Acquiring the kinematic information of a target human body, and performing preset simplified dynamics operation on the kinematic information to obtain the dynamic information of the target human body; Inputting the kinematic information and the dynamic information to the neural network encoder to obtain a kinematic time-space feature and a dynamic time-space feature respectively; And inputting the kinematic time-space features and the dynamic time-space features to the neural network decoder to obtain a human motion prediction result of the target human body. Optionally, the kinematic information comprises joint rotation angles of human skeleton topological structures, wherein the kinematic information comprises joint forces between joint points; the step of performing a preset simplified dynamics operation on the kinematic information to obtain the dynamics information of the target human body includes: Determining an end articulation point in the human skeleton topological structure, and determining joint quality and acceleration of the end articulation point according to the articulation point rotation angle; Determining an end joint force of the end joint point based on the joint mass and the acceleration; based on Newton Euler iteration rules and the end joint forces, iterating inwards from the end joint points to obtain all joint forces between the joint points. Optionally, the step of determining the joint quality of the end-node according to the joint rotation angle includes: determining a three-dimensional distance between the end joint point and a parent joint point corresponding to the end joint point according to the joint point rotation angle; And inputting the three-dimensional distance into a preset joint quality algorithm to obtain the joint quality of the end joint point. Optionally, the neural network encoder comprises a kinematic encoder comprising a first spatial transducer and a first encoder temporal transducer; the step of inputting the kinematic information and the dynamics information to the neural network encoder to obtain a kinematic spatiotemporal feature and a dynamic spatiotemporal feature, respectively, includes: the kinematic information is input to the first space Transformer after being combined with a preset space position embedding function so as to obtain kinematic space characteristics; And flattening the kinematic space features, combining a preset encoder time position embedding function, and then inputting the flattened kinematic space features into the first encoder time converter to obtain the kinematic space-time features. Optionally, the neural network encoder comprises a dynamic encoder comprising a second spatial transducer and a second encoder temporal transducer; the step of inputting the