CN-121982171-A - Method for generating simulated virtual image based on motion capture

CN121982171ACN 121982171 ACN121982171 ACN 121982171ACN-121982171-A

Abstract

The invention relates to the technical field of motion capture and discloses a method for generating an artificial virtual image based on motion capture, which comprises the following steps of S1, acquiring and preprocessing motion data, namely acquiring human motion data by adopting a hybrid capture system; S2, action feature extraction and model construction, namely generating a clothing grid based on basic model topology and learning appearance characteristics, S3, action driving sequence generation and calibration, namely establishing a mapping relation between driving signals and a basic model and optimizing action matching degree, S4, dynamic response calculation and rendering, and S5, virtual image generation and optimization. By adopting the data preprocessing technical scheme of the hybrid capturing and multi-algorithm fusion, the collaborative lifting effects of motion data noise reduction, error correction and feature optimization are achieved, and compared with the technical scheme of single sensor acquisition or single algorithm processing in the prior art, the problems that data are easy to be interfered, accumulated drift errors exist and feature extraction is incomplete are solved.

Inventors

CHEN XIAOFANG
WANG YU

Assignees

福建泽沛集团有限公司

Dates

Publication Date: 20260505
Application Date: 20260125

Claims (10)

1. A motion capture-based avatar generation method, comprising the steps of: S1, acquiring and preprocessing action data, namely acquiring human action data by adopting a hybrid capture system, performing filtering noise reduction, coordinate system unification and sensor accumulated error correction on original data, and executing the next step when the preset equipment synchronization condition and the data precision condition are met; S2, extracting a human body key point time sequence and dynamic characteristics through a deep convolution network, constructing a parameterized human body basic model, binding skeleton characteristics, generating a clothing grid based on basic model topology, learning appearance characteristics, and executing the next step when preset characteristic integrity conditions, skeleton binding accuracy conditions and grid adaptation conditions are met; S3, generating and calibrating an action driving sequence, namely inputting the characteristic sequence into the action data of the full-occlusion frame of the dynamic occlusion frame, generating an virtual image driving signal, establishing a mapping relation between the driving signal and a basic model, optimizing the action matching degree, and executing the next step when the preset driving signal time sequence condition and the action synchronization condition are met; S4, calculating collision response and deformation data of clothes and a human body based on the driving signals, fusing the action data and the deformation data to carry out nerve rendering, and executing the next step when the preset deformation calculation frame rate condition and the rendering image quality condition are met; And S5, performing the generation optimization of the virtual image, namely performing inter-frame smoothing processing on the rendering sequence, outputting the virtual image, and completing the generation when the delay condition of the preset generation sequence is met.
2. The method for generating an avatar based on motion capture as recited in claim 1, wherein the step S1 of collecting human motion data by the hybrid capture system comprises collecting human joint angle and acceleration data by the inertial measurement unit, and synchronously collecting human surface contour and space coordinate data by the infrared camera, wherein the two types of data are aligned by time stamps to form an original motion data set.
3. The method for generating an avatar based on motion capture of claim 1, wherein in S1, the preprocessing comprises the sub-steps of: S101, firstly adopting Kalman filtering to perform noise reduction treatment on original data, and eliminating interference of environmental noise and equipment noise; s102, correcting the accumulated error of the inertial measurement unit through complementary filtering to compensate the measurement deviation of a single sensor; and S103, finally unifying the image coordinates acquired by the infrared camera and the local coordinates of the inertial measurement unit to a preset world coordinate system, and completing the coordinate system unification.
4. The method for generating an artificial avatar based on motion capture of claim 1, wherein in S1, motion feature extraction comprises the steps of inputting the preprocessed motion data into a deep convolutional network, extracting bottom space features through a convolutional layer, reducing dimensions through a pooling layer, gradually aggregating local features and global features through a 6-stage feature fusion module, and finally outputting a human body key point time sequence and dynamic features; the human body key point time sequence is output through the key point detection head, and the dynamic characteristics are output through the dynamic analysis head.
5. The method for generating an avatar based on motion capture of claim 1, wherein in S2, the model construction comprises the sub-steps of: s201, constructing a parameterized human body basic model based on a human body anatomical structure, and defining a skeletal joint degree of freedom and a muscle attachment point; S202, mapping the extracted human body key point time sequence with skeleton nodes of a basic model one by one to finish skeleton feature binding; S203, generating a clothing grid matched with the human body contour by adopting a physical engine based on the surface topological structure of the basic model, dividing grid cells and setting material parameters; S204, inputting a real clothing image sample through a nerve appearance network, learning clothing textures, gloss and light and shadow reflection characteristics, and giving a learning result to clothing grids.
6. The method for generating an avatar based on motion capture of claim 1, wherein in S3, the motion driving sequence generation comprises inputting the extracted feature sequence into a dynamic mask frame, masking the feature data corresponding to the occlusion frame, constraining the feature reconstruction process through a diffusion loss function, and supplementing the missing motion data of the occlusion frame based on the motion trend and the dynamic feature correlation of the adjacent frames to form a complete avatar driving signal sequence.
7. The method for generating an avatar based on motion capture of claim 1, wherein in S3, the motion calibration comprises the sub-steps of: S301, establishing a mapping relation between a driving signal and a basic model skeleton joint, and defining a joint motion range corresponding to each driving parameter; S302, constructing a generating network G and a judging network D, wherein the generating network G receives a driving signal and outputs a model action prediction result, and the judging network D is used for distinguishing predicted action and real action data; S303, performing iterative training through a loss function formed by reconstruction loss and KL divergence items, optimizing the matching degree of the driving signals and the model action, and completing the action synchronous calibration.
8. The method for generating the simulated virtual image based on motion capture according to claim 1, wherein in the step S4, the specific method for dynamic response calculation is that driving signals are input into a physical engine, human body joint motion tracks corresponding to the driving signals are analyzed firstly, collision detection results of clothes and human body surfaces are calculated based on physical properties of clothes grids, deformation displacement of the clothes is solved through finite element analysis, frame-by-frame deformation data are generated, the specific mode of nerve rendering is that the motion data and the deformation data are taken as input, shooting view angle information is integrated through a view angle coding module, and an image frame containing dynamic details is generated through a rendering network.
9. The method of generating an avatar based on motion capture as set forth in claim 1, wherein in S4, the neural rendering process supports viewing angle dependent adjustment, dynamically adjusts the shadow effect of the image according to a preset viewing angle parameter, calculates dynamic shadows in real time during the rendering process, ensures consistency of the avatar and the ambient shadow, and renders the output image resolution not lower than a preset threshold.
10. The method for generating an avatar based on motion capture of claim 1, wherein in S5, the optimization method of the inter-frame smoothing process comprises dynamically adjusting a weight distribution strategy of bilateral filtering in combination with the joint torque, the muscle contraction degree and the motion acceleration of the human body dynamics feature extracted in S2; When the joint movement acceleration in the dynamic characteristics is detected to be larger than a preset threshold value fast action scene, the spatial distance weight coefficient is raised to 0.6-0.8, and the gray scale similarity weight coefficient is lowered to 0.2-0.4; When the joint movement acceleration is smaller than or equal to a preset threshold value slow motion or static scene, reversely adjusting the weight coefficient; And synchronously referencing S3 the calibrated action time sequence information in the filtering process.

Description

Method for generating simulated virtual image based on motion capture Technical Field The invention relates to the technical field of motion capture, in particular to a method for generating an artificial virtual image based on motion capture. Background The motion capture-based simulated virtual image generation technology is a comprehensive technology integrating sensor acquisition, algorithm processing, three-dimensional modeling and rendering, and is widely applied to multiple fields of virtual live broadcasting, film and television production, rehabilitation training simulation, game development and the like. According to the technology, the real human body action data is captured, the virtual image is driven to realize natural and accurate action re-engraving, the interaction barrier between the virtual image and the reality is broken, immersive experience can be provided for a user, and meanwhile, core requirements such as action analysis, skill simulation and the like can be assisted in a professional scene, so that the technology becomes a key support technology in the fields of digital content production and intelligent interaction. In the existing virtual image generation technology, in the links of action data acquisition and preprocessing, most of the virtual image generation technology adopts a single sensor to acquire data, or simply processes the data through a single type of algorithm, so that the comprehensiveness of data acquisition and the accuracy of processing are difficult to be considered. The data collected by a single sensor is easily affected by environmental interference and noise of equipment, so that the accuracy of the data is insufficient, meanwhile, the accumulated drift error generated by long-term operation of the sensor cannot be effectively solved by a single algorithm, and in the characteristic extraction process, the local detail and the global trend are difficult to consider, so that the characteristic information is lost, the reality and the stability of the motion restoration of the virtual image are finally affected, and the application of the technology in high-precision and high-real-time scenes is restricted. Disclosure of Invention Aiming at the defects of the prior art, the invention provides an artificial virtual image generating method based on motion capture, which solves the problems of easy interference of data, accumulated drift error and incomplete feature extraction caused by single sensor acquisition or single algorithm processing in the prior art. In order to achieve the above object, the present invention is realized by a method for generating an avatar based on motion capture, comprising the steps of: S1, acquiring and preprocessing action data, namely acquiring human action data by adopting a hybrid capture system, performing filtering noise reduction, coordinate system unification and sensor accumulated error correction on original data, and executing the next step when the preset equipment synchronization condition and the data precision condition are met; S2, extracting a human body key point time sequence and dynamic characteristics through a deep convolution network, constructing a parameterized human body basic model, binding skeleton characteristics, generating a clothing grid based on basic model topology, learning appearance characteristics, and executing the next step when preset characteristic integrity conditions, skeleton binding accuracy conditions and grid adaptation conditions are met; S3, generating and calibrating an action driving sequence, namely inputting the characteristic sequence into the action data of the full-occlusion frame of the dynamic occlusion frame, generating an virtual image driving signal, establishing a mapping relation between the driving signal and a basic model, optimizing the action matching degree, and executing the next step when the preset driving signal time sequence condition and the action synchronization condition are met; S4, calculating collision response and deformation data of clothes and a human body based on the driving signals, fusing the action data and the deformation data to carry out nerve rendering, and executing the next step when the preset deformation calculation frame rate condition and the rendering image quality condition are met; And S5, performing the generation optimization of the virtual image, namely performing inter-frame smoothing processing on the rendering sequence, outputting the virtual image, and completing the generation when the delay condition of the preset generation sequence is met. Preferably, in the step S1, the hybrid capturing system collects human motion data, including collecting human joint angle and acceleration data through an inertial measurement unit, and synchronously collecting human surface contour and space coordinate data through an infrared camera, wherein the two types of data are aligned according to time stamps to form an original motion data se