CN-117372577-B - Method and device for generating mouth shape image of virtual object

CN117372577BCN 117372577 BCN117372577 BCN 117372577BCN-117372577-B

Abstract

The embodiment of the invention provides a method and a device for generating a mouth shape image of a virtual object. The method comprises the steps of obtaining dubbing materials to be processed, wherein the dubbing materials comprise audio data and/or text data corresponding to a virtual object, obtaining a deformer matched with the virtual object from a preset deformer template, generating an amplitude curve corresponding to a pronunciation mouth shape based on the dubbing materials, mapping the dubbing materials into a skeleton model of the virtual object through the deformer, generating a face mouth shape image synchronous with the dubbing materials, and adjusting the face mouth shape image into a mouth shape image of the virtual object through the amplitude curve. According to the method, the dubbing materials are converted into the mouth shape images which are in accordance with the Chinese pinyin rules and the styles of the virtual objects through the deformer and the amplitude curve matched with the virtual objects, so that the generation efficiency of the mouth shape images is greatly improved, and the audio-visual effect of the mouth shape images is optimized.

Inventors

WU HEKANG

Assignees

完美世界(北京)软件科技发展有限公司

Dates

Publication Date: 20260505
Application Date: 20220630

Claims (10)

1. A method for generating a mouth shape image of a virtual object, comprising: acquiring dubbing materials to be processed, wherein the dubbing materials comprise audio data and/or text data corresponding to a virtual object; Obtaining a deformer matched with the virtual object from a preset deformer template, wherein the deformer comprises a mapping relation between a pronunciation mouth shape and a skeleton model, and the pronunciation mouth shape comprises an initial consonant mouth shape and/or a final mouth shape which are built based on Chinese pinyin rule combination; Generating an amplitude curve corresponding to the pronunciation mouth shape based on the dubbing material, wherein the amplitude curve is used for indicating audio amplitude corresponding to each phoneme in the dubbing material, and each phoneme in the dubbing material is in one-to-one correspondence with an initial consonant mouth shape and/or a final mouth shape in the pronunciation mouth shape; mapping the dubbing material into a skeleton model of the virtual object through the deformer, generating a face mouth shape image synchronous with the dubbing material, and adjusting the face mouth shape image into a mouth shape image of the virtual object through the amplitude curve.
2. The method of claim 1, wherein the obtaining the dubbing material to be processed comprises: receiving audio data and/or text data input by a user; And identifying a plurality of virtual objects from the audio data and/or the text data, and dividing data fragments corresponding to the virtual objects from the audio data and/or the text data as the dubbing materials.
3. The method according to claim 1, wherein the obtaining, from a pre-set deformer template, a deformer matching the virtual object, comprises: displaying at least one preset deformer template in a deformer panel, wherein the deformer template comprises a deformer and a corresponding mapping pool, and the mapping pool is used for storing the mapping relation between at least one pronunciation mouth shape and at least one bone model; and responding to a selection instruction of the deformer, determining a skeleton model corresponding to the virtual object, and selecting a deformer matched with the skeleton model corresponding to the virtual object from the at least one deformer template.
4. The method as recited in claim 1, further comprising: Setting corresponding skeleton models for deformers in the deformer templates, wherein the corresponding skeleton models are multiplexed to a plurality of virtual objects.
5. The method of claim 1, wherein mapping the dubbing material into a skeletal model of the virtual object through the deformer, generating a face mouth-shape image synchronized with the dubbing material, and adjusting the face mouth-shape image to a mouth-shape image of the virtual object through the amplitude curve, comprises: identifying each phoneme in the dubbing material through the deformer; Mapping each identified phoneme into a skeleton model of the virtual object to obtain corresponding skeleton model parameters; Calculating the facial mouth shape image based on the bone model parameters; Showing the amplitude curve in an amplitude panel; And in response to an editing instruction of the amplitude curve, adjusting the variation amplitude of the amplitude curve to change the variation amplitude of the mouth shape size in the mouth shape image.
6. The method of claim 5, wherein the generating a corresponding amplitude profile based on the dubbing material comprises: Selecting a key frame from each phoneme in the dubbing material, wherein the key frame comprises an audio data frame corresponding to an initial consonant and/or a final in the dubbing material; the displaying the amplitude curve in an amplitude panel, comprising: And displaying an amplitude curve corresponding to the key frame in the amplitude panel.
7. The method as recited in claim 1, further comprising: And responding to the editing instruction of the deformer template, and adjusting the mapping parameters of the deformer to modify the mapping relation between the pronunciation mouth shape and the bone model.
8. The method as recited in claim 1, further comprising: responding to an editing instruction of animation preset parameters, and adjusting the animation preset parameters to modify the visual effect of the mouth shape image; Wherein the animation preset parameters comprise at least one of mouth shape animation style, frame rate, sampling parameters, additional duration and fade-in and fade-out.
9. The method as recited in claim 1, further comprising: Carrying out semantic recognition on the dubbing material; judging whether the dubbing material accords with a preset condition or not based on the identification result; and if the dubbing material meets the preset condition, adding a specific visual element associated with the virtual object in the facial mouth shape image, wherein the specific visual element comprises a facial expression and/or action bound with a skeleton model.
10. A mouth shape image generating device of a virtual object, the device comprising: The system comprises an acquisition module, a deformer, a processing module and a processing module, wherein the acquisition module is used for acquiring dubbing materials to be processed, the dubbing materials comprise audio data and/or text data corresponding to a virtual object, the deformer matched with the virtual object is acquired from a preset deformer template, the deformer comprises a mapping relation between a pronunciation mouth shape and a skeleton model, and the pronunciation mouth shape comprises an initial consonant mouth shape and/or a final mouth shape which are built based on Chinese pinyin rule combination; The generating module is used for generating an amplitude curve corresponding to the pronunciation mouth shape based on the dubbing material, wherein the amplitude curve is used for indicating audio amplitude corresponding to each phoneme in the dubbing material, each phoneme in the dubbing material corresponds to an initial consonant mouth shape and/or a final vowel mouth shape in the pronunciation mouth shape one by one, mapping the dubbing material into a skeleton model of the virtual object through the deformer, generating a face mouth shape image synchronous with the dubbing material, and adjusting the face mouth shape image into a mouth shape image of the virtual object through the amplitude curve.

Description

Method and device for generating mouth shape image of virtual object Technical Field The present invention relates to the field of image technologies, and in particular, to a method and an apparatus for generating a mouth shape image of a virtual object. Background In scenes such as games, videos, network live broadcast and the like, mouth shape animation corresponding to the character audio needs to be adapted for the virtual character, so that mouth shape actions in the mouth shape animation are matched with pronunciation in the character audio, and the authenticity of the virtual character is improved. Virtual character colors are, for example, game characters, character characters in movie works, avatars of anchor in webcast, and the like. Most of the related technologies do not support the Chinese pronunciation rules, which results in poor mouth shape animation effect of the virtual character, so that the scheme of manually making mouth shape animation of the virtual character by related technicians is still the main scheme at present. In the mouth shape animation production scheme, relevant technicians collect facial data of actors through a face capturing technology, and then produce mouth shape animation by combining with setting of virtual characters on the basis of the facial data. The mouth shape animation generation mode has low automation degree and poor animation production efficiency, and is difficult to deal with large-scale virtual character mouth shape animation generation scenes. In summary, how to automatically generate the mouth animation of the virtual character is a technical problem to be solved. Disclosure of Invention The embodiment of the invention provides a method and a device for generating a mouth shape image of a virtual object, which are used for realizing automatic generation of the mouth shape image, greatly improving the generation efficiency of the mouth shape image, improving the synchronism and the accuracy degree of the mouth shape image and dubbing materials and optimizing the audiovisual effect of the mouth shape image. In a first aspect, an embodiment of the present invention provides a method for generating a mouth shape image of a virtual object, including: Acquiring dubbing materials to be processed, wherein the dubbing materials comprise audio data and/or text data corresponding to virtual objects; Obtaining a deformer matched with a virtual object from a preset deformer template, wherein the deformer comprises a mapping relation between a pronunciation mouth shape and a skeleton model, and the pronunciation mouth shape comprises an initial consonant mouth shape and/or a final mouth shape which are built based on Chinese pinyin rule combination; Generating an amplitude curve corresponding to the pronunciation mouth shape based on the dubbing material, wherein the amplitude curve is used for indicating the audio amplitude corresponding to each phoneme in the dubbing material, and each phoneme in the dubbing material corresponds to the initial consonant mouth shape and/or the vowel mouth shape in the pronunciation mouth shape one by one; Mapping the dubbing material into a skeleton model of the virtual object through a deformer, generating a face mouth shape image synchronous with the dubbing material, and adjusting the face mouth shape image into a mouth shape image of the virtual object through an amplitude curve. In one possible embodiment, obtaining dubbing material to be processed includes: And identifying a plurality of virtual objects from the audio data and/or the text data, and dividing data fragments corresponding to the virtual objects from the audio data and/or the text data as dubbing materials. In one possible embodiment, obtaining a deformer matched with the virtual object from a preset deformer template includes: Displaying at least one preset deformer template in a deformer panel, wherein the deformer template comprises a deformer and a corresponding mapping pool, and the mapping pool is used for storing the mapping relation between at least one pronunciation mouth shape and at least one bone model; and responding to the selection instruction of the deformer, determining a skeleton model corresponding to the virtual object, and selecting the deformer matched with the skeleton model corresponding to the virtual object from at least one deformer template. In one possible embodiment, the method further comprises setting a corresponding bone model for the deformer in the deformer template, wherein the corresponding bone model is multiplexed to the plurality of virtual objects. In one possible embodiment, mapping the dubbing material into a skeletal model of the virtual object through the deformer, generating a face mouth-piece image synchronized with the dubbing material, and adjusting the face mouth-piece image to a mouth-piece image of the virtual object through the amplitude curve, includes: The method comprises the steps of recognizing each phone