CN-122024695-A - Dubbing method and device and electronic equipment

CN122024695ACN 122024695 ACN122024695 ACN 122024695ACN-122024695-A

Abstract

The application discloses a dubbing method, a dubbing device and electronic equipment. The method comprises the steps of obtaining at least one target role in materials to be dubbed, matching target role prototypes corresponding to the target roles from a role prototyping library, wherein the role prototypes in the role prototyping library are obtained by multi-level clustering on voiceprint features in a plurality of media materials, the plurality of media materials comprise a plurality of cross-collection or cross-season media materials with the same roles in the same program, the target role prototypes are used for reflecting the target voiceprint features of the target roles, determining target audio data corresponding to target languages in a plurality of audio data corresponding to the target role prototypes, wherein the target languages are languages to be dubbed by the target roles, and dubbing the target roles in the materials to be dubbed by the target audio data. The application solves the technical problem that the sound image of the same character is difficult to keep stable in cross-collection or cross-season multi-language dubbing by adopting the dubbing method in the related technology.

Inventors

ZHANG CHAO
WANG JINGFEI
PENG YI

Assignees

湖南快乐阳光互动娱乐传媒有限公司

Dates

Publication Date: 20260512
Application Date: 20260214

Claims (10)

1. A dubbing method, comprising: acquiring at least one target role in a material to be dubbed; Matching a target role prototype corresponding to the target role from a role prototype library, wherein the role prototype in the role prototype library is obtained by multi-level clustering on voiceprint features in a plurality of media materials, the plurality of media materials comprise a plurality of cross-collection or cross-season media materials of the same role in the same program, and the target role prototype is used for reflecting the target voiceprint features of the target role; Determining target audio data corresponding to a target language in a plurality of audio data corresponding to the target role prototype, wherein the target language is the language to be dubbed by the target role; And dubbing the target role in the material to be dubbed by adopting the target audio data.
2. The method of claim 1, wherein the role prototype library is constructed by: Acquiring first character prototypes of a plurality of characters corresponding to a plurality of media materials respectively, wherein the first character prototypes are used for reflecting voiceprint characteristics of the characters in each media material; clustering a plurality of first role prototypes corresponding to media materials of the same program to obtain a second role prototype, wherein the second role prototype is used for reflecting voiceprint characteristics of the roles in each program; clustering a plurality of second role prototypes corresponding to the same role in different programs to obtain a third role prototype, wherein the third role prototype is used for reflecting voiceprint features corresponding to the roles in different programs; and determining a role prototype library corresponding to the first role prototype, the second role prototype and the third role prototype.
3. The method of claim 2, wherein obtaining a first character genotype for a plurality of characters corresponding to the plurality of media materials, respectively, comprises: Obtaining all first voiceprint feature vectors corresponding to a target media material from a voiceprint feature table, wherein the voiceprint feature table is used for storing voiceprint feature vectors of voice paragraphs corresponding to historical media materials, and the target media material is any one media material of a plurality of media materials; and clustering the first voiceprint feature vectors to obtain a first character prototype corresponding to the first character in the target media material.
4. The method of claim 2, wherein clustering the plurality of first character prototypes corresponding to the media material of the same program to obtain a second character prototype comprises: Acquiring a first role prototype set corresponding to a target program and prior information of the target program, wherein the prior information is used for reflecting the role type and the presence mode of a second role in the target program; Determining a similarity graph based on the first role prototype set and the prior information, wherein nodes in the similarity graph are first role prototypes corresponding to the target program, and edges in the similarity graph are similarity scores between two nodes determined based on the prior information; Adjusting the similarity graph by adopting a history editing sample to obtain a target similarity graph, wherein the history editing sample is used for reflecting role identity information judgment of the second role in history editing; and clustering the target similarity graph to obtain a second role prototype corresponding to the second role in the target program.
5. The method of claim 2, wherein clustering corresponding ones of the plurality of second character prototypes of the same character in different programs to obtain a third character prototype comprises: Acquiring a second role prototype set corresponding to any one third role in different programs; determining the distance between second voice print feature vectors of any two second character prototypes in the second character prototype set to obtain a voice print feature vector distance matrix, wherein the distance is used for quantitatively representing the similarity degree between the second voice print feature vectors; And clustering the second role prototype set based on the voiceprint feature vector distance matrix to obtain a third role prototype corresponding to the third role.
6. The method of claim 4, wherein the a priori information is determined by: Acquiring media information of the target program, wherein the media information is used for reflecting the program type of the target program and identity information of a second role in the target program; analyzing the caption text of the target program to obtain an identification result, wherein the identification result is used for reflecting the speaking mode of the second character; And determining the character type and the departure mode of the second character based on the media information and the identification result to obtain the prior information.
7. The method according to claim 1, wherein the method further comprises: acquiring an editing log of a target object, wherein the editing log is used for storing editing operation records of clustering results of the multi-level clustering; determining an editing sample based on the editing operation record, wherein the editing sample comprises a first editing sample for indicating the target object to perform merging role prototype operation and a second editing sample for indicating the target object to perform splitting role prototype operation; And updating the clustering parameters corresponding to different program types and/or different role types by adopting the editing sample.
8. The method of claim 1, wherein after matching a target character prototype corresponding to the target character from a character prototype library, the method further comprises: determining an initial confidence coefficient corresponding to a target role type of the target role, wherein the initial confidence coefficient is used for quantitatively representing the stability degree of the target role type; Determining a target confidence level of the target character prototype based on the initial confidence level, wherein the target confidence level is used for quantitatively representing the matching degree between the target character and the target character prototype; And determining whether to add the target role prototype to an audit list based on the target confidence.
9. A dubbing apparatus, comprising: the acquisition module is used for acquiring at least one target role in the material to be dubbed; The matching module is used for matching a target role prototype corresponding to the target role from a role prototype library, wherein the role prototype in the role prototype library is obtained by multi-level clustering on voiceprint features in a plurality of media materials, the plurality of media materials comprise a plurality of cross-collection or cross-season media materials of the same role in the same program, and the target role prototype is used for reflecting the target voiceprint features of the target role; The determining module is used for determining target audio data corresponding to a target language from a plurality of audio data corresponding to the target role prototype, wherein the target language is the language to be dubbed by the target role; And the dubbing module is used for dubbing the target role in the material to be dubbed by adopting the target audio data.
10. An electronic device comprising a memory for storing program instructions and a processor coupled to the memory for performing the dubbing method as claimed in any one of claims 1to 8.

Description

Dubbing method and device and electronic equipment Technical Field The application relates to the technical field of audio processing, in particular to a dubbing method, a dubbing device and electronic equipment. Background Along with the development of the internet video industry, the diversification and internationalization demands of video content are increasing, and especially in the production of multi-season, multi-album and multi-language content such as variety, episode and short play, the sound image ensuring the consistency of characters in different album numbers and language versions is required to improve the content quality and audience experience. However, the dubbing method adopted in the related art performs one-time speaker clustering and character recognition only for a single-stage variety or an episode, resulting in difficulty in maintaining stability of sound images for the same character in cross-episode or cross-season multilingual dubbing. In view of the above problems, no effective solution has been proposed at present. Disclosure of Invention The embodiment of the application provides a dubbing method, a dubbing device and electronic equipment, which at least solve the technical problem that the dubbing method adopted by the related technology is difficult to keep stable in cross-collection or cross-season multi-language dubbing for the sound image of the same role. According to one aspect of the embodiment of the application, a dubbing method is provided, which comprises the steps of obtaining at least one target role in materials to be dubbed, matching target role prototypes corresponding to the target roles from a role prototypes library, wherein the role prototypes in the role prototypes library are obtained by multi-level clustering on voiceprint features in a plurality of media materials, the plurality of media materials comprise a plurality of cross-collection or cross-season media materials of the same roles in the same program, the target role prototypes are used for reflecting the target voiceprint features of the target roles, determining target audio data corresponding to target languages in a plurality of audio data corresponding to the target role prototypes, wherein the target languages are languages to be dubbed by the target roles in the materials to be dubbed by adopting the target audio data. In some embodiments of the application, a character prototype library is constructed by obtaining first character prototypes of a plurality of characters corresponding to a plurality of media materials respectively, wherein the first character prototypes are used for reflecting voiceprint features of the characters in each media material, clustering the plurality of first character prototypes corresponding to the media materials of the same program to obtain second character prototypes, wherein the second character prototypes are used for reflecting voiceprint features of the characters in each program, clustering the plurality of second character prototypes corresponding to the same character in different programs to obtain third character prototypes, wherein the third character prototypes are used for reflecting voiceprint features of the characters commonly corresponding to the characters in different programs, and determining a character prototype library corresponding to the first character prototypes, the second character prototypes and the third character prototypes. In some embodiments of the application, the method comprises the steps of obtaining first character prototypes of a plurality of characters corresponding to a plurality of media materials respectively, and obtaining all first voiceprint feature vectors corresponding to target media materials from a voiceprint feature table, wherein the voiceprint feature table is used for storing voiceprint feature vectors of voice paragraphs corresponding to historical media materials, the target media materials are any one of the plurality of media materials, and clustering the first voiceprint feature vectors to obtain the first character prototypes corresponding to the first characters in the target media materials. In some embodiments of the application, clustering a plurality of first role prototypes corresponding to media materials of the same program to obtain a second role prototype comprises the steps of obtaining a first role prototype set corresponding to a target program and prior information of the target program, wherein the prior information is used for reflecting the role type and the departure mode of the second role in the target program, determining a similarity graph based on the first role prototype set and the prior information, wherein nodes in the similarity graph are the first role prototypes corresponding to the target program, edges in the similarity graph are similarity scores determined between two nodes based on the prior information, adjusting the similarity graph by adopting a history edi