CN-122027864-A - 2D digital teacher video course automatic generation method and system based on real teacher

CN122027864ACN 122027864 ACN122027864 ACN 122027864ACN-122027864-A

Abstract

The invention discloses an automatic 2D digital teacher video course generation method and system based on a real teacher, and relates to the technical field of digital education. The technical problems that the existing education video course shooting cost is high, the period is long, the course iteration is difficult, the batch generation cannot be realized, the teaching naturalness of the existing digital human scheme is poor, the teacher specialized function is lacked and the like are solved. According to the method, through the collaborative work of a teacher image acquisition and construction module, a teacher sound line cloning and TTS text reading module, a mouth shape mapping and 2D expression driving module, an automatic course script generation or importing module and a video rendering and course output module, a 2D digital teacher video course with high consistency and high naturalness can be generated only by a small amount of real teacher related sampling data. The invention can greatly reduce the course making cost, improve the making efficiency, and the generated course highly restores the style of a real teacher, has the characteristic of teaching friendliness, and can meet the large-scale use requirements of universities and educational institutions.

Inventors

YANG JING
LU MINGHUI
ZHU XINYUE
WANG RUNMIN
ZHANG SIQING

Assignees

安徽省景卓信息技术有限公司

Dates

Publication Date: 20260512
Application Date: 20260128

Claims (8)

1. The method for automatically generating the high-naturalness 2D digital teacher video course based on the real teacher is characterized by comprising the following steps of: S1, collecting and constructing a teacher image, namely collecting a front photo, a short video, facial feature points, light and shadow parameters and expression samples of a real teacher, and generating a 2D teacher digital human model and a multi-expression frame, wherein the multi-expression frame comprises opening, closing, smiling and emphasizing expressions; S2, teacher voice line cloning and TTS text reading, namely collecting teacher voice frequency for 30 seconds to 3 minutes, extracting voice print characteristics, generating scripts which keep the real tone of the teacher, contain the special rhythm of teaching and can adapt to different courses; S3, performing mouth shape mapping and 2D expression driving, namely converting the audio generated in the step S2 into a viseme mouth shape sequence and an expression change sequence, wherein the expression change sequence is generated based on the mood to form a natural 2D digital teacher expression animation; S4, automatic course script generation or importing, namely acquiring course content, wherein the course content sources comprise teaching plan document importing, text provided by a teacher and an AI automatically generated teaching script; And S5, video rendering and course output, namely synthesizing the digital teacher picture obtained in the step S3, adding a course background, wherein the course background comprises a classroom, a whiteboard or a PPT template, and outputting 1080P/4K video courses.
2. The method according to claim 1, wherein in step S1, the collected facial feature points are used to accurately construct the facial structure of the 2D teacher digital human model, and the light and shadow parameters are used to ensure the visual consistency of the 2D teacher digital human model in different scenes.
3. The method of claim 1, wherein in step S2, in addition to the genuine voiceprint clone, speech adapting to the lesson script is generated instead of TTS voiceprint synthesis.
4. The method of claim 1, wherein in step S3, the viseme mouth-shaped sequence is aligned precisely with the time axis of the audio, and the expression change sequence is matched with the mood change in real time, so as to realize collaborative synchronization of mouth shape, expression and audio.
5. The method according to claim 1, wherein in step S4, the course content is supplemented or modified by means of manual editing, knowledge base extraction and external interface importation.
6. The method of claim 1, wherein in step S5, the fusion of the picture of the digital teacher and the background of the course adopts an automatic fusion algorithm to ensure that the transition between the digital teacher and the background is natural and free of offensive feeling.
7. 2D digital teacher video course automatic generation system based on true man teacher, its characterized in that includes: S1, a teacher image acquisition and construction module is used for acquiring a front photo or a short video, facial feature points, light and shadow parameters and expression samples of a real teacher to generate a 2D teacher digital human model and a multi-expression frame; s2, a teacher sound line cloning and TTS text reading module is used for collecting teacher audio, extracting sound print characteristics, generating teacher style voice and carrying out TTS text reading; s3, a mouth shape mapping and 2D expression driving module is used for converting the audio into viseme mouth shape sequences and expression change sequences to form 2D digital teacher expression animation; S4, an automatic course script generation or import module is used for acquiring course contents, including teaching plan document import, text receiving provided by a teacher and automatic AI generation teaching scripts; and S5, a video rendering and course output module which is used for synthesizing a digital teacher picture, adding a course background and outputting a video course.
8. The system of claim 7, wherein the 2D teacher digital mannequin is replaceable with a 3D digital mannequin and the motion driving logic remains consistent.

Description

2D digital teacher video course automatic generation method and system based on real teacher Technical Field The invention relates to the technical field of digital education, in particular to an automatic 2D digital teacher video course generation method and system based on a real teacher. Background The existing education video courses generally need real teachers to shoot, and various industry pain points exist. Firstly, the shooting cost is high, the period is long, setting up of scenes and preparing light equipment are needed in the manufacturing process, the time and cost are high due to long-time shooting of course content, secondly, course iteration is difficult, teachers need to shoot courses again because the content of teaching materials is updated every year, original shooting resources are difficult to reuse, furthermore, courses cannot be generated in batches, the same teacher cannot produce a large number of courses in a short time, the demands of education institutions on the number of courses are difficult to meet, in addition, the problems of inaccurate mouth-type teaching naturalness, stiff expression, mismatching of speech speed, lack of teaching characteristics and the like exist in the conventional digital scheme, the teaching style requirements of real class are difficult to meet, and finally, the conventional 2D digital person on the market has more entertainment attributes and does not have the optimizing capability of 'education visual expression', such as rhythm synchronization of board books, knowledge point-based specialized functions of teachers and the like. Therefore, a method for automatically generating a high-naturalness 2D digital teacher video course based on a real teacher is urgently needed, so that shooting cost is reduced, manufacturing efficiency is improved, and teaching is ensured to be real, credible and professional. Disclosure of Invention The invention aims to provide a 2D digital teacher video course automatic generation method and system based on a real teacher, and the specific technical scheme is as follows: the 2D digital teacher video course automatic generation method based on the real teacher comprises the following steps: 1. And the teacher image acquisition and construction, namely acquiring a front photo or a short video of a real teacher, and acquiring facial feature points, light and shadow parameters and expression samples. The facial feature points are used for precisely constructing the facial structure of the 2D teacher digital human model, ensuring that the digital human model is consistent with the facial features of a real teacher, the light and shadow parameters are used for ensuring the visual consistency of the 2D teacher digital human model under different courses, avoiding the condition of abrupt light and shadow, the expression samples comprise opening, closing, smiling, emphasizing expressions and the like, generating a multi-expression frame based on the samples, providing a basis for the subsequent expression animation, and finally generating the 2D teacher digital human model highly conforming to the image of the real teacher. 2. Collecting teacher audio frequency for 30 seconds-3 minutes, obtaining unique voiceprint characteristics of a teacher through a voiceprint characteristic extraction technology, and further generating teacher style voice. The voice can keep the true intonation of a teacher, comprises the rhythm of 'pause- & gt emphasis- & gt explanation' specific to teaching, and can adapt to different course scripts. If the cloning of the real voice print is not carried out, TTS voice line synthesis can be used for replacing the cloning, and the voice meeting the course requirement can be generated. 3. And (3) mouth shape mapping and 2D expression driving, namely converting the teacher style voice or the TTS synthesized voice generated in the step (2) into viseme mouth shape sequences and expression change sequences. The viseme mouth type sequence is accurately aligned with a time axis of the audio frequency, the mouth type of the digital teacher is guaranteed to be completely synchronous with voice pronunciation, the expression change sequence is generated based on the mood of the voice, real-time matching of the mood and the expression is achieved, and then natural and smooth 2D digital teacher expression animation is formed. 4. And (3) automatically generating or importing course scripts, namely acquiring course contents, wherein the course contents have various sources, including the importing of teaching plan documents and texts provided by teachers, and automatically generating teaching scripts through AI. In addition, the course content can be supplemented or corrected in a manual editing mode, a knowledge base extraction mode or an external interface importing mode, and accuracy and completeness of the course content are ensured. 5. And (3) video rendering and course output, namely synthesizing the digital teacher