CN-121985196-A - Interactive video and model generation method and device, electronic equipment and storage medium

CN121985196ACN 121985196 ACN121985196 ACN 121985196ACN-121985196-A

Abstract

The application discloses a method and a device for generating an interactive video and a model, electronic equipment and a storage medium. The method for generating the interactive video comprises the steps of obtaining a digital person image and a scene file, generating corresponding interactive feature vectors according to the digital person image and the scene file, wherein the interactive feature vectors comprise digital person interactive capability sub-vectors, scene attribute sub-vectors, interactive object feature sub-vectors and interactive action sub-vectors in a scene, and inputting the scene file and the interactive feature vectors into a trained interactive video model to obtain a target interactive video. The application ensures that the digital person has a definite interaction target and can precisely point to the interaction object, thereby improving the video generation efficiency.

Inventors

LI TAO
Ruan Mengqing

Assignees

郑州阿帕斯数云信息科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251230

Claims (10)

1. An interactive video generation method, comprising: acquiring digital human images and scene texts; generating corresponding interaction feature vectors according to the digital person image and the scene document, wherein the interaction feature vectors comprise digital person interaction capability sub-vectors, scene attribute sub-vectors, interaction object feature sub-vectors and interaction action sub-vectors in a scene; and inputting the scene file and the interaction feature vector into a trained interaction video model to obtain a target interaction video.
2. The method of claim 1, wherein the generating the corresponding interaction feature vector from the digital persona and the scene document comprises: Generating the digital human interaction capability sub-vector according to the digital human image; Generating the scene attribute sub-vector according to the scene file; Extracting information of an interaction object in the scene file, and generating a feature sub-vector of the interaction object according to the information of the interaction object; generating the interaction action sub-vector according to a preset recommended interaction action; And generating the interaction feature vector according to the digital human interaction capability sub-vector, the scene attribute sub-vector, the interaction object feature sub-vector and the interaction action sub-vector.
3. The method of claim 1, wherein the inputting the scene document and the interaction feature vector into the trained interaction video model to obtain the target interaction video comprises: converting the interaction feature vector into a character string with a predefined format, and wrapping the character string with the predefined format by using a target marker to obtain a serialized character string of the interaction feature vector; splicing the serialized interactive feature vector character string, the scene text and the video generation prompt to obtain a single text character string; and inputting the single text character string into the trained interactive video model to obtain the target interactive video.
4. The method according to claim 2, wherein the inputting the scene file and the interaction feature vector into the trained interaction video model to obtain the target interaction video further comprises: and regenerating the corresponding interaction feature vector and the target interaction video according to the modification of the user on at least one of the digital human figure, the scene file and the recommended interaction action based on the target interaction video.
5. A method for generating an interactive video model, comprising: The method comprises the steps of obtaining a training data set, wherein the training data set comprises a plurality of groups of training data, each group of training data comprises a training scene file, training interaction feature vectors and corresponding training target interaction videos, and the training interaction feature vectors comprise training digital human interaction capability sub-vectors, training scene attribute sub-vectors, training interaction object feature sub-vectors and training interaction action sub-vectors in a training scene; Inputting the training scene file and the training interaction feature vector into an interaction video model to be trained to obtain training candidate interaction videos; and training the interactive video to be trained according to the training candidate interactive video and the training target interactive video to obtain a trained interactive video model.
6. The method of claim 5, wherein the acquiring a training data set comprises: and generating a plurality of different training target interaction videos aiming at the same training scene file and a plurality of different training interaction feature vectors.
7. An interactive video generating apparatus, comprising: The first acquisition module is used for acquiring the digital human images and the scene texts; The generation module is used for generating corresponding interaction feature vectors according to the digital person image and the scene document, wherein the interaction feature vectors comprise digital person interaction capability sub-vectors, scene attribute sub-vectors, interaction object feature sub-vectors and interaction action sub-vectors in a scene; and the first input module is used for inputting the scene file and the interaction feature vector into the trained interaction video model to obtain a target interaction video.
8. An apparatus for generating an interactive video model, comprising: The second acquisition module is used for acquiring a training data set, wherein the training data set comprises a plurality of groups of training data, each group of training data comprises a training scene file, training interaction feature vectors and corresponding training target interaction videos, and the training interaction feature vectors comprise training digital human interaction capability sub-vectors, training scene attribute sub-vectors, training interaction object feature sub-vectors and training interaction action sub-vectors in a training scene; the second input module is used for inputting the training scene file and the training interaction feature vector into an interaction video model to be trained to obtain training candidate interaction videos; And the training module is used for training the interactive video to be trained according to the training candidate interactive video and the training target interactive video to obtain a trained interactive video model.
9. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, the program or instruction when executed by the processor implementing the steps of the method according to any one of claims 1 to 4 or the steps of the method according to any one of claims 5 to 6.
10. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the method according to any of claims 1-4 or the steps of the method according to any of claims 5-6.

Description

Interactive video and model generation method and device, electronic equipment and storage medium Technical Field The application belongs to the technical field of artificial intelligence, and particularly relates to a method and a device for generating interactive video and a model, electronic equipment and a storage medium. Background With the development of artificial intelligence (ARTIFICIAL INTELLIGENCE, abbreviated as AI) technology, the digital person generating technology can quickly generate high-precision digital person images and voice videos through single photos or videos, and the video generating technology can generate virtual scene videos conforming to scene setting according to scene text description. In the related art, a digital person and a scene are integrally overlapped to generate a digital person scene video. But the scheme includes that 1) a core interaction object (such as an A model icon and product equipment) cannot be automatically identified from scene text description, so that a digital person does not have an explicit interaction target, 2) the digital person does not have an explicit interaction target (pointing, touching, displaying and the like) lacks of space calibration and logic binding with the interaction object, and violations and effects such as 'blank interaction', 'action and object size mismatch', and the like, such as inaccurate pointing of a digital person finger to an icon position, occur frequently, and 3) the scene text description-scene-interaction cleavage cannot be automatically converted into a closed loop flow of 'scene generation+object creation+digital person interaction', object parameters and action instructions need to be manually configured, and the operation is complex and the efficiency is low. Disclosure of Invention The embodiment of the application aims to provide a method and a device for generating an interactive video and a model, electronic equipment and a storage medium, and aims to solve the problems that a digital person in the related technology has no definite interactive target, cannot precisely point to an interactive object, is complex to operate and has low efficiency. In order to achieve the above purpose, the embodiment of the present application adopts the following technical scheme: According to a first aspect, the embodiment of the application provides a method for generating an interactive video, which comprises the steps of obtaining a digital person image and a scene file, generating corresponding interactive feature vectors according to the digital person image and the scene file, wherein the interactive feature vectors comprise a digital person interactive capability sub-vector, a scene attribute sub-vector, an interactive object feature sub-vector and an interactive action sub-vector in a scene, and inputting the scene file and the interactive feature vector into a trained interactive video model to obtain a target interactive video. In a second aspect, an embodiment of the present application provides a method for generating an interactive video model, including obtaining a training dataset, where the training dataset includes a plurality of sets of training data, each set of training data includes a training scene document, a training interaction feature vector, and a corresponding training target interactive video, and the training interaction feature vector includes a training digital human interaction capability sub-vector, a training scene attribute sub-vector, a training interaction object feature sub-vector in a training scene, and a training interaction action sub-vector, inputting the training scene document and the training interaction feature vector into an interactive video model to be trained to obtain a training candidate interactive video, and training the interactive video to be trained according to the training candidate interactive video and the training target interactive video to obtain a trained interactive video model The embodiment of the application provides a generating device of an interactive video, which comprises a first obtaining module, a generating module and a first input module, wherein the first obtaining module is used for obtaining a digital person image and a scene file, the generating module is used for generating corresponding interactive feature vectors according to the digital person image and the scene file, the interactive feature vectors comprise digital person interactive capability sub-vectors, scene attribute sub-vectors, interactive object feature sub-vectors and interactive action sub-vectors in a scene, and the first input module is used for inputting the scene file and the interactive feature vectors into a trained interactive video model to obtain a target interactive video. In a fourth aspect, an embodiment of the present application provides a generating device of an interactive video model, including a second obtaining module, configured to obtain a training data