CN-116665696-B - Piano playing video generation method and device, computer equipment and storage medium

CN116665696BCN 116665696 BCN116665696 BCN 116665696BCN-116665696-B

Abstract

The present invention relates to the field of speech analysis, and in particular, to a piano playing video generating method, apparatus, computer device and storage medium. The method comprises the steps of obtaining audio stream data, inputting the audio stream data into an audio encoder for encoding processing to obtain audio codes, performing transcoding on the audio codes through a piano video transcoding model to obtain piano video code book sequences, performing decoding processing on the piano video code book sequences through a piano video code Bao Jiema to obtain piano video stream data, wherein the piano video stream data is a video stream of music corresponding to the audio codes played by hands on a piano, and combining the piano video stream data and the audio stream data to obtain piano playing video. The invention converts the audio stream data into the video stream data with the pictures of the hand-played piano, and finally generates the video which contains the audio and the hand playing the audio on the piano, so that the video effect and the quality are better, and the user experience is improved.

Inventors

KANG ZUHENG
PENG JUNQING
WANG JIANZONG
XIAO JING

Assignees

平安科技（深圳）有限公司

Dates

Publication Date: 20260508
Application Date: 20230531

Claims (9)

1. A piano-playing video generation method, characterized by comprising: acquiring audio stream data; inputting the audio stream data into an audio encoder for encoding processing to obtain audio codes; Transcoding the audio codes through a piano video transcoding model to obtain piano video code book sequences corresponding to the audio codes; The piano video code book sequence is decoded through a piano video code Bao Jiema device to obtain piano video stream data, wherein the piano video stream data refers to a video stream of music corresponding to the audio code played by hands on a piano; Combining the piano video stream data and the audio stream data to obtain piano playing video; before the piano video code book sequence is decoded by the piano video code Bao Jiema device to obtain piano video stream data, the method comprises the following steps: acquiring a second piano video stream sample; Extracting a first skeleton key point video stream from the second piano video stream sample through a hand model; performing coding processing on the second piano video stream sample through a code Bao Bianma device to obtain a second video code book sequence; performing quantization processing on the second video code book sequence to obtain a video quantized code book sequence; Inputting the video quantized code book sequence into an initial piano video code Bao Jiema device for decoding processing to obtain a target piano video stream and a second bone key point video stream; Determining a total loss value from the second piano video stream sample, the target piano video stream, the first skeletal keypoint video stream, the second video codebook sequence and the video quantization codebook sequence; And when the total loss value does not reach a preset convergence condition, iteratively updating initial parameters of the initial piano video code Bao Jiema until the total loss value reaches the preset convergence condition, and taking the initial piano video code Bao Jiema after convergence as the piano video code Bao Jiema.
2. The piano-playing video generating method of claim 1, wherein before said transcoding said audio codes through a piano video transcoding model, obtaining a piano video codebook sequence corresponding to said audio codes, comprising: Acquiring a piano playing video sample; performing coding processing on the piano playing video sample to obtain a first video code book sequence and an audio sample code; inputting the audio sample codes into an initial piano video code conversion model to obtain a target video code book sequence; determining a video loss value from the first video codebook sequence and the target video codebook sequence; And when the video loss value does not reach a preset convergence condition, iteratively updating initial parameters of the initial piano video transcoding model, and taking the initial piano video transcoding model after convergence as the piano video transcoding model until the video loss value reaches the preset convergence condition.
3. The piano-playing video generation method of claim 2, wherein said encoding said audio samples into an initial piano video transcoding model results in a sequence of target video codebooks, comprising: the audio sample codes are used as first input data and are input into the initial piano video transcoding model, and a first frame video code book sequence is obtained; performing code splicing on the audio sample codes and the first frame video code book sequence to obtain second input data, and inputting the second input data into the initial piano video code conversion model to obtain a second frame video code book sequence; performing code splicing on the audio sample codes, the first frame video code book sequence and the second frame video code book sequence to obtain third input data, and inputting the third input data into the initial piano video code conversion model to obtain a third frame video code book sequence; And when the audio sample codes are all converted into video code book sequences, obtaining the target video code book sequences.
4. The piano-playing video generating method of claim 2, wherein said encoding the piano-playing video sample to obtain a first video codebook sequence and an audio sample code comprises: carrying out splitting processing on the piano playing video sample to obtain a first piano video stream sample and an audio stream sample; inputting the first piano video stream sample into a code Bao Bianma device to obtain the first video code book sequence; and inputting the audio stream samples into the audio encoder to obtain the audio sample codes.
5. The piano action video generating method of claim 4, wherein said determining a total loss value from said second piano video stream sample, said target piano video stream, said first skeletal keypoint video stream, said second video codebook sequence and said video quantization codebook sequence comprises: determining a first loss value according to the second piano video stream sample and the target piano video stream; determining a second loss value according to the first skeletal keypoint video stream and the second skeletal keypoint video stream; determining a third loss value based on the second video codebook sequence and the video quantization codebook sequence; Determining the total loss value from the first loss value, the second loss value, and the third loss value.
6. A piano-playing video generating apparatus, characterized by comprising: The audio stream data module is used for acquiring audio stream data; The audio coding module is used for inputting the audio stream data into an audio coder for coding processing to obtain audio codes; The piano video code book sequence module is used for carrying out code conversion on the audio codes through a piano video code conversion model to obtain piano video code book sequences corresponding to the audio codes; The piano video stream data module is used for decoding the piano video code book sequence through a piano video code Bao Jiema device to obtain piano video stream data, wherein the piano video stream data refers to a video stream of music corresponding to the audio code played by hands on a piano; The piano playing video module is used for combining the piano video stream data and the audio stream data to obtain piano playing video; the piano playing video generating device further comprises: The second piano video stream sample module is used for acquiring a second piano video stream sample; The first skeleton key point video stream module is used for extracting a first skeleton key point video stream from the second piano video stream sample through a hand model; the second video code book sequence module is used for carrying out coding processing on the second piano video stream sample through a code Bao Bianma device to obtain a second video code book sequence; The video quantization code book sequence module is used for carrying out quantization processing on the second video code book sequence to obtain a video quantization code book sequence; The decoding processing module is used for inputting the video quantized code book sequence into an initial piano video code Bao Jiema device for decoding processing to obtain a target piano video stream and a second skeleton key point video stream; a total loss value module, configured to determine a total loss value according to the second piano video stream sample, the target piano video stream, the first bone key point video stream, the second video code book sequence, and the video quantization code book sequence; and the piano video code Bao Jiema device module is used for iteratively updating the initial parameters of the initial piano video code Bao Jiema device when the total loss value does not reach the preset convergence condition, and taking the initial piano video code Bao Jiema device after convergence as the piano video code Bao Jiema device when the total loss value reaches the preset convergence condition.
7. The piano-playing video generating device of claim 6, comprising, prior to said piano video codebook sequence block: the piano playing video sample module is used for acquiring piano playing video samples; The coding processing module is used for carrying out coding processing on the piano playing video sample to obtain a first video code book sequence and an audio sample code; the target video code book sequence module is used for inputting the audio sample codes into an initial piano video code conversion model to obtain a target video code book sequence; A video loss value module for determining a video loss value from the first video codebook sequence and the target video codebook sequence; and the piano video transcoding model module is used for iteratively updating the initial parameters of the initial piano video transcoding model when the video loss value does not reach the preset convergence condition, and taking the initial piano video transcoding model after convergence as the piano video transcoding model until the video loss value reaches the preset convergence condition.
8. A computer device comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, wherein the processor, when executing the computer readable instructions, implements the piano action video generating method of any one of claims 1 to 5.
9. One or more readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the piano action video generation method of any one of claims 1 to 5.

Description

Piano playing video generation method and device, computer equipment and storage medium Technical Field The present invention relates to the field of speech analysis, and in particular, to a piano playing video generating method, apparatus, computer device and storage medium. Background With the development of science and technology and entertainment industry, people's entertainment mode has been shifted from simple text reading, picture display to multimedia such as audio and video. Particularly, with the rise of short videos, the demand for music videos is also increasing. The existing channels for acquiring music are usually acquired through music software, but only the audio of the player piano is often acquired, and no complete piano playing video containing the human player piano is available. Second, even if some piano music generates a corresponding piano-playing video containing a human hand-played piano. However, the existing piano playing video is usually generated by speaking to drive the motion of a virtual person, so that the generated piano playing video is not smooth enough, has poor quality and affects the user experience. Disclosure of Invention Based on the foregoing, it is necessary to provide a piano-playing video generating method, apparatus, computer device and storage medium, so as to solve the problem of poor video effect in the existing piano-playing video generating technology. A piano-playing video generation method, comprising: acquiring audio stream data; inputting the audio stream data into an audio encoder for encoding processing to obtain audio codes; Transcoding the audio codes through a piano video transcoding model to obtain piano video code book sequences corresponding to the audio codes; The piano video code book sequence is decoded through a piano video code Bao Jiema device to obtain piano video stream data, wherein the piano video stream data refers to a video stream of music corresponding to the audio code played by hands on a piano; And combining the piano video stream data and the audio stream data to obtain piano playing video. A piano-playing video generating apparatus comprising: The audio stream data module is used for acquiring audio stream data; The audio coding module is used for inputting the audio stream data into an audio coder for coding processing to obtain audio codes; The piano video code book sequence module is used for carrying out code conversion on the audio codes through a piano video code conversion model to obtain piano video code book sequences corresponding to the audio codes; The piano video stream data module is used for decoding the piano video code book sequence through a piano video code Bao Jiema device to obtain piano video stream data, wherein the piano video stream data refers to a video stream of music corresponding to the audio code played by hands on a piano; and the piano playing video module is used for combining the piano video stream data and the audio stream data to obtain piano playing video. A computer device comprising a memory, a processor and computer readable instructions stored in the memory and executable on the processor, the processor implementing the piano-playing video generation method described above when executing the computer readable instructions. One or more readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform a piano-playing video generation method as described above. The piano playing video generating method, the piano playing video generating device, the computer equipment and the storage medium are characterized by acquiring audio stream data, inputting the audio stream data into an audio encoder for encoding to obtain audio codes, performing code conversion on the audio codes through a piano video code conversion model to obtain a piano video code book sequence corresponding to the audio codes, performing decoding processing on the piano video code book sequence through a piano video code Bao Jiema device to obtain piano video stream data, and combining the piano video stream data and the audio stream data to obtain piano playing video. The invention decodes the piano video code book sequence into piano video stream data (the piano video stream data refers to a video stream of music corresponding to the audio codes played by hands on a piano) by performing encoding processing and code conversion on the audio data into the piano video code book sequence. And then combine piano video stream data and audio stream data, obtain the piano and play the video, finally realize converting the audio stream data into including the audio and including the piano that the staff played this audio on the piano and play the video, it is smoother to play the video according to the piano that the audio stream data produced, makes video effect and quality better, improves user experience and feels. Drawings In order to more