EP-4742679-A1 - DATA PROCESSING METHOD AND APPARATUS, AND ELECTRONIC DEVICE AND STORAGE MEDIUM

EP4742679A1EP 4742679 A1EP4742679 A1EP 4742679A1EP-4742679-A1

Abstract

Provided in the embodiments of the present disclosure is a data processing method. The method comprises: in response to a first operation performed by a user, obtaining a first video, the first video comprising a first set of images and a first voice, and a timestamp of each frame image in the first set of images being the same as a timestamp of a corresponding first syllable in the first voice; and in response to a second operation performed by the user, playing a second video, the second video comprising the first set of images, a stretched first voice, and background music, and a timestamp of a stretched first syllable in the stretched first voice being the same as a timestamp of a beat in the background music, and a timestamp of each frame image in the first set of images in the second video being the same as a timestamp of a corresponding stretched first syllable in the stretched first voice.

Inventors

ZHANG, YILIN
LI, BOCHEN
Yao, Shawn
SHEN, Lingxuan
PEI, Xiu
PAN, Xitong

Assignees

Lemon Inc.
Beijing Zitiao Network Technology Co., Ltd.

Dates

Publication Date: 20260513
Application Date: 20240606

Claims (16)

A data processing method, comprising: in response to a first operation performed by a user, obtaining a first video, the first video comprising a first set of images and a first voice, and a timestamp of each frame image in the first set of images being the same as a timestamp of a corresponding first syllable in the first voice; and in response to a second operation performed by the user, playing a second video, the second video comprising the first set of images, a stretched first voice, and background music, and a timestamp of a stretched first syllable in the stretched first voice being the same as a timestamp of a beat in the background music, and a timestamp of each frame image in the first set of images in the second video being the same as a timestamp of a corresponding stretched first syllable in the stretched first voice.
The method of claim 1, wherein in response to the first operation performed by the user, obtaining the first video comprises: in response to a video upload operation performed by the user, obtaining the first video; or in response to a recording operation performed by the user, obtaining the first video, wherein the first video comprises the first voice recorded by the user.
The method of claim 1, wherein in response to the second operation performed by the user, playing the second video comprises: in response to a selection, by the user, of one music style control among a plurality of music style controls, playing the second video, wherein the background music has the music style.
The method of any of claims 1 to 3, wherein: the background music comprises a first audio track and a second audio track; the first audio track comprises an instrumental audio track, and the second audio track comprises an audio track of the background music other than the instrumental audio track; and a timestamp of a stretched first syllable in the stretched first voice is the same as a timestamp of an instrumental sound in the first audio track, and a timestamp of a stretched first syllable in the stretched first voice is the same as a timestamp of a beat in the second audio track.
The method of claim 4, wherein: a timestamp of a first stressed syllable in each of voice segments in the stretched first voice is the same as a timestamp of a first beat of a bar in the second audio track, or is the same as a timestamp of a middle beat of the bar in the second audio track; and the voice segments are obtained by segmenting the first voice based on a silent fraction in the first voice; and the beat comprises a first beat of each bar in the second audio track and a middle beat of each bar in the second audio track.
The method of claim 5, wherein the method further comprises: determining audio energies of the respective first syllables in the first voice; for each of the first syllables in the first voice, judging whether the first syllable is a stressed syllable based on an audio energy of the first syllable and an audio energy of a first syllable adjacent to the first syllable; and determining a first stressed syllable in each of the voice segments based on whether each of the first syllables is a stressed syllable.
The method of claim 6, wherein the method further comprises: determining a start timestamp and an end timestamp of the second audio track based on a timestamp of the stretched first stressed syllable in a first one of the voice segments and a timestamp of the stretched first stressed syllable in a last one of the voice segments; and playing music in the second audio track in a loop based on the start timestamp and the end timestamp.
The method of claim 4, wherein the method further comprises: for a second syllable in the first voice, selecting, from two beats in the second audio track having timestamps closest to a timestamp of the second syllable before being stretched, a matched first beat for the second syllable, based on audio energies and timestamps of the two beats; and stretching a timestamp of the second syllable to be the same as a timestamp of the first beat; wherein the second syllable is a syllable among the respective first syllables of the first voice other than a first stressed syllable in each of voice segments, a first syllable in a first one of the voice segments, and a last syllable in a last one of the voice segments; and the voice segments are obtained by segmenting the first voice based on a silent fraction in the first voice.
The method of claim 8, wherein selecting, from the two beats in the second audio track having timestamps closest to the timestamp of the second syllable before being stretched, the matched first beat for the second syllable, based on audio energies and timestamps of the two beats, comprises: based on the audio energies of the two beats, a first time length required to stretch the timestamp of the second syllable before being stretched to the timestamps of the two beats respectively, and a variation of a second time length, selecting the matched first beat for the second syllable from the two beats; the second time length comprising a time length between the timestamp of the second syllable before being stretched and a first timestamp.
The method of claim 9, wherein, the first beat is a beat that is selected from the two beats in an order of audio energy magnitudes, that causes the first time length to be less than a length threshold value, and that causes the variation of the second time length to satisfy a variation requirement; and the first timestamp comprises a timestamp of a stretched first syllable located before the second syllable and/or a timestamp of a stretched first syllable located after the second syllable.
The method of claim 4, wherein the method further comprises: for a second syllable in the first voice, stretching, based on a stretching order of the respective second syllables, a timestamp of a second syllable that is earlier in the stretching order; stretching a timestamp of a second syllable that is later in the stretching order, based on a stretching amount of the timestamp of the second syllable that is earlier in the stretching order and the stretching order; and wherein the second syllable is a syllable among the respective first syllables of the first voice other than a first stressed syllable in each of voice segments, a first syllable in a first one of the voice segments, and a last syllable in a last one of the voice segments; and the voice segments are obtained by segmenting the first voice based on a silent fraction in the first voice.
The method of claim 11, wherein the method further comprises: determining, based on audio energies of the respective second syllables in the first voice and whether the respective second syllables are stressed syllables, a stretching order of the respective second syllables.
A data processing apparatus, comprising: an obtaining unit, configured to obtain a first video, in response to a first operation performed by a user, the first video comprising a first set of images and a first voice, and a timestamp of each frame image in the first set of images being the same as a timestamp of a corresponding first syllable in the first voice; a playing unit, configured to play a second video, in response to a second operation performed by the user, the second video comprising the first set of images, a stretched first voice, and background music, and a timestamp of a stretched first syllable in the stretched first voice being the same as a timestamp of a beat in the background music, and a timestamp of each frame image in the first set of images in the second video being the same as a timestamp of a corresponding stretched first syllable in the stretched first voice.
An electronic device, comprising: a processor; and a memory configured to store computer executable instructions, wherein the computer executable instructions, when executed, cause the processor to implement steps of the method of any of claims 1 to 12.
A computer readable storage medium for storing computer executable instructions, wherein the computer executable instructions, when executed by a processor, implement steps of the method of any of claims 1 to 12.
A computer program product stored in a non-transitory computer readable medium and comprising machine executable instructions, wherein the machine executable instructions, when executed, cause a machine to perform the method of any of claims 1 to 12.

Description

CROSS-REFERENCE TO RELATED APPLICATION The present application claims priority to Chinese Patent Application No. 202310829915.2, entitled "DATA PROCESSING METHOD AND APPARATUS, AND ELECTRONIC DEVICE AND STORAGE MEDIUM" filed on July 7, 2023, the disclosure of which is incorporated herein by reference in its entirety. FIELD The present disclosure relates to the field of audio and video processing, and in particular, to a data processing method, apparatus, electronic device, and storage medium. BACKGROUND Currently, there are various audio and video processing applications available on the market. Users can process recorded audio and video through such applications, for example, by adding props to the video or adding background music to the video. SUMMARY Embodiments of the present disclosure provide a data processing method, apparatus, electronic device, and storage medium, which are capable of converting voice into music having a certain musical style. According to a first aspect, an embodiment of the present disclosure provides a data processing method, comprising: in response to a first operation performed by a user, obtaining a first video, the first video comprising a first set of images and a first voice, and a timestamp of each frame image in the first set of images being the same as a timestamp of a corresponding first syllable in the first voice; andin response to a second operation performed by the user, playing a second video, the second video comprising the first set of images, a stretched first voice, and background music, and a timestamp of a stretched first syllable in the stretched first voice being the same as a timestamp of a beat in the background music, and a timestamp of each frame image in the first set of images in the second video being the same as a timestamp of a corresponding stretched first syllable in the stretched first voice. According to a second aspect, an embodiment of the present disclosure provides a data processing apparatus, comprising: an obtaining unit, configured to obtain a first video, in response to a first operation performed by a user, the first video comprising a first set of images and a first voice, and a timestamp of each frame image in the first set of images being the same as a timestamp of a corresponding first syllable in the first voice; anda playing unit, configured to play a second video, in response to a second operation performed by the user, the second video comprising the first set of images, a stretched first voice, and background music, and a timestamp of a stretched first syllable in the stretched first voice being the same as a timestamp of a beat in the background music, and a timestamp of each frame image in the first set of images in the second video being the same as a timestamp of a corresponding stretched first syllable in the stretched first voice. According to a third aspect, an embodiment of the present disclosure provides an electronic device, comprising: a processor; and a memory configured to store computer executable instructions, wherein the computer executable instructions, when executed, cause the processor to implement steps of the method described in the first aspect. According to a fourth aspect, an embodiment of the present disclosure provides a computer readable storage medium for storing computer executable instructions, and the computer executable instructions, when executed by a processor, implement steps of the method described in the first aspect. In one or more embodiments of the present disclosure, a second video can be played based on a user operation. Compared with a first video, the second video includes not only a first set of images, but also a stretched first voice and background music. A timestamp of a stretched first syllable in the stretched first voice is the same as a timestamp of a beat in the background music, such that the stretched first voice and the background music are combined to form music with a certain musical style, thereby achieving the technical effect of converting voice into music. Furthermore, a timestamp of each frame image in the first set of images in the second video is the same as a timestamp of a corresponding stretched first syllable in the stretched first voice, such that each frame image is synchronously displayed and played with the corresponding stretched first syllables during playback, thereby improving the user experience in viewing the video. BRIEF DESCRIPTION OF THE DRAWINGS In order to more clearly illustrate technical solutions in one or more embodiments of the present disclosure or in the prior art, the drawings required for use in the description of the embodiments or the prior art will be briefly introduced below. It should be understood that the drawings described below merely illustrate some embodiments disclosed herein, and those of ordinary skill in the art may also derive other drawings based on these drawings without any inventive effort. FIG. 1 is a schematic fl