US-12626725-B2 - Audio/video processing method and apparatus, device, and storage medium

US12626725B2US 12626725 B2US12626725 B2US 12626725B2US-12626725-B2

Abstract

Provided are an audio/video processing method and apparatus, a device, and a storage medium. The method comprises: displaying text data corresponding to an audio/video to be edited, wherein the text data has a mapping relation with an audio/video timestamp of said audio/video; displaying said audio/video according to a time axis track; in response to a preset operation triggered for target text data in the text data, determining an audio/video timestamp corresponding to the target text data as a target audio/video timestamp; and processing, on the basis of the preset operation, an audio/video clip corresponding to the target audio/video timestamp in said audio/video.

Inventors

Weiming Zheng
Cheng Li
Xuelun FU
Yixiu HUANG
RUI XIA
Xin Zheng
Lin Bao
Weisi Wang
Chen Ding

Assignees

BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.

Dates

Publication Date: 20260512
Application Date: 20231222
Priority Date: 20210922

Claims (14)

1 . An audio video processing method, the method comprising: presenting text data corresponding to an audio video to be edited, wherein the text data has a mapping relationship with an audio video timestamp of the audio video to be edited; presenting the audio video to be edited in accordance with a time axis track; determining an audio video timestamp corresponding to target text data in the text data as a target audio video timestamp, in response to a preset operation triggered for the target text data; and processing an audio video clip corresponding to the target audio video timestamp in the audio video to be edited based on the preset operation, wherein the preset operation is triggered by: presenting a first edit entry for invalid tone words or a preset silence clip, wherein the first edit entry corresponds to a first edit card, on which a one-click deletion control is provided; displaying the invalid tone words or the preset silence clip in the text data in accordance with a second display style, in response to a trigger operation for the first edit entry; and deleting the invalid tone words or the preset silence clip from the text data, in response to a trigger operation for the one-click deletion control.
2 . The method according to claim 1 , wherein the method further comprises: presenting a voice enhancement control on a second edit card; performing an enhancement processing on human voice in the audio video to be edited, in response to a trigger operation for the voice enhancement control.
3 . The method according to claim 1 , wherein the method further comprises: determining background music corresponding to the audio video to be edited based on at least one of a musical genre of the audio video to be edited or a content in the text data corresponding to the audio video to be edited; adding the background music to the audio video clip to be edited.
4 . The method according to claim 1 , wherein the method further comprises: presenting a loudness equalization control on a third edit card; performing a normalization processing on loudness of a volume in the audio video to be edited, in response to a trigger operation for the loudness equalization control.
5 . The method according to claim 1 , wherein the method further comprises: presenting an intelligent teaser control on a fourth edit card; adjusting a music volume and a human voice volume in an audio video clip in the audio video to be edited within a previous preset time period, in response to a trigger operation for the intelligent teaser control, so as to obtain an audio video clip with adjusted volume, wherein the music volume is inversely proportional to the human voice volume in the audio video clip with adjusted volume.
6 . The method according to claim 1 , wherein the preset operation comprises a selection operation, and the processing an audio video clip corresponding to the target audio video timestamp in the audio video to be edited based on the preset operation comprises: displaying the audio video clip corresponding to the target audio video timestamp in the audio video to be edited in accordance with a preset first display style.
7 . The method according to claim 1 , wherein the preset operation comprises a deletion operation, and the processing an audio video clip corresponding to the target audio video timestamp in the audio video to be edited based on the preset operation comprises: deleting the audio video clip corresponding to the target audio video timestamp in the audio video to be edited based on the deletion operation.
8 . The method according to claim 1 , wherein the preset operation comprises a modification operation, and the processing an audio video clip corresponding to the target audio video timestamp in the audio video to be edited based on the preset operation comprises: acquiring modified text data corresponding to the modification operation; generating an audio video clip based on the modified text data and tone color information in the audio video to be edited, as an audio video clip to be modified; performing replacement processing on the audio video clip corresponding to the target audio video timestamp in the audio video to be edited, by utilizing the audio video clip to be modified.
9 . The method according to claim 1 , wherein the method further comprises: upon receiving an addition operation for first text data in the text data, generating a first audio video clip based on the first text data and tone color information in the audio video to be edited; determining a first audio video timestamp corresponding to the first text data based on position information of the first text data in the text data; adding the first audio video clip to the audio video to be edited, based on the first audio video timestamp.
10 . A non-transitory computer readable storage medium having stored therein instructions that, when being executed on a terminal device, cause the terminal device to implement a method comprising: presenting text data corresponding to an audio video to be edited, wherein the text data has a mapping relationship with an audio video timestamp of the audio video to be edited; presenting the audio video to be edited in accordance with a time axis track; determining an audio video timestamp corresponding to target text data in the text data as a target audio video timestamp, in response to a preset operation triggered for the target text data; and processing an audio video clip corresponding to the target audio video timestamp in the audio video to be edited based on the preset operation, wherein the preset operation is triggered by: presenting a first edit entry for invalid tone words or a preset silence clip, wherein the first edit entry corresponds to a first edit card, on which a one-click deletion control is provided; displaying the invalid tone words or the preset silence clip in the text data in accordance with a preset second display style, in response to a trigger operation for the first edit entry; and deleting the invalid tone words or the preset silence clip from the text data, in response to a trigger operation for the one-click deletion control.
11 . The non-transitory computer readable storage medium according to claim 10 , wherein the method further comprises: presenting a voice enhancement control on a second edit card; performing an enhancement processing on human voice in the audio video to be edited, in response to a trigger operation for the voice enhancement control.
12 . The non-transitory computer readable storage medium according to claim 10 , wherein the method further comprises: determining background music corresponding to the audio video to be edited based on at least one of a musical genre of the audio video to be edited or a content in the text data corresponding to the audio video to be edited; adding the background music to the audio video clip to be edited.
13 . A device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor, when executing the computer program, implementing a method comprising: presenting text data corresponding to an audio video to be edited, wherein the text data has a mapping relationship with an audio video timestamp of the audio video to be edited; presenting the audio video to be edited in accordance with a time axis track; determining an audio video timestamp corresponding to target text data in the text data as a target audio video timestamp, in response to a preset operation triggered for the target text data; and processing an audio video clip corresponding to the target audio video timestamp in the audio video to be edited based on the preset operation, wherein the preset operation is triggered by: presenting a first edit entry for invalid tone words or a preset silence clip, wherein the first edit entry corresponds to a first edit card, on which a one-click deletion control is provided; displaying the invalid tone words or the preset silence clip in the text data in accordance with a preset second display style, in response to a trigger operation for the first edit entry; and deleting the invalid tone words or the preset silence clip from the text data, in response to a trigger operation for the one-click deletion control.
14 . The device according to claim 13 , wherein the method further comprises: presenting a voice enhancement control on a second edit card; performing an enhancement processing on human voice in the audio video to be edited, in response to a trigger operation for the voice enhancement control.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS The present application is a continuation application of International Application No. PCT/CN2022/116650, filed on Sep. 2, 2022, which claims priority to Chinese Patent Application No. 202111109213.4, entitled “AUDIO/VIDEO PROCESSING METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM”, filed on Sep. 22, 2021, the disclosure of both applications is incorporated by reference herein in their entirety. TECHNICAL FIELD The present disclosure relates to the field of data processing, and in particular to an audio video processing method, apparatus, device, and a storage medium. BACKGROUND With the increasing abundance of internet information, watching audio video has become an entertainment activity in daily life of people. In order to improve the watching experience of users, audio video clipping is an important part before various audio videos are published. At present, in the process of audio video clipping, for some slight changes such as clipping of invalid words, a user usually listens to the audio video repeatedly and meanwhile finely adjusts the start point and the end point of time so as to clip the audio video. The operations are complex, and the accuracy of the audio video clipping needs to be improved. SUMMARY In order to solve the above technical problem or at least partially solve the above technical problem, an embodiment of the present disclosure provides an audio video processing method, which can improve the accuracy of audio video clipping and simplify user operations. In a first aspect, the present disclosure provides an audio video processing method, including: presenting text data corresponding to an audio video to be edited, wherein the text data has a mapping relationship with an audio video timestamp of the audio video to be edited; andpresenting the audio video to be edited in accordance with a time axis track;determining an audio video timestamp corresponding to target text data in the text data as a target audio video timestamp, in response to a preset operation triggered for the target text data;processing an audio video clip corresponding to the target audio video timestamp in the audio video to be edited based on the preset operation. In an optional implementation, the method further includes: presenting a first edit entry for a preset keyword or a preset silence clip;displaying the preset keyword or the preset silence clip in the text data in accordance with a preset second display style, in response to a trigger operation for the first edit entry. In an optional implementation, the first edit entry corresponds to a first edit card, on which a one-click deletion control is provided. After the displaying the preset keyword or the preset silence clip in the text data in accordance with a preset second display style, in response to a trigger operation for the first edit entry, the method further includes: deleting the preset keyword or the preset silence clip from the text data, in response to a trigger operation for the one-click deletion control. In an optional implementation, the method further includes: presenting a voice enhancement control on a second edit card;performing an enhancement processing on human voice in the audio video to be edited, in response to a trigger operation for the voice enhancement control. In an optional implementation, the method further includes: determining background music corresponding to the audio video to be edited based on a musical genre of the audio video to be edited and/or a content in the text data corresponding to the audio video to be edited;adding the background music to the audio video clip to be edited. In an optional implementation, the method further includes: presenting a loudness equalization control on a third edit card;performing a normalization processing on loudness of a volume in the audio video to be edited, in response to a trigger operation for the loudness equalization control. In an optional implementation, the method further includes: presenting an intelligent teaser control on a fourth edit card;adjusting a music volume and a human voice volume in an audio video clip in the audio video to be edited within a previous preset time period, in response to a trigger operation for the intelligent teaser control, so as to obtain an audio video clip with adjusted volume, wherein the music volume is inversely proportional to the human voice volume in the audio video clip with adjusted volume. In an optional implementation, the preset operation includes a selection operation, and the processing an audio video clip corresponding to the target audio video timestamp in the audio video to be edited based on the preset operation includes: displaying the audio video clip corresponding to the target audio video timestamp in the audio video to be edited in accordance with a preset first display style. In an optional implementation, the preset operation includes a deletion operation, and the processing an