Search

CN-121983008-A - Method, apparatus, device, medium and program product for generating music podcast program

CN121983008ACN 121983008 ACN121983008 ACN 121983008ACN-121983008-A

Abstract

The invention relates to the technical field of computers and provides a method, a device, equipment, a medium and a program product for generating a music podcast program, wherein the method for generating the music podcast program comprises the steps of firstly acquiring a target song list and broadcasting style description information for broadcasting the target song list; then, generating a podcast manuscript based on the song information in the target song list and the broadcasting style description information, wherein the podcast manuscript contains song string words with the style corresponding to the broadcasting style description information; and finally, converting the generated podcast manuscript into voice audio, and synthesizing the voice audio and song audio corresponding to the target song list, thereby obtaining the final music podcast program. The present disclosure is capable of generating a music podcast program having a particular genre.

Inventors

  • ZHANG YUTAO
  • LI ZHAOMING
  • CHEN MIAORONG
  • CHEN JINHAI

Assignees

  • 杭州网易云音乐科技有限公司

Dates

Publication Date
20260505
Application Date
20260212

Claims (10)

  1. 1. A method of generating a music podcast program, the method comprising: acquiring a target song list and broadcasting style description information for broadcasting the target song list; Generating a podcast manuscript based on the song information in the target song list and the broadcasting style description information, wherein the podcast manuscript comprises song string words with the style corresponding to the broadcasting style description information; And converting the podcast manuscript into voice audio, and synthesizing the voice audio and song audio corresponding to the target song list to obtain the music podcast program.
  2. 2. The method of claim 1, wherein the obtaining the list of target songs comprises: Screening candidate songs matched with the broadcasting style description information from a preset music database according to the broadcasting style description information; screening the candidate songs according to a preset screening rule to obtain the target song list.
  3. 3. The method of claim 2, wherein after the step of obtaining the list of target songs, the method further comprises: inquiring whether a music podcast program containing the candidate song is generated based on the broadcasting style description information within a preset historical time period; If yes, the candidate songs are removed from the target song list.
  4. 4. The method of claim 1, wherein the generating a podcast script based on song information in the target song list and the presentation style description information comprises: constructing an input instruction, wherein the input instruction comprises the broadcasting style description information, manuscript format constraint and metadata of each song in the target song list, and the metadata at least comprises song names and singer names; And inputting the input instruction into a text generation model to obtain the podcast manuscript.
  5. 5. The method of claim 4, wherein said inputting the input instruction into a text generation model to obtain the podcast document comprises: Packaging metadata of all songs in the target song list into the same input instruction; And generating a podcast manuscript containing the string words of all songs in the target song list through the text generation model.
  6. 6. The method of claim 1, wherein synthesizing the voice audio with song audio corresponding to the target song list comprises: and performing time sequence synthesis according to the playing sequence of songs in the target song list, and performing audio transition processing on at least one song audio so as to realize smooth transition between the voice audio and the song audio.
  7. 7. A device for generating a music podcast program, said device comprising: the acquisition module is used for acquiring a target song list and broadcasting style description information for broadcasting the target song list; the generation module is used for generating a podcast manuscript based on the song information in the target song list and the broadcasting style description information, wherein the podcast manuscript comprises song string words with the style corresponding to the broadcasting style description information; And the synthesizing module is used for converting the podcast manuscript into voice audio and synthesizing the voice audio and song audio corresponding to the target song list to obtain the music podcast program.
  8. 8. An electronic device, comprising: A memory and a processor in communication with each other, the memory having stored therein computer instructions which, upon execution, perform the method of generating a music podcast program of any one of claims 1 to 6.
  9. 9. A computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of generating a music podcast program of any one of claims 1 to 6.
  10. 10. A computer program product comprising computer instructions for causing a computer to perform the method of generating a music podcast program as claimed in any one of claims 1 to 6.

Description

Method, apparatus, device, medium and program product for generating music podcast program Technical Field The present disclosure relates to the field of computer technology, and in particular, to a method, apparatus, device, medium, and program product for generating a music podcast program. Background With the rapid development of audio platforms, music podcasts have become an important channel for users to acquire music content. The current production of music podcasts mainly depends on a manual operation mode, namely, operators manually screen songs, write word strings, record human voice and carry out later audio editing and splicing. In addition, although a partially automated content production scheme exists in the related technology, the method is mostly limited to a filling mode based on a fixed sentence pattern template, for example, metadata such as song names, singers and the like are simply embedded into preset recommended words and song information slots, or only a comment for a single song can be generated by using simple rules, complete and logically coherent long podcast manuscripts containing opening white, natural transition strings between songs and ending cannot be automatically generated, the generated program content is serious in homogeneity, unique personalized styles and emotion colors of a host are lacked, and the requirements of listeners on diversified and richer content are difficult to meet. Therefore, there is a need for a method for generating a podcast program to solve the problems of low production efficiency, single content style and serious homogeneity of the podcast in the related art. Disclosure of Invention The present disclosure provides a method, apparatus, device, medium and program product for generating a music podcast program to solve the problems of low production efficiency, single content style and serious homogeneity of the music podcast in the related art. In a first aspect, the present disclosure provides a method of generating a music podcast program, the method comprising: Acquiring a target song list and broadcasting style description information for broadcasting the target song list; Generating a podcast manuscript based on song information and broadcasting style description information in a target song list, wherein the podcast manuscript comprises song string words with the style corresponding to the broadcasting style description information; and converting the podcast manuscript into voice audio, and synthesizing the voice audio with song audio corresponding to the target song list to obtain the music podcast program. In an alternative embodiment, obtaining the target song list includes: screening candidate songs matched with the description information of the broadcasting style from a preset music database according to the description information of the broadcasting style; screening the candidate songs according to a preset screening rule to obtain a target song list. In an alternative embodiment, after the step of obtaining the target song list, the method further includes: inquiring whether a music podcast program containing candidate songs is generated based on the broadcasting style description information within a preset historical time period; if yes, the candidate songs are removed from the target song list. In an alternative embodiment, generating the podcast document based on song information in the target song list and the broadcast style description information includes: Constructing an input instruction, wherein the input instruction comprises broadcasting style description information, manuscript format constraint and metadata of each song in a target song list, and the metadata at least comprises song names and singer names; and inputting the input instruction into a text generation model to obtain the podcast manuscript. In an alternative embodiment, inputting an input instruction into a text generation model to obtain a podcast document includes: packaging metadata of all songs in the target song list into the same input instruction; And generating the podcast manuscript containing the string words of all songs in the target song list through a text generation model. In an alternative embodiment, synthesizing the voice audio with song audio corresponding to the target song list includes: And performing time sequence synthesis according to the playing sequence of songs in the target song list, and performing audio transition processing on at least one song audio to realize smooth transition between voice audio and song audio. In an alternative embodiment, the time sequence synthesis is performed according to the playing sequence of songs in the target song list, and the audio transition processing is performed on at least one song audio, including: Determining the entry time period and the exit time period of each song audio according to the playing sequence of songs in the target song list; executing volume gradual change processin