CN-121999740-A - AI-generation-based collaborative singing interaction method and system for children, electronic equipment and readable storage medium

CN121999740ACN 121999740 ACN121999740 ACN 121999740ACN-121999740-A

Abstract

The application discloses a child collaborative singing interaction method and system based on AI generation, electronic equipment and a readable storage medium. The child collaborative singing interaction method based on AI generation comprises the steps of collecting child language texts and/or child voice audios, generating target child songs according to the child language texts and/or the child voice audios by using an AI child song lightweight generation model, collecting real-time audios for collaborative singing of the target child songs by at least two persons, detecting tone deviation, volume stability and gamut matching degree of the real-time audios, distributing singing roles corresponding to the real-time audios according to the tone deviation, the volume stability and the gamut matching degree, and performing audio-visual linkage rendering on the real-time audios after playing and collecting the real-time audios of the singing target child songs. The application improves the experience sense of multi-person collaborative singing interaction for children.

Inventors

Wu Aobing
YE SHENG
YE ZIYUN
Ye Taoying
HUANG ZIHAO

Assignees

广东晔生科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20260320

Claims (10)

1. The children collaborative singing interaction method based on AI generation is characterized by comprising the following steps: collecting language text and/or voice audio of children; Generating a target baby song according to the child language text and/or the child voice audio by using an AI baby song lightweight generation model; collecting real-time audio of at least two collaborative singing children singing the target baby songs; Detecting the pitch deviation, volume stability and gamut matching degree of the real-time audio; distributing corresponding singing roles of real-time audio according to the intonation deviation, the volume stability and the gamut matching degree; and playing the generated target baby song, collecting real-time audio of singing the target baby song, and performing audio-visual linkage rendering on the real-time audio.
2. The AI-generation-based collaborative singing interaction method for children according to claim 1, wherein the AI-pergola lightweight generation model performs semantic understanding on creative content contained in input children language text and/or children voice audio, and then invokes a preset pergola library to generate pergola conforming to the semantic understanding content according to template matching and rule constraint.
3. The collaborative singing interaction method for children based on AI generation according to claim 2, wherein the child song generated by the AI child song lightweight generation model is split into a plurality of independent audio tracks, each independent audio track is subjected to audio domain adaptation, the audio domain of the independent audio track falls within the range of 150Hz to 800Hz which accords with the comfort audio domain of the child, and the plurality of independent audio tracks after the audio domain adaptation are recombined into the target child song.
4. The AI-generation-based collaborative singing interaction method for children according to claim 1, wherein the pitch deviation, the volume stability and the gamut matching degree of real-time audio are weighted and summed to obtain a composite score, and then corresponding singing roles are distributed according to the composite score.
5. The AI-generation-based collaborative singing interaction method for children of claim 1, wherein after real-time audio is collected, a hardware timestamp based on a master clock is added to each path of real-time audio for time sequence alignment.
6. The AI-generation-based collaborative singing interaction method for children of claim 5, wherein the audio frames in the real-time audio are checked for uniqueness based on the hardware timestamp, and repeated audio frames in the real-time audio are removed.
7. The AI-generation-based collaborative singing interaction method for children of claim 1, wherein the audiovisual linkage rendering comprises outputting visual elements corresponding to the level and volume of real-time audio to a display device for display.
8. The child collaborative singing interaction system based on AI generation is characterized by comprising a creative child singing generation module, an analysis and distribution module and an audiovisual linkage rendering module; the creative child song generation module comprises a creative input unit and an AI child song lightweight generation model, and the analysis and distribution module comprises an audio acquisition unit, a singing capability analysis unit and a role distribution unit; The creative input unit is used for collecting language texts of children and/or voice audios of the children; The AI baby song lightweight generation model is used for generating a target baby song according to the child language text and/or the child voice audio; the audio acquisition unit is used for acquiring real-time audio of at least two collaborative singing children singing the target baby songs; The singing capability analysis unit is used for detecting the pitch deviation, the volume stability and the gamut matching degree of the real-time audio; the character allocation unit is used for allocating singing characters corresponding to the real-time audio according to the tone deviation, the volume stability and the voice domain matching degree; the audio-visual linkage rendering module is used for performing audio-visual linkage rendering on real-time audio after playing and collecting real-time audio of the target baby song.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and running on the processor, the processor implementing the AI-generated collaborative singing interaction method for children of any one of claims 1-7 when executing the computer program.
10. A readable storage medium storing computer instructions for causing a computer to perform the AI-generated collaborative singing interaction method for a child of any of claims 1-7.

Description

AI-generation-based collaborative singing interaction method and system for children, electronic equipment and readable storage medium Technical Field The application belongs to the technical field of entertainment equipment, and particularly relates to a child collaborative singing interaction method, system, electronic equipment and readable storage medium based on AI generation. Background Currently, the technology or product of the collaborative singing interaction of children is actually a functional scene variant of the traditional karaoke or singing following system, the song content depends on preset songs, or a user is required to manually use keywords to perform networking searching, the interaction capability of the acquisition of the songs is limited to the text data mapping relation between the set keywords and the song content, the capability of dynamically deciding to search or generate songs according to the creative input of the user (such as the creative of natural language text to the song content or song characteristics and the creative of voice audio of the user to the song content or song characteristics) is lacking, and the defects are overcome in the entertainment equipment facing the collaborative singing of children, because the knowledge capability and the description capability of the children are much weaker than those of the teenagers and the adults, the requirement of generating new content according to the instant ideas of the children is difficult to be met, and the interaction form is single and the thinking creation of the children is difficult to be stimulated. In addition, in the prior art, the AI music search matching technology based on a large-scale deep learning model has reasoning delay of usually more than 10 seconds, cannot meet the demand of children on instant feedback, and the generated content is not optimized for the children in the aspects of voice domain and rhythm, so that the singing threshold is high. On the other hand, the technology supporting collaborative singing of multiple children often distributes singing parts of each child according to preset singing roles, lacks flexible role distribution and rotation mechanisms, reduces the continuous participation wish of children with weaker singing ability, is limited to static lyrics and simple background animation on audiovisual feedback, cannot dynamically link with real-time states of singing such as intonation, volume and degree of collaboration, and has insufficient immersion experience and excitation effect. Disclosure of Invention The application aims to provide a child collaborative singing interaction method, system, electronic equipment and readable storage medium based on AI generation, which are used for improving the experience of multi-person collaborative singing interaction for children. In a first aspect, the present application provides a method for collaborative singing interaction of children based on AI generation, including: collecting language text and/or voice audio of children; generating a target baby song according to the child language text and/or the child voice audio by using an AI baby song lightweight generation model; Collecting real-time audio of at least two collaborative singing children singing target songs; the pitch deviation, volume stability and gamut matching degree of the real-time audio; distributing corresponding singing roles of the real-time audio according to the tone deviation, the volume stability and the gamut matching degree; And playing the generated target baby song, collecting real-time audio of singing the target baby song, and performing audio-visual linkage rendering on the target real-time audio. As an implementation manner of the first aspect, the AI baby song lightweight generation model performs semantic understanding on creative content contained in the input child language text and/or child voice audio, and then invokes a preset baby song library to generate a baby song conforming to the semantic understanding content according to template matching and rule constraint. Preferably, the child song generated by the AI child song lightweight generation model is split into a plurality of independent sound tracks, each independent sound track is subjected to the sound range adaptation, the sound range of the independent sound track falls into the range of 150Hz to 800Hz which accords with the comfort sound range of children, and the plurality of independent sound tracks after the sound range adaptation are recombined into the target child song. As an implementation manner of the first aspect, the pitch deviation, the volume stability and the gamut matching degree of the real-time audio are weighted and summed to obtain a composite score, and then the corresponding singing roles are allocated according to the composite score. As an implementation manner of the first aspect, after the real-time audio is collected, a hardware timestamp based on a mast