CN-122024536-A - Multi-role interaction-based story machine control method, system, electronic equipment and readable storage medium

CN122024536ACN 122024536 ACN122024536 ACN 122024536ACN-122024536-A

Abstract

The application relates to the technical field of educational equipment, in particular to a story machine control method and system based on multi-role interaction, electronic equipment and a readable storage medium. The method comprises the steps of obtaining a user voice text through wake-up detection and voice recognition, carrying out intention recognition and emotion analysis on the user voice text based on a pre-trained natural language understanding model to obtain user intention and user emotion labels, calling at least one virtual character file according to the user intention, generating a character response text corresponding to the user intention based on the virtual character file and dialogue history, wherein the character file comprises a knowledge field, character templates and language styles, outputting story content according to the user intention, the character response text and the user emotion labels, converting the character response text into voice audio matched with tone characteristics of the virtual character, and outputting the voice audio according to the story content. The multi-role deep cooperative dialogue is realized, and personalized and dynamic narrative based on real-time feedback of users is realized.

Inventors

SHI WEIXIONG
XU ZHENXING
REN QUAN

Assignees

怡高科教(广东)有限公司

Dates

Publication Date: 20260512
Application Date: 20260126

Claims (10)

1. The story machine control method based on multi-role interaction is characterized by comprising the following steps of: based on the received user voice input, performing wake-up detection and voice recognition to obtain a user voice text; Transmitting the user voice text to a pre-trained natural language understanding model, and carrying out intention recognition and emotion analysis on the user voice text to obtain user intention and user emotion labels; Invoking at least one virtual character file according to the user intention, and generating a character response text corresponding to the user intention based on the virtual character file and a dialogue history, wherein the character file comprises a knowledge field, a character template and a language style; outputting story content according to the user intention, the character response text and the user emotion label; And converting the character response text into voice audio matched with tone characteristics of the virtual character, and outputting the voice audio according to the story content.
2. The multi-persona interaction-based storytelling machine control method of claim 1, wherein the step of outputting story content based on the user intent, character response text, user emotion tags comprises: initializing a story line or continuing according to the user intention; Generating story content including the character response text based on the story line; and updating the story content according to the user emotion label or the user interaction instruction, and outputting the updated story content.
3. The multi-persona interaction-based storytelling machine control method of claim 1, wherein the step of invoking at least one virtual character profile according to the user intent and generating character response text corresponding to the user intent based on the virtual character profile, a dialog history, and comprises: determining the number of virtual roles and the names of the virtual roles according to the user intention; If the number of the virtual roles is greater than or equal to two, loading a corresponding virtual role file based on the virtual role names; generating a corresponding prompt word for each virtual character based on the user intention, each virtual character file and the dialogue history; and transmitting the prompt words to a pre-trained language generation model to obtain character response texts of the virtual characters.
4. The multi-persona interaction-based storytelling control method of claim 3, wherein the step of converting the character response text into voice audio matching a timbre characteristic of a virtual character and outputting the voice audio according to the story content comprises: Determining a speaking sequence of each virtual character based on the story content; And sequentially transmitting the character response text of each virtual character to the corresponding TTS model according to the speaking sequence, generating voice audio matched with the tone characteristics of each virtual character and outputting the voice audio.
5. The multi-persona interaction-based storytelling control method of any of claims 1 to 4, wherein after the step of converting the persona response text into voice audio matching the timbre characteristics of the virtual persona and outputting the voice audio according to the story content, further comprises: Suspending audio output operation of the virtual character at a preset interaction condition node, and outputting interaction guiding audio corresponding to the preset interaction condition node; After receiving voice input of a user, carrying out story line continuation operation according to the voice input of the user; And performing an updating operation on the story content based on the continued story line.
6. The multi-persona interaction-based storytelling machine control method of claim 1, wherein the multi-persona interaction-based storytelling machine control method further comprises: Performing data desensitization processing on interaction data of a user and a virtual character, and performing structural storage on the desensitized interaction data according to a user ID, wherein the interaction data comprises an interaction time stamp, a user voice text, an ASR confidence coefficient, an NLU analysis result, a character response text and a selected virtual character; And carrying out user growth analysis on the desensitized interaction data, generating a user growth report and outputting the user growth report to the visual front end.
7. The method of claim 6, wherein the step of performing a user growth analysis on the desensitized interaction data to generate a user growth report and outputting the user growth report to a visualization front end comprises: Aggregating the desensitized interaction data according to a preset period to obtain periodic interaction data; analyzing vocabulary index, creativity index, logic index, emotion index and communication index based on the periodic interaction data to obtain the index data of the present period; comparing the current period index data with the previous period index data to generate the user growth report, wherein the user growth report comprises a capability growth trend graph, a capability radar graph and a descriptive comment.
8. A storytelling machine control system based on multi-persona interactions, wherein the storytelling machine control system based on multi-persona interactions comprises: the voice interaction module is used for carrying out wake-up detection and voice recognition based on received user voice input to obtain user voice text; The language understanding module is used for transmitting the user voice text to a pre-trained natural language understanding model, and carrying out intention recognition and emotion analysis on the user voice text to obtain user intention and user emotion labels; the role dialogue module is used for calling at least one virtual role file according to the user intention, generating a role response text corresponding to the user intention based on the virtual role file and dialogue history, wherein the role file comprises a knowledge field, a character template and a language style; the story generation module is used for outputting story contents according to the user intention, the role response text and the user emotion label; and the audio output module is used for converting the character response text into voice audio matched with the tone characteristics of the virtual character and outputting the voice audio according to the story content.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the multi-persona interaction-based storytelling control method of any one of claims 1 to 7.
10. A readable storage medium, characterized in that the readable storage medium is a computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the multi-persona interaction based storytelling control method as claimed in any of claims 1 to 7.

Description

Multi-role interaction-based story machine control method, system, electronic equipment and readable storage medium Technical Field The application relates to the technical field of educational equipment, in particular to a story machine control method and system based on multi-role interaction, electronic equipment and a readable storage medium. Background Currently, children's storytelling machines have been upgraded from early unidirectional audio playback to intelligent devices with basic voice interaction capabilities. However, the existing market products generally have the problems of weak interactivity and single function. Although part of products introduce a voice interaction technology, the interaction depth is insufficient, only a basic question-answer mode is usually supported, multi-role cooperative interaction cannot be realized, and content is difficult to dynamically generate according to user behaviors. The common children story machine products mainly comprise three types, namely a unidirectional play story machine, a simple voice interactive story machine and a story machine based on fixed scripts. The unidirectional playing story machine realizes story playing by relying on pre-stored audio files, has the advantages of low cost and simple operation, but has the fundamental defect of completely lacking interactivity, and is difficult to excite and maintain the use interests of children. The simple voice interactive story machine integrates basic voice recognition and response functions, can respond to simple instructions such as playing, pausing and the like or can conduct questions and answers within a limited range, but the technical architecture of the simple voice interactive story machine cannot process complex conversations containing context association, and cannot support collaborative or dialectical scenes involving a plurality of virtual characters with different characters and knowledge backgrounds. For a story machine based on fixed scripts, the content generation depends on a preset story tree or a limited branch script, and can provide a certain plot change, but because the narrative path and the role response of the story machine are logically solidified, the story trend or the role behavior cannot be dynamically adjusted according to real-time feedback (such as sudden fantasy, emotional expression or personalized selection) generated by children in the interaction process, so that the experience still tends to be single and predictable. For example, chinese patent publication No. CN104916172a discloses a method for dialogue between early-teaching story machines and early-teaching story machines, which mainly plays preset voice dialogue files sequentially by two early-teaching story machines according to preset control commands, so that they are presented as dialogue. The essence is the sequential playing of audio files, rather than interactive conversations in the true sense, the characters lack context understanding and dynamic response capability, and the system cannot perform semantic analysis and intention recognition according to language input contents of children, so that the interaction process is stiff and passive. In summary, the story machine in the market at present has the problems that an interaction mode is stiff, multi-role deep cooperative dialogue cannot be supported, a content generation mechanism is fixed, and personalized and dynamic narrative based on real-time feedback of users cannot be realized. The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present application and is not intended to represent an admission that the foregoing is prior art. Disclosure of Invention The application mainly aims to provide a story machine control method, system, electronic equipment and readable storage medium based on multi-role interaction, and aims to solve the technical problems of stiffness of interaction modes of children story machines and fixation of a content generation mechanism. The application provides a story machine control method based on multi-role interaction, which comprises the steps of carrying out wake-up detection and voice recognition based on received user voice input to obtain user voice texts, transmitting the user voice texts to a pre-trained natural language understanding model, carrying out intention recognition and emotion analysis on the user voice texts to obtain user intention and user emotion labels, calling at least one virtual role file according to the user intention, generating role response texts corresponding to the user intention based on the virtual role file and dialogue histories, wherein the role file comprises knowledge fields, character templates and language styles, outputting story contents according to the user intention, the role response texts and the user emotion labels, converting the role response texts into voice audios matched with tone characterist