CN-121983039-A - Dialogue intention recognition method of intelligent toy and related equipment

CN121983039ACN 121983039 ACN121983039 ACN 121983039ACN-121983039-A

Abstract

The application is applicable to the technical field of artificial intelligence, and provides a dialogue intention recognition method of an intelligent toy, which comprises the steps of acquiring voice data of a target child user; the method comprises the steps of carrying out acoustic and text parallel preprocessing on voice data to obtain text semantic features and corresponding acoustic emotion features corresponding to the voice data, carrying out feature fusion processing on the text semantic features and the acoustic emotion features to obtain fusion features, carrying out intention recognition processing on the voice data based on the fusion features to obtain first dialogue intention, carrying out rationality check and correction on the first dialogue intention through a preset intention correction model in combination with an intention memory database to obtain target dialogue intention, determining a target response strategy in a preset response strategy library based on the target dialogue intention, and controlling the intelligent toy to execute the target response strategy. The application solves the problems that the prior method is limited by the diversity of the voice characteristics and the expression modes of children and is difficult to accurately judge the true intention of the children.

Inventors

MAO WEIPENG
LUO GAN

Assignees

深圳市噜咔博士科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251226

Claims (10)

1. A method for identifying conversational intent of an intelligent toy, the method comprising the steps of: acquiring voice data of a target child user; Carrying out acoustic and text parallel preprocessing on the voice data to obtain text semantic features and corresponding acoustic emotion features corresponding to the voice data; Performing feature fusion processing on the text semantic features and the acoustic emotion features to obtain fusion features; Performing intention recognition processing on the voice data based on the fusion characteristics to obtain a first dialogue intention; Combining an intention memory database, and carrying out rationality check and correction on the first dialogue intention through a preset intention correction model to obtain a target dialogue intention; and determining a target response strategy in a preset response strategy library based on the target dialogue intention, and controlling the intelligent toy to execute the target response strategy.
2. The method for recognizing dialogue intention of intelligent toy according to claim 1, wherein the text preprocessing of the voice data to obtain text semantic features corresponding to the voice data comprises: performing voice-to-text processing on the voice data to obtain text data to be processed; normalizing the text data to be processed to obtain normalized text data; And carrying out semantic feature extraction processing on the normalized text data to obtain text semantic features corresponding to the voice data.
3. The method for recognizing dialogue intention of intelligent toy according to claim 1, wherein the performing acoustic preprocessing on the voice data to obtain acoustic emotion characteristics corresponding to the voice data comprises: Performing acoustic feature extraction processing on the voice data to obtain acoustic features; And carrying out emotion recognition processing on the acoustic features to obtain acoustic emotion features corresponding to the voice data.
4. The method for recognizing a dialog intention of a smart toy according to claim 1, wherein the performing an intention recognition process on the voice data based on the fusion feature to obtain a first dialog intention includes: calculating semantic similarity between the fusion features and semantic features in each scene template; determining a target scene template based on the semantic similarity; and combining the context information with the target scene, and determining a first dialogue intention in a preset dynamic intention map.
5. The method for recognizing a dialog intention of a smart toy according to claim 4, wherein before the combining of the context information with the target scene, the method further comprises, before determining the first dialog intention in a preset dynamic intention map: obtaining a plurality of groups of sample data, wherein each group comprises sample text semantic features, sample emotion features and sample intonation curve features; And carrying out weighted fusion on each group of sample data to generate a dynamic intention map.
6. The method for recognizing dialogue intention of intelligent toy according to claim 1, wherein the intention memory database includes user history interaction data and context information of dialogue, the combining intention memory database performs rationality check and correction on the first dialogue intention through a preset intention correction model to obtain a target dialogue intention, comprising: inputting the first dialogue intention, the context information and the user history interaction data into a preset intention correction model for rationality verification to obtain a verification result; and correcting the first dialogue intention based on the verification result to obtain a target dialogue intention.
7. The intelligent toy conversation intention recognition method as claimed in claim 1, wherein the determining a target response strategy from a preset response strategy library based on the target conversation intention includes: According to the target dialogue intention, a plurality of candidate response strategies are determined in a preset response strategy library; and determining a target response strategy from a plurality of candidate response strategies by combining the attribute information and the historical preference data of the target child user.
8. A dialog intention recognition device of an intelligent toy, characterized in that the dialog intention recognition device of an intelligent toy comprises: the acquisition module is used for acquiring voice data of the target child user; The first processing module is used for carrying out acoustic and text parallel preprocessing on the voice data to obtain text semantic features and corresponding acoustic emotion features corresponding to the voice data; the second processing module is used for carrying out feature fusion processing on the text semantic features and the acoustic emotion features to obtain fusion features; The third processing module is used for carrying out intention recognition processing on the voice data based on the fusion characteristics to obtain a first dialogue intention; The fourth processing module is used for combining the intention memory database, and carrying out rationality check and correction on the first dialogue intention through a preset intention correction model to obtain a target dialogue intention; And the control module is used for determining a target response strategy in a preset response strategy library based on the target dialogue intention and controlling the intelligent toy to execute the target response strategy.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the method for recognizing dialog intention of a smart toy according to any one of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps in the dialog intention recognition method of a smart toy according to any of claims 1 to 7.

Description

Dialogue intention recognition method of intelligent toy and related equipment Technical Field The application belongs to the technical field of artificial intelligence, and particularly relates to a dialog intention recognition method of an intelligent toy and related equipment. Background With the wide application of the intelligent dialogue system for children in the fields of education, accompanying, emotion guiding and the like, how to accurately identify the real intention in the dialogue for children becomes a key problem. Traditional speech recognition and natural language understanding models are often limited by the variety of children's speech features (such as unclear pronunciation, unstable speech speed, and semantic jump) and expression modes, so that the system is difficult to accurately judge the true intention of the children. For example, when a child speaks the statement "i starve" for i little bear "i want to drink milk," the system often cannot distinguish the behavioral intent behind it (request, express emotion, or virtual feeding action). The existing scheme is dependent on general semantic matching, and lacks depth modeling aiming at language features and context of children, so that the recognition accuracy and interaction naturalness are insufficient. Therefore, a method for recognizing dialogue intention aiming at language features and context of children is needed to solve the problems that the existing voice recognition method is limited by the diversity of the voice features and expression modes of children, and the true intention of children is difficult to accurately judge, so that the user experience is poor. Disclosure of Invention The embodiment of the application provides a dialogue intention recognition method of an intelligent toy, which can solve the problems that the existing voice recognition method is limited by the variety of voice characteristics and expression modes of children, and the true intention of the children is difficult to accurately judge, so that the user experience is poor. The method comprises the steps of carrying out acoustic and text parallel preprocessing on voice data to obtain text semantic features and corresponding acoustic emotion features corresponding to the voice data, carrying out feature fusion processing on the text semantic features and the acoustic emotion features to obtain fusion features, carrying out intention recognition processing on the voice data according to the fusion features to obtain first dialogue intention, carrying out rationality check and correction on the first dialogue intention through a preset intention correction model in combination with an intention memory database to obtain target dialogue intention, determining a target response strategy in a preset response strategy library according to the target dialogue intention, and controlling an intelligent toy to execute the target response strategy, thereby solving the problems that the existing voice recognition method is limited by the diversity of voice features and expression modes of children, is difficult to accurately judge the true intention of children and leads to poor user experience. In a first aspect, an embodiment of the present application provides a method for identifying a dialog intention of an intelligent toy, the method including the steps of: acquiring voice data of a target child user; Carrying out acoustic and text parallel preprocessing on the voice data to obtain text semantic features and corresponding acoustic emotion features corresponding to the voice data; Performing feature fusion processing on the text semantic features and the acoustic emotion features to obtain fusion features; Performing intention recognition processing on the voice data based on the fusion characteristics to obtain a first dialogue intention; Combining an intention memory database, and carrying out rationality check and correction on the first dialogue intention through a preset intention correction model to obtain a target dialogue intention; and determining a target response strategy in a preset response strategy library based on the target dialogue intention, and controlling the intelligent toy to execute the target response strategy. Optionally, the text preprocessing is performed on the voice data to obtain text semantic features corresponding to the voice data, including: performing voice-to-text processing on the voice data to obtain text data to be processed; normalizing the text data to be processed to obtain normalized text data; And carrying out semantic feature extraction processing on the normalized text data to obtain text semantic features corresponding to the voice data. Optionally, the performing acoustic preprocessing on the voice data to obtain acoustic emotion features corresponding to the voice data includes: Performing acoustic feature extraction processing on the voice data to obtain acoustic features; And carrying out emotion recognitio