CN-121983038-A - Multi-mode control method and device for intelligent toy, electronic equipment and storage medium

CN121983038ACN 121983038 ACN121983038 ACN 121983038ACN-121983038-A

Abstract

The application is applicable to the technical field of artificial intelligence, and provides a multi-mode control method of an intelligent toy, which comprises the steps of obtaining input data of a target user; converting the input data into text data; determining entity structure data and entity relation data corresponding to the input data based on the text data, generating answer data of the input data according to the text data, the entity structure and the entity relation data, and controlling the intelligent toy to output the answer data. The application solves the problem of poor user experience caused by lack of deep combination of multi-mode information in the children dialogue scene due to single-mode interaction in the existing voice recognition method, and realizes high-efficiency semantic recognition and emotion interaction on the user dialogue scene.

Inventors

ZHAO PINLONG
Kuang Zhemin

Assignees

深圳市噜咔博士科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251226

Claims (10)

1. A multi-modal control method for an intelligent toy, the method comprising the steps of: Acquiring input data of a target user; Converting the input data into text data; Determining entity structure data and entity relation data corresponding to the input data based on the text data; Generating answer data of the input data according to the text data, the entity structure and the entity relation data; And controlling the intelligent toy to output the answer data.
2. The multi-modal control method of a smart toy according to claim 1, wherein the input data includes voice data and/or image data, the converting the input data into text data includes: Performing text conversion processing on the voice data to obtain first text data corresponding to the voice data; And carrying out semantic extraction processing on the image data to obtain second text data which is used for describing the image data.
3. The method for multi-modal control of a smart toy according to claim 1, wherein determining entity structure data and entity relationship data corresponding to the input data based on the text data comprises: Performing entity extraction processing on the text data through a preset entity extraction model to obtain entity structure data corresponding to the input data; And searching entity relation data in a preset map database based on the entity structure data.
4. The multi-modal control method of intelligent toys according to claim 3, wherein the entity structure data includes a target content body, the map database includes content bodies and association relations between the content bodies, and the retrieving entity relation data in a preset map database based on the entity structure data includes: Searching in the map database according to the target content main body to obtain an associated content main body with an association relation with the target; and determining entity relation data based on the target content body and the associated content body.
5. The multi-modal control method of intelligent toys of claim 4, wherein the profile database includes a first cache-based profile database and a second server-based profile database, the retrieving in the profile database based on the target content body comprising: Searching in the first map database based on the cache according to the target content main body; If the searching in the first map database based on the cache fails, searching in the second map database based on the server.
6. A multi-modal control method for a smart toy as claimed in claim 3 wherein prior to said determining entity structure data and entity relationship data corresponding to said input data based on said text data, said method further comprises: Adopting a first scene corpus of a group to which the target user belongs; based on the first scene corpus, extracting sample entity structure data and sample entity relation data; Training an entity extraction model according to the first scene corpus and the sample entity structure data; And constructing a map database according to the sample entity relation data.
7. The multi-modal control method of a smart toy according to any one of claims 1 to 6, wherein the generating answer data of the input data from the text data, the entity structure, and the entity relationship data includes: acquiring role information of the intelligent toy aiming at the target user; Based on the role information of the intelligent toy aiming at the target user, packaging the text data, the entity structure and the entity relation data to obtain request information of the input data; and processing the request information through a preset large language model to generate answer data of the input data.
8. A multi-modal control device for an intelligent toy, the multi-modal control device comprising: The acquisition module is used for acquiring input data of a target user; the conversion module is used for converting the input data into text data; The determining module is used for determining entity structure data and entity relation data corresponding to the input data based on the text data; the generating module is used for generating answer data of the input data according to the text data, the entity structure and the entity relation data; And the control module is used for controlling the intelligent toy to output the answer data.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the multi-modal control method of the intelligent toy of any one of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps in the multi-modal control method of a smart toy according to any one of claims 1 to 7.

Description

Multi-mode control method and device for intelligent toy, electronic equipment and storage medium Technical Field The application belongs to the technical field of artificial intelligence, and particularly relates to a multi-mode control method and device for an intelligent toy, electronic equipment and a storage medium. Background The intelligent toy can provide question and answer interaction for the child when the child is accompanied, and the user initiates a question in the question and answer interaction process, so that the intelligent toy answers according to the content of the question of the user. At present, the existing voice recognition method mainly captures language input of children through a voice recognition technology and performs semantic analysis by combining a Natural Language Processing (NLP) technology, but the existing voice recognition method is mostly focused on single-mode interaction, and lacks of deep combination of multi-mode information in a children dialogue scene. Therefore, a multi-mode control method combining semantic matching and emotion analysis is needed to solve the problem that the existing voice recognition method has single-mode interaction, and lacks of deep combination of multi-mode information in a children dialogue scene, so that user experience is poor. Disclosure of Invention The embodiment of the application provides a multi-mode control method of an intelligent toy, which can solve the problem of poor user experience caused by lack of deep combination of multi-mode information in a children dialogue scene due to single-mode interaction in the existing voice recognition method. According to the method, input data of a target user are converted into text data, entity structure data and entity relation data corresponding to the input data are determined according to the text data, answer data of the input data are generated according to the text data, the entity structure and the entity relation data, and the intelligent toy is controlled to output the answer data, so that efficient semantic recognition and emotion interaction of a user dialogue scene are realized, the problem that the existing voice recognition method has single-mode interaction, and the problem that the user experience is poor due to the fact that the deep combination of multi-mode information in the children dialogue scene is lacking is solved. In a first aspect, an embodiment of the present application provides a multi-mode control method for an intelligent toy, the method including the steps of: Acquiring input data of a target user; Converting the input data into text data; Determining entity structure data and entity relation data corresponding to the input data based on the text data; Generating answer data of the input data according to the text data, the entity structure and the entity relation data; And controlling the intelligent toy to output the answer data. Optionally, the input data includes voice data and/or image data, and the converting the input data into text data includes: Performing text conversion processing on the voice data to obtain first text data corresponding to the voice data; And carrying out semantic extraction processing on the image data to obtain second text data which is used for describing the image data. Optionally, the determining, based on the text data, entity structure data and entity relationship data corresponding to the input data includes: Performing entity extraction processing on the text data through a preset entity extraction model to obtain entity structure data corresponding to the input data; And searching entity relation data in a preset map database based on the entity structure data. Optionally, the entity structure data includes a target content body, the map database includes a content body and an association relationship between the content bodies, and the retrieving entity relationship data in a preset map database based on the entity structure data includes: Searching in the map database according to the target content main body to obtain an associated content main body with an association relation with the target; and determining entity relation data based on the target content body and the associated content body. Optionally, the spectrum database includes a first spectrum database based on a cache and a second spectrum database based on a server, and the searching in the spectrum database according to the target content body includes: Searching in the first map database based on the cache according to the target content main body; If the searching in the first map database based on the cache fails, searching in the second map database based on the server. Optionally, before determining the entity structure data and the entity relationship data corresponding to the input data based on the text data, the method further includes: Adopting a first scene corpus of a group to which the target user belongs; based on the first scene cor