CN-121999776-A - Interaction method of intelligent device and intelligent device

CN121999776ACN 121999776 ACN121999776 ACN 121999776ACN-121999776-A

Abstract

The invention provides an interaction method of intelligent equipment and the intelligent equipment, wherein the method is applied to the intelligent equipment and comprises the steps of responding to a picked voice signal, extracting first voiceprint characteristic information in the voice signal, responding to the fact that the similarity between the first voiceprint characteristic information and pre-stored reference first voiceprint characteristic information is higher than a corresponding first similarity threshold value, and starting a dialogue mode by the intelligent equipment to respond to a follow-up dialogue instruction. In the mode, the intelligent device can compare the similarity between the first voiceprint feature information in the voice signal and the pre-stored reference first voiceprint feature information, a conversation mode is directly started based on the similarity comparison result, the intelligent device is not required to be awakened by a wake-up word, and the conversation efficiency and interaction experience of the user and the intelligent device can be improved.

Inventors

Jiang Lianzeng
BAI QINGSONG
WANG PENGCHENG
REN FUJIA

Assignees

杭州老板电器股份有限公司

Dates

Publication Date: 20260508
Application Date: 20260326

Claims (10)

1. An interaction method of an intelligent device, which is applied to the intelligent device, the method comprising: Extracting first voiceprint feature information in a picked-up voice signal in response to the voice signal; and responding to the first voiceprint feature information and the prestored reference first voiceprint feature information, if the similarity between the first voiceprint feature information and the prestored reference first voiceprint feature information is higher than a corresponding first similarity threshold value, starting a dialogue mode by the intelligent equipment so as to respond to a subsequent dialogue instruction.
2. The method according to claim 1, wherein the method further comprises: in response to the wake-up word signal picked up, the smart device initiates a dialog mode to respond to subsequent dialog instructions.
3. The method of claim 1, wherein in response to the similarity between the first voiceprint feature information and the pre-stored reference first voiceprint feature information being above a corresponding first similarity threshold, the smart device initiates a conversation mode to respond to a subsequent conversation instruction, comprising: responding to the dialogue instruction, and extracting second voice characteristic information in the dialogue instruction; responding to the similarity between the first voiceprint characteristic information and the second voiceprint characteristic information being higher than a corresponding second similarity threshold, only preserving target dialogue content corresponding to the second voiceprint characteristic information in the dialogue instruction so as to carry out voice recognition on the target dialogue content; And discarding the speech recognition of the dialogue instruction in response to the similarity between the first voiceprint feature information and the second voiceprint feature information being not higher than the corresponding second similarity threshold.
4. The method of claim 2, wherein in response to the wake-up word signal being picked up, the smart device initiates a dialogue mode to respond to subsequent dialogue instructions, comprising: extracting third voiceprint feature information in the wake-up word signal; responding to the dialogue instruction, and extracting second voice characteristic information in the dialogue instruction; Responding to the similarity between the second voice print characteristic information and the third voice print characteristic information being higher than a corresponding third similarity threshold value, only preserving target dialogue content corresponding to the second voice print characteristic information in the dialogue instruction so as to carry out voice recognition on the target dialogue content; And in response to the similarity between the second voiceprint feature information and the third voiceprint feature information is not higher than a corresponding third similarity threshold, discarding the voice recognition of the dialogue instruction.
5. The method according to any one of claims 1-4, further comprising: In each round of dialogue in the dialogue mode, the intelligent equipment picks up the dialogue instruction in a preset dialogue picking period; And if the dialogue instruction is not picked up after the dialogue pickup period is exceeded, the intelligent device exits the dialogue mode.
6. The method of claim 1, wherein prior to the step of extracting first voiceprint feature information in the speech signal in response to the picked-up speech signal, the method further comprises: and recording the reference first voiceprint feature information, and storing the reference first voiceprint feature information.
7. The method of claim 6, wherein the step of entering the reference first voiceprint feature information comprises: Playing corresponding corpus information, and sending voiceprint input prompt information corresponding to the corpus information, wherein the voiceprint input prompt information is used for indicating a user to read the corpus information; And recording the reference first voiceprint feature information based on the voice of the corpus information read by the user.
8. The method of claim 1, wherein the step of extracting first voiceprint feature information in the speech signal in response to the picked-up speech signal comprises: Extracting feature vectors of frames of the voice signal in a Mel frequency cepstrum coefficient mode in response to the picked voice signal; And combining the feature vectors of each frame of the voice signal to obtain first voiceprint feature information in the voice signal.
9. The method of claim 8, wherein the step of extracting feature vectors for frames of the speech signal by means of mel-frequency cepstral coefficients comprises: Preprocessing the voice signal, wherein the preprocessing comprises pre-emphasis, framing and windowing; Performing fast Fourier transform on the voice signal subjected to windowing of each frame, and converting the voice signal from a time domain to a frequency domain to obtain an amplitude spectrum of each frame; Integrating the amplitude spectrum of each frame with each Mel filter to obtain a group of energy values of Mel scale; Taking natural logarithms of the energy values output from each of the mel filters; performing discrete cosine transform on each energy value obtained by natural logarithm to obtain a cepstrum coefficient of each frame; And carrying out dynamic feature extraction on the cepstrum coefficient of each frame to obtain a feature vector of each frame.
10. A smart device for performing the interaction method of the smart device of any of claims 1-9.

Description

Interaction method of intelligent device and intelligent device Technical Field The invention belongs to the technical field of intelligent home, and particularly relates to an interaction method of intelligent equipment and the intelligent equipment. Background At present, the working state of the existing intelligent equipment after being started is a waiting awakening state or a conversation mode, the intelligent equipment enters the conversation mode after recognizing a awakening word, sound of surrounding environment is continuously collected when the intelligent equipment is in the conversation mode, sound signals are processed and then sent to a cloud to be recognized and understood through a WIFI (mobile hot spot) network, then corresponding actions are carried out according to recognition and understanding results, no voice is recognized after the intelligent equipment enters the conversation mode for a period of time (15 seconds), and the intelligent equipment actively exits the conversation mode to return to the waiting awakening state. However, two significant problems are encountered in the overall process: 1. If the user frequently performs voice interaction with the intelligent device (for example, the intelligent device is in a conversation once for 30 seconds, because the intelligent device is in a waiting and awakening state after being awakened for 15 seconds continuously without collecting voice, the user needs to speak a specific awakening word first to awaken the intelligent device each time so as to speak the conversation content, and the conversation efficiency and conversation experience of the user and the intelligent device are greatly affected. 2. When the intelligent device is in a dialogue mode, sound of the surrounding environment is always collected, when other people speak around, the content which the other people speak is collected by the intelligent device and sent to the cloud for recognition and understanding, the intelligent device performs corresponding actions according to recognition and understanding results, from the perspective of a user, the content is not content which the intelligent device wants to talk with, the intelligent device responds to the sound and has bad experience to the user, and if the user talks with the intelligent device, the intelligent device frequently responds to the content which the other people speak to affect the efficiency of the user talks with the intelligent device. Disclosure of Invention Therefore, the invention aims to provide an interaction method of an intelligent device and the intelligent device, so that a user omits a wake-up step in the process of interacting with the intelligent device by registering reference voiceprint characteristic information of the user in advance, the user directly speaks interactive contents to the intelligent device to avoid frequent speaking wake-up words, thereby improving efficiency and interactive experience of user voice interaction, effectively filtering sounds of other people around in a conversation stage of the intelligent device to avoid interference of the sounds of the other people on conversation between the user and the intelligent device, and filtering sounds of other people around in the conversation process of the user and the intelligent device to reduce sound data uploaded to a cloud end, thereby reducing cloud computing pressure. In a first aspect, an embodiment of the invention provides an interaction method of an intelligent device, which is applied to the intelligent device, and the method comprises the steps of responding to a picked voice signal, extracting first voiceprint characteristic information in the voice signal, and responding to the fact that the similarity between the first voiceprint characteristic information and pre-stored reference first voiceprint characteristic information is higher than a corresponding first similarity threshold value, starting a dialogue mode by the intelligent device to respond to a follow-up dialogue instruction. In an alternative embodiment of the present application, the method further comprises, in response to the wake-up word signal being picked up, the smart device initiating a dialogue mode to respond to subsequent dialogue instructions. In an alternative embodiment of the present application, the step of enabling the intelligent device to respond to the subsequent dialogue instruction by responding to the dialogue instruction by starting the dialogue mode if the similarity between the first voiceprint feature information and the pre-stored reference first voiceprint feature information is higher than a corresponding first similarity threshold value includes extracting second voiceprint feature information in the dialogue instruction in response to the dialogue instruction, and only preserving the target dialogue content corresponding to the second voiceprint feature information in the dialogue instruction to perform voice reco