US-12626700-B2 - Method and system for identifying a voice command in a continuous listening IoT environment

US12626700B2US 12626700 B2US12626700 B2US 12626700B2US-12626700-B2

Abstract

A method for identifying and executing a voice command in a continuous listening Internet of Things (IoT) environment, may include: receiving, by at least one IoT device, a voice input in the continuous listening IoT environment; detecting, by the at least one IoT device, an occurrence of at least one non-speech event in a vicinity of at least one other IoT device in the continuous listening IoT environment; determining, by the at least one IoT device, an ambient context associated with the at least one non-speech event; determining, by the at least one IoT device, a correlation between the ambient context and the at least one other IoT device based on an event location of the occurrence of the at least one non-speech event; and determining, by the at least one IoT device, presence of at least one voice command within the voice input based on the correlation.

Inventors

Manjunath Belgod LOKANATH
Vinay Vasanth Patage

Assignees

SAMSUNG ELECTRONICS CO., LTD.

Dates

Publication Date: 20260512
Application Date: 20231212
Priority Date: 20230105

Claims (14)

1 . A method for identifying and executing a voice command in a continuous listening Internet of Things (IoT) environment, the method comprising: receiving, by at least one IoT device, a voice input in the continuous listening IoT environment; detecting, by the at least one IoT device while the voice input is being received, an occurrence of at least one non-speech event in a vicinity of at least one other IoT device in the continuous listening IoT environment; determining, by the at least one IoT device, an ambient context associated with the at least one non-speech event; determining, by the at least one IoT device, a correlation between the ambient context and the at least one other IoT device based on an event location of the occurrence of the at least one non-speech event; determining, by the at least one IoT device, presence of at least one voice command within the voice input based on the correlation; and executing, by the at least one IoT device, the at least one voice command, or instructing, by the at least one IoT device, the at least one other IoT device to execute the at least one voice command, wherein detecting the occurrence of the at least one non-speech event comprises: detecting at least one non-speech sound while the voice input is being received; and marking the at least one non-speech sound with a user attention level indicating a degree of urgency associated with the at least one non-speech sound.
2 . The method as claimed in claim 1 , wherein the ambient context of the at least one non-speech event is determined based on at least one of the event location, a user attention, and a type of the at least one non-speech event.
3 . The method as claimed in claim 1 , further comprising: transmitting, by continuous listening, the voice input for speech recognition upon receiving the voice input.
4 . The method as claimed in claim 3 , further comprising: generating a textual description and at least one relevant tag associated with the voice input based on performing a voice activity detection and an automatic speech recognition.
5 . The method as claimed in claim 4 , wherein the at least one relevant tag comprises information of a point of interest, a time, and a noun and at least one other part of speech associated with the voice input.
6 . The method as claimed in claim 1 , wherein the determining the correlation comprises: feeding, to an Artificial Intelligence (AI), the at least one non-speech event, the ambient context, and a state of the at least one other IoT device that is present at the event location based on the voice input being received; and determining, by the AI, the correlation between the ambient context and the at least one other IoT device based on the event location, an urgency associated with the at least one non-speech event and the state of the at least one other IoT device.
7 . The method as claimed in claim 6 , wherein the at least one non-speech event occurs at a physical location of the at least one other IoT device.
8 . The method as claimed in claim 6 , further comprises generating a relational table of available IoT devices that are capable of executing the at least one voice command.
9 . The method as claimed in claim 1 , wherein determining that the voice input comprises the at least one voice command comprises: receiving a textual description associated with the voice input; determining that the textual description associated with the voice input comprises the at least one voice command or not; evaluating, based on determining that the textual description associated with the voice input is the at least one voice command, the textual description associated with the voice input, the at least one non-speech event, a first state of the at least one IoT device, a second state of the at least one other IoT device, and a first capability associated with each of the at least one IoT device, and a second capability associated with the at least one other IoT device; and generating a probable outcome indicating that the voice input comprises the at least one voice command based on the evaluating.
10 . The method as claimed in claim 9 , wherein the voice input, non-speech data, the first state and the second state, and the first capability and the second capability are fetched from a dynamic IoT mesh.
11 . The method as claimed in claim 9 , further comprising: capturing, by a dynamic IoT mesh upon determining that the textual description is the voice command from the at least one IoT device in the continuous listening IoT environment, the at least one non-speech event, the first state, at least one operational capability associated with the at least one IoT device, at least one non-speech sound associated with the occurrence of the at least one non-speech event, a user attention associated with the at least one non-speech sound, the event location, and a physical location of the at least one IoT device.
12 . The method as claimed in claim 1 , wherein the at least one non-speech event is an event with a degree of urgency where a non-speech sound is produced in the continuous listening IoT environment by at least one of a person, an animal, a device, and a machine.
13 . The method as claimed in claim 1 , wherein the occurrence of the at least one non-speech event in a vicinity of the at least one other IoT device is based on at least one non-speech sound in the continuous listening IoT environment.
14 . A system for identifying and executing a voice command in a continuous listening Internet of Things (IoT environment), the system comprising: at least one IoT device configured to receive a voice input in the continuous listening IoT environment, the at least one IoT device comprising: a memory storing instructions; and at least one processor operatively connected to the memory and configured to execute the instructions to: receive the voice input in the continuous listening IoT environment; detect, while the voice input is being received, an occurrence of at least one non-speech event in a vicinity of the at least one other IoT device in the continuous listening IoT environment, wherein the at least one non-speech event is detected; determine an ambient context associated with the at least one non-speech event; determine a correlation between the ambient context and the at least one other IoT device based on an event location of the occurrence of the at least one non-speech event; determine presence of at least one voice command within the voice input based on the correlation; and execute the at least one voice command, or instruct the at least one other IoT device to execute the at least one voice command, wherein the occurrence of the at least one non-speech event is detected by: detecting at least one non-speech sound while the voice input is being received; and marking the at least one non-speech sound with a user attention level indicating a degree of urgency associated with the at least one non-speech sound.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a bypass continuation of International Application No. PCT/IB2023/061823, filed on Nov. 23, 2023, which is based on and claims priority to India Patent Application No. 202341001155, filed on Jan. 5, 2023, in Intellectual Property India, the disclosures of which are incorporated by reference herein in their entireties. BACKGROUND 1. Field The disclosure generally relates to field of voice recognition, and more particularly to a method and system for identifying a voice command in a continuous Internet of Things (IoT) environment. 2. Description of Related Art Continuous listening is an upcoming technology, where a voice-assistant can be interacted without saying the ‘wakeup word’ like ‘Hi Bixby’, ‘Ok Google’, ‘Alexa’ etc. This continuous listening classification model relies on command vs conversation identification to identify and wakeup itself for the command. Such a classification model supports limited actions and is dynamically non-scalable. Such a conventional model will also fail to recognize the user surrounding to dynamically scale itself for command vs conversation classification, hence, degrading the performance and the user experience. Such a conventional model fails to recognize the non-speech event references in case of contextual commands. According to conventional system, the conventional command will reject the contextual utterance of user, as it will not be able to identify the non-speech event as a context accurately. FIGS. 1A and 1B illustrates various problem scenarios according to the related art. As can be seen in (A) of the FIGS. 1A and 1B the user is in a laundry room and in the meantime the baby starts crying in a bedroom. The user has provided a command “Play Lullaby”. However, as the user is interacting with a smart washing machine that is incapable of performing the command. Hence fails to recognize the command. Thus, the smart washing machine rejects the command as it does not support the given command. According to another example scenario as shown in the (B) of the FIGS. 1A and 1B, one user is playing in a living room, and another user is working in a kitchen. The user who is in the kitchen starts conversation with the user in the living room. However, the user in the living room replied that he cannot hear the voice of the user who is in the kitchen. In a smart home environment, one or more smart devices are connected with each other and are capable of interacting with each other. However, in the scenario as depicted in (B) of the FIGS. 1A and 1B, the smart devices that is near to the user in the living room is unable to classify any contextual utterance/contextual command of the user. Thus, any command being the part of any on-going conversation is being rejected as the nearby smart devices does not support such command. Accordingly, the command provided by the user in the living room may be rejected by nearby the smart devices. In related art solutions, the classification model fails to recognize a part of any on-going conversation as a command in correlation to the non-speech event happening in the vicinity of the user. Thus, there is a need to overcome above-mentioned drawbacks. SUMMARY Provided are a method and a system for identifying and executing a voice command in a continuous listening Internet of Things (IoT) environment. Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments. According to an aspect of the disclosure, a method for identifying and executing a voice command in a continuous listening Internet of Things (IoT) environment, may include: receiving, by at least one IoT device, a voice input in the continuous listening IoT environment; detecting, by the at least one IoT device, an occurrence of at least one non-speech event in a vicinity of at least one other IoT device in the continuous listening IoT environment, wherein the at least one non-speech event is detected while the voice input is being received; determining, by the at least one IoT device, an ambient context associated with the at least one non-speech event: determining, by the at least one IoT device, a correlation between the ambient context and the at least one other IoT device based on an event location of the occurrence of the at least one non-speech event: determining, by the at least one IoT device, presence of at least one voice command within the voice input based on the correlation: and by the at least one IoT device, executing the at least one voice command, or instructing the at least one other IoT device to execute the at least one voice command. The ambient context of the at least one non-speech event may be determined based on one or more of the event location, a user attention, and a type of the at least one non-speech event. The method may include transmitting, by continuous listening, t