US-12626716-B2 - Methods and systems for voice control

US12626716B2US 12626716 B2US12626716 B2US 12626716B2US-12626716-B2

Abstract

A user device may detect speech and use “early exiting” when identifying a potential operational command in the detected speech. The implementation of early exiting may be based on a variable threshold, where variable sensitivity settings for the threshold may be used to control how quickly, and whether, an “early exit” or early prediction of an operational command will occur. An early exit threshold may be adjusted, for example, based on network conditions, to ensure optimal operational command determination from the audio.

Inventors

Raphael Tang
Karun Kumar
Wenyan Li
Gefei Yang
Yajie Mao
Yang Hu
Rui Min
Geoffrey Murray

Assignees

COMCAST CABLE COMMUNICATIONS, LLC

Dates

Publication Date: 20260512
Application Date: 20211230

Claims (20)

1 . A method comprising: determining, based on detected audio, a portion of a predicted operational command; determining, based on the portion of the predicted operational command, a complete operational command and a confidence score associated with the complete operational command; determining one or more network conditions; and based on the one or more network conditions, and the confidence score satisfying a command execution threshold associated with how quickly the complete operational command will be executed, executing the complete operational command.
2 . The method of claim 1 , wherein the one or more network conditions comprise one or more of: an amount of data communicated, an amount of available bandwidth, an amount of errors, or an amount of operational commands or partial operational commands received by a network device.
3 . The method of claim 1 , further comprising determining the command execution threshold, wherein the command execution threshold comprises one or more network condition thresholds.
4 . The method of claim 3 , wherein: the command execution threshold is satisfied when at least a portion of a plurality of portions of the audio corresponds to at least a portion of a plurality of portions of the predicted operational command.
5 . The method of claim 1 , further comprising determining, based on a change in the one or more network conditions, an updated command execution threshold.
6 . The method of claim 1 , further comprising determining the detected audio comprises one or more of: a plurality of phonemes, a plurality of words, or a plurality of phonetic sounds.
7 . The method of claim 1 , wherein detecting the audio comprises one or more of: voice recognition or natural language processing.
8 . The method of claim 1 , further comprising determining the detected audio corresponds to one or more stored operational commands.
9 . The method of claim 1 , wherein the confidence score indicates that at least one of: a word associated with the audio corresponds to a word associated with the predicted operational command, or a phonetic sound associated with the audio corresponds to a phonetic sound associated with the predicted operational command.
10 . The method of claim 1 , wherein the predicted operational command is associated with a target device, wherein executing the predicted operational command comprises sending the predicted operational command to the target device.
11 . A method comprising: determining, based on a first portion of audio, one or more initial predicted operational commands and one or more initial confidence scores associated with the one or more initial predicted operational commands; determining, based on a network condition, a command execution threshold; determining the one or more initial confidence scores do not satisfy the command execution threshold; updating, based on detecting a change in the network condition, the command execution threshold; determining, based on the first portion of the audio and a second portion of the audio, a second predicted operational command and a second confidence score, wherein the second confidence score satisfies the updated command execution threshold; and based on the second confidence score satisfying the updated command execution threshold, executing the second predicted operational command.
12 . The method of claim 11 , wherein the network condition comprises at least one of: an amount of data communicated over a network, an amount of available bandwidth of the network, an amount of network errors, or an amount of operational commands received by a network device associated with the network.
13 . The method of claim 11 , wherein updating the command execution threshold comprises: receiving, from a network device, an indication of the change in the network condition; and updating, based on the indication of the change in the network condition, the command execution threshold.
14 . The method of claim 11 , wherein determining the one or more initial predicted operational commands comprises providing the first portion of the audio to a trained machine learning model.
15 . The method of claim 11 , wherein determining the one or more initial predicted operational commands comprises: determining a first portion of the audio corresponds to a first portion of an operational command; determining a second portion of the audio corresponds to a second portion of the operational command; and determining, based on the first portion of the audio corresponding to the first portion of the operational command and the second portion of the audio corresponding to the second portion of the operational command, a confidence score.
16 . The method of claim 11 , wherein the command execution threshold comprises a setting configured to control how quickly the second predicted operational command will be executed.
17 . A method comprising: detecting audio comprising a partial operational command; determining, based on the audio, a first confidence score associated with the partial operational command; based on detecting the audio, determining a command execution threshold, wherein the command execution threshold is associated with how quickly a complete operational command associated with the partial operational command will be executed; determining the first confidence score does not satisfy the command execution threshold; updating, based on detecting a change in a network condition, the command execution threshold; determining the first confidence score satisfies the updated command execution threshold; and based on the first confidence score satisfying the updated command execution threshold, executing the complete operational command associated with the partial operational command.
18 . The method of claim 17 , wherein the network condition comprises one or more of: an amount of data communicated over a network, an amount of available bandwidth of the network, an amount of network errors, or an amount of operational commands received by a network device associated with the network.
19 . The method of claim 16 , wherein the complete operational command is associated with one or more target devices, wherein executing the complete operational command comprises sending the complete operational command to the one or more target devices.
20 . The method of claim 17 , wherein the command execution threshold comprises a network condition threshold.

Description

BACKGROUND Speech recognition systems facilitate human interaction with computing devices, such as voice-enabled smart devices, by relying on speech. Such systems employ techniques to identify words spoken by a human user based on a received audio input (e.g., detected speech input, an utterance) and, combined with speech recognition and natural language processing techniques determine one or more operational commands associated with the audio input. These systems enable speech-based control of a computing device to perform tasks based on the user's spoken commands. The speed at which the computing device, and/or a remote computing device, processes the received audio input has a direct impact on the user experience. Computational processing delays and network conditions such as traffic volume and error rates can negatively impact response times. Slow response times (e.g., the delay between when the user speaks and when the associated operational command is executed) degrade user experience. SUMMARY It is to be understood that both the following general description and the following detailed description are exemplary and explanatory only and are not restrictive. Methods and systems for voice control are described. A computing device may receive audio. The audio may comprise one or more utterances. For example, a user device may comprise a voice enabled device configured to receive and/or otherwise determine one or more utterances and may send the one or more utterances (and/or portions thereof) to the computing device. An utterance of the one or more utterances may comprise a word, a phrase, one or more portions thereof, combinations thereof, and the like. For example, the utterance may comprise one or more keywords. The computing device may be configured to process the one or more utterances and determine one or more operational commands associated with the one or more utterances. The computing device may be configured for natural language processing (“NLP”) and/or natural language understanding (“NLU”) according to techniques known in the art. The computing device may be configured for “early exiting,” wherein, based on detecting (e.g., capturing, interpreting, etc.) a portion of an operational command (e.g., a partial operational command, etc.), one or more operational commands or one or more tasks related thereto may be predictively determined and executed. For example, if the computing device detects a first portion of an utterance comprising “H,” the computing device may determine one or more potential operational commands. For example, based on the first portion of the utterance “H,” the computing device may determine a first potential operational command of the one or more potential operational commands (e.g., “HBO”), a second potential operational command of the one or more potential operational commands (e.g., “HGTV”), and a third potential operational command of the one or more potential operational commands (e.g., “HBSN”). This summary is not intended to identify critical or essential features of the disclosure, but merely to summarize certain features and variations thereof. Other details and features will be described in the sections that follow. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments and together with the description, serve to explain the principles of the methods and systems: FIG. 1 shows an example system; FIG. 2A shows an example table; FIG. 2B shows an example table; FIG. 3 shows an example flowchart; FIG. 4 shows an example flowchart; FIG. 5 shows an example flowchart; FIG. 6 shows an example flowchart; FIG. 7 shows a block diagram of an example computing device; and FIG. 8 shows example voice control results. DETAILED DESCRIPTION As used in the specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Ranges may be expressed herein as from “about” one particular value, and/or to “about” another particular value. When such a range is expressed, another configuration includes from the one particular value and/or to the other particular value. When values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms another configuration. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. “Optional” or “optionally” means that the subsequently described event or circumstance may or may not occur, and that the description includes cases where said event or circumstance occurs and cases where it does not. Throughout the description and claims of this specification, the word “comprise” and variations of the word, such as “comprising” and “comprises,” means “including but not limited to,” and is not intended to exclu