US-20210104238-A1 - VOICE ENABLEMENT AND DISABLEMENT OF SPEECH PROCESSING FUNCTIONALITY

US20210104238A1US 20210104238 A1US20210104238 A1US 20210104238A1US-20210104238-A1

Abstract

Methods and devices for enabling and disabling applications using voice are described herein. In some embodiments, an individual speak an utterance to their electronic device, which may send audio data representing the utterance to a backend system. The backend system may generate text data representing the utterance, and may determine that an intent of the utterance was for an application to be enabled or disabled for their user account on the backend system. If, for instance, the intent was to enable the application, the backend system may receive one or more rules for performing functionalities of the application, as well as one or more sample templates of sample utterances and sample responses that future utterances may use when requesting the application. Furthermore, one or more invocation phrases that may be used within the future utterances to invoke the application may be received, along with slot values for the sample templates.

Inventors

D'SOUZA SHAMAN
SUTTLE IAN
NORI SRIKANTH
REDDY RAJIV
KANITKAR AMOL
OROOJI TINA

Assignees

AMAZON TECH INC

Dates

Publication Date: 20210408
Application Date: 20200924
Priority Date: 20160627

Claims (20)

1 .- 20 . (canceled)
21 . A computer-implemented method, comprising: receiving first input data; determining that the first input data corresponds to a first command to enable audio initiation of a first action of a computing system; based at least in part on the first input data corresponding to the first command, configuring an updated speech processing component to cause the first action to be performed in response to recognition of a first phrase; after configuring the updated speech processing component, receiving first audio data; processing the first audio data using the updated speech processing component to detect the first phrase; and based at least in part on detection of the first phrase, causing the first action to be performed.
22 . The computer-implemented method of claim 21 , further comprising: prior to receiving the first input data, receiving a request to operate a first application; and after receiving the request to operate the first application, presenting a prompt to enable the audio initiation of the first action, wherein the first input data is received after presenting the prompt.
23 . The computer-implemented method of claim 22 , further comprising: determining the first input data is related to the first application.
24 . The computer-implemented method of claim 22 , wherein the first action is related to the first application.
25 . The computer-implemented method of claim 22 , wherein causing the first action to be performed comprises invoking the first application.
26 . The computer-implemented method of claim 21 , wherein the first input data corresponds to a touch input.
27 . The computer-implemented method of claim 21 , wherein the first input data is received by a first device associated with a user account and the method further comprises: associating the updated speech processing component with the user account.
28 . The computer-implemented method of claim 21 , further comprising, after configuring the updated speech processing component: receiving second input data; determining that the second input data corresponds to a second command to disable audio initiation of the first action; and based at least in part on the second input data corresponding to the second command, configuring a further updated speech processing component.
29 . The computer-implemented method of claim 21 , further comprising: receiving the first input data from a first device; determining that a first speech processing component is associated with the first device; configuring the updated speech processing component at least in part by updating the first speech processing component; receiving the first audio data from the first device; and determining that the updated speech processing component is associated with the first device.
30 . The computer-implemented method of claim 21 , further comprising: determining at least one word associated with the first input data, wherein configuring the updated speech processing component causes the at least one word to be associated with the first action.
31 . A system comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor, cause the system to: receive first input data; determine that the first input data corresponds to a first command to enable audio initiation of a first action of a computing system; based at least in part on the first input data corresponding to the first command, configure an updated speech processing component to cause the first action to be performed in response to recognition of a first phrase; after configuration of the updated speech processing component, receive first audio data; process the first audio data using the updated speech processing component to detect the first phrase; and based at least in part on detection of the first phrase, cause the first action to be performed.
32 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: prior to receiving the first input data, receive a request to operate a first application; and after receiving the request to operate the first application, present a prompt to enable the audio initiation of the first action, wherein the first input data is received after presentation of the prompt.
33 . The system of claim 32 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine the first input data is related to the first application.
34 . The system of claim 32 , wherein the first action is related to the first application.
35 . The system of claim 32 , wherein the instructions that cause the first action to be performed comprise instructions that, when executed by the at least one processor, cause the system to invoke the first application.
36 . The system of claim 32 , wherein the first input data corresponds to a touch input.
37 . The system of claim 31 , wherein the first input data is received by a first device associated with a user account and wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: associate the updated speech processing component with the user account.
38 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to, after configuration of the updated speech processing component: receive second input data; determine that the second input data corresponds to a second command to disable audio initiation of the first action; and based at least in part on the second input data corresponding to the second command, configure a further updated speech processing component.
39 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: receive the first input data from a first device; determine that a first speech processing component is associated with the first device; configure the updated speech processing component at least in part by updating the first speech processing component; receive the first audio data from the first device; and determine that the updated speech processing component is associated with the first device.

Description

CROSS REFERENCE TO RELATED APPLICATION This application is a continuation of, and claims the benefit of priority of, U.S. Non-provisional patent application Ser. No. 16/447,426, filed Jun. 20, 2019 and entitled “VOICE ENABLEMENT AND DISABLEMENT OF SPEECH PROCESSING FUNCTIONALITY”, and which is a continuation of, and claims the benefit of priority of, U.S. Non-provisional patent application Ser. No. 15/194,453, filed Jun. 27, 2016 and entitled “VOICE ENABLEMENT AND DISABLEMENT OF SPEECH PROCESSING FUNCTIONALITY,” which issued as U.S. Pat. No. 10,332,513 on Jun. 25, 2019. The contents of each of which is expressly incorporated herein by reference in its entirety. BACKGROUND Voice activated electronic devices are capable of performing various functionalities. An individual speaks a command to activate such a device and in response, the device performs various functions, such as outputting audio. BRIEF DESCRIPTION OF THE DRAWINGS FIGS. 1A and 1B are illustrative diagrams of systems for enabling an application using speech, in accordance with various embodiments; FIG. 2A is an illustrative diagram of a portion of the system architecture of FIG. 1, in accordance with various embodiments; FIG. 2B is an illustrative diagram of a multi-domain architecture for an NLU module of FIG. 2A, in accordance with various embodiments; FIG. 2C is an illustrative diagram of a prompts module of FIG. 2C, in accordance with various embodiments; FIG. 3 is an illustrative flowchart of a process for determining one or more applications to be enabled or disabled based on an utterance, in accordance with various embodiments; FIG. 4 is an illustrative flowchart of a process for determining an application to enable from more than one application that matches the application's name, in accordance with various embodiments; FIG. 5 is an illustrative flowchart of a process for determining an application to be enabled/disabled, or that no application could be determined for enablement, in accordance with various embodiments; FIG. 6 is an illustrative flowchart of a process for causing an application to be enabled/disabled in response to receiving a confirmation utterance to enable/disable the application, in accordance with various embodiments; FIG. 7 is an illustrative diagram of the NLU module of FIG. 2A and 2B being provided with various identifiers, rules, invocations, application names, and invocation names associated with a first application, in accordance with various embodiments; FIG. 8 is an illustrative diagram for determining that an utterance corresponds to an invocation, in accordance with various embodiments; FIG. 9 is an illustrative flowchart of a process for enabling an application for a user account, in accordance with various embodiments; FIG. 10 is an illustrative diagram of a system for determining that an application needs to be enabled for an utterance, in accordance with various embodiments; and FIG. 11 is an illustrative diagram of a system for disabling an application using speech, in accordance with various embodiments. DETAILED DESCRIPTION The present disclosure, as set forth below, is generally directed to various embodiments of methods and devices for enabling and/or disabling various functionalities for a user account in response to an utterance. An individual may, in a non-limiting embodiment, speak an utterance to a requesting device in communication with a backend system that allows the individual to receive content and/or have one or more actions occur. Some content and/or some actions, however, may require the backend system to have stored thereon certain rules, conditions, and/or instructions, as well as access one or more databases or other types of information sources, in order to provide responses to the requesting device and/or perform actions. The backend system, for instance, may include one or more first party applications, which may also be referred to as skills, functionalities, and/or capabilities. The backend system may also be in communication with one or more third party applications. The first party application(s) and/or the third party application(s) may be capable of providing a requesting device with desired content and/or causing particular actions to occur. To use the first and/or third party applications, a user account may need to have access to those applications associated functionalities. Individuals may invoke these functionalities, in some embodiments, by speaking an utterance that requests that a particular application, having certain functionalities, be enabled. Conversely, individuals may disable certain functionalities by speaking an utterance that requests that a particular application be disabled. Applications may be enabled or disabled in response to manual inputs as well. For instance, an individual may choose to enable a particular application on their touch-based electronic device by downloading a local client application to their touch-based electronic device. The local