US-12626702-B2 - Multi-modal digital assistant

US12626702B2US 12626702 B2US12626702 B2US 12626702B2US-12626702-B2

Abstract

Systems and processes for a multi-modal digital assistant are provided. An example method includes, at a computer system, while operating in a hands-free operating mode, receiving a natural-language speech input indicative of a task; in response to receiving the natural-language speech input, initiating performance of the task; and providing an output corresponding to the task, wherein providing the output includes displaying a first user interface associated with an application, the first user interface displayed according to the hands-free operating mode; while displaying the first user interface, receiving, via the one or more input devices, a touch input at a location corresponding to the first user interface; and in response to the touch input: transitioning from the hands-free operating mode to a hands-on operating mode different than the hands-free operating mode; and launching the application, wherein launching the application includes displaying a second user interface according to the hands-on operating mode.

Inventors

Neal S. ELLIS
Arian Behzadi
Christopher P. FOSS
Tyler C. LEPPEK
Pedro MARI
Gemma A. Roper
Seyit Yilmaz

Assignees

APPLE INC.

Dates

Publication Date: 20260512
Application Date: 20240315

Claims (20)

1 . A computer system, comprising: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: while operating in a hands-free operating mode, receiving a natural-language speech input indicative of a task; in response to receiving the natural-language speech input, initiating performance of the task; and providing an output corresponding to the task, wherein providing the output includes displaying at a first magnification greater than a second magnification a first user interface associated with an application, the first user interface displayed according to the hands-free operating mode; while displaying the first user interface, receiving, via the one or more input devices, a touch input at a location corresponding to the first user interface; and in response to the touch input: transitioning from the hands-free operating mode to a hands-on operating mode different than the hands-free operating mode; and launching the application, wherein launching the application includes displaying at the second magnification a second user interface according to the hands-on operating mode.
2 . The computer system of claim 1 , wherein the one or more programs further include instructions for: determining whether a first set of proximity criteria is met; in accordance with a determination that the first set of proximity criteria is met, maintaining operation in the hands-on operating mode; and in accordance with a determination that the first set of proximity criteria is not met, transitioning from the hands-on operating mode to the hands-free operating mode.
3 . The computer system of claim 2 , wherein transitioning from the hands-on operating mode to the hands-free operating mode includes displaying a third user interface associated with the application, wherein the third user interface is displayed according to the hands-free operating mode.
4 . The computer system of claim 1 , wherein the first user interface includes a second user interface object and the second user interface includes the second user interface object.
5 . The computer system of claim 4 , wherein: the second user interface object corresponds to a digital assistant of the computer system, the first user interface includes the second user interface object at a first location, and displaying the second user interface includes: ceasing to display the second user interface object at the first location; and displaying the second user interface object at a second location different than the first location.
6 . The computer system of claim 1 , wherein the first user interface includes a third user interface object corresponding to the application, and receiving the touch input at the location corresponding to the first user interface includes receiving the touch input at a location corresponding to the third user interface object.
7 . The computer system of claim 1 , wherein the one or more programs further include instructions for: selecting a mode of operation based on a second set of proximity criteria, wherein the hands-free operating mode is selected when one criterion of the second set of criteria is satisfied, and wherein the one criterion is satisfied when the computer system is oriented in a landscape orientation; in accordance with selecting the hands-free operating mode, operating the computer system in the hands-free operating mode.
8 . A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system, the one or more programs including instructions for: while operating in a hands-free operating mode, receiving a natural-language speech input indicative of a task; in response to receiving the natural-language speech input, initiating performance of the task; and providing an output corresponding to the task, wherein providing the output includes displaying at a first magnification greater than a second magnification a first user interface associated with an application, the first user interface displayed according to the hands-free operating mode; while displaying the first user interface, receiving, via the one or more input devices, a touch input at a location corresponding to the first user interface; and in response to the touch input: transitioning from the hands-free operating mode to a hands-on operating mode different than the hands-free operating mode; and launching the application, wherein launching the application includes displaying at the second magnification a second user interface according to the hands-on operating mode.
9 . The non-transitory computer-readable storage medium of claim 8 , wherein the one or more programs further include instructions for: determining whether a first set of proximity criteria is met; in accordance with a determination that the first set of proximity criteria is met, maintaining operation in the hands-on operating mode; and in accordance with a determination that the first set of proximity criteria is not met, transitioning from the hands-on operating mode to the hands-free operating mode.
10 . The non-transitory computer-readable storage medium of claim 9 , wherein transitioning from the hands-on operating mode to the hands-free operating mode includes displaying a third user interface associated with the application, wherein the third user interface is displayed according to the hands-free operating mode.
11 . The non-transitory computer-readable storage medium of claim 8 , wherein the first user interface includes a second user interface object and the second user interface includes the second user interface object.
12 . The non-transitory computer-readable storage medium of claim 11 , wherein: the second user interface object corresponds to a digital assistant of the computer system, the first user interface includes the second user interface object at a first location, and displaying the second user interface includes: ceasing to display the second user interface object at the first location; and displaying the second user interface object at a second location different than the first location.
13 . The non-transitory computer-readable storage medium of claim 8 , wherein the first user interface includes a third user interface object corresponding to the application, and receiving the touch input at the location corresponding to the first user interface includes receiving the touch input at a location corresponding to the third user interface object.
14 . The non-transitory computer-readable storage medium of claim 8 , wherein the one or more programs further include instructions for: selecting a mode of operation based on a second set of proximity criteria, wherein the hands-free operating mode is selected when one criterion of the second set of criteria is satisfied, and wherein the one criterion is satisfied when the computer system is oriented in a landscape orientation; in accordance with selecting the hands-free operating mode, operating the computer system in the hands-free operating mode.
15 . A method, comprising: at a computer system that is in communication with a display generation component and one or more input devices: while operating in a hands-free operating mode, receiving a natural-language speech input indicative of a task; in response to receiving the natural-language speech input, initiating performance of the task; and providing an output corresponding to the task, wherein providing the output includes displaying at a first magnification greater than a second magnification a first user interface associated with an application, the first user interface displayed according to the hands-free operating mode; while displaying the first user interface, receiving, via the one or more input devices, a touch input at a location corresponding to the first user interface; and in response to the touch input: transitioning from the hands-free operating mode to a hands-on operating mode different than the hands-free operating mode; and launching the application, wherein launching the application includes displaying at the second magnification a second user interface according to the hands-on operating mode.
16 . The method of claim 10 , further comprising: at the computer system: determining whether a first set of proximity criteria is met; in accordance with a determination that the first set of proximity criteria is met, maintaining operation in the hands-on operating mode; and in accordance with a determination that the first set of proximity criteria is not met, transitioning from the hands-on operating mode to the hands-free operating mode.
17 . The method of claim 16 , wherein transitioning from the hands-on operating mode to the hands-free operating mode includes displaying a third user interface associated with the application, wherein the third user interface is displayed according to the hands-free operating mode.
18 . The method of claim 15 , wherein the first user interface includes a second user interface object and the second user interface includes the second user interface object.
19 . The method of claim 18 , wherein: the second user interface object corresponds to a digital assistant of the computer system, the first user interface includes the second user interface object at a first location, and displaying the second user interface includes: ceasing to display the second user interface object at the first location; and displaying the second user interface object at a second location different than the first location.
20 . The method of claim 15 , wherein the first user interface includes a third user interface object corresponding to the application, and receiving the touch input at the location corresponding to the first user interface includes receiving the touch input at a location corresponding to the third user interface object.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority to U.S. Provisional Patent Application No. 63/465,234, entitled “MULTI-MODAL DIGITAL ASSISTANT,” filed on May 9, 2023, which the entire contents of the application are hereby incorporated by reference in its entirety. FIELD This relates generally to intelligent automated assistants and, more specifically, to operating intelligent automated assistants in various modes. BACKGROUND Intelligent automated assistants (or digital assistants) can provide a beneficial interface between human users and electronic devices. Such assistants can allow users to interact with devices or systems using natural language in spoken and/or text forms. For example, a user can provide a speech input containing a user request to a digital assistant operating on an electronic device. The digital assistant can interpret the user's intent from the speech input and operationalize the user's intent into tasks. The tasks can then be performed by executing one or more services of the electronic device, and a relevant output responsive to the user request can be returned to the user. SUMMARY Example methods are disclosed herein. An example method includes, at a computer system that is in communication with a display generation component and one or more input devices: displaying, via the display generation component, a user interface including a user interface object, the user interface object corresponding to a digital assistant of the computer system; detecting, via the one or more input devices, a user input at a location corresponding to the user interface; in accordance with a determination that the user input satisfies a set of input criteria: disabling the user interface object; and modifying a visual characteristic of the user interface object; and in accordance with a determination that the user input does not satisfy the set of input criteria: forgoing disabling the user interface object; and maintaining the visual characteristic of the user interface object. An example method includes, at a computer system that is in communication with a display generation component and one or more input devices: determining that a first set of proximity criteria has been met; in response to detecting that the first set of proximity criteria has been met, displaying, via the display generation component, a user interface at a first magnification level; while displaying the user interface, receiving, via the one or more input devices, a natural-language speech input indicative of a task; in response to the natural-language speech input, initiating performance of the task, wherein initiating performance of the task includes updating a first portion of the user interface; determining that a second set of proximity criteria, different than the first set of proximity criteria, has been met; and in response to detecting that the second set of proximity criteria has been met, displaying, via the display generation component, a second portion of the user interface at a second magnification level different than the first magnification level. An example method includes, at a computer system that is in communication with a display generation component and one or more input devices: displaying, via the display generation component, a user interface including a user interface object, the user interface object corresponding to a digital assistant of the computer system; detecting, via the one or more input devices, a user input at a location corresponding to the user interface; in accordance with a determination that the user input satisfies a set of input criteria: disabling the user interface object; and modifying a visual characteristic of the user interface object; and in accordance with a determination that the user input does not satisfy the set of input criteria: forgoing disabling the user interface object; and maintaining the visual characteristic of the user interface object. An example method includes, at a computer system that is in communication with a display generation component and one or more input devices: displaying, via the display generation component, a user interface including a user interface object, the user interface object corresponding to a digital assistant of the computer system; while displaying the user interface, detecting, via the one or more input devices, an input corresponding to a location of the user interface object; in response to the input corresponding to the location of the user interface object: in accordance with a determination that a set of enablement criteria is met for the user interface object, performing a function associated with the user interface object; and in accordance with a determination that a set of enablement criteria is not met for the user interface object, forgoing performing a function associated with the user interface object. Example non-transitory computer-readable media are disclosed herein. An example non-transitory c