US-12625595-B2 - Interactive GUI elements for indicating objects to supplement requests for generative output

US12625595B2US 12625595 B2US12625595 B2US 12625595B2US-12625595-B2

Abstract

Implementations set forth herein relate to a graphical user interface (GUI) element that can be manipulated at an interface to indicate a particular object and/or feature of interest to be considered when providing generative output for a separate user request. One or more GUI elements can be provided at a display interface, such as a touch display panel and/or virtual or augmented reality display interface, thereby allowing the GUI elements to be associated with rendered and/or tangible objects. When a user interacts with a GUI element, the GUI element can exhibit responsive behavior that is based on features of the interaction and/or other features of a particular object. When an object of interest is identified, processing can be performed to identify information about the object, and this information can then be utilized to facilitate provisioning of a generative output that is responsive to a separate user request.

Inventors

Ramprasad Sedouram
Karthik Srinivas
Dharma Teja
Saachi Grover
Ajay Prasad

Assignees

GOOGLE LLC

Dates

Publication Date: 20260512
Application Date: 20240708

Claims (20)

1 . A method implemented by one or more processors, the method comprising: receiving a user input directed to causing a graphical user interface (GUI) element to relocate the GUI element relative to an object that is visible via a display interface of a computing device, wherein the GUI element can be relocated at the display interface to be associated with a rendered or tangible object that is visible to a user who is viewing the display interface; generating, in response to receiving the user input, object data for the object that is visible via the display interface, wherein the object data characterizes an object feature of the object; causing the GUI element to exhibit a change to an element feature for the GUI element based on the object data, and in response to the user input directed to the GUI element, wherein the change to the element feature causes the GUI element to exhibit one or more features that are based on the object data; receiving a natural language input directed to an automated assistant, and/or another application, that uses one or more generative models to provide generative output based on the natural language input; causing, in response to receiving the natural language input, the one or more generative models to be employed for processing input data that is based on the natural language input and the object data; and causing the computing device, or a separate computing device, to render the generative output.
2 . The method of claim 1 , wherein causing the GUI element to exhibit the change to the element feature includes: causing the GUI element to resemble a portion of a boundary of the object, wherein the object data characterizes the portion of the boundary of the object.
3 . The method of claim 1 , wherein the object includes natural language content, and other objects visible via the display interface include other natural language content, and wherein the generative output is based on the GUI element being more proximate to the object than the other objects.
4 . The method of claim 1 , wherein the object is one object of a plurality of objects that are visible via the display interface, and the method further comprises: causing the GUI element to appear to be automatically repelled from one or more other locations of one or more other objects of the plurality of objects.
5 . The method of claim 1 , further comprising: determining a context of the user and content associated with the object; and causing, based on the context of the user and content associated with the object, the GUI element to appear to be automatically attracted to, or repelled from, an object location of the object.
6 . The method of claim 1 , further comprising: causing at least a portion of the GUI element to extend or spread towards an object location of the object based on a pre-determined user preference, a context of the user, and/or a context of the object, wherein the object feature characterized by the object data is the object location.
7 . The method of claim 1 , wherein the user input causes the GUI element to relocate to be less proximate to another location of a separate object based on a pre-determined user preference, a context of the user, and/or a context of the object.
8 . The method of claim 1 , wherein the user input directed to causing the GUI element to relocate the GUI element relative to the object is a multimodal input that includes at least an audible component and/or a textual component.
9 . The method of claim 1 , wherein receiving the natural language input directed to the automated assistant, and/or the other application includes: receiving an additional input directed to natural language content of the object, wherein the natural language input includes the natural language content of the object.
10 . The method of claim 9 , wherein the additional input is a user gesture directed to the computing device, or the separate computing device, to cause a text search to be performed using the natural language content of the object.
11 . The method of claim 1 , wherein causing the GUI element to exhibit the change to the element feature for the GUI element based on the object data includes: causing the GUI element to exhibit a change in transparency and/or in color that is proportional to one or more features of the user input.
12 . The method of claim 11 , wherein the one or more features of the user input includes: a determined distance that a user extremity, and/or the GUI element, moves when providing at least a portion of the user input, and/or a determined velocity and/or acceleration of the user extremity, and/or the GUI element, when providing at least the portion of the user input.
13 . The method of claim 1 , wherein the GUI element is visible at a virtual reality GUI, and/or an augmented reality GUI, rendered at the display interface of the computing device.
14 . The method of claim 1 , wherein the natural language input corresponds to a request for the automated assistant application, or the other application, to provide the generative output for indicating a difference between the object and another object associated with a separate GUI element rendered at the display interface.
15 . The method of claim 1 , further comprising: causing, in response to receiving the user input, the GUI element to exhibit a fluid characteristic relative to one or more object features of the object and/or other objects visible via the display interface of the computing device, wherein the fluid characteristic includes cohesion, hydrophilic movement, hydrophobic movement, and/or surface tension.
16 . The method of claim 15 , wherein the GUI element dynamically exhibits the hydrophobic movement and/or hydrophilic movement when the element feature of the GUI element is changing based on the object data and in response to the user input directed to the GUI element.
17 . A method implemented by one or more processors, the method comprising: receiving one or more user inputs directed to causing graphical user interface (GUI) elements to relocate the GUI elements relative to objects that are visible via a display interface of a computing device, wherein the GUI elements can be relocated at the display interface to be associated with rendered and/or tangible objects that are visible to a user who is viewing the display interface; generating, in response to receiving the user input, object data for the objects that are visible via the display interface, wherein the object data characterizes object features of the objects; causing the GUI elements to exhibit changes to element features for the GUI elements based on the object data, and in response to the one or more user inputs directed to the GUI elements, wherein the changes to the element features cause the GUI elements to exhibit one or more features that are based on the object data; receiving a natural language input directed to an automated assistant, and/or another application, that uses one or more generative models to provide generative output based on the natural language input; causing, in response to receiving the natural language input, the one or more generative models to be employed for processing input data that is based on the natural language input and the object data; and causing the computing device, or a separate computing device, to render the generative output.
18 . The method of claim 17 , wherein the GUI elements are visible at a virtual reality GUI, and/or an augmented reality GUI, rendered at the display interface of the computing device.
19 . A method implemented by one or more processors, the method comprising: receiving a user input directed to causing a graphical user interface (GUI) element to relocate the GUI element relative to an object that is visible via a display interface of a computing device, wherein the GUI element can be relocated at the display interface to be associated with a rendered object that is visible to a user who is viewing the display interface; generating, in response to receiving the user input, object data for the object that is visible via the display interface, wherein the object data characterizes an object feature of the object; causing the GUI element to exhibit a change to an element feature for the GUI element, and/or the object, based on the object data, and in response to the user input directed to the GUI element, wherein the change to the element feature causes the GUI element and/or the object to exhibit one or more features that are based on the object data and the user input; receiving a natural language input directed to an automated assistant, and/or another application, that uses one or more generative models to provide generative output based on the natural language input; causing, in response to receiving the natural language input, the one or more generative models to be employed for processing input data that is based on the natural language input and the one or more features; and causing the computing device, or a separate computing device, to render the generative output.
20 . The method of claim 19 , wherein the GUI element dynamically exhibits hydrophobic movement and/or hydrophilic movement when the element feature of the GUI element is changing based on the object data and in response to the user input directed to the GUI element.

Description

BACKGROUND Humans may engage in human-to-computer dialogs with interactive software applications referred to herein as “automated assistants” (also referred to as “digital agents,” “chatbots,” “interactive personal assistants,” “intelligent personal assistants,” “assistant applications,” “conversational agents,” etc.). For example, humans (which when they interact with automated assistants may be referred to as “users”) may provide commands and/or requests to an automated assistant using spoken natural language input (i.e., utterances), which may in some cases be converted into text and then processed, and/or by providing textual (e.g., typed) natural language input. In some instances, a user may interact with an automated assistant, or other application, to receive assistance with various tasks that can be performed by a computing device, or otherwise facilitated by a computing device. However, an automated assistant may not be able to provide accurate responses to a user inquiry without relevant context. For example, a user viewing a display interface may provide an automated assistant with an inquiry regarding a particular portion of content in the display interface. In response, the automated assistant may provide a generated output based on the entire content of the display interface, and/or other contextual information, which may not be particularly relevant to the inquiry. As a result, the user may repeat their inquiry, thereby wasting resources at the computing device and any other remote device that may be employed to facilitate assistant functionality. Furthermore, processing of the content of the display interface can be wasteful when much of the content is not relevant to providing the generated output. For example, processing of non-specific display content can be performed using various machine learning models, and undertaking such processing can consume significant processing bandwidth. Therefore, determining how to facilitate accurate assistant responses and/or other outputs, while relying on less input content, can realize various benefits to local and remote devices, including reduced consumption of power and processing bandwidth. SUMMARY Implementations set forth herein relate to an automated assistant and/or another application that can provide generative output based on a natural language input and the location of one or more user configurable GUI elements at a display interface. For example, a user configurable GUI element can refer to a GUI shape, such as a droplet or other recognizable shape that can be relocatable at a display interface in response to a user input. The display interface can be, but is not limited to, a display panel, a touch display interface, a wearable display interface, a virtual reality GUI rendered at a device, an augmented reality GUI rendered at a device, and/or any other type of user interface. The GUI element can be relocated on the display interface to be increasingly or decreasingly associated with one or more objects that are visible via the display interface or other interface of a computing device. In some implementations, relocating the GUI element can be performed in response to one or more multi-modal inputs, or an input at a single interface of a computing device. For example, a multi-modal input can have an audio component, textual component, visual component, haptic component, tactile component, and/or any other component that can be associated with an input or an output for a computing device. In some implementations, a facial expression plus a hand or finger movement can result in a modification to one or more features of a GUI element (e.g., a location of the GUI element and a transparency of the GUI element). Alternatively, or additionally, a spoken input plus a touch input can result in a modification to a location and a size of a GUI element. When the display interface refers to a portion of an augmented reality device, the GUI element can be a portion of a rendered augmented GUI that is visible via an interface that is at least partially transparent. As a result, the GUI element can appear adjacent to, over top of, and/or otherwise proximate to a tangible object in reality, and that is visible when a user is directing their gaze the object. Alternatively, and in some implementations, the GUI element can be displayed at a touch display interface, such as those embodied with tablet computing devices, cellular phones, desktop displays, and/or any other computing devices with display interfaces that may not be entirely transparent. In some implementations, the GUI element can be relocated and/or otherwise modified through user input. In some implementations, user inputs can be multimodal for modifying the GUI element. For example, the location of a GUI element can be modified using a touch input gesture or non-touch gesture, such as the motioning of an extremity in front of a camera. Alternatively, or additionally, the location of a GUI elemen