US-12626694-B2 - Interfacing between digital assistant applications and navigation applications
Abstract
The present disclosure is generally related to systems and methods of interfacing among multiple applications in a networked computer environment. A data processing system can access a navigation application to retrieve point locations within a reference frame corresponding to a geographic region displayed in a viewport of the navigation application. Each point location can have an identifier. The data processing system can parse an input audio signal to identify a request and a referential word. The data processing system can identify a point location within the reference frame based on the referential word parsed from the input audio signal and the identifier for the point location. The data processing system can generate an action data structure including the point location identified. The data processing system can transmit the action data structure to the navigation application to initiate a navigation guidance process using the point location.
Inventors
- Vikram Aggarwal
- Moises Morgenstern Gali
Assignees
- GOOGLE LLC
Dates
- Publication Date
- 20260512
- Application Date
- 20240916
Claims (20)
- 1 . A method implemented by one or more processors, the method comprising: receiving, while a display of a client device is rendering a viewport of a navigation application accessible by the client device, an input audio signal detected by a sensor of the client device; parsing the input audio signal to identify at least a referential word; transmitting, to a data processing system, at least the referential word, wherein transmitting the referential word to the data processing system causes the data processing system to: retrieve, based on the referential word, one or more point locations within a reference frame corresponding to a geographic region displayed in the viewport of the navigation application when the input audio signal is received, the viewport corresponding to an area of a display of the client device through which the reference frame is visible, and the reference frame corresponding to the geographic region displayed in the viewport is different from a previously presented reference frame corresponding to a current location of the client device; and transmit, to the client device, an indication of the one or more point locations within the reference frame corresponding to the geographic region displayed in the viewport of the navigation application when the input audio signal is received; receiving, from the data processing system, the indication of the one or more point locations within the reference frame corresponding to the geographic region displayed in the viewport of the navigation application when the input audio signal is received; and causing the navigation application to display the one or more point locations within the reference frame corresponding to the geographic region displayed in the viewport of the navigation application when the input audio signal is received.
- 2 . The method of claim 1 , wherein parsing the input audio signal further comprises: parsing the input audio signal to identify a request, wherein the referential word is transmitted to the data processing system along with the referential word.
- 3 . The method of claim 2 , wherein the request is a request to initiate a navigation guidance process, and wherein the navigation guidance process provides directions from the current location of the client device to a destination location associated with the given point location.
- 4 . The method of claim 2 , wherein the request is a request to display the one or more point locations associated with the referential word.
- 5 . The method of claim 2 , wherein the request is a request for information about the one or more point locations associated with the referential word.
- 6 . The method of claim 1 , further comprising: receiving, while the display of the client device is rendering the viewport of the navigation application accessible by the client device, an additional input audio signal detected by the sensor of the client device; parsing the additional input audio signal to identify a request; and causing, based on the given point location, the navigation application to initiate a navigation guidance process.
- 7 . The method of claim 6 , wherein the navigation guidance process provides directions from the current location of the client device to a destination location associated with the given point location.
- 8 . A system comprising: at least one processor; and memory storing instructions that, when executed, cause the at least one processor to be operable to: receive, while a display of a client device is rendering a viewport of a navigation application accessible by the client device, an input audio signal detected by a sensor of the client device; parse the input audio signal to identify at least a referential word; transmit, to a data processing system, at least the referential word, wherein transmitting the referential word to the data processing system causes the data processing system to: retrieve, based on the referential word, one or more point locations within a reference frame corresponding to a geographic region displayed in the viewport of the navigation application when the input audio signal is received, the viewport corresponding to an area of a display of the client device through which the reference frame is visible, and the reference frame corresponding to the geographic region displayed in the viewport is different from a previously presented reference frame corresponding to a current location of the client device; and transmit, to the client device, an indication of the one or more point locations within the reference frame corresponding to the geographic region displayed in the viewport of the navigation application when the input audio signal is received; receive, from the data processing system, the indication of the one or more point locations within the reference frame corresponding to the geographic region displayed in the viewport of the navigation application when the input audio signal is received; and cause the navigation application to display the one or more point locations within the reference frame corresponding to the geographic region displayed in the viewport of the navigation application when the input audio signal is received.
- 9 . The system of claim 8 , wherein the instructions to parse the input audio signal further comprise instructions to: parse the input audio signal to identify a request, wherein the referential word is transmitted to the data processing system along with the referential word.
- 10 . The system of claim 9 , wherein the request is a request to initiate a navigation guidance process, and wherein the navigation guidance process provides directions from the current location of the client device to a destination location associated with the given point location.
- 11 . The system of claim 9 , wherein the request is a request to display the one or more point locations associated with the referential word.
- 12 . The system of claim 9 , wherein the request is a request for information about the one or more point locations associated with the referential word.
- 13 . The system of claim 8 , wherein the instructions further cause the at least one processor to: receive, while the display of the client device is rendering the viewport of the navigation application accessible by the client device, an additional input audio signal detected by the sensor of the client device; parse the additional input audio signal to identify a request; and cause, based on the given point location, the navigation application to initiate a navigation guidance process.
- 14 . The system of claim 13 , wherein the navigation guidance process provides directions from the current location of the client device to a destination location associated with the given point location.
- 15 . A non-transitory computer-readable storage medium storing instructions that, when executed, cause at least one processor to be operable to perform operations, the operations comprising: receiving, while a display of a client device is rendering a viewport of a navigation application accessible by the client device, an input audio signal detected by a sensor of the client device; parsing the input audio signal to identify at least a referential word; transmitting, to a data processing system, at least the referential word, wherein transmitting the referential word to the data processing system causes the data processing system to: retrieve, based on the referential word, one or more point locations within a reference frame corresponding to a geographic region displayed in the viewport of the navigation application when the input audio signal is received, the viewport corresponding to an area of a display of the client device through which the reference frame is visible, and the reference frame corresponding to the geographic region displayed in the viewport is different from a previously presented reference frame corresponding to a current location of the client device; and transmit, to the client device, an indication of the one or more point locations within the reference frame corresponding to the geographic region displayed in the viewport of the navigation application when the input audio signal is received; receiving, from the data processing system, the indication of the one or more point locations within the reference frame corresponding to the geographic region displayed in the viewport of the navigation application when the input audio signal is received; and causing the navigation application to display the one or more point locations within the reference frame corresponding to the geographic region displayed in the viewport of the navigation application when the input audio signal is received.
- 16 . The non-transitory computer-readable storage medium of claim 15 , wherein parsing the input audio signal further comprises: parsing the input audio signal to identify a request, wherein the referential word is transmitted to the data processing system along with the referential word.
- 17 . The non-transitory computer-readable storage medium of claim 16 , wherein the request is a request to initiate a navigation guidance process, and wherein the navigation guidance process provides directions from the current location of the client device to a destination location associated with the given point location.
- 18 . The non-transitory computer-readable storage medium of claim 16 , wherein the request is a request to display the one or more point locations associated with the referential word.
- 19 . The non-transitory computer-readable storage medium of claim 16 , wherein the request is a request for information about the one or more point locations associated with the referential word.
- 20 . The non-transitory computer-readable storage medium of claim 15 , the operations further comprising: receiving, while the display of the client device is rendering the viewport of the navigation application accessible by the client device, an additional input audio signal detected by the sensor of the client device; parsing the additional input audio signal to identify a request; and causing, based on the given point location, the navigation application to initiate a navigation guidance process, wherein the navigation guidance process provides directions from the current location of the client device to a destination location associated with the given point location.
Description
BACKGROUND Digital assistant applications can operate in a networked computer environment in which processing associated with functionality provided at a client device is performed at a server connected to the client device by way of a network. The server can be provided with data associated with a request at the client device by way of the network. Excessive network transmissions, packet-based or otherwise, of network traffic data between computing devices can prevent a computing device from properly processing the network traffic data, completing an operation related to the network traffic data, or responding timely to the network traffic data. The excessive network transmissions of network traffic data can also complicate data routing or degrade the quality of the response when the responding computing device is at or above its processing capacity, which may result in inefficient bandwidth utilization, consumption of computing resources, and depletion of battery life. A portion of the excessive network transmissions can include transmissions for requests that are not valid requests. Additional challenges exist in the provision of a speech-based interface with applications that typically operate as a graphical user interface, particularly in such a networked environment in which it is desirable to minimize excessive network transmissions. SUMMARY According to an aspect of the disclosure, a system to interface among multiple applications in a networked computer environment can include a data processing system having one or more processors. A navigation interface component executed on the data processing system can access a navigation application executing on a first client device to retrieve a plurality of point locations within a reference frame corresponding to a geographic region displayed in a viewport of the navigation application. Each point location of the plurality of locations can have an identifier. A natural language processor component executed on the data processing system can receive an input audio signal detected by a sensor of at least one of the first client and a second client device. The natural language processor component can parse the input audio signal to identify a request and a referential word. The natural language processor component can identify, responsive to the identification of the request, a point location from the plurality of point locations within the reference frame based on the referential word parsed from the input audio signal and the identifier for the point location. An action handler component executed on the data processing system can generate an action data structure including the point location identified responsive to the detection of the input audio signal. The action handler component can transmit the action data structure to the first client device to cause the navigation application to initiate a navigation guidance process using the point location. According to an aspect of the disclosure, a method of interfacing among multiple applications in a networked computer environment can include accessing a navigation application executing on a first client device to retrieve a plurality of point locations within a reference frame corresponding to a geographic region displayed in a viewport of the navigation application. Each point location of the plurality of locations can have an identifier. The method can include receiving an input audio signal detected by a sensor of at least one of the first client and a second client device. The method can include parsing the input audio signal to identify a request and a referential word. The method can include identifying, responsive to identifying the request, a point location from the plurality of point locations within the reference frame based on the referential word parsed from the input audio signal and the identifier for the point location. The method can include generating an action data structure including the point location identified responsive to the detection of the input audio signal. The method can include transmitting the action data structure to the first client device to cause the navigation application to initiate a navigation guidance process using the point location. Each aspect may include one or more of the following features. The navigation interface component may access the navigation application to determine a first portion of the reference frame corresponding to the geographic region displayed concurrently to the receipt of the input audio signal and to determine a second portion of the reference frame corresponding to the geographic region previously displayed in the viewport based on a velocity of the first client device acquired from an inertial motion unit. The natural language processor component may identify the point location from the plurality of point locations within the reference frame based on a travel direction of at least one of the first client and the second client device determi