EP-4339940-B1 - DIRECTING A VEHICLE CLIENT DEVICE TO USE ON-DEVICE FUNCTIONALITY

EP4339940B1EP 4339940 B1EP4339940 B1EP 4339940B1EP-4339940-B1

Inventors

AGGARWAL, VIKRAM
KRISHNAN, VINOD

Dates

Publication Date: 20260513
Application Date: 20190212

Claims (9)

A method implemented by one or more processors, the method comprising: determining that natural language content embodied in a spoken utterance, received at a first vehicle computing device, corresponds to a first intent request; determining, based on determining that the natural language content corresponds to the first intent request, an extent to which the first intent request is supported by a server device for a version corresponding to the first vehicle computing device; generating, based on the extent to which the first intent request is supported by the server device, first data that characterizes an intent requested by the user; determining that other natural language content embodied in audio data that captures a further spoken utterance, received at a second vehicle computing device, corresponds to a second intent request; determining, based on determining that the other natural language content includes the second intent request, another extent to which the second intent request is supported by the server device for another version corresponding to the second vehicle computing device, wherein the version is different than the other version, and wherein the second intent request is not completely supported by the server; generating, based on the other extent to which the second intent request is supported by the server device, second data characterizing the other natural language content of the other spoken utterance, wherein the second data includes speech-to-text data that is generated by the server device based on performing speech-to-text processing of the audio data that captures the further spoken utterance; providing, to the first vehicle computing device, the first data in furtherance of causing the first vehicle computing device to fulfill the first intent request; and providing, to the second vehicle computing device, the second data in furtherance of causing the second vehicle computing device to fulfill the second intent request wherein providing the second data to the second vehicle computing device causes the second vehicle computing device to locally perform an action that is locally generated by the second vehicle computing device based on the speech-to-text data of the second data, wherein the other version corresponding to the second vehicle computing device was initially supported by the server device subsequent to a time when the version corresponding to the first vehicle computing device was initially supported by the server device.
The method of claim 1, wherein the first intent request and the second intent request correspond to a type of vehicle hardware device, and wherein the first data further characterizes an action corresponding to the intent and an operation capable of being performed by the type of vehicle hardware intent.
The method of claim 2, wherein, in response to the second vehicle computing device receiving the second data, the second data causes the type of vehicle hardware device to perform the operation and/or a different operation.
The method as in claim 2 or 3, wherein the type of vehicle hardware device includes one or more sensors, one or more other computing devices, and/or one or more electromechanical devices.
The method of any preceding claim, wherein the first data that characterizes the intent requested by the user comprises: action data for the first intent request that is executable by the first vehicle computing device to fulfill the first intent request.
The method of any preceding claim, wherein the second vehicle computing device locally generates the action using a local natural language understanding engine and a local action engine.
The method of any preceding claim, further comprising: receiving, from the second vehicle computing device, second device version data; and determining the second version based on the second device version data received from the second vehicle computing device.
A computer program comprising instructions, which, when executed by one or more processors, cause the one or more processors to carry out the method of any one of the preceding claims.
A system comprising one or more processors for carrying out the method of any one of claims 1 to 7.

Description

Background Humans may engage in human-to-computer dialogs with interactive software applications referred to herein as "automated assistants" (also referred to as "digital agents," "chatbots," "interactive personal assistants," "intelligent personal assistants," "assistant applications," "conversational agents," etc.). For example, humans (which when they interact with automated assistants may be referred to as "users") may provide commands and/or requests to an automated assistant using spoken natural language input (i.e., utterances), which may in some cases be converted into text and then processed, and/or by providing textual (e.g., typed) natural language input. Automated assistants can be installed at a variety of different devices such as, for example, mobile phones, smart home devices, and/or vehicles. Unlike mobile phones, and other computing devices, a vehicle can often be utilized for an extended period of time (e.g., ten or more years) by a respective owner before the owner eventually decides to purchase a replacement vehicle. During this time period of ownership of the vehicle, software that is installed at vehicle can be subject to updates. For example, updates can be provided to a vehicle computing device in order to allow a vehicle computing device to be responsive to commands that newer smart home devices and/or newer mobile phone can handle. However, a user may choose to not install certain updates, leading to incompatibility between the vehicle computing device and a remote server device with which the vehicle computing device interacts in responding to a command. Also, after a period of time (e.g., three or more years), updates may no longer be provided for the vehicle computing device due to for example, end of support life, inability of hardware of the vehicle computing device to execute new updates, and/or other factor(s). This can also lead to incompatibility between the vehicle computing device and the remote server device. When the vehicle computing device and the remote server device become incompatible, the server device may respond to requests, from the vehicle computing device, that are no longer interpretable by the vehicle computing device. This can lead to the vehicle computing device failing to appropriately respond to various commands, and leads to the vehicle computing device wastefully transmitting various data to the server device and/or the server device wastefully transmitting various data to the vehicle computing device (since some server device response(s) will no longer be interpretable by the vehicle computing device). Some techniques attempt to address this problem by having the server device be compatible with most recent updates, as well as with prior versions of vehicle computing devices. However, to provide such backwards compatibility indefinitely can require extensive storage, memory, and/or processor usage at the server device. WO 2018/199483 A1 discloses, a method and a device can provide an intelligent agent composed to execute an application according to a user input. An electronic device comprises a housing, a touch screen display, a microphone, at least one speaker, a wireless communication circuit, a processor, and a memory. Summary The invention is defined in the appended claims. Implementations set forth herein relate to techniques for processing spoken utterances received at vehicle computing devices that-although including operable software-correspond to a version that is subject to a gradual phasing-out by the server device, and/or any other supporting system. The gradual phasing out can result in a spectrum of support for various server operations for versions of hardware and/or software. These server operations can include, but are not limited to, speech-to-text processing, natural language understanding (e.g., intent identification and/or slot value identification), action generation, and/or action execution. The server device can operate to gradually phase out performance of one or more of the operations over time, based on a version of hardware and/or software becoming outdated relative to other, newly released hardware and/or software. As a result, particular operations, such as generating actions and/or slot values, for particular versions can be exclusively performed locally as a result of those particular versions being subject to a phasing-out. In this way, computing devices that are typically operated longer than most others (e.g., vehicle computing devices) can nonetheless receive some amount of support from a server device for longer periods of time, despite not corresponding to a latest version. For instance, a spoken utterance can include natural language content that the user typically uses to control another one of their devices, such as a smart home device and/or a mobile phone, and the natural language content can specify a requested intent. When the vehicle computing device corresponds to a current version being supported by