US-12625889-B2 - AI user intent for actions on a user device

US12625889B2US 12625889 B2US12625889 B2US 12625889B2US-12625889-B2

Abstract

Systems and methods for classifying user intent based on the user interaction with an entry bar on a user device include receiving an input from a user in an entry box; classifying the input into a category of a plurality of categories including a category for an Artificial Intelligence (AI) session for an AI agent; and performing an action responsive to the classified category, including utilizing the AI agent for the AI session when the category is the category for the AI session and bypassing the AI agent for other categories of the plurality of categories.

Inventors

Howie XU
Omer Shilo
Evgeny Sidorenko
Gal David Shilo
Christopher Joseph O'Connell
Danni Chen
Alejandro Romero
Pawel Stanek

Assignees

Gen Digital Inc.

Dates

Publication Date: 20260512
Application Date: 20241015

Claims (20)

1 . A method implemented through a user device, the method comprising steps of: receiving an input from a user in an entry box of a web browser on the user device; classifying the input into a category of a plurality of categories including a Uniform Resource Locator (URL) category, a data/resource category and a category for an Artificial Intelligence (AI) session for an AI agent, the classifying being performed locally on the user device, as the input is being received and prior to completion of the input, and prior to initiating any remote AI-session or data-service request for the input, by first applying pattern matching using regular expressions until a confidence threshold is met and, if not met, invoking a general-purpose AI model associated with the AI agent that is configured via token reweighting to deterministically output a single-token class label; and performing an action responsive to the classified category, including preloading a webpage when the category is the URL category, establishing and priming an AI session when the category is the category for the AI session, and bypassing the AI agent for other categories of the plurality of categories.
2 . The method of claim 1 , wherein the classifying is performed by an AI model associated with the AI agent, the AI model being a general-purpose AI model configured to perform the AI session as well as to provide the classifying.
3 . The method of claim 1 , wherein the classifying is performed using matching via regular expressions, the regular expressions comprising predefined URL patterns and keyword templates that are executed entirely locally on the user device to filter likely categories before invoking any AI model.
4 . The method of claim 1 , wherein the classifying is performed first using matching via regular expressions as the input is being received until a confidence level is reached, and second via an AI model associated with the AI agent where the confidence level is not reached.
5 . The method of claim 1 , wherein the plurality of categories also includes a category for a Uniform Resource Locator (URL) for loading an associated address and a category for data for obtaining a resource from an external system.
6 . The method of claim 1 , wherein the classifying is performed as the input is being received until a confidence level is reached, without having a full input from the user.
7 . The method of claim 1 , wherein the classifying includes: for each character of the input during the receiving, attempting to determine the category; determining a confidence score for each of the attempting; and determining the classified category based on a level of the confidence score, wherein the confidence score is dynamically updated as each character is entered and used to trigger anticipatory preactions including connection setup of resource allocation.
8 . The method of claim 1 , wherein the performing the action includes pre-loading a webpage when the category is the Uniform Resource Locator (URL), the pre-loading beginning before the user completes the full URL input based on a predicted completion and continuing in parallel with the ongoing classification.
9 . The method of claim 1 , wherein the performing the action includes, when the category is the AI session: performing a connection to the AI agent; and loading base prompts and some parts of a query to the AI agent.
10 . A non-transitory computer-readable medium storing instructions that, when executed, cause one or more processors on a user device to perform steps of: receiving an input from a user in an entry box of a web browser on the user device; classifying the input into a category of a plurality of categories including a Uniform Resource Locator (URL) category, a data/resource category, and a category for an Artificial Intelligence (AI) session for an AI agent, the classifying being performed locally on the user device, as the input is being received and prior to completion of the input, and prior to initiating any remote AI-session or data-service request for the input, by first applying pattern matching using regular expressions until a confidence threshold is met and, if not met, invoking a general-purpose AI model associated with the AI agent that is configured via token reweighting to deterministically output a single-token class label; and performing an action responsive to the classified category, including preloading a webpage when the category is the URL category establishing and priming an AI session when the category is the category for the AI session and, bypassing the AI agent for other categories of the plurality of categories.
11 . The non-transitory computer-readable medium of claim 10 , wherein the classifying is performed by an AI model associated with the AI agent, the AI model being a general-purpose AI model configured to perform the AI session as well as to provide the classifying.
12 . The non-transitory computer-readable medium of claim 10 , wherein the classifying is performed using matching via regular expressions, the regular expressions comprising predefined URL patterns and keyword templates that are executed entirely locally on the user device to filter likely categories before invoking any AI model.
13 . The non-transitory computer-readable medium of claim 10 , wherein the classifying is performed first using matching via regular expressions as the input is being received until a confidence level is reached, and second via an AI model associated with the AI agent where the confidence level is not reached.
14 . The non-transitory computer-readable medium of claim 10 , wherein the plurality of categories also includes a category for a Uniform Resource Locator (URL) for loading an associated address and a category for data for obtaining a resource from an external system.
15 . The non-transitory computer-readable medium of claim 10 , wherein the classifying is performed as the input is being received until a confidence level is reached, without having a full input from the user.
16 . The non-transitory computer-readable medium of claim 10 , wherein the classifying includes: for each character of the input during the receiving, attempting to determine the category; determining a confidence score for each of the attempting; and determining the classified category based on a level of the confidence score, wherein the confidence score is dynamically updated as each character is entered and used to trigger anticipatory preactions including connection or resource allocation.
17 . The non-transitory computer-readable medium of claim 10 , wherein the performing the action includes pre-loading a webpage when the category is Uniform Resource Locator (URL), the pre-loading beginning before the user complete s the full URL input based on a predicted completion and continuing in parallel with the ongoing classification.
18 . The non-transitory computer-readable medium of claim 10 , wherein the performing the action includes, when the category is the AI session: performing a connection to the AI agent; and loading base prompts and some parts of a query to the AI agent.
19 . A user device comprising: one or more processors; and memory storing instructions that, when executed, cause the one or more processors to: receive an input from a user in an entry box of a web browser on the user device; classify the input into a category of a plurality of categories including a Uniform Resource Locator (URL) category, a data/resource category, and a category for an Artificial Intelligence (AI) session for an AI agent, the classifying being performed locally on the user device, as the input is being received and prior to completion of the input, and prior to initiating any remote AI-session of data-service request for the input, by first applying pattern matching using regular expressions until a confidence threshold is met and, if not met, invoking a general-purpose AI model associated with the AI agent that is configured via token reweighting to deterministically output a single-token class label; and perform an action responsive to the classified category, including preloading a webpage when the category is the URL category, establishing and priming an AI session when the category is the category for the AI session and bypassing the AI agent for other categories of the plurality of categories.
20 . The user device of claim 19 , wherein the user device includes an AI model associated with the AI agent, the AI model being a general-purpose AI model configured to perform the AI session as well as to classify the input.

Description

FIELD OF THE DISCLOSURE The present disclosure relates generally to computing and Artificial Intelligence (AI). More particularly, the present disclosure relates to systems and methods for AI user intent for actions being performed on a user device such as through a browser, browser extension, plugin, etc., using general-purpose AI models as special purpose classifiers, and AI model bundling and splitting for widescale distribution. BACKGROUND OF THE DISCLOSURE User devices, such as smartphones, tablets, laptops, and desktop computers, serve as the physical platforms that run web browsers, which are the primary tools for accessing and interacting with the Internet. Web browsers include an entry box which is also referred to as the address bar, Uniform Resource Locator (URL) bar, search bar, location bar, omnibox, or navigation bar, depending on the browser or context. Users interact with the entry box to enter a URL, a search query, or a specific command or question. For example, typing www.acme.com would invoke a URL, typing a specific command like setup may bring up the browser's configuration, and all other entries may be treated as a search query or question. Further, the browser may utilize history and autocomplete to assist with the user's intent. Conversely, AI tool usage is proliferating and today focuses on external AI tools specifically invoked by a user. The conventional approach requires manual user interaction and selection of the AI tools. It would be advantageous to integrate AI agents directly into the browser environment, via the entry box, locally on the user device. BRIEF SUMMARY OF THE DISCLOSURE The present disclosure relates to systems and methods for AI user intent for actions being performed on a user device such as through a browser, browser extension, plugin, etc. Many products (e.g., software tools such as browsers) are now integrating AI agents into their workflows, typically in one of two ways. The most common approach requires users to manually select when they want to use an AI agent, leading to additional user interface interactions and the need to educate users about the AI option. This also introduces the downsides of having separate modes in the interface (e.g., AI mode vs. non-AI mode). Alternatively, some products pass all user inputs to the AI, which may rely on other systems for support, integrating those outputs into its response (sometimes called a Retrieval Augmented Generation architecture). While this approach eliminates the need for mode selection, it introduces significant latency and costs due to the AI processing time. Moreover, incorporating additional resources into the AI's reasoning chain creates integration challenges and increases the risk of AI “hallucinations,” which can lead to inaccurate responses. This issue is particularly challenging for browsers, where it is crucial to determine whether the user intends to visit a URL, open a resource, access ephemeral information, engage with an AI assistant, or perform other actions. To that end, the present disclosure includes various approaches to detect user intent via AI for various actions. In an embodiment, the present disclosure includes quickly and precisely classifying user interaction automatically as the user types in the entry box (or alternatively, immediately upon hitting the “return” key, or action button.) This classification is performed locally on the user device and the present disclosure also includes various techniques for supporting an AI model locally on the user device. In another embodiment, the present disclosure includes the use of a general-purpose AI model on the user device as a special purpose classifier for the classifying user interaction, thereby removing the need to have separate AI models on the user device. In another embodiment, the present disclosure includes approaches to bundling and splitting AI models for widespread distribution to different types of user devices (in terms of hardware, memory, processing capability, etc.). The present disclosure makes a browser or the like into a “co-browser,” following a user every step of the way, giving shortcuts and streamlining interactions. This can save time initiating AI sessions instead of providing search queries. The present disclosure includes such approach with various embodiments for including classifying intent and getting the shortcut. It is based on both AI model (local or remote) and information about you (history, profile, etc.). The functionality of the entry box is more powerful than existing boxes which toggle between search and URL. The objective of these various techniques is to improve user experience by accurately and quickly inferring intent, eliminating requirements for the user to actively select different functions (e.g., search, AI, URL, or other actions) while improving latency. With the techniques descried herein, the intent can be quickly determined, with slow, expensive and non-deterministic technique