EP-4736031-A1 - USER ACTIVITY HISTORY EXPERIENCES POWERED BY A MACHINE LEARNING MODEL

EP4736031A1EP 4736031 A1EP4736031 A1EP 4736031A1EP-4736031-A1

Abstract

Machine learning techniques are leveraged to provide personalized assistance on a computing device. In some configurations a timeline of a user's interactions with the computing device is generated. For example, screenshots and audio streams may be saved as entries in the timeline. Context – the state of the computing device when the entry is created, such as which documents and websites are open – is also stored. Entries in the timeline are processed by a model to generate embedding vectors. The timeline may be searched by finding the embedding vector that is closest to an embedding vector derived from a search query. The user may select a query result, causing the associated context to be restored. For example, if the query is "show me all documents related to my upcoming trip to Japan", the query result may open documents and websites that were open when booking a flight to Japan.

Inventors

SALOWITZ, Elizabeth Picchietti
PERRY, DAVID BEN
PESSOA, Carlos A.C.
PRADEEP, VIVEK
VISWANATHAN, SHARATH
Luquetta-Fish, Nathan James
BATHICHE, STEVEN
SOMMERLADE, ERIC CHRIS WOLFGANG
LARA SILVA, Jose Antonio

Assignees

Microsoft Technology Licensing, LLC

Dates

Publication Date: 20260506
Application Date: 20240612

Claims (20)

1. A method comprising: providing a search query (500) to a machine learning model (330); receiving a query embedding vector (510) from the machine learning model (330) that represents the search query (500) in an embedding space (370); selecting an interaction embedding vector (340) from a plurality of interaction embedding vectors based on a distance between the query 7 embedding vector (510) and the interaction embedding vector (340) in the embedding space (370); retrieving a context (360) of an application (310) at a previous point in time based on the selected interaction embedding vector (340); and configuring the application (310) based on the context (360).
2. The method of claim 1, further comprising: monitoring the application for a change in content; in response to the change in content: providing a portion of a screenshot of the application to the machine learning model; receiving the interaction embedding vector from the machine learning model that represents the portion of the screenshot in an embedding space; and determining the context of the application when the screenshot was taken.
3. The method of claim 1, wherein the context of the application comprises screen coordinates of the application, a file name of a document loaded by the application, a page of the document displayed by 7 the application, a website address navigated to by the application, user credentials of the application, or login credentials of a website.
4. The method of claim 2, further comprising: segmenting the screenshot into a plurality 7 of portions based on content type; and selecting the portion of the screenshot from the plurality of portions.
5. The method of claim 2, further comprising: storing the interaction embedding vector in a vector database, wherein the interaction embedding vector is selected from the plurality 7 of interaction embedding vectors by searching the vector database for embedding vectors closest to or within a defined distance of the query embedding vector.
6. The method of claim 1, wherein the interaction embedding vector comprises an index usable to retrieve the screenshot and the context of the application.
7. The method of claim 6, further comprising: retrieving the screenshot and the context of the application using the interaction embedding vector; displaying the screenshot; and receiving a selection of the displayed screenshot, wherein the application is configured based on the context in response to receiving the selection of the screenshot.
8. The method of claim 1, wherein configuring the application based on the context comprises opening a document that was open when the screenshot was taken, navigating to a website that was open when the screenshot was taken, or filling out a form with content taken from the form when the screenshot was taken.
9. A system comprising: a processing unit; and a computer-readable storage medium having computer-executable instructions stored thereupon, which, when executed by the processing unit, cause the processing unit to: receive a current interaction (520) of an individual application (400); provide the current interaction (520) and a prompt (525) to a machine learning model (330); receive a current interaction embedding vector (530) from the machine learning model (330) that represents the current interaction (520) as it relates to the prompt (525) in an embedding space (370); select an interaction embedding vector (340) from a plurality of interaction embedding vectors based on a distance between the current interaction embedding vector (530) and the interaction embedding vector (340) in the embedding space (370), wherein the interaction embedding vector (340) is associated with a previous state of an application (310); generate an action (128) associated with the prompt (525) based on the selected interaction embedding vector (340); and perform the action (128).
10. The system of claim 9, wherein the computer-executable instructions further cause the processing unit to: display a selectable indication of the action, wherein the action is performed in response to receiving a selection of the selectable indication of the action.
11. The system of claim 9. wherein the action displays content relevant to the current interaction, completes a partially-completed portion of content, opens a document, schedules a meeting, shares a document during a meeting, or attaches a document to an email.
12. The system of claim 9, wherein the application comprises a videoconference application, wherein the interaction comprises a screenshot, wherein the individual application comprises an electronic message application, wherein the current interaction comprises a screenshot taken while drafting an electronic message, and wherein the action opens the document that was shared during the meeting based on content of the electronic message.
13. The system of claim 9, wherein the interaction comprises a screenshot or an audio stream.
14. The system of claim 9, wherein the current interaction includes an indication of user input.
15. The system of claim 9, wherein the prompt asks, given a set of documents, which documents a user might want to view.
16. A computer-readable storage medium having encoded thereon computer-readable instructions that when executed by a processing unit causes a system to: provide a search query (500) to a machine learning model (330); receive a query embedding vector (510) from the machine learning model (330) that represents the search query (500) in an embedding space (370); select an interaction embedding vector (340) from a plurality’ of interaction embedding vectors based on a distance between the query embedding vector (510) and the interaction embedding vector (340) in the embedding space (370); retrieve a context (360) based on the selected interaction vector (340); and configure an application (310) based on the context (360).
17. The computer-readable storage medium of claim 16, wherein the search queryreferences content included in the interaction of the application.
18. The computer-readable storage medium of claim 16, wherein the context of the application describes attributes of the application that are not derived from content displayed by the application.
19. The computer-readable storage medium of claim 16, wherein the plurality of interaction embedding vectors comprise a user history timeline, and wherein configuring the application based on the context returns the application to an earlier state.
20. The computer-readable storage medium of claim 19, wherein the machine learning model generates the query embedding vector based on relationships identified between entries in the user history- timeline.

Description

USER ACTIVITY HISTORY EXPERIENCES POWERED BY A MACHINE LEARNING MODEL BACKGROUND [0001] Users perform a wide variety of tasks with computing devices. Common tasks include booking travel, creating documents, video conferencing, and editing photos. Users often switch from one task to another, causing them to lose track of what they were working on. Similarly, when a user completes a task the user may lose track of confirmation emails, itineraries, and other resources generated when performing the task. Traditional search and retrieval methods, such as keyword-based searches, folder hierarchies, and app-specific organization tools, are often inadequate for quickly resuming a task or finding resources generated when a task was performed. These methods rely on users remembering specific details about their past activities, which can be challenging due to the vast amount of information that users generate and interact with. [0002] For example, a user drafting a word processing document may not remember where the document was saved. This problem is exacerbated by the increasing number of storage locations available on modem computing devices. Instead of quickly picking up where they left off, the user may be forced to manually search through a number of directories, attachments, cloud drives, etc., before finding the file. [0003] As another example, a user that was in the process of planning a trip may have forgotten which websites they were using to book flights and hotels. The user may attempt a keyword search on their browsing history, but keyword searches are often inadequate in deciphering context and user intent. For example, a search for travel-related websites may return results associated with a previous trip. [0004] It is with respect to these and other considerations that the disclosure made herein is presented. SUMMARY [0005] Disclosed are systems and methods that leverage machine learning techniques to provide personalized assistance on a computing device. In some configurations a timeline of a user’s interactions with the computing device is generated. For example, screenshots and audio streams may be saved as entries in the timeline. Context - the state of the computing device when an entry is created, such as which documents and websites are open, or what content was filled into a form - is also stored. Entries in the timeline may be processed by a machine learning model, such as a large language model or multi-modal generative model, among others, to generate embedding vectors that represent the entries in an embedding space. [0006] The timeline may be searched by evaluating the associated embedding vectors. For example, an embedding vector derived from a query may be compared to the embedding vectors derived from the timeline. Embedding vectors that are closer, e.g., the distance between them in the embedding space is shorter, are considered more closely related. As such, embedding vectors derived from the timeline that are closest to the query embedding vector, or which are within a defined distance of the query embedding vector, are selected as query results. In some configurations, the user may select one of the query results causing the associated context to be restored. For example, documents and websites that were open when the vacation planning transcript entry was created are re-opened, and data that was entered into a web form may be restored. [0007] Technical benefits of the disclosed embodiments include improved human-computer interaction, conserv ation of processing resources, improved search of local computing resources, and the like. Human-computer interaction is improved by allowing a user to search for content that was previously displayed by an application, even if the content was transitory and was not stored in a file. This unlocks new avenues for answering questions that a user may have about their operation of the computing device. The disclosed embodiments improve the conservation of processing resources by reducing the number of searches that a user may need to perform before they are able to retrieve the desired information/document/interaction. [0008] Features and technical benefits other than those explicitly described above will be apparent from a reading of the following Detailed Description and a review of the associated drawings. This Summary' is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. The term “techniques,” for instance, may refer to system(s), method(s), computer-readable instructions, module(s), algorithms, hardware logic, and/or operation(s) as permitted by the context described above and throughout the document. BRIEF DESCRIPTION OF THE DRAWINGS [009] The Detailed Description is des