US-12626287-B2 - Processing crowd-sourced information using machine learning based language models

US12626287B2US 12626287 B2US12626287 B2US 12626287B2US-12626287-B2

Abstract

A system, for example, an online system uses a machine learning based language model, for example, a large language model (LLM) to process crowd-sourced information provided by users. The crowd-sourced information may include comments from users represented as unstructured text. The system further receives queries from users and answers the queries based on the crowd-sourced information collected by the system. The system generates a prompt for input to a machine-learned language model based on the query. The system provides the prompt to the machine-learned language model for execution and receives a response from the machine-learned language model. The response comprises the insight on the topic and evidence for the insight. The evidence identifies one or more comments used to obtain the insight.

Inventors

Li Tan
Haixun Wang
Shishir Kumar Prasad
Tejaswi TENNETI
Aomin Wu
Jagannath Putrevu

Assignees

MAPLEBEAR INC.

Dates

Publication Date: 20260512
Application Date: 20240305

Claims (20)

1 . A method comprising: at a computer system comprising a processor and a computer-readable medium: receiving a plurality of comments from users; providing the plurality of comments to an index associated with a machine learning based large language model; receiving, from a client device, a request; querying the index based on the received request to retrieve context data relevant to the received request; generating a prompt for input to the machine learning based large language model based on the received request from the client device and the context data retrieved from the index; providing the prompt to the machine learning based large language model for execution; receiving a response generated by the machine learning based large language model based on the prompt; and providing request response information to the client device based on the response received from the machine learning based large language model.
2 . The method of claim 1 , wherein the query requests an insight on a particular topic, wherein the response generated by the machine learning based language model comprises the insight on the particular topic and evidence for the insight, the evidence identifying one or more comments used to obtain the insight.
3 . The method of claim 2 , wherein the query concerns an item, wherein the prompt requests the machine learning based language model to generate insight, wherein the insight identifies one or more properties of the item.
4 . The method of claim 3 , further comprising, identifying a set of possible values for each of the one or more properties, wherein the set of possible values for a particular property is determined based on one or more of: accessing a database for identifying the set of possible values for the particular property; or generating a new prompt for the machine learning based language model, the prompt identifying a particular item and a particular property and requesting the machine learning based language model to list properties of the item.
5 . The method of claim 4 , wherein the response identifies one or more properties of the item, the method further comprising: configuring a user interface to display the one or more properties of the item, the user interface configured to display a set of values for each of the one or more properties and allowing users to select one or more values for each of the one or more properties; and sending the user interface for display.
6 . The method of claim 1 , further comprising: generating one or more aggregate representations based on the plurality of comments, wherein the response is based on the one or more aggregate representations based on the plurality of comments.
7 . The method of claim 6 , wherein the prompt for input to the machine learning based language model is based on the query and an aggregate representation of the plurality of comments.
8 . The method of claim 6 , further comprising: categorizing the plurality of comments into a plurality of categories, each category associated with a summary representing an aggregate representation of comments assigned to the category.
9 . The method of claim 8 , wherein the query represents a comment from a user, wherein the response determines a category associated with the query, the method further comprising: routing the comment to a particular user based on the category associated with the query.
10 . A non-transitory computer readable storage medium storing instructions that when executed by one or more computer processors cause the one or more computer processors to perform steps comprising: receiving a plurality of comments from users; providing the plurality of comments to an index associated with a machine learning based large language model; receiving, from a client device, a request; querying the index based on the received request to retrieve context data relevant to the received request; generating a prompt for input to the machine learning based large language model based on the received request from the client device and the context data retrieved from the index; providing the prompt to the machine learning based large language model for execution; receiving a response generated by the machine learning based large language model based on the prompt; and providing request response information to the client device based on the response received from the machine learning based large language model.
11 . The non-transitory computer readable storage medium of claim 10 , wherein the query requests an insight on a particular topic, wherein the response generated by the machine learning based language model comprises the insight on the particular topic and evidence for the insight, the evidence identifying one or more comments used to obtain the insight.
12 . The non-transitory computer readable storage medium of claim 11 , wherein the query concerns an item, wherein the prompt requests the machine learning based language model to generate insight, wherein the insight identifies one or more properties of the item further comprising, identifying a set of possible values for each of the one or more properties by performing one or more of: accessing a database for identifying the set of possible values for a particular property; or generating a new prompt for the machine learning based language model, the prompt identifying a particular item and a particular property and requesting the machine learning based language model to list properties of the item.
13 . The non-transitory computer readable storage medium of claim 12 , wherein the response identifies one or more properties of the item, the instructions further causing the one or more computer processors to perform steps comprising: configuring a user interface to display the one or more properties of the item, the user interface configured to display a set of values for each of the one or more properties and allowing users to select one or more values for each of the one or more properties; and sending the user interface for display.
14 . The non-transitory computer readable storage medium of claim 10 , the instructions further causing the one or more computer processors to perform steps comprising: generating one or more aggregate representations based on the plurality of comments, wherein the response is based on the one or more aggregate representations based on the plurality of comments.
15 . The non-transitory computer readable storage medium of claim 14 , the instructions further causing the one or more computer processors to perform steps comprising: categorizing the plurality of comments into a plurality of categories, each category associated with a summary representing an aggregate representation of comments assigned to the category.
16 . The non-transitory computer readable storage medium of claim 15 , wherein the query represents a comment from a user, wherein the response determines a category associated with the query, the instructions further causing the one or more computer processors to perform steps comprising: routing the comment to a particular user based on the category associated with the query.
17 . A computer system comprising: one or more computer processors; and a non-transitory computer readable storage medium storing instructions that when executed by one or more computer processors cause the one or more computer processors to perform steps comprising: receiving a plurality of comments from users; providing the plurality of comments to an index associated with a machine learning based large language model; receiving, from a client device, a request; querying the index based on the received request to retrieve context data relevant to the received request; generating a prompt for input to the machine learning based large language model based on the received request from the client device and the context data retrieved from the index; providing the prompt to the machine learning based large language model for execution; receiving a response generated by the machine learning based large language model based on the prompt; and providing request response information to the client device based on the response received from the machine learning based large language model.
18 . The computer system of claim 17 , wherein the query requests an insight on a particular topic, wherein the response generated by the machine learning based language model comprises the insight on the particular topic and evidence for the insight, the evidence identifying one or more comments used to obtain the insight.
19 . The computer system of claim 17 , wherein the query concerns an item, wherein the prompt requests the machine learning based language model to generate insight, wherein the insight identifies one or more properties of the item further comprising, identifying a set of possible values for each of the one or more properties by performing one or more of: accessing a database for identifying the set of possible values for a particular property; or generating a new prompt for the machine learning based language model, the prompt identifying a particular item and a particular property and requesting the machine learning based language model to list properties of the item.
20 . The computer system of claim 17 , wherein the response identifies one or more properties of an item, the instructions further causing the one or more computer processors to perform steps comprising: configuring a user interface to display the one or more properties of the item, the user interface configured to display a set of values for each of the one or more properties and allowing users to select one or more values for each of the one or more properties; and sending the user interface for display.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Application No. 63/450,255, filed on Mar. 6, 2023, and of U.S. Provisional Application No. 63/462,722, filed on Apr. 28, 2023, each of which is incorporated by reference herein in its entirety. TECHNICAL FIELD One or more aspects described herein relate generally to analyzing unstructured text data using machine learning based language models, and more specifically to processing unstructured text such as crowd-sourced information using machine learning based language models. BACKGROUND Systems such as online systems often store large amounts of unstructured data. For example, a system may store a large corpus of documents comprising unstructured data. An online system may receive user feedback representing unstructured text. Often such unstructured data includes information that is useful to other users. Typical techniques for searching through such data include text search techniques. If the data comprises a large number of small documents, for example, documents representing user feedback, conventional text search techniques for processing the data can be inefficient. For example, several documents may match a search request, thereby requiring significant manual effort to process the search results. SUMMARY In accordance with one or more aspects of the disclosure, a system receives crowd-sourced information from multiple client devices. The system provides the crowd-sourced information to a machine learning based language model. The system uses the aggregate representations to answer queries from users based on the crowd-sourced information. The system receives a query and generates a prompt for the machine learning based language model based on the query. The system provides the prompt to the machine learning based language model for execution and receives a response generated by the machine learning based language model. The system provides the generated response to the user providing the query. According to one or more embodiments, the system receives crowd-sourced information comprising one or more statements from multiple client devices. The system generates an aggregate representation of the crowd-sourced information. The system receives a query from a client device. The system generates a prompt for a machine learning based language model based on the query and the aggregate representation of the crowd-sourced information. The system provides the prompt to the machine learning based language model for execution. The system receives a response generated by executing the machine learning based language model on the prompt and provides the response to the user of the client device. According to one or more embodiments, the crowd-sourced information represents comments received from users. The system receives requests from users and generates insights based on the comments. The system extracts insights from the corpus of comments received from users. The system also provides evidence for each insight, for example, specific comments used to gain a particular insight. According to one or more embodiments, the system receives comments from users. The system provides the comments to an index associated with a machine learned language model. The index aggregates the information in the comments so that questions based on the comments can be answered in conjunction with the machine learned language model. The system receives a query requesting an insight on a particular topic. The system generates a prompt for the machine learning based language model based on the query. The system provides the prompt to the machine learning based language model for execution. The system receives a response generated by the machine learning based language model based on the prompt. The response comprises the insight on the particular topic and evidence for the insight. The evidence identifies one or more comments used to obtain the insight. The system provides the response to a user that requested the information. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1A illustrates an example system environment for an online system, in accordance with one or more embodiments. FIG. 1B illustrates an example system environment for an online system, in accordance with one or more embodiments. FIG. 2 illustrates an example system architecture for an online system, in accordance with one or more embodiments. FIG. 3 is a flowchart for answering shopping related queries based on crowd-sourced information describing retailers collected from multiple users, in accordance with one or more embodiments. FIG. 4 is a flowchart for answering queries based on crowd-sourced information, in accordance with one or more embodiments. FIG. 5 is a flowchart for answering queries based on comments from users, in accordance with one or more embodiments. FIG. 6 is a flowchart for improving user interactions associated with items such as products or services, in accordance