US-12626292-B2 - Processing queries using reinforcement learning

US12626292B2US 12626292 B2US12626292 B2US 12626292B2US-12626292-B2

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing a user query, comprising receiving a user query corresponding to a primary node in the computational graph; processing the user query to determine one or more listing nodes in the computational graph, providing, to the user device, data associated with one or more listing nodes. The processing comprises: determining a first set of random walks starting at the primary node according to a policy, determining a second set of random walks starting at one or more secondary nodes according to the policy, determining a final score for each of a first set of listing nodes reached by the first set of random walks, and determining the one or more listing nodes from the first set of listing nodes based on the final scores.

Inventors

Andrew Stanton
Arthur Maciejewicz
Stephen Balogh

Assignees

ETSY, INC.

Dates

Publication Date: 20260512
Application Date: 20231222

Claims (20)

1 . A method, comprising: receiving, from a user device, a user query corresponding to a primary node in a computational graph; processing the received user query to determine one or more listing nodes in the computational graph for the user query, the processing comprising: determining a first set of random walks starting at the primary node according to a policy trained using reinforcement learning, wherein the primary node is determined based on text of the user query, determining a second set of random walks starting at one or more secondary nodes in the computational graph according to the policy, wherein each of the one or more secondary nodes is determined based on context information associated with one or more user activities and is different from the primary node, generating a final score for each of a first set of listing nodes in the computational graph reached by the first set of random walks based on outputs of the first set of random walks and the second set of random walks, and identifying the one or more listing nodes from the first set of listing nodes based on the final scores; and providing, to the user device, data associated with one or more listing nodes that correspond to the user query.
2 . The method of claim 1 , wherein generating the final score for each of the first set of listing nodes in the computational graph comprises: determining the first set of listing nodes in the computational graph reached by the first set of random walks from the primary node according to the policy; determining a second set of listing nodes in the computational graph reached by the second set of random walks from the one or more secondary nodes according to the policy; and generating the final score for each of the first set of listing nodes by updating a score for one or more listing nodes in the first set of listing nodes according to the second set of listing nodes.
3 . The method of claim 1 , wherein updating the score for the one or more listing nodes in the first set of listing nodes comprises: for each listing node in the first set of listing nodes that is also in the second set of listing nodes, updating the score for the listing node in the first set of listing nodes by adding a modification value.
4 . The method of claim 3 , wherein the modification value is computed based on (i) a value proportional to a total count of reaches at the listing node in the second set of random walks and (ii) a boost value.
5 . The method of claim 4 , wherein the boost value includes a range from zero to one.
6 . The method of claim 3 , wherein the modification value is computed using an exponentiation function with a base proportional to a total count of reaches at the listing node in the second set of random walks and a pre-determined power value.
7 . The method of claim 1 , wherein the first set of random walks and the second set of random walks are performed in parallel using two or more threads using respective random seeds.
8 . A system, comprising: one or more memory devices storing instructions; and one or more data processing apparatus that are configured to interact with the one or more memory devices, and upon execution of the instructions, perform operations including: receiving, from a user device, a user query corresponding to a primary node in a computational graph; processing the received user query to determine one or more listing nodes in the computational graph for the user query, the processing comprising: determining a first set of random walks starting at the primary node according to a policy trained using reinforcement learning, wherein the primary node is determined based on text of the user query, determining a second set of random walks starting at one or more secondary nodes in the computational graph according to the policy, wherein each of the one or more secondary nodes is determined based on context information associated with one or more user activities and is different from the primary node, generating a final score for each of a first set of listing nodes in the computational graph reached by the first set of random walks based on outputs of the first set of random walks and the second set of random walks, and identifying the one or more listing nodes from the first set of listing nodes based on the final scores; and providing, to the user device, data associated with one or more listing nodes that correspond to the user query.
9 . The system of claim 8 , wherein generating the final score for each of the first set of listing nodes in the computational graph comprises: determining the first set of listing nodes in the computational graph reached by the first set of random walks from the primary node according to the policy; determining a second set of listing nodes in the computational graph reached by the second set of random walks from the one or more secondary nodes according to the policy; and generating the final score for each of the first set of listing nodes by updating a score for one or more listing nodes in the first set of listing nodes according to the second set of listing nodes.
10 . The system of claim 8 , wherein updating the score for the one or more listing nodes in the first set of listing nodes comprises: for each listing node in the first set of listing nodes that is also in the second set of listing nodes, updating the score for the listing node in the first set of listing nodes by adding a modification value.
11 . The system of claim 10 , wherein the modification value is computed based on (i) a value proportional to a total count of reaches at the listing node in the second set of random walks and (ii) a boost value.
12 . The system of claim 11 , wherein the boost value includes a range from zero to one.
13 . The system of claim 10 , wherein the modification value is computed using an exponentiation function with a base proportional to a total count of reaches at the listing node in the second set of random walks and a pre-determined power value.
14 . The system of claim 8 , wherein the first set of random walks and the second set of random walks are performed in parallel using two or more threads using respective random seeds.
15 . One or more non-transitory computer-readable media storing instructions that, when executed by one or more data processing apparatus, cause the one or more data processing apparatus to perform operations comprising: receiving, from a user device, a user query corresponding to a primary node in a computational graph; processing the received user query to determine one or more listing nodes in the computational graph for the user query, the processing comprising: determining a first set of random walks starting at the primary node according to a policy trained using reinforcement learning, wherein the primary node is determined based on text of the user query, determining a second set of random walks starting at one or more secondary nodes in the computational graph according to the policy, wherein each of the one or more secondary nodes is determined based on context information associated with one or more user activities and is different from the primary node, generating a final score for each of a first set of listing nodes in the computational graph reached by the first set of random walks based on outputs of the first set of random walks and the second set of random walks, and identifying the one or more listing nodes from the first set of listing nodes based on the final scores; and providing, to the user device, data associated with one or more listing nodes that correspond to the user query.
16 . The one or more non-transitory computer-readable media of claim 15 , wherein generating the final score for each of the first set of listing nodes in the computational graph comprises: determining the first set of listing nodes in the computational graph reached by the first set of random walks from the primary node according to the policy; determining a second set of listing nodes in the computational graph reached by the second set of random walks from the one or more secondary nodes according to the policy; and generating the final score for each of the first set of listing nodes by updating a score for one or more listing nodes in the first set of listing nodes according to the second set of listing nodes.
17 . The one or more non-transitory computer-readable media of claim 15 , wherein updating the score for the one or more listing nodes in the first set of listing nodes comprises: for each listing node in the first set of listing nodes that is also in the second set of listing nodes, updating the score for the listing node in the first set of listing nodes by adding a modification value.
18 . The one or more non-transitory computer-readable media of claim 17 , wherein the modification value is computed based on (i) a value proportional to a total count of reaches at the listing node in the second set of random walks and (ii) a boost value.
19 . The one or more non-transitory computer-readable media of claim 18 , wherein the boost value includes a range from zero to one.
20 . The one or more non-transitory computer-readable media of claim 17 , wherein the modification value is computed using an exponentiation function with a base proportional to a total count of reaches at the listing node in the second set of random walks and a pre-determined power value.

Description

BACKGROUND This specification relates to processing a user query for information, e.g., on an exchange platform, and in particular, relates to processing the user query along a computational graph using reinforcement learning. An example exchange platform enables the exchange of goods, content, and services between end users and providers. Providers can list or provide their goods, content, and services on the exchange platform, and end users obtain the goods, content, and services from the providers via the exchange platform. Reinforcement learning systems can be deployed in such platforms to facilitate various operations of the platform, including, e.g., search and retrieval of information, e.g., information related to items provided on the platform. In a reinforcement learning system, an agent generally interacts with an environment by performing actions that are selected by the reinforcement learning system in response to receiving observations that characterize the current state of the environment. Some reinforcement learning systems select the action to be performed by the agent in response to receiving a given observation in accordance with the output of a neural network. One example reinforcement learning technique includes random walks. SUMMARY This specification relates to processing a user query on a computing platform along a computational graph with multiple nodes and edges to generate output data for the user query. In particular, the processing described here is in parallel, and the user query is processed using reinforcement learning techniques. One example of reinforcement learning techniques described herein relates to performing multiple random walks starting at respective nodes in the computational graph in parallel. Based on the user query, a system performing the described techniques determines a primary node in the computational graph that corresponds to the user query and determines one or more secondary nodes in the computational graph that corresponds to contextual information related to the user providing the user query. The one or more secondary nodes are different from the primary node. The system performs a first set of random walks starting at the primary node to reach a first set of listing nodes and a second set of random walks, each starting at one of the one or more secondary nodes to teach a second set of listing nodes. Both the first and second sets of random walks are performed according to a policy trained using reinforcement learning techniques over the computational graph. The system determines a final score for each of the first set of listing nodes reached via the first set of random walks starting at the primary node based at least on the characteristics of the second set of listing nodes reached via the second set of random walks starting at the one or more secondary nodes. The system determines one or more listing nodes from the first set of listing nodes based on the final scores and provides data associated with the one or more listing nodes as an output for the user query to the user device. Particular embodiments of the subject matter described in this specification can be implemented to realize one or more of the following advantages. For example, the described techniques in this specification improve the accuracy of determining listing nodes for a user query. By including additional random walks from secondary nodes, the system can reach listing nodes via random walks starting at the secondary nodes, thus enabling the system explore possible listing nodes that might be reached via random walks starting at the primary node. In addition, since the secondary nodes are determined based on the contextual information (e.g., data related to user activities), listing nodes reached by random walks from secondary nodes could be used to determine output data that are more customized for users than those reached by primary nodes otherwise determined merely based on user queries. In addition, the techniques described herein can improve the computational efficiency of processing a user query since different sets of random walks starting at different nodes (e.g., primary nodes and secondary nodes) are performed using different threads/cores in parallel. Each thread or core can be initialized using different random seeds to ensure that each random walk result is independent or different from the rest of the random walks. The described multi-thread or multi-core implementation of random walks allows a system to perform ten (10) times more total random walk samplings within a given latency than serial operations. The details of one or more embodiments of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of an exa