CN-122029535-A - Query routing in a generative machine learning model using self-jeopardy

CN122029535ACN 122029535 ACN122029535 ACN 122029535ACN-122029535-A

Abstract

The present specification describes methods performed by one or more data processing apparatus. The method includes obtaining a query and obtaining a subset of context data from context data related to the query. The method further includes processing the query and the subset of context data using a generative machine learning model to generate response data related to the query, or to generate an indication that the query cannot be satisfied using the subset of context data. In response to determining that the query cannot be satisfied using the subset of the context data, the method includes processing the query and context data related to the query using a generative machine learning model to generate response data.

Inventors

LI ZHUOWAN
LI CHENG
MINGYANG ZHANG
Michael Benderski
Qiaozhu plum blossom

Assignees

GDM控股有限责任公司

Dates

Publication Date: 20260512
Application Date: 20250718
Priority Date: 20240718

Claims (20)

1. A method performed by one or more data processing devices, the method comprising: Obtaining a query; Obtaining a subset of context data from the context data related to the query; Processing the subset of the query and context data using a generative machine learning model to generate response data related to the query, or to generate an indication that the query cannot be satisfied using the subset of context data; in response to determining that the query cannot be satisfied using the subset of context data, the query and the context data related to the query are processed using the generated machine learning model to generate the response data.
2. The method of claim 1, wherein processing the subset of the query and context data using the generated machine learning model comprises: The generated machine learning model is prompted to predict whether the subset of usage context data is able to satisfy the query.
3. The method of claim 1 or 2, wherein obtaining the subset of context data from context data related to the query comprises selecting the subset of context data from the context data related to the query.
4. The method of claim 3, wherein selecting the subset of context data is performed by a retriever machine learning model.
5. The method of claim 3 or 4, wherein selecting the subset of context data is based on evaluating a distance metric between the query and a plurality of candidate chunks extracted from the context data related to the query.
6. The method of claim 5, wherein evaluating the distance metric between the query and the plurality of candidate chunks extracted from the context data related to the query comprises: Generating a query embedding according to the query; generating a respective chunk embedding for each candidate chunk of the plurality of candidate chunks, and The distance metric between the query embedding and the respective chunk embedding is evaluated.
7. The method of claim 6, wherein the distance metric is based on cosine similarity.
8. The method according to claim 6 or 7, Wherein the query embedding is generated by a first encoder; Wherein the chunk embedding is generated by a second encoder, and Wherein the first encoder and the second encoder are the same encoder, or wherein the first encoder and the second encoder are different encoders but are jointly trained.
9. The method of any of claims 5-8, wherein selecting the subset of context data based on evaluating the distance metric between the query and the plurality of candidate chunks comprises: ranking the plurality of candidate chunks based on the distance metric evaluation, and K highest ranked candidate chunks are selected as the subset of context data.
10. The method of claim 9, further comprising: The selected k highest ranked chunks are concatenated in rank order to form the subset of context data.
11. The method of claim 9 or 10, wherein the value of k is determined based on the query.
12. The method of claim 11, wherein the value of k is determined based on a measure of query type or query complexity.
13. The method of claim 11 or 12, wherein the value of k is determined based on processing the query by the generative machine learning model to predict the value of k.
14. The method of claim 11 or 12, wherein the value of k is determined based on processing the query by a machine learning model different from the generated machine learning model to predict the value of k.
15. The method of any one of claims 9 to 14, wherein k is an integer in the range of 1 to 5 inclusive.
16. The method of any of claims 5 to 15, wherein the plurality of candidate chunks is generated based on a chunk size of approximately 300 words.
17. The method of any preceding claim, wherein the generated machine learning model is a large language model LLM-based machine learning model.
18. The method of any preceding claim, wherein processing the subset of the query and context data to generate response data using the generated machine learning model comprises: generating an indication that the subset of usage context data may only partially satisfy the query, and The following operations were repeated: obtaining a further subset of the context data, and The query, the further subset of context data, and a previously obtained subset of context data are processed using the generative machine learning model until either response data is generated or an indication is generated that the query cannot be satisfied.
19. The method of claim 18, wherein the indication that the query is only partially satisfied comprises a reformulated query, and wherein the further subset of context data is obtained based on the reformulated query.
20. The method of claim 18 or 19, wherein the indication that the query can only be partially satisfied includes a predicted size of additional context data necessary to satisfy the query, and wherein the further subset of context data is obtained based on the predicted size.

Description

Query routing in a generative machine learning model using self-jeopardy Background The present description relates to processing data using a machine learning model. The machine learning model receives input and generates an output, such as a predicted output, based on the received input. Some machine learning models are parametric models and generate an output based on received inputs and values of model parameters. Some machine learning models are depth models that employ a multi-layer model to generate an output for a received input. For example, a deep neural network is a deep machine learning model that includes an output layer and one or more hidden layers, each hidden layer applying a nonlinear transformation to a received input to generate an output. Disclosure of Invention According to a first aspect, a method performed by one or more data processing devices is provided. The method includes obtaining a query and obtaining a subset of context data from context data related to the query. The method further includes processing the query and the subset of context data using a generative machine learning model to generate response data related to the query, or to generate an indication that the query cannot be satisfied using the subset of context data. In response to determining that the query cannot be satisfied using the subset of the context data, the method includes processing the query and context data related to the query using a generative machine learning model to generate response data. In some implementations, processing the query and the subset of the context data using the generative machine learning model includes prompting the generative machine learning model to predict whether the query can be satisfied using the subset of the context data. For example, a generative machine learning model may be prompted to generate a response to the query if the model predicts that the query may be satisfied using a subset of the context data, or to provide output text indicating that the query cannot be satisfied using the subset of the context data. In some implementations, obtaining the subset of context data from the context data related to the query includes selecting the subset of context data from the context data related to the query. In some implementations, selecting the subset of the context data is performed by a retriever machine learning model. The retriever machine learning model is one type of machine learning model that is trained to select one or more of the most relevant extraction results from the context data for a given query. The methods described herein may be used in conjunction with an off-the-shelf retriever machine learning model. Examples of suitable retriever machine learning models include "Contriever", details of which can be found in Izacard, gautier et al, "Unsupervised dense information RETRIEVAL WITH contrastive learning (non-supervised dense information retrieval using contrast learning)", arXiv:2112.09118 (2021), which is hereby incorporated by reference in its entirety, and "DRAGON", details of which can be found in Lin, eng-Chieh et al, "How to train your DRAGON: diverse augmentation towards generalizable DENSE RETRIEVAL (how to train your DRAGON: diversification enhancement for generalizable dense retrieval)", arXiv:2302.07452 (2023), which is hereby incorporated by reference in its entirety. In some implementations, selecting the subset of context data is based on evaluating a distance metric between the query and a plurality of candidate chunks extracted from the context data related to the query. Any suitable method may be used to generate the plurality of candidate chunks. For example, the context data may be divided into overlapping or non-overlapping chunks of uniform size. In some implementations, for text data, the chunk size is or is approximately 300 words (or lemmas). In some implementations, evaluating a distance metric between the query and a plurality of candidate chunks extracted from context data related to the query includes generating a query embedment from the query, generating a respective chunk embedment for each candidate chunk of the plurality of candidate chunks, and evaluating the distance metric between the query embedment and the respective chunk embedment. In some implementations, the embedding may be generated using a retriever machine learning model. In some implementations, the query embedding is generated by the first encoder. In some implementations, the chunk embedding is generated by the second encoder. In some implementations, the first encoder and the second encoder are the same encoder. In other implementations, the first encoder and the second encoder are different encoders, but are jointly trained. In some implementations, the retriever machine learning model includes a first encoder and a second encoder. The distance measure may be any suitable distance measure, such as a cosine distance. In some implementations, the ret