US-12626068-B2 - Large language model output generation using data graphs

US12626068B2US 12626068 B2US12626068 B2US 12626068B2US-12626068-B2

Abstract

Systems and methods for generating output data based on a data graph are provided. An output request for the output data based on the data graph is received. The output request comprises one of a natural language request from a target user or an application request from an application of the target user. The data graph has nodes and edges between the nodes. The nodes represent entities associated with an enterprise organization and the edges represent relationships among the entities. A graph data query is generated with a large language model (LLM) using the output request as a first input to the LLM. The graph data query is performed against the data graph to obtain a graph data output that represents a sub-portion of the data graph. The output data is generated with the LLM using the graph data output as a second input to the LLM.

Inventors

Vipindeep Vangala
Rajeev Gupta
Madhusudhanan Krishnamoorthy

Assignees

MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date: 20260512
Application Date: 20230615

Claims (20)

1 . A computer-implemented method of generating output data based on a data graph, the method comprising: receiving an output request for the output data based on the data graph, the output request comprising one of a natural language request from a target user or an application request from an application of the target user, the data graph having nodes and edges between the nodes, the nodes representing entities associated with an enterprise organization, and the edges representing relationships among the entities; generating a graph data query with a large language model (LLM) using the output request as a first input to the LLM; performing the graph data query against the data graph to obtain a graph data output that represents a sub-portion of the data graph; and generating the output data with the LLM using the graph data output as a second input to the LLM.
2 . The computer-implemented method of claim 1 , wherein: the graph data output represents one or more nodes from the data graph; the data graph is a heterogenous graph having nodes with different types; the entities include one or more of users, documents, emails, meetings, and conversations; and the relationships include one or more of document authorship, document modification, document sharing, meeting invites, linked data between documents, email sending, and email replying.
3 . The computer-implemented method of claim 2 , wherein generating the output data with the LLM comprises converting the graph data output to a text data format that is readable by the LLM.
4 . The computer-implemented method of claim 3 , wherein generating the output data with the LLM comprises providing the converted graph data output, as the second input, with one or more documents corresponding to the one or more nodes as a third input, to the LLM to generate the output data.
5 . The computer-implemented method of claim 4 , wherein the converted graph data output represents weights for the one or more documents according to the target user; and generating the output data comprises generating a weighted summary of content of the one or more documents according to the weights.
6 . The computer-implemented method of claim 5 , wherein the weights for the one or more documents comprise respective numbers of LLM tokens to be used for generating the weighted summary by the LLM.
7 . The computer-implemented method of claim 4 , wherein: the graph data output represents a plurality of nodes from the data graph, the plurality of nodes comprises the one or more nodes, and the converted graph data output represents a user context for the target user; generating the output data comprises selecting the one or more nodes as a subset of the plurality of nodes for the output data according to the user context for the target user; and wherein generating the output data comprises generating a summary of content of the one or more documents, the converted graph data output representing weights for the one or more documents according to the target user.
8 . The computer-implemented method of claim 1 , wherein: the output request comprises a request for nodes of the data graph that are related to the graph data query; and generating the graph data query with the LLM comprises providing an extraction prompt to the LLM, the extraction prompt comprising syntax examples for the LLM to extract graph data outputs from the data graph.
9 . A system for generating output data based on a data graph, the system comprising: at least one processor, and at least one memory storing computer-executable instructions that when executed by the at least one processor cause the at least one processor to: receive an output request for the output data based on the data graph, the output request comprising one of a natural language request from a target user or an application request from an application of the target user, the data graph having nodes and edges between the nodes, the nodes representing entities associated with an enterprise organization, and the edges representing relationships among the entities; generate a graph data query with a large language model (LLM) using the output request as a first input to the LLM; perform the graph data query against the data graph to obtain a graph data output that represents a sub-portion of the data graph; and generate the output data with the LLM using the graph data output as a second input to the LLM.
10 . The system of claim 9 , wherein: the graph data output represents one or more nodes from the data graph; the data graph is a heterogenous graph having nodes with different types; the entities include one or more of users, documents, emails, meetings, and conversations; and the relationships include one or more of document authorship, document modification, document sharing, meeting invites, linked data between documents, email sending, and email replying.
11 . The system of claim 10 , wherein the computer-executable instructions cause the at least one processor to generate the output data with the LLM comprises converting the graph data output to a text data format that is readable by the LLM.
12 . The system of claim 11 , wherein the computer-executable instructions cause the at least one processor to provide the converted graph data output, as the second input, with one or more documents corresponding to the one or more nodes as a third input, to the LLM to generate the output data.
13 . The system of claim 12 , wherein the converted graph data output represents weights for the one or more documents according to the target user; and the computer-executable instructions cause the at least one processor to generate a weighted summary of content of the one or more documents according to the weights.
14 . The computer-implemented method of claim 1 , wherein the LLM is trained using a training set derived from a plurality of graph data outputs converted into a text format readable by the LLM, and providing the converted outputs as input to the LLM during training.
15 . The computer-implemented method of claim 14 , wherein the plurality of graph data outputs are generated from a training graph, the training graph comprising nodes representing entities associated with an enterprise organization and edges representing relationships among the entities.
16 . The computer-implemented method of claim 14 , further comprising providing an extraction prompt to the LLM, the extraction prompt comprising a syntax example for the LLM to extract a second graph data output from the data graph, and providing the second graph data output as the output data.
17 . The system of claim 9 , wherein the LLM is trained using a training set derived from a plurality of graph data outputs converted into a text format readable by the LLM, and providing the converted outputs as input to the LLM during training.
18 . The system of claim 17 , wherein the plurality of graph data outputs are generated from a training graph, the training graph comprising nodes representing entities associated with an enterprise organization and edges representing relationships among the entities.
19 . The system of claim 17 , further comprising providing an extraction prompt to the LLM, the extraction prompt comprising a syntax example for the LLM to extract a second graph data output from the data graph, and providing the second graph data output as the output data.
20 . Non-transitory computer storage media having computer-readable instructions embodied thereon that, when executed by at least one processor, perform operations, the operations comprising: receiving an output request for the output data based on the data graph, the output request comprising one of a natural language request from a target user or an application request from an application of the target user, the data graph having nodes and edges between the nodes, the nodes representing entities associated with an enterprise organization, and the edges representing relationships among the entities; generating a graph data query with a large language model (LLM) using the output request as a first input to the LLM; performing the graph data query against the data graph to obtain a graph data output that represents a sub-portion of the data graph; generating the output data with the LLM using the graph data output as a second input to the LLM.

Description

BACKGROUND Enterprise organizations such as businesses with hundreds or thousands of employees may manage large amounts of data for entities associated with the organization, such as various users (e.g., employees), emails sent by the users, documents generated by the users, meetings attended by the users, etc. These entities may have relationships among themselves, for example, a first user (e.g., a first entity) may have an authorship relationship with a document (e.g., a second entity) that the first user generated. Further relationships may be created or modified when the document is shared with a second user of the organization, included in an email message, or referenced within a meeting invite. Knowledge of these relationships may be leveraged to recommend relevant entities to a user when performing some tasks, such as sending an email (e.g., recommendations for documents to be attached) or composing a meeting invite (e.g., recommendations for users to invite). Data for the entities and relationships may be stored as a data graph having nodes representing the entities and edges between nodes representing the relationships. However, creating a suitable query that extracts relevant information from the data graph may be challenging or time consuming for some users. Moreover, the result of a query (e.g., several documents and emails) may be too cumbersome or complex for a user to review when limited time is available, such as during preparations for a meeting that is about to begin. It is with respect to these and other general considerations that embodiments have been described. Also, although relatively specific problems have been discussed, it should be understood that the embodiments should not be limited to solving the specific problems identified in the background. SUMMARY Aspects of the present disclosure are directed to using a data graph as an input to a neural network model, such as a large language model (LLM). In accordance with at least one example of the present disclosure, a computer-implemented method of generating output data based on a data graph is provided. The method includes: receiving an output request for the output data based on the data graph, the output request comprising one of a natural language request from a target user or an application request from an application of the target user, the data graph having nodes and edges between the nodes, the nodes representing entities associated with an enterprise organization, and the edges representing relationships among the entities; generating a graph data query with a large language model (LLM) using the output request as a first input to the LLM; performing the graph data query against the data graph to obtain a graph data output that represents a sub-portion of the data graph; and generating the output data with the LLM using the graph data output as a second input to the LLM. In accordance with at least one example of the present disclosure, a computer-implemented method for training a large language model (LLM) is provided. The method comprises: converting training graph data outputs from a data graph into a text data format that is readable by the LLM, the data graph having nodes and edges between the nodes, the nodes representing entities associated with an enterprise organization, and the edges representing relationships among the entities; generating a training set that comprises the converted training graph data outputs; training the LLM to receive graph data as an input prompt using the training set; and providing an extraction prompt to the LLM, the extraction prompt comprising syntax examples for the LLM to extract second graph data outputs from the data graph. In accordance with at least one example of the present disclosure, a system for generating output data based on a data graph is provided. The system includes at least one processor, and at least one memory storing computer-executable instructions that when executed by the at least one processor cause the at least one processor to: receive an output request for the output data based on the data graph, the output request comprising one of a natural language request from a target user or an application request from an application of the target user, the data graph having nodes and edges between the nodes according to a graph schema, the nodes representing entities associated with an enterprise organization, and the edges representing relationships among the entities; generate a graph data query with a large language model (LLM) using the output request as a first input to the LLM, the graph data query being based on the graph schema; perform the graph data query against the data graph to obtain a graph data output that represents a sub-portion of the data graph; generate the output data with the LLM using the graph data output as a second input to the LLM. This summary is provided to introduce a selection of concepts in a simplified form that are further described