Search

US-20260127164-A1 - USER QUERY TO DATA QUERY TRANSFORMATION WITH KNOWLEDGE GRAPH RETRIEVAL-AUGMENTED GENERATION

US20260127164A1US 20260127164 A1US20260127164 A1US 20260127164A1US-20260127164-A1

Abstract

Methods, systems, and computer-readable storage media for receiving a user query provided in natural language and requesting data stored in a resource according to a data schema, determining a set of relationship and context data represented in a knowledge graph, the set of data schema and context data determined to be relevant to the user query from a super-set of data schema and context data stored in a graph database, generating a prompt using the user query and the set of data schema and context data, receiving a data query from a LLM that is responsive to the prompt, the data query being in a structured format, processing the data query to provide a query result, and displaying the query result to a user.

Inventors

  • Yonggang Xie
  • Tao Bai

Assignees

  • SAP SE

Dates

Publication Date
20260507
Application Date
20241104

Claims (20)

  1. 1 . A computer-implemented method for querying resources using structured queries generated by large language models (LLMs), the method being executed by one or more processors and comprising: receiving a user query provided in natural language and requesting data stored in a resource according to a data schema; determining a set of data schema and context data represented in a knowledge graph stored in a graph database, the knowledge graph comprising multi-relational graphs using triples, each triple having a vector generated therefor, the set of data schema and context data being determined by querying the graph database and are relevant to the user query from a super-set of data schema and context data stored in the knowledge graph within the graph database; generating a prompt using the user query and the set of data schema and context data; receiving a data query from a LLM that is responsive to the prompt, the data query being in a structured format; processing the data query to provide a query result; and displaying the query result to a user.
  2. 2 . The method of claim 1 , wherein generating a prompt using the user query and the set of data schema and context data comprises: providing a query vector based on the user query; identifying a sub-set of vectors from a set of vectors by comparing vectors in a set of vectors to the query vector; and retrieving the set of data schema and context data from a database, the set of data schema and context data being associated with vectors in the sub-set of vectors.
  3. 3 . The method of claim 1 , wherein the set of data schema and context data is at least partially provided from the knowledge graph that represents the data schema and descriptions of entities.
  4. 4 . The method of claim 3 , wherein the knowledge graph is at least partially built by extracting description information from an enterprise data graph and parsing metadata represented in a set of data schema files.
  5. 5 . The method of claim 1 , wherein the set of data schema and context data comprises hierarchical relationships between data objects that are to be queried and contextual information descriptive of properties of the data objects.
  6. 6 . The method of claim 1 , wherein the data query comprises a set of filter conditions and a uniform resource locator (URL) for the resource.
  7. 7 . The method of claim 1 , wherein the structured format comprises Javascript object notation (JSON).
  8. 8 . A non-transitory computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations for querying resources using structured queries generated by large language models (LLMs), the operations comprising: receiving a user query provided in natural language and requesting data stored in a resource according to a data schema; determining a set of data schema and context data represented in a knowledge graph stored in a graph database, the knowledge graph comprising multi-relational graphs using triples, each triple having a vector generated therefor, the set of data schema and context data being determined by querying the graph database and are relevant to the user query from a super-set of data schema and context data stored in the knowledge graph within the graph database; generating a prompt using the user query and the set of data schema and context data; receiving a data query from a LLM that is responsive to the prompt, the data query being in a structured format; processing the data query to provide a query result; and displaying the query result to a user.
  9. 9 . The non-transitory computer-readable storage medium of claim 8 , wherein generating a prompt using the user query and the set of data schema and context data comprises: providing a query vector based on the user query; identifying a sub-set of vectors from a set of vectors by comparing vectors in a set of vectors to the query vector; and retrieving the set of data schema and context data from a database, the set of data schema and context data being associated with vectors in the sub-set of vectors.
  10. 10 . The non-transitory computer-readable storage medium of claim 8 , wherein the set of data schema and context data is at least partially provided from the knowledge graph that represents the data schema and descriptions of entities.
  11. 11 . The non-transitory computer-readable storage medium of claim 10 , wherein the knowledge graph is at least partially built by extracting description information from an enterprise data graph and parsing metadata represented in a set of data schema files.
  12. 12 . The non-transitory computer-readable storage medium of claim 8 , wherein the set of data schema and context data comprises hierarchical relationships between data objects that are to be queried and contextual information descriptive of properties of the data objects.
  13. 13 . The non-transitory computer-readable storage medium of claim 8 , wherein the data query comprises a set of filter conditions and a uniform resource locator (URL) for the resource.
  14. 14 . The non-transitory computer-readable storage medium of claim 8 , wherein the structured format comprises Javascript object notation (JSON).
  15. 15 . A system, comprising: a computing device; and a computer-readable storage device coupled to the computing device and having instructions stored thereon which, when executed by the computing device, cause the computing device to perform operations for querying resources using structured queries generated by large language models (LLMs), the operations comprising: receiving a user query provided in natural language and requesting data stored in a resource according to a data schema; determining a set of data schema and context data represented in a knowledge graph stored in a graph database, the knowledge graph comprising multi-relational graphs using triples, each triple having a vector generated therefor, the set of data schema and context data being determined by querying the graph database and are relevant to the user query from a super-set of data schema and context data stored in the knowledge graph within the graph database; generating a prompt using the user query and the set of data schema and context data; receiving a data query from a LLM that is responsive to the prompt, the data query being in a structured format; processing the data query to provide a query result; and displaying the query result to a user.
  16. 16 . The system of claim 15 , wherein generating a prompt using the user query and the set of data schema and context data comprises: providing a query vector based on the user query; identifying a sub-set of vectors from a set of vectors by comparing vectors in a set of vectors to the query vector; and retrieving the set of data schema and context data from a database, the set of data schema and context data being associated with vectors in the sub-set of vectors.
  17. 17 . The system of claim 15 , wherein the set of data schema and context data is at least partially provided from the knowledge graph that represents the data schema and descriptions of entities.
  18. 18 . The system of claim 17 , wherein the knowledge graph is at least partially built by extracting description information from an enterprise data graph and parsing metadata represented in a set of data schema files.
  19. 19 . The system of claim 15 , wherein the set of data schema and context data comprises hierarchical relationships between data objects that are to be queried and contextual information descriptive of properties of the data objects.
  20. 20 . The system of claim 15 , wherein the data query comprises a set of filter conditions and a uniform resource locator (URL) for the resource.

Description

BACKGROUND Entities, such as commercial enterprises, use software systems to conduct operations. Example software systems can include, without limitation, enterprise resource management (ERP) systems, customer relationship management (CRM) systems, human capital management (HCM) systems, and the like. Enterprises continuously seek to improve and gain efficiencies in their operations. To this end, enterprises integrate systems in the domain of so-called intelligent enterprise, which can employ artificial intelligence (AI) that can include, for example, machine learning (ML) models. For example, AI can be used for data analytics and/or automating tasks in support of enterprise operations. AI, however, presents technical hurdles and risks that need to be mitigated in use by enterprises. SUMMARY Implementations of the present disclosure are directed to a query processing system that leverages a large language model (LLM) to convert user queries to data queries. More particularly, implementations of the present disclosure are directed to a query processing system that includes a knowledge graph for retrieval-augmented generation (RAG) in leveraging a LLM to convert user queries to data queries. In some implementations, actions include receiving a user query provided in natural language and requesting data stored in a resource according to a data schema, determining a set of relationship and context data represented in a knowledge graph, the set of data schema and context data determined to be relevant to the user query from a super-set of data schema and context data stored in a graph database, generating a prompt using the user query and the set of data schema and context data, receiving a data query from a LLM that is responsive to the prompt, the data query being in a structured format, processing the data query to provide a query result, and displaying the query result to a user. Other implementations of this aspect include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices. These and other implementations can each optionally include one or more of the following features: a generating a prompt using the user query and the set of data schema and context data includes providing a query vector based on the user query, identifying a sub-set of vectors from a set of vectors by comparing vectors in a set of vectors to the query vector, and retrieving the set of data schema and context data from a database, the set of data schema and context data being associated with vectors in the sub-set of vectors; the set of data schema and context data is at least partially provided from a knowledge graph that represents the data schema and descriptions of entities; the knowledge graph is at least partially built by extracting description information from an enterprise data graph and parsing metadata represented in a set of data schema files; the set of data schema and context data includes hierarchical relationships between data objects that are to be queried and contextual information descriptive of properties of the data objects; the data query comprises a set of filter conditions and a uniform resource locator (URL) for the resource; and the structured format includes Javascript object notation (JSON). The present disclosure also provides a computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations in accordance with implementations of the methods provided herein. The present disclosure further provides a system for implementing the methods provided herein. The system includes one or more processors, and a computer-readable storage medium coupled to the one or more processors having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations in accordance with implementations of the methods provided herein. It is appreciated that methods in accordance with the present disclosure can include any combination of the aspects and features described herein. That is, methods in accordance with the present disclosure are not limited to the combinations of aspects and features specifically described herein, but also include any combination of the aspects and features provided. The details of one or more implementations of the present disclosure are set forth in the accompanying drawings and the description below. Other features and advantages of the present disclosure will be apparent from the description and drawings, and from the claims. DESCRIPTION OF DRAWINGS FIG. 1 depicts an example architecture that can be used to execute implementations of the present disclosure. FIG. 2 depicts an example conceptual architecture in accordance with implementations of the present disclosure. FIG. 3 depicts an example workflow