Search

US-20260127203-A1 - SYSTEM AND METHODS FOR A NATURAL-LANGUAGE DATABASE INTERFACE PROVIDING A DETERMINISTIC OUTPUT

US20260127203A1US 20260127203 A1US20260127203 A1US 20260127203A1US-20260127203-A1

Abstract

A system and method are disclosed for interfacing with one or more databases using natural-language queries. The system translates a natural-language input into an intermediate formal representation, such as a Concept Query Language (CQL), which references domain concepts rather than database-specific structures. A data access subsystem maps these domain concepts to database-specific queries using a domain dictionary, enabling seamless translation across heterogeneous databases and database management systems (DBMSs). The system supports distributed data retrieval, error recovery, and dynamic query planning. A presentation subsystem formats the results into user-friendly outputs such as charts or tables. The architecture allows for modular grammar and dictionary configuration, enabling rapid adaptation to new domains, schemas, or user roles without procedural code changes. This approach improves accessibility, maintainability, and scalability of database interactions by abstracting technical complexity from end users.

Inventors

  • Earl David Sacerdoti
  • Muhammad Qasim
  • Nabia Mansoor Feroze
  • Sarah Mohrle

Assignees

  • Quarrio Corporation

Dates

Publication Date
20260507
Application Date
20251105

Claims (20)

  1. 1 . A method of responding to a natural-language query regarding information derived from one or more databases comprising: determining, by a session initiation subsystem on a computer system, one or more domains related to a particular sphere of business, activity, or knowledge; loading, into a memory by a grammar loader on the computer system, one or more grammar files corresponding to a domain-specific grammar corresponding to the determined one or more domains; transforming, by the grammar loader on the computer system, the grammar files into a specification of a grammar; loading, by a domain dictionary loader into a memory of the computer system, one or more domain dictionary files corresponding to a domain-specific domain dictionary corresponding to the determined one or more domains; transforming, by the domain dictionary loader on the computer system, the domain dictionary files into a specification of a domain dictionary; receiving, by a parser on the computer system, the specification of the grammar; receiving, by a language processing subsystem on the computer system, a natural-language query from a user interface; parsing, by the parser of the computer system, the natural-language query; identifying, by the language processing subsystem of the computer system using the domain-specific grammar, references to one or more domain concepts or sub-concepts within said natural-language query; determining, by the language processing subsystem of the computer system, one or more relationships among the identified references to the one or more domain concepts or sub-concepts; transforming, by the computer system, the one or more domain concepts or sub-concepts to one or more specific database fields or portions thereof using the domain dictionary; generating, by the language processing subsystem of the computer system, a Concept Query Language (“CQL”) query comprising the one or more domain concepts, said query representing the natural-language query; generating, by a data access subsystem of the computer system, one or more database queries from the CQL query; executing, by the data access subsystem of computer system, the one or more database queries against one or more data sources to retrieve data; generating, by the data access subsystem of computer system, a processed dataset based on the retrieved data; generating, by the computer system, one or more operator-viewable presentations based on the processed dataset; and presenting, by the computer system, the one or more operator-viewable presentations to the user interface.
  2. 2 . The method of claim 1 comprising: generating one or more additional database queries based on the retrieved data.
  3. 3 . The method of claim 2 , wherein the one or more database queries are directed to multiple databases.
  4. 4 . The method of claim 3 , wherein the multiple databases are processed by a plurality of DBMSs.
  5. 5 . The method of claim 1 further wherein the one or more domains comprises more than one domain.
  6. 6 . The method of claim 1 further wherein the one or more identified domain concepts refer to one or more other domain concepts as parameters.
  7. 7 . The method of claim 1 further comprising: generating, by the computer system, one or more clarification questions; presenting, by the computer system, the one or more clarification questions to the user interface; and receiving, by the computer system, one or more responses to the one or more clarification questions.
  8. 8 . The method of claim 1 further comprising: receiving, by the language processing subsystem on the computer system, a second natural-language query from the user interface; identifying, by the parser on the computer system, one or more ellipses corresponding to one or more missing words in the second natural-language query; replacing, by the parser on the computer system, the one or more ellipses based on context established by one or more previous natural-language queries.
  9. 9 . The method of claim 1 further comprising: generating, by the language processing subsystem of the computer system, a paraphrase of the natural-language query; and displaying, by the computer system, the paraphrase to the user interface.
  10. 10 . The method of claim 9 , wherein the paraphrase comprises a restatement of the CQL query in response to submission of the natural-language query.
  11. 11 . A method comprising: determining, by a computer system, a domain related to a particular sphere of business, activity, or knowledge; generating, by the computer system, a specification of a domain-specific grammar from one or more grammar files or sub-grammar files; generating, by the computer system, a specification of a domain dictionary from one or more domain dictionary files; receiving, by a language processing subsystem on the computer system, a natural-language query from a user interface; parsing, by a parser of the computer system, the natural-language query; identifying, by the language processing subsystem using the domain-specific grammar, one or more domain concepts and sub-concepts related to the natural-language query; determining, by the language processing subsystem, one or more relationships between the domain concepts and sub-concepts; generating, by the language processing subsystem, a structure-independent query representing a meaning of the natural-language query; transforming, with the computer system, the identified domain concepts into one or more specific database fields or portions thereof using one or more domain dictionaries; generating, with the computer system, a sequence of one or more database queries referencing the one or more specific database fields; querying, with the computer system using the one or more database queries, a database management system; receiving, by the computer system, a raw dataset in response to querying the database; generating, with the computer system, a processed dataset based on the dataset returned by the database management system responsive to the one or more database queries; and presenting, with the computer system, a presentation based on the processed dataset to the user interface.
  12. 12 . The method of claim 11 further comprising: extracting schema information responsive to the domain dictionary; and generating a field translation structure, wherein the field translation structure maps domain concepts to respective database schema metadata.
  13. 13 . The method of claim 11 , wherein the one or more database queries comprise multiple database queries.
  14. 14 . The method of claim 13 , wherein the one or more database queries are directed to a plurality of databases.
  15. 15 . The method of claim 11 further comprising: determining, by the computer system, a preferred visualization type for the processed dataset.
  16. 16 . The method of claim 15 , wherein determining the preferred visualization type further comprises: analyzing characteristics of the processed dataset, including data distribution, cardinality, and relationships between concepts; selecting, by the computer system, at least one visualization type from a group comprising pie chart, line graph, bar graph, or scatterplot based on the analysis; and automatically configuring visualization parameters, including axis labels, color schemes, and legend placement, such that the resulting visualization represents underlying data trends and relationships for presentation to the user interface.
  17. 17 . The method of claim 11 further comprising: receiving, by the computer system, one or more clarification questions; generating, by the computer system, instructions for displaying the one or more clarification questions; and receiving, by the computer system, one or more responses to the one or more clarification questions.
  18. 18 . The method of claim 11 further comprising: receiving, by the computer system, a second natural-language query from the user interface; identifying, by the computer system, one or more ellipses corresponding to one or more missing words in the second natural-language query; replacing, by the computer system, the one or more ellipses based on context established by one or more previous natural-language queries.
  19. 19 . The method of claim 11 further comprising: generating, by the computer system, a paraphrase of the natural-language query; and displaying, by the computer system, the paraphrase to the user interface.
  20. 20 . A tangible non-transitory storage medium containing instructions for execution by, or to control operation of, data processing apparatus, the instructions comprising steps comprising: determining a domain related to a particular sphere of business, activity, or knowledge to be processed by the data processing apparatus; generating a specification of a grammar from one or more grammar files or sub-grammar files; generating a specification of a domain dictionary from one or more domain dictionary files; receiving a natural-language query from a user interface; parsing the natural-language query; identifying one or more domain concepts and sub-concepts related to the natural-language query; determining one or more relationships between the domain concepts and sub-concepts; generating a structure-independent query representing a meaning of the natural-language query; mapping the identified domain concepts to all or portions of one or more specific database fields using the domain dictionary; generating a sequence of one or more database queries using the one or more mapped specific database fields; querying, with the computer system using the one or more database queries, a database management system; receiving, with the computer system, a raw dataset in response to querying the database; generating a processed dataset based on the data received responsive to the one or more database queries; and presenting the processed dataset to the user interface.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of priority under 35 U.S.C. § 119(e) to U.S. Provisional Application No. 63/717,003, filed on Nov. 6, 2024, the entire disclosure of which is hereby incorporated by reference. FIELD OF THE INVENTION This Application generally describes techniques relating to an improved database interface. BACKGROUND In modern enterprises, information is often siloed across different departments and systems, making it difficult to access and utilize data effectively. Traditional methods of data analysis and information retrieval, such as spreadsheets and Business Intelligence (BI) software, require specialized skills and are often bottlenecks in the decision-making process. While relatively large amounts of data can be maintained in a computer system having one or more data structures, such as either relational database structures or other structures such as spreadsheets, retrieving useful information from database structures can sometimes be difficult. A first method is to query a database structure using a database management system, using an applicable query language such as SQL (“structured query language”). While this first method can retrieve information desired by the operator (or other controlling element, as described herein), it has the drawback that to use it, operators (or developers of a database interface) should be familiar: with the database schema, with how to code in SQL or another applicable query language, and with how to interface with the database management system. A second method is to attempt to translate requests made in a different format (such as natural language requests) directly into SQL or another applicable database query language. While this second method can also retrieve information desired by the operator, it is subject to drawbacks. First, the translation device should be familiar with the structure of the database (including information about the types of each field or other element of the database and how those elements are composed into aggregated structures such as tables, and the logical relationships among the elements, such as might be expressed in the database schema), with the domain concepts that operators might express, with how those domain concepts map to and from the values of database elements, and with how to interface with the database management system. Domain concepts can be affiliated with a specific domain, e.g., a specified sphere of business, activity, or knowledge, such as cellular phones, semiconductor fabrication, or sales of specific products or services. Second, when it is desired to use this second method with a different database schema, or with different domain concepts that operators might express, with different ways those domain concepts map to values of database elements, or with a different database management system (such as one that uses a different query language), it might be necessary or desirable to redesign or reimplement significant portions of the translation algorithm. For example, it might be necessary or desirable to rewrite large portions of the translation device to provide translation of natural language queries into a different database query language or into queries for a database having a different schema. For one example, the operator might request information from the database for a particular “calendar quarter”. In such cases, the translation device should be familiar with how the database maintains information about events and about their times and durations. In one such case, if each event is associated with a field that contains a particular date formatted as a string (such as “Mar. 3, 1953”), the translation device should know how to extract the month from that date string, and should know how to associate selected months with the calendar quarter to which they belong. In another such case, if each event is associated with a field that contains a particular date-time group formatted as a string (such as “19530303021506”), the translation device should know how to compare that timestamp with the earliest and latest times of each quarter, to determine to which calendar quarter that event belongs. For another example, the operator might request information to be determined in response to the values maintained in one or more database fields. In one such case, the operator might request “contact information” (which the operator might intend to be an aggregation of multiple fields: phone number, email address, or otherwise). In another such case, the operator might request objects “within 50 miles” of a selected object (which might require determining the position of the selected object, determining the position of other objects, and calculation of distances between objects, each possibly in response to the values maintained in or derived from one or more database fields). Each of these issues, as well as other possible considerations, might cause difficulty