EP-4736026-A1 - DATABASE WITH INTEGRATED GENERATIVE AI
Abstract
A system and method for managing, categorizing and manipulating data within a server environment utilizing a user interface, a generative AI field in database, and database datastores. The generative AI field accepts natural language requests from a user and determines appropriate prompts for a generative AI model. The generated prompt is configured to generate data providing the functionality specified in the natural language request. The functionality may include categorizing data, generating fields in a database, assisting prompt generation and producing functions for further data manipulation.
Inventors
- KEENAN, Sean William
- HONG, JERRY
- LIU, Howard Thomas
- MAGGIO, ANTHONY H.
- CAI, Emily Yiming
Assignees
- Formagrid Inc
Dates
- Publication Date
- 20260506
- Application Date
- 20240627
Claims (20)
- 1. A computer-implemented method for generating data for a base, the computer- implemented method comprising: receiving, by a computing device, a request to generate a set of data providing a functionality in a base comprising structured data, the request comprising a natural language request for the functionality using a least a portion of the structured data of the base; determining, by the computing device, a prompt for a large language model to generate the set of data providing the functionality, wherein the prompt comprising a representation of: the natural language request for the functionality; the portion of the structured data used in the natural language request, a structure of the base; structural relationships in the base; and data types of the structured data in the base; in response to transmitting the prompt to the large language model, receiving, by the computing device, the set of data providing the functionality from the large language model; and transmitting the set of data for display as structured data in the base.
- 2. The computer-implemented method of claim 1, comprising determining, using the prompt at a network system hosting the large language model, the set of data providing the functionality, the large language model configured to: interpret the natural language request for the functionality using at least the structured data, identify contextual relationships between the functionality and one or more of the structure of the structured data, the structural relationships of data in the structured data, and data types of the structured data, and determine the set of data providing the functionality based on the contextual relationships.
- 3. The computer-implemented method of claim 1, further comprising: receiving, by the computing device, a selection of the large language model from a plurality of large language models, wherein the plurality of large language models is provided to a client device generating the prompt; and wherein generating the prompt for the large language model accounts for a configuration of the selected large language model.
- 4. The computer-implemented method of claim 1, wherein the natural language request comprises one or more data objects representing the portion of the structured data of the base.
- 5. The computer-implemented method of claim 1, comprising: receiving, at the computing device, an edit to the set of data displayed as structured data of the base; generating, at the computing device, a flag for the set of data as manipulated data based on the edit; responsive to receiving an additional request to modify the set of data, determining an additional prompt and modifying the set of data based on the generated flag.
- 6. The computer-implemented method of claim 1, wherein the structure of the structured data comprises one or more of: a plurality of elements in the base, each element comprising structured data; a set of rows in the base, the set of rows comprising one or more of the plurality of elements; a set of fields in the base, the set of fields comprising one or more of the plurality of elements; a label for each element of the plurality of elements, each row of the set of rows, and each field of the set of fields; and a size of the base.
- 7. The computer-implemented method of claim 1, wherein the base comprises a plurality of elements in a set of fields and a set of rows, and the structural relationships of data in the structured data comprise one or more of: a dependency of a first field in the set of fields on a second field in the set of fields; a dependency of a first row in the set of rows on a second row in the set of rows; a dependency of a first element of the plurality of elements on a second element of the plurality of elements; and one or more logical functions governing dependencies in the structural data.
- 8. The computer-implemented method of claim 1, wherein the prompt comprises a relationship between the base and one or more additional bases; wherein each of the one more additional bases depend on structured data of the base.
- 9. The computer-implemented method of claim 8, wherein the set of data is propagated to the one or more additional bases that depend on the structured data of the base.
- 10. The computer-implemented method of claim 1, wherein the prompt comprises a representation of a relationship between the base and one or more additional bases; wherein the structured data of the base depend on structured data of the one or more additional bases.
- 11. The computer-implemented method of claim 10, wherein the set of data is propagated to the one or more additional bases that depend on the structured data of the base.
- 12. The computer-implemented method of claim 1, wherein: the base comprises a field, and the request to generate the set of data providing the functionality in the base is received as an input to the field; and the generated prompt is associated with the field.
- 13. The computer-implemented method of claim 12, wherein the received set of data is displayed in the field before being displayed as structured data in the base.
- 14. The computer-implemented method of claim 1, wherein the base comprises a data generation assistant function, and the request to generate the set of data providing the functionality in the base is received at the data generation assistant.
- 15. The computer-implemented method of claim 14, wherein the received set of data is displayed by the data generation assistant function before being displayed as structured data in the base.
- 16. The computer-implemented method of claim 1, wherein the functionality is categorizing data input into the base.
- 17. The computer-implemented method of claim 1, wherein the functionality is generating a function that manipulates a first portion of the structured data in the base based on a second portion of the structured data in the base.
- 18. The computer-implemented method of claim 1, wherein the functionality is translating a first portion of the structured data in the base.
- 19. A system comprising: one or more processors; and a non-transitory computer readable storage medium comprising computer program instructions for generating data for a base, the computer program instructions, when executed by the one or more processors, causing the one or more processors to: receive, at the system, a request to generate a set of data providing a functionality in a base comprising structured data, the request comprising a natural language request for the functionality using a least a portion of the structured data of the base; determine, by the system, a prompt for a large language model to generate the set of data providing the functionality, wherein the prompt comprising a representation of: the natural language request for the functionality; the portion of the structured data used in the natural language request, a structure of the base; structural relationships in the base; and data types of the structured data in the base; in response to transmitting the prompt to the large language model, receive, at the system, the set of data providing the functionality from the large language model; and transmit, to a client device, the set of data for display as structured data in the base.
- 20. A non-transitory computer readable storage medium comprising computer program instructions for generating data for a base, the computer program instructions, when executed by the one or more processors, causing the one or more processors to: receive, at a computing device, a request to generate a set of data providing a functionality in a base comprising structured data, the request comprising a natural language request for the functionality using a least a portion of the structured data of the base; determine, at the computing device, a prompt for a large language model to generate the set of data providing the functionality, wherein the prompt comprising a representation of: the natural language request for the functionality; the portion of the structured data used in the natural language request, a structure of the base; structural relationships in the base; and data types of the structured data in the base; in response to transmitting the prompt to the large language model, receive, at the computing device, the set of data providing the functionality from the large language model; and transmit, to a client device, the set of data for display as structured data in the base.
Description
DATABASE WITH INTEGRATED GENERATIVE Al Inventors: Sean William Keenan Jerry Hong Howard Thomas Liu Anthony H. Maggio Emily Yiming Cai CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims the benefit of U.S. Provisional Application No. 63/523,588, filed June 27, 2023, which is hereby incorporated by reference in their entirety. BACKGROUND 1. TECHNICAL FIELD [0002] The subject matter described relates generally to databases and, in particular, to incorporating generative artificial intelligence (“Al”) into databases. 2. BACKGROUND INFORMATION [0003] Generative Al is a powerful tool, but it can be difficult to integrate into complex, chained workflows. Existing solutions add generative Al in an open-ended fashion through existing formula functions. These solutions, while possible to integrate into existing workflows, do not do so in a structured fashion. Neither the data consumed nor the manner of integration into workflows to solve problems adopt a coherent structural approach. Consequently, it is difficult to configure the generative Al using such approaches making them non-scalable, SUMMARY [0004] A database solution provides a generative Al field type. A generative Al field is able to integrate with a range of other primitives, such as automations, syncing, or interfaces, etc. In various embodiments, a generative Al field can have a prompt building experience guided based on many potential inputs, such one or more of as contextual information from the database, contextual information about the user, or explicit user selection from preconfigured options. The use of context-specific inputs can provide improved results from the generative Al than is achieved with a user-provided prompt alone. [0005] In some embodiments, the prompt building experience includes presenting the user with suggestions on how to build a better prompt by running a generative Al prompt against an initial prompt to offer specific suggestions with contextual information about the database or user. The user may also be able to automatically run multiple generated variations of suggested prompts and compare the results. Human edited content may be captured and used to retrain the generative Al model or improve prompt engineering. For example, the prompt building experience may provide the user with previously human edited example that can be fed into the generative model directly. These human edited examples can be selected using either a generative text model or embedding model to select maximally different examples. Maximally categorically different examples typically result in the best generative text model output. [0006] The capture of human edited content may also allow back testing to be done where (especially with large language models tuned to be fully deterministic) new prompts can be statically analyzed against a user’s data (including human edited content) to ensure it is performing better. Since data is generally “human scale,” adding a human in the loop enable cases where the generative Al generates new or substantively different results to be rated (e.g., highlighting to the user the most different results). [0007] In some embodiments, Al may also be used for database categorization using vector embeddings. This allows for structured databases to automatically have categorizations applied to them. Adding new categories is cheap because computing the initial vector embedding is expensive, but finding the nearest neighbor is cheap. By identifying clusters using K-means clustering or a similar approach, new cluster names can be proposed. In some cases, recommended names may be generated for all clusters (by reversing the most centrally weighted point in each cluster and finding the most closely associated text). Cluster names can also be updated by suggesting new text for clusters that is more central to the existing cluster associated with that text. [0008] In some aspects, the techniques described herein relate to a computer- implemented method for generating data for a base, the computer-implemented method including: receiving, by a computing device, a request to generate a set of data providing a functionality in a base including structured data, the request including a natural language request for the functionality using a least a portion of the structured data of the base; determining, by the computing device, a prompt for a large language model to generate the set of data providing the functionality, wherein the prompt including a representation of: the natural language request for the functionality; the portion of the structured data used in the natural language request, a structure of the base; structural relationships in the base; and data types of the structured data in the base; in response to transmitting the prompt to the large language model, receiving, by the computing device, the set of data providing the functionality from the large language model; and transmitting the set of data for dis