Search

EP-4736023-A1 - ENABLING AN EFFICIENT UNDERSTANDING OF CONTENTS OF A LARGE DOCUMENT WITHOUT STRUCTURING OR CONSUMING THE LARGE DOCUMENT

EP4736023A1EP 4736023 A1EP4736023 A1EP 4736023A1EP-4736023-A1

Abstract

The system obtains a record in a database and a property associated with the record in the database, where the record includes a large document, and where the large document is unstructured or semi-structured. The system receives an input indicating a type of analysis to perform associated with the record and performs, using an artificial intelligence, the analysis associated with the record to obtain an output. The type of analysis to obtain the output includes generating a document describing contents of the record, where the document describing the contents of the record is smaller than the record. The system stores the output as the property in the database and enables access to the database based on the property, thereby enabling an efficient understanding of contents of the document without consuming the document.

Inventors

  • LEE, LINUS
  • LU, HE
  • LI, LU

Assignees

  • Notion Labs, Inc.

Dates

Publication Date
20260506
Application Date
20240627

Claims (20)

  1. 1 . A non-transitory, computer-readable storage medium comprising instructions recorded thereon, wherein the instructions when executed by at least one data processor of a system cause the system to: obtain a record in a database and a property associated with the record in the database, wherein the record includes a large document comprising text, and wherein the large document is unstructured or semi-structured; receive an input indicating a type of analysis to perform associated with the record; perform, using a large language model, the analysis associated with the record to obtain an output, wherein the type of analysis to obtain the output belongs to multiple categories including: summarizing the record to obtain the output including a summary; extracting indicated information from the record to obtain the output including the indicated information; generating a second document based on the large document to obtain the output including the second document, wherein the second document includes contents not included in the large document, and wherein the second document is smaller than the large document; and store the output as the property in the database; and enable access of the database based on the property, thereby enabling an efficient understanding of contents of the large document without structuring the large document or consuming the large document.
  2. 2. The non-transitory, computer-readable storage medium of claim 1 , comprising instructions to: obtain the property, wherein the property is associated with multiple categories, and wherein the property indicates that the output of the analysis falls into one of the multiple categories; receive the input indicating the type of analysis to perform; obtain, from the database, multiple records associated with multiple properties, wherein each property among the multiple properties is generated using the type of analysis; based on the multiple records and the multiple properties, determine a pattern of correspondence between the multiple properties and the multiple records; and based on the pattern of correspondence, determine the output, wherein the output falls into one of the multiple categories.
  3. 3. The non-transitory, computer-readable storage medium of claim 1 , comprising instructions to: obtain multiple records including the record in the database and multiple properties including the property associated with the multiple records in the database, wherein each property among the multiple properties is generated using the type of analysis; receive a search query; search the multiple properties based on the search query to produce a list of relevant results; and provide the list of relevant results.
  4. 4. The non-transitory, computer-readable storage medium of claim 1 , wherein instructions to extract the indicated information from the record to obtain the output comprise instructions to: provide a predetermined category of the type of analysis to perform associated with the record, wherein the predetermined category includes the extracting of indicated information; provide an entry for a natural language input modifying the predetermined category, wherein the natural language input requests information to extract from the record; receive a first input indicating to perform the predetermined category of the analysis and a second input including the natural language input modifying the predetermined category; based on the natural language input and the predetermined category, perform the predetermined category of the analysis; and upon performing the predetermined category of the analysis, produce the output indicating the requested information.
  5. 5. The non-transitory, computer-readable storage medium of claim 1 , comprising instructions to: provide a predetermined category of the type of analysis to perform associated with the record; receive the input indicating to perform the predetermined category of the type of analysis; and perform the predetermined category of the type of analysis.
  6. 6. The non-transitory, computer-readable storage medium of claim 1 , comprising instructions to: determine whether the record in the database is modified after the analysis is performed; upon determining that the record in the database is modified after the analysis is performed, automatically repeat a performance of the analysis to obtain a second output; and store the second output as the property in the database.
  7. 7. The non-transitory, computer-readable storage medium of claim 1 , comprising instructions to: receive an indication to perform the type of analysis on multiple records in the database; obtain from the database the multiple records on which to perform the type of analysis; select a subset of the multiple records, wherein the subset of the multiple records is a representative sample of the multiple records; perform the type of analysis on each record among the subset of the multiple records to obtain multiple outputs, wherein performing the type of analysis in each record among the subset of the multiple records requires less time than performing the type of analysis on each record among the multiple records; and provide the multiple outputs.
  8. 8. A method comprising: obtaining a record in a database and a property associated with the record in the database, wherein the record includes a large document, and wherein the large document is unstructured or semi-structured; receiving an input indicating a type of analysis to perform associated with the record; performing, using an artificial intelligence, the analysis associated with the record to obtain an output, wherein the type of analysis to obtain the output comprises generating a document describing contents of the record, and wherein the document describing the contents of the record is smaller than the record; storing the output as the property in the database; and enabling access to the database based on the property, thereby enabling an efficient understanding of contents of the large document without consuming the large document.
  9. 9. The method of claim 8, comprising: obtaining the property, wherein the property is associated with multiple categories, and wherein the property indicates that the output of the analysis falls into one of the multiple categories; receiving the input indicating the type of analysis to perform; obtaining, from the database, multiple records associated with multiple properties, wherein each property among the multiple properties is generated using the type of analysis; based on the multiple records and the multiple properties, determining a pattern of correspondence between the multiple properties and the multiple records; and based on the pattern of correspondence, determine the output, wherein the output falls into one of the multiple categories.
  10. 10. The method of claim 8, comprising: obtaining multiple records including the record in the database and multiple properties including the property associated with the multiple records in the database, wherein each property among the multiple properties is generated using the type of analysis; receiving a search query; searching the multiple properties based on the search query to produce a list of relevant results; and providing the list of relevant results.
  11. 1 1 . The method of claim 8, wherein extracting indicated information from the record to obtain the output comprises: providing a predetermined category of the type of analysis to perform associated with the record, wherein the predetermined category includes the extracting of indicated information; providing an entry for a natural language input modifying the predetermined category, wherein the natural language input requests information to extract from the record; receiving a first input indicating to perform the predetermined category of the analysis and a second input including the natural language input modifying the predetermined category; based on the natural language input and the predetermined category, performing the predetermined category of the analysis; and upon performing the predetermined category of the analysis, producing the output indicating the requested information.
  12. 12. The method of claim 8, comprising: determining whether the record in the database is modified after the analysis is performed; upon determining that the record in the database is modified after the analysis is performed, automatically repeating a performance of the analysis to obtain a second output; and storing the second output as the property in the database.
  13. 13. The method of claim 8, comprising: receiving an indication to perform the type of analysis on multiple records in the database; obtaining from the database the multiple records on which to perform the type of analysis; selecting a subset of the multiple records, wherein the subset of the multiple records is a representative sample of the multiple records; performing the type of analysis on each record among the subset of the multiple records to obtain multiple outputs, wherein performing the type of analysis in each record among the subset of the multiple records requires less time than performing the type of analysis on each record among the multiple records; and providing the multiple outputs.
  14. 14. A system comprising: at least one hardware processor; and at least one non-transitory memory storing instructions, which, when executed by the at least one hardware processor, cause the system to: obtain a record in a database and a property associated with the record in the database, wherein the record includes a large document, and wherein the large document is unstructured or semi-structured; receive an input indicating a type of analysis to perform associated with the record; perform, using an artificial intelligence, the analysis associated with the record to obtain an output, wherein the type of analysis to obtain the output comprises translating at least a portion of the large document into a different language including a natural language or a programming language; store the output as the property in the database; and enable access to the database based on the property, thereby enabling an efficient understanding of contents of the large document without consuming the large document.
  15. 15. The system of claim 14, comprising instructions to: obtain the property, wherein the property is associated with multiple categories, and wherein the property indicates that the output of the analysis falls into one of the multiple categories; receive the input indicating the type of analysis to perform; obtain, from the database, multiple records associated with multiple properties, wherein each property among the multiple properties is generated using the type of analysis; based on the multiple records and the multiple properties, determine a pattern of correspondence between the multiple properties and the multiple records; and based on the pattern of correspondence, determine the output, wherein the output falls into one of the multiple categories.
  16. 16. The system of claim 14, comprising instructions to: obtain multiple records including the record in the database and multiple properties including the property associated with the multiple records in the database, wherein each property among the multiple properties is generated using the type of analysis; receive a search query; search the multiple properties based on the search query to produce a list of relevant results; and provide the list of relevant results.
  17. 17. The system of claim 14, wherein instructions to extract indicated information from the record to obtain the output comprise instructions to: provide a predetermined category of the type of analysis to perform associated with the record, wherein the predetermined category includes extract indicated information; provide an entry for a natural language input modifying the predetermined category, wherein the natural language input requests information to extract from the record; receive a first input indicating to perform the predetermined category of the analysis and a second input including the natural language input modifying the predetermined category; and based on the natural language input and the predetermined category, perform the predetermined category of the analysis; and upon performing the predetermined category of the analysis, produce the output indicating the requested information.
  18. 18. The system of claim 14, comprising instructions to: provide a predetermined category of the type of analysis to perform associated with the record; receive the input indicating to perform the predetermined category of the type of analysis; and perform the predetermined category of analysis.
  19. 19. The system of claim 14, comprising instructions to: determine whether the record in the database is modified after the analysis is performed; upon determining that the record in the database is modified after the analysis is performed, automatically repeat a performance of the analysis to obtain a second output; and store the second output as the property in the database.
  20. 20. The system of claim 14, comprising instructions to: receive an indication to perform the type of analysis on multiple records in the database; obtain from the database the multiple records on which to perform the type of analysis; select a subset of the multiple records, wherein the subset of the multiple records is a representative sample of the multiple records; perform the type of analysis on each record among the subset of the multiple records to obtain multiple outputs, wherein performing the type of analysis in each record among the subset of the multiple records requires less time than performing the type of analysis on each record among the multiple records; and provide the multiple outputs.

Description

ENABLING AN EFFICIENT UNDERSTANDING OF CONTENTS OF A LARGE DOCUMENT WITHOUT STRUCTURING OR CONSUMING THE LARGE DOCUMENT BACKGROUND [0001] Knowledge workers are often inundated with many documents that they need to make sense of efficiently. For example, salespeople may need to quickly understand a long list of prospective customers; designers may need to make sense of hundreds of user interviews to identify key takeaways or requests; engineers may need to digest multiple technical specifications, bug reports, and tasks; managers of large teams may need to navigate dozens of long documents, projects plans, and meeting notes inside a team. In collaborative environments, these kinds of documents often change constantly, adding to the problem. BRIEF DESCRIPTION OF THE DRAWINGS [0002] Detailed descriptions of implementations of the present invention will be described and explained through the use of the accompanying drawings. [0003] Figure 1 is a block diagram of an example platform. [0004] Figure 2 is a block diagram of an example transformer. [0005] Figure 3 shows a system to enable an efficient understanding of contents of a large document. [0006] Figure 4 shows various types of analysis that can be performed on the documents in the database. [0007] Figure 5 shows various options associated with the summary analysis. [0008] Figure 6 shows the various options associated with the custom analysis. [0009] Figure 7 shows the document and multiple properties stored in the database. [0010] Figure 8 is a flowchart of a method to enable an efficient understanding of contents of the document without consuming the document. [0011] Figure 9 is a block diagram that illustrates an example of a computer system in which at least some operations described herein can be implemented. [0012] The technologies described herein will become more apparent to those skilled in the art from studying the Detailed Description in conjunction with the drawings. Embodiments or implementations describing aspects of the invention are illustrated by way of example, and the same references can indicate similar elements. While the drawings depict various implementations for the purpose of illustration, those skilled in the art will recognize that alternative implementations can be employed without departing from the principles of the present technologies. Accordingly, while specific implementations are shown in the drawings, the technology is amenable to various modifications. DETAILED DESCRIPTION [0013] The disclosed system and method obtain a record in a database and a property associated with the record in the database. The property can be text or one or more tags belonging to a set of predetermined categories. The record can include a large text document over 3,000 words. The large text document is unstructured or semi-structured. In other words, the large text document is not a spreadsheet such as an Excel file but a document containing any type of unstructured or semi-structured data. The system receives an input indicating a type of analysis to perform associated with the record, and performs, using a large language model, the analysis associated with the record to obtain an output. [0014] The type of analysis belongs to multiple categories including summarizing, extracting, generating, and translating. When summarizing the record, the large language model produces the output including a summary. When extracting indicated information from the record, the large language model produces the output including the indicated information. When extracting the information, the large language model does not generate new words or concepts based on the document and only includes words contained in the record. When generating a second document based on the large text document, the large language model produces the output including the second document, where the second document includes contents not included in the large text document, and where the second document is smaller than the large text document. The second document can be a social media post generated by the large language model such as a generative artificial intelligence. When translating, the large limit model can translate at least a portion of the large text document into another language, including a different natural language or a programming language. [0015] The system stores the output as the property in the database and enables access of the database based on the property, thereby enabling an efficient understanding of contents of the large text document without structuring the large text document or requiring the user to consume the large text document. [0016] The description and associated drawings are illustrative examples and are not to be construed as limiting. This disclosure provides certain details for a thorough understanding and enabling description of these examples. One skilled in the relevant technology will understand, however, that the invention can be practiced wit