Search

US-12619817-B1 - Document template generation

US12619817B1US 12619817 B1US12619817 B1US 12619817B1US-12619817-B1

Abstract

Techniques are disclosed for using a machine learning model to generate document templates from input templates. For example, a computing system receives an input document. A machine learning model of the computing system processes the input document to identify one or more text items corresponding to respective variable field types of the plurality of variable field types. The computing system creates, for each text item of the one or more text items, a variable field for the variable field type of the plurality of variable field types corresponding to the text item and creates, for each variable field of the one or more created variable fields, a mapping for the variable field to a corresponding data source. The computing system generates, based on the input document, a document template comprising the one or more created variable fields and the respective one or more mappings.

Inventors

  • Erin Lindsay McNeill

Assignees

  • DOCUSIGN, INC.

Dates

Publication Date
20260505
Application Date
20230428

Claims (20)

  1. 1 . A computing system comprising processing circuitry having access to a memory, the processing circuitry configured to: receive an electronic input document; process, with a machine learning model, the electronic input document to identify one or more first text items corresponding to respective variable field types of a plurality of variable field types, wherein the machine learning model is trained, with a plurality of labeled electronic documents, to identify, in document text, text that corresponds to any of the plurality of variable field types, wherein each of the plurality of labeled electronic documents includes one or more second text items labeled with a corresponding variable field type of the plurality of variable field types; for each text item of the one or more first text items identified within the electronic input document, create a variable field for the variable field type of the plurality of variable field types corresponding to the text item; for each variable field of the one or more created variable fields, create a mapping for the variable field to a corresponding electronic data source; generate, based on the electronic input document, an electronic document template comprising at least a portion of document text of the electronic input document, the one or more created variable fields and, for each variable field of the one or more created variable fields, the mapping for the variable field to the corresponding electronic data source, wherein the mapping comprises data identifying the corresponding electronic data source; and parameterize, based on the corresponding mapping, each variable field of the one or more created variable fields of the electronic document template with data from the corresponding electronic data source to generate an electronic output document.
  2. 2 . The computing system of claim 1 , wherein the electronic output document comprises the at least a portion of document text of the electronic input document.
  3. 3 . The computing system of claim 2 , wherein the data from the corresponding electronic data source comprises text.
  4. 4 . The computing system of claim 1 , wherein to generate, based on the electronic input document, the electronic document template, the processing circuitry is configured to: for each text item of the one or more first text items identified within the electronic input document, replace the text item within the electronic input document with the corresponding variable field of the one or more created variable fields.
  5. 5 . The computing system of claim 1 , wherein the at least a portion of document text of the electronic input document does not correspond to any of the plurality of variable field types.
  6. 6 . The computing system of claim 1 , wherein the plurality of variable field types comprise one or more of an agreement field, a clause, or an obligation.
  7. 7 . The computing system of claim 1 , wherein the plurality of variable field types comprise one or more of a name, an address, a state of governing law, a payment term, an effective date, or a termination date.
  8. 8 . The computing system of claim 1 , wherein a variable field type of the plurality of variable field types comprises an entity name, wherein to identify the one or more first text items, the machine learning model is configured to identify a text item of the one or more first text items corresponding to the entity name, the text item comprising a first name of an entity, and wherein the processing circuitry is configured to create a mapping, for the variable field of the one or more variable fields corresponding to the first name of the entity, comprising data identifying a column of the corresponding electronic data source, the column comprising entity names for a set of entities.
  9. 9 . The computing system of claim 1 , wherein the processing circuitry is further configured to store the electronic document template in a database comprising a plurality of electronic document templates.
  10. 10 . The computing system of claim 1 , wherein each variable field of the one or more created variable fields comprises a text string descriptive of the corresponding variable field type of the plurality of variable field types.
  11. 11 . A method comprising: receiving, by processing circuitry of a computing system, an electronic input document; processing, with a machine learning model executed by the processing circuitry, the electronic input document to identify one or more first text items corresponding to respective variable field types of a plurality of variable field types, wherein the machine learning model is trained, with a plurality of labeled electronic documents, to identify, in document text, text that corresponds to any of the plurality of variable field types, wherein each of the plurality of labeled electronic documents includes one or more second text items labeled with a corresponding variable field type of the plurality of variable field types; for each text item of the one or more first text items identified within the electronic input document, creating, by the processing circuitry, a variable field for the variable field type of the plurality of variable field types corresponding to the text item; for each variable field of the one or more created variable fields, creating, by the processing circuitry, a mapping for the variable field to a corresponding electronic data source; generating, by the processing circuitry and based on the electronic input document, an electronic document template comprising at least a portion of document text of the electronic input document, the one or more created variable fields and, for each variable field of the one or more created variable fields, the mapping for the variable field to the corresponding electronic data source, wherein the mapping comprises data identifying the corresponding electronic data source; and parameterize, by the processing circuitry and based on the corresponding mapping, each variable field of the one or more created variable fields of the electronic document template with data from the corresponding electronic data source to generate an electronic output document.
  12. 12 . The method of claim 11 , wherein the electronic output document comprises the at least a portion of document text of the electronic input document.
  13. 13 . The method of claim 11 , wherein the data from the corresponding electronic data source comprises text.
  14. 14 . The method of claim 11 , wherein generating, based on the electronic input document, the electronic document template comprises: for each text item of the one or more first text items identified within the electronic input document, replacing the text item within the electronic input document with the corresponding variable field of the one or more created variable fields.
  15. 15 . The method of claim 11 , wherein the at least a portion of document text of the electronic input document does not correspond to any of the plurality of variable field types.
  16. 16 . The method of claim 11 , wherein the plurality of variable field types comprise one or more of an agreement field, a clause, or an obligation.
  17. 17 . The method of claim 11 , wherein the plurality of variable field types comprise one or more of a name, an address, a state of governing law, a payment term, an effective date, or a termination date.
  18. 18 . The method of claim 11 , wherein a variable field type of the plurality of variable field types comprises an entity name, wherein identifying the one or more first text items comprises identifying a text item of the one or more first text items corresponding to the entity name, the text item comprising a first name of an entity, and wherein the method further comprises creating, by the processing circuitry, a mapping, for the variable field of the one or more variable fields corresponding to the first name of the entity, comprising data identifying a column of the corresponding electronic data source, the column comprising entity names for a set of entities.
  19. 19 . The method of claim 11 , wherein each variable field of the one or more created variable fields comprises a text string descriptive of the corresponding variable field type of the plurality of variable field types.
  20. 20 . A non-transitory, computer-readable medium comprising instructions that, when executed, are configured to cause processing circuitry of a computing system to: receive an electronic input document; process, with a machine learning model, the electronic input document to identify one or more first text items corresponding to respective variable field types of a plurality of variable field types, wherein the machine learning model is trained, with a plurality of labeled electronic documents, to identify, in document text, text that corresponds to any of the plurality of variable field types, wherein each of the plurality of labeled electronic documents includes one or more second text items labeled with a corresponding variable field type of the plurality of variable field types; for each text item of the one or more first text items identified within the electronic input document, create a variable field for the variable field type of the plurality of variable field types corresponding to the text item; for each variable field of the one or more created variable fields, create a mapping for the variable field to a corresponding electronic data source; generate, based on the electronic input document, an electronic document template comprising at least a portion of document text of the electronic input document, the one or more created variable fields and, for each variable field of the one or more created variable fields, the mapping for the variable field to the corresponding electronic data source, wherein the mapping comprises data identifying the corresponding electronic data source; and parameterize, based on the corresponding mapping, each variable field of the one or more created variable fields of the electronic document template with data from the corresponding electronic data source to generate an electronic output document.

Description

TECHNICAL FIELD This disclosure generally relates to electronic document management, and more specifically to machine learning for document creation. BACKGROUND Online document management systems are used for creating and reviewing documents for various entities (e.g., people, companies, organizations). Such electronic documents may include various types of agreements that can be executed (e.g., electronically signed) by entities, such as non-disclosure agreements, indemnity agreements, purchase orders, lease agreements, and employment contracts, etc. Online document management systems provide users with tools to edit, view, and execute the documents. Online document management systems are increasingly using cloud-based solutions that allow participants to perform collaborations based on online documents. SUMMARY In general, the disclosure describes techniques for generating a document template from document text of a document. The generated document templates may be used to generate new documents by merging data into variable fields identified from the document text. For example, a computing system as described herein executes a machine learning model trained to identify, in document text, text that corresponds to any of a plurality of variable field types. The computing system receives an input document and applies the machine learning model to process the input document to identify text items corresponding to respective variable field types of the plurality of variable field types. For each recognized text item, the computing system creates a variable field for the variable field type corresponding to the text item. Further, for each of the created variable fields, the computing system creates a mapping for the variable field to a corresponding data source. The computing system generates, based on the input document, a document template comprising the created variable fields and the respective mappings. For example, the computing system may replace each instance of each identified text item within the input document with a variable field for the variable field type corresponding to instance of the text item to generate the document template. The computing system may thereafter use the mappings for the variable fields to the respective data sources to parameterize each variable field with a new text item of the corresponding variable field type to produce an output document. The techniques of the disclosure may provide specific technical improvements to the computer-related field of electronic document management and document creation that have practical applications. For example, the techniques disclosed herein may enable a document management system to automatically identify probable dynamic data within document text and assist a user with converting, or in some cases autonomously convert, an input document into a document template having variable fields in place of the dynamic data. The document template may thereafter be usable to generate subsequent documents for use by different entities or in different scenarios by merging the variable fields with data from respective data sources mapped to the variable fields. A document management system as described herein may require only a single example of a document to rapidly prepare a document template for use by an entity. Therefore, the techniques of the disclosure may reduce the need for a technical expert to gather requirements of the entity (e.g., such as customer, business, legal, or other requirements) and comprehensively review multiple documents of a similar type to determine which portions of the documents are required to manually prepare the document template. Accordingly, the techniques of the disclosure may significantly reduce the time, expense, and labor required to prepare a document template useable for the generation of documents. In one example, this disclosure describes a computing system comprising processing circuitry having access to a memory, the processing circuitry configured to: receive an input document; execute a machine learning model trained to identify, in document text, text that corresponds to any of a plurality of variable field types, the machine learning model configured to process the input document to identify one or more text items corresponding to respective variable field types of the plurality of variable field types; for each text item of the one or more text items, create a variable field for the variable field type of the plurality of variable field types corresponding to the text item; for each variable field of the one or more created variable fields, create a mapping for the variable field to a corresponding data source; and generate, based on the input document, a document template comprising the one or more created variable fields and the respective one or more mappings. In another example, this disclosure describes a method comprising: receiving, by processing circuitry of a computing system, an input doc