Search

US-12626527-B2 - Automatic document template inference, generation, and refinement

US12626527B2US 12626527 B2US12626527 B2US 12626527B2US-12626527-B2

Abstract

Various embodiments offer improved functionality for generating and/or refining templates that can be used for automatically extracting information from within an invoice or other document, based on geometric characteristics of the document. An initial template may be automatically generated, and such initial template may then be refined over time based on user feedback, so as to improve reliability and accuracy in information extraction.

Inventors

  • Gal Isaac Lellouche
  • Saed Hussain
  • Ernest Ho Hin Chow
  • Alexander Millar

Assignees

  • SAGE GLOBAL SERVICES LIMITED

Dates

Publication Date
20260512
Application Date
20230913

Claims (20)

  1. 1 . A computer-implemented method for extracting information from a document, comprising: at a hardware processing device, receiving a document; at the hardware processing device, automatically extracting image data from the received document; at the hardware processing device, automatically determining a source of the received document; at the hardware processing device, automatically determining whether a template exists for the determined source in a data storage device; responsive to a template not existing for the determined source: at the hardware processing device, automatically extracting information from the image data representing the received document; based on the extracted information, at the hardware processing device, automatically creating a template for the determined source; and storing the created template in the data storage device; responsive to a template existing for the determined source, automatically retrieving the template from the data storage device; based on the template for the determined source, at the hardware processing device, automatically extracting information from the image data representing the received document; at the hardware processing device, receiving feedback from at least one user specifying data expected to be found within the received document; at the hardware processing device, automatically assigning a confidence metric to the received feedback; at the hardware processing device, automatically identifying at least a subset of the feedback as having a confidence metric indicating that the subset is trustworthy; at the hardware processing device, comparing the identified subset of the received feedback with the extracted information; at the hardware processing device, automatically refining the template based on the results of the comparison; and outputting the extracted information on an output device.
  2. 2 . The method of claim 1 , wherein: comparing the identified subset of the received feedback with the extracted information comprises identifying at least one error in the extracted information; and automatically refining the template comprises automatically updating the template based on the indicated at least one error.
  3. 3 . The method of claim 2 , wherein: identifying at least one error in the extracted information comprises automatically determining at least one location in the received document corresponding to the at least one error; and automatically updating the template comprises automatically updating at least one portion of the template corresponding to the determined at least one location.
  4. 4 . The method of claim 3 , wherein the received user feedback is text-based, and wherein automatically determining at least one location in the received document corresponding to the indicated at least one error comprises: automatically scanning the received document to identify at least one location corresponding to the received text-based feedback.
  5. 5 . The method of claim 1 , wherein the received user feedback is text-based.
  6. 6 . The method of claim 1 , further comprising, prior to outputting the extracted information, validating the extracted information to determine a confidence metric for the extracted information.
  7. 7 . The method of claim 1 , wherein: the document comprises an invoice; and the source of the document comprises a vendor.
  8. 8 . The method of claim 1 , further comprising automatically populating an accounting record using the extracted information.
  9. 9 . The method of claim 1 , wherein receiving a document comprises receiving a visual representation of a scanned document.
  10. 10 . A non-transitory computer-readable medium for extracting information from a document, comprising instructions stored thereon, that when performed by a hardware processor, perform the steps of: receiving a document; automatically extracting image data from the received document; automatically determining a source of the document; automatically determining whether a template exists for the determined source in a data storage device; responsive to a template not existing for the determined source: automatically extracting information from the image data representing the received document; based on the extracted information, automatically creating a template for the determined source; and causing the created template to be stored in the data storage device; responsive to a template existing for the determined source, automatically retrieving the template from the data storage device; based on the template for the determined source, automatically extracting information from the image data representing the received document; receiving feedback from at least one user specifying data expected to be found within the received document; automatically assigning a confidence metric to the received feedback; automatically identifying at least a subset of the feedback as having a confidence metric indicating that the subset is trustworthy; comparing the identified subset of the received feedback with the extracted information; automatically refining the template based on the results of the comparison; and causing the extracted information to be output on an output device.
  11. 11 . The non-transitory computer-readable medium of claim 10 , wherein: comparing the identified subset of the received feedback with the extracted information comprises identifying at least one error in the extracted information; and automatically refining the template comprises automatically updating the template based on the indicated at least one error.
  12. 12 . The non-transitory computer-readable medium of claim 11 , wherein: identifying at least one error in the extracted information comprises automatically determining at least one location in the received document corresponding to the indicated at least one error; and automatically updating the template comprises automatically updating at least one portion of the template corresponding to the determined at least one location.
  13. 13 . The non-transitory computer-readable medium of claim 12 , wherein the received user feedback is text-based, and wherein automatically determining at least one location in the received document corresponding to the indicated at least one error comprises: automatically scanning the received document to identify at least one location corresponding to the received text-based feedback.
  14. 14 . The non-transitory computer-readable medium of claim 10 , wherein the received user feedback is text-based.
  15. 15 . The non-transitory computer-readable medium of claim 10 , further comprising instructions stored thereon, that when performed by a hardware processor, perform the step of, prior to causing the extracted information to be output on an output device, validating the extracted information to determine a confidence metric for the extracted information.
  16. 16 . The non-transitory computer-readable medium of claim 10 , wherein: the document comprises an invoice; and the source of the document comprises a vendor.
  17. 17 . The non-transitory computer-readable medium of claim 10 , further comprising instructions stored thereon, that when performed by a hardware processor, perform the step of automatically populating an accounting record using the extracted information.
  18. 18 . The non-transitory computer-readable medium of claim 10 , wherein receiving a document comprises receiving a visual representation of a scanned document.
  19. 19 . A system for extracting information from a document, comprising: a data storage device; an output device; a hardware processing device, communicatively coupled to the data storage device and the output device, configured to: receive a document; automatically extract image data from the received document; automatically determine a source of the document; automatically determine whether a template exists for the determined source in the data storage device; responsive to a template not existing for the determined source: automatically extract information from the image data representing the received document; based on the extracted information, automatically create a template for the determined source; and cause the created template to be stored in the data storage device; responsive to a template existing for the determined source, automatically retrieve the template from the data storage device; and based on the template for the determined source, automatically extract information from the image data representing the received document; and an input device, communicatively coupled to the hardware processing device, configured to receive feedback from at least one user specifying data expected to be found within the received document; automatically assign a confidence metric to the received feedback; wherein: the hardware processing device is further configured to: automatically identify at least a subset of the feedback as having a confidence metric indicating that the subset is trustworthy; compare the identified subset of the received feedback with the extracted information; and automatically refine the template based on the results of the comparison; and the output device is configured to output the extracted information.
  20. 20 . The system of claim 19 , wherein: comparing the identified subset of the received feedback with the extracted information comprises identifying at least one error in the extracted information; and automatically refining the template comprises automatically updating the template based on the indicated at least one error.

Description

CROSS-REFERENCE TO RELATED APPLICATION The present application is related to U.S. Utility application Ser. No. 17/939,809 for “Classifying Documents Using Geometric Information”, filed Sep. 7, 2022, which is incorporated by reference herein in its entirety. TECHNICAL FIELD The present document relates to techniques for automatically extracting information from documents. BACKGROUND An important component of software applications such as accounting applications is the extraction of information from documents that arrive from external sources, such as invoices, statements, and/or the like. Conventionally, when invoices and/or other documents arrive at a company, information from such documents must be entered into accounting software for payment and/or other processing. Such operations may take place by manual entry, or by optical character recognition of text contained in the incoming invoices and/or other documents. Such manual operations can be tedious, time-consuming, and error prone. More specifically, in the context of accounting software, reliable extraction of information from incoming documents is critical so as to ensure that transactions are correctly encoded and recorded in a general ledger. In general, existing optical character recognition (OCR) technology does a poor job of extracting information from documents such as vendor invoices, and often results in errors. In addition, conventional deep learning based tools often require large volumes of annotated training data before they are able to deliver adequate performance in reliably extracting information from documents. In addition, because of the expensive and long training times, conventional systems provide models that are focused on zero-shot prediction, to provide best results on average without improving over time. SUMMARY Various embodiments described herein offer improved functionality for generating and/or refining templates that can be used for automatically extracting information from an invoice or other document, based on geometric characteristics of the document. An initial template may be automatically generated, and such initial template may then be refined and improved over time based on user interactions and feedback, so as to improve reliability and accuracy in information extraction. As described in the above-referenced related application, many vendors use a standard document template for invoices and/or other documents that may be generated by their accounting software or other source. Often, each vendor's standard document template has some geometric characteristics that are unique to that vendor. According to the techniques described in the above-referenced related application, geometric information extracted from incoming images of documents can be used to construct a unique vendor template, thereby providing a mechanism for automatically identifying particular vendors when new documents are received. Such techniques can be used for classifying any type of document. The vendor template specifies where key information may be found within each document. Such key information (referred to herein as “entities”) may include, for example: vendor name, invoice number, document number, due date, and/or the like. According to various embodiments described herein, vendor templates may be automatically determined so that entity location information from such templates may be used to automatically extract data from documents in a reliable and accurate manner. A software application is used to automatically determine the locations of a document's entities and to generate a template for the document; little to no human intervention is needed. In at least one embodiment, when a document is first encountered, the system examines the document, identifies the entities and their location in the document, and stores this information in a template file. If there is user feedback, the template may be automatically updated based on such feedback, so as to provide a mechanism to automatically improve accuracy and confidence in the template. When a document from the same template is encountered in the future, the stored template may be retrieved and used to extract entities from the new document. In at least one embodiment, a given template can be used across users who may receive invoices from the same vendor. In at least one embodiment, the system operates in real time (i.e., during a user interaction session). The template may be automatically generated from a single file with no human intervention, and may be updated continually, based on received user feedback and/or other information. The described system is thus able to automatically generate a document template, in real time, during user interaction, with no human intervention. Temples may be automatically and continually updated based on user feedback, so as to further increase accuracy and confidence. Document templates can be used to reliability and accurately extract entities as need