JP-7855086-B2 - A framework for extracting information from text layout.

JP7855086B2JP 7855086 B2JP7855086 B2JP 7855086B2JP-7855086-B2

Inventors

サラチンスキー，マーク
メリル，クリスティアンジョセフ
フィローティ，オクタビアンフローリン
パク，アイリーンユエ－リン

Assignees

ブリストル－マイヤーズスクイブカンパニー

Dates

Publication Date: 20260507
Application Date: 20220420

Claims (20)

A method for extracting information, The processor receives a file of a first form, which contains information and multiple regions of interest (ROIs). The aforementioned processor converts the aforementioned file into an image. The processor generates a first output using the first model, which includes a first information set extracted from the image and a first coordinate set of the first information set within the image. The processor generates a second output using a second model, which has a second set of coordinates for each of the multiple ROIs in the image. The processor generates a third output using a third model, which includes a second information set extracted from the image and a third coordinate set of the second information set within the image. The processor combines the first output and the third output to generate a plurality of coordinates having the information contained in the file and the coordinates of the information in the image. The processor generates a second type of output file having multiple sections using the second output, wherein each section of the multiple sections corresponds to an ROI of the multiple ROIs, and the output file contains the coordinates of the ROIs in the second coordinate set corresponding to each section. The processor inputs the portion of the information that has been determined to correspond to each section into each of the multiple sections in the output file, based on the coordinates corresponding to the portion of the information and the coordinates of each section. Includes, The second format makes the information in the output file searchable while it is displayed on the graphical user interface (GUI) or stored in the data storage device. method.
The aforementioned information comprises one or more words, Generating the first output or the third output includes generating bounding boxes surrounding each of the one or more words in the image. The method according to claim 1.
The third coordinate set forms a bounding box surrounding each of the plurality of ROIs. The method according to claim 1.
The first output is generated using optical character recognition (OCR). The method according to claim 1.
The third output is generated using a neural network. The method according to claim 1.
One or more machine learning models generate a knowledge base using the output file. The method according to claim 1.
The aforementioned information is searchable within the output file. The method according to claim 1.
The processor further includes displaying the output file and the image. The method according to claim 1.
The aforementioned information includes words and/or images, The method according to claim 1.
Combining the first output with the third output to generate the information includes holding the image included in the first output. The method according to claim 9.
The processor identifies one or more words in the first output that share the same coordinates as one or more words in the third output. The processor determines the similarity level between one or more words in the first output and one or more words in the third output. Further including, The method according to claim 9.
The processor assigns a first priority value to the first output and a second priority value to the third output. The processor includes one or more words from the third output among the plurality of words based on the second priority value of the third output and on the similarity level being greater than a predetermined threshold, The processor removes one or more words from the first output among the multiple words based on the first priority value of the first output and on the similarity level being greater than a predetermined threshold. Further including, The method according to claim 11.
The processor identifies the first data format of one or more words in the first output and the second data format of one or more words in the third output. The processor includes, based on the first data format of the one or more words in the first output, the one or more words in the multiple words from the first output, The processor removes one or more words from the third output based on the second data format of one or more words in the third output, Further including, The method according to claim 11.
It is a system for extracting information, Memory containing instructions, A processor connected to the memory and configured to execute the instructions, Equipped with, When the instruction is executed, the processor will: To receive a first type of file containing information and multiple regions of interest (ROIs), Convert the aforementioned file into an image, Using the first model, generate a first output having a first information set extracted from the image and a first coordinate set of the first information set in the image. Using the second model, a second output is generated having a second set of coordinates for each of the multiple ROIs in the image. Using the third model, a third output is generated having a second information set extracted from the image and a third coordinate set of the second information set within the image. The first output and the third output are combined to generate the information contained in the file and the multiple coordinates having the coordinates of the information in the image. Using the second output, generate a second format output file having multiple sections, where each section of the multiple sections corresponds to an ROI of the multiple ROIs, and each section of the multiple sections is included in the output file based on the coordinates of the ROI in the second coordinate set corresponding to each section. Based on the coordinates corresponding to the portion of the information and the coordinates of each section, the portion of the information determined to correspond to each section is input to each of the multiple sections in the output file. Includes, The second format makes the information in the output file searchable while it is displayed on a graphical user interface (GUI) or stored in a data storage device. system.
The information comprises one or more words, and generating the first output or the third output comprises generating bounding boxes surrounding each of the one or more words in the image. The system according to claim 14.
The third coordinate set forms a bounding box surrounding each of the plurality of ROIs. The system according to claim 14.
The first output is generated using optical character recognition (OCR). The system according to claim 14.
The third output is generated using a neural network. The system according to claim 14.
The aforementioned output file is used by one or more machine learning models to generate a knowledge base. The system according to claim 14.
The aforementioned information is selectable within the output file. The system according to claim 14.

Description

Information extraction is a crucial aspect of creating a searchable knowledge base or database. Furthermore, information extraction and knowledge base creation involve the ability to understand data within files and extract information from it. Information can be extracted from files such as text, images, charts, and graphs. Files can also be in various formats and have diverse layouts. As a result, accurately extracting data from files can be difficult. Moreover, mining files on a large scale to find automatically processable information can be challenging. Furthermore, traditional systems cannot extract data from files in the same way that humans read them. Provided herein are embodiments of systems, apparatuses, devices, methods, and/or computer programs for extracting information from files, and/or combinations thereof, or alternative combinations thereof. A particular embodiment includes a method for extracting information. The method includes receiving a file of a first form having information and a plurality of regions of interest (ROIs), and converting the file into an image. The method further includes generating a first output using a first model, having a first set of information extracted from the image and a first set of coordinates of the first set of information in the image. The method includes generating a second output using a second model, having a second set of coordinates of each of the plurality of ROIs in the image. The method includes generating a third output using a third model, having a second set of information extracted from the image and a third set of coordinates of the second set of information in the image. The method further includes combining the first output and the third output to generate the information contained in the file and a plurality of coordinates, the plurality of coordinates having the coordinates of the information in the image. The method includes generating a second form output file having a plurality of sections. Each section of the plurality of sections corresponds to an ROI of the plurality of ROIs, and each section of the plurality of sections is included in the output file based on the coordinates of the ROI in the second set of coordinates corresponding to each section. The method includes inputting the portion of the information determined to correspond to each section into each section of the plurality of sections in the output file, based on the coordinates corresponding to the portion of the information and the coordinates of each section. The second format makes the information in the output file searchable while it is displayed in a graphical user interface (GUI) or stored in a data storage device. In some embodiments, the information has one or more words, and generating the first and third outputs involves generating bounding boxes surrounding the one or more words in the image. In some embodiments, the third coordinate set forms a bounding box surrounding each of the multiple ROIs. In some embodiments, the first output is generated using optical character recognition (OCR). In some embodiments, the third output is generated using a neural network. In some embodiments, the output file is used by one or more machine learning models to generate a knowledge base. In some embodiments, the information is selectable within the output file. In some embodiments, the method further includes displaying the output file and the image. In some embodiments, the information comprises words and/or images. Generating the information by combining the first output with the third output may include retaining the images contained in the first output. The method may include identifying one or more words in the first output that share the same coordinates as one or more words in the third output, and determining the similarity level between the one or more words in the first output and the one or more words in the third output. The method may further include assigning a first priority value to the first output and a second priority value to the third output. The method may further include including one or more words from the third output of the plurality of words based on the second priority value of the third output and on the similarity level being greater than a predetermined threshold. The method may further include excluding one or more words from the first output of the plurality of words based on the first priority value of the first output and on the similarity level being greater than a predetermined threshold. The method may further include identifying a first data format for one or more words in the first output and a second data format for one or more words in the second output. The method may further include including one or more words from the first output of the plurality of words based on the first data format for one or more words in the first output. The method may further include excluding one or more words from the third output of the plurality