KR-20260067434-A - Deep learning-based unstructured document classification system and method

KR20260067434AKR 20260067434 AKR20260067434 AKR 20260067434AKR-20260067434-A

Abstract

The present invention relates to a deep learning-based unstructured document classification system and method, The invention is characterized by comprising: a text object extraction unit that receives a document image generated for a document of an unstructured structure, extracts text objects from the document image using a Document Understanding Transformer and an OCR (Optical Character Reader) respectively, and integrates the extracted text objects into one to provide them; a text object tagging unit that performs NE tagging using a Named Entity Recognition (NER) model on the text objects provided through the text object extraction unit; and a document type classification unit that classifies the type of the document image according to the field combination configuration of the NE tag applied through the text object tagging unit using a pre-built deep learning model.

Inventors

박성제

Assignees

(주)네스지오

Dates

Publication Date: 20260513
Application Date: 20241104

Claims (12)

A text object extraction unit that receives a document image generated from a document of an unstructured structure, extracts text objects from the document image using a Document Understanding Transformer and an OCR (Optical Character Reader) respectively, and integrates the extracted text objects into one to provide them; A text object tagging unit that performs NE tagging through a Named Entity Recognition (NER) model on a text object provided through the text object extraction unit; and An unstructured document classification system based on deep learning natural language processing, characterized by including a document type classification unit that classifies the type of the document image according to the field combination configuration of the NE tag applied through the text object tagging unit using a pre-built deep learning model.
In Article 1, The above text object extraction unit is, A first text object data generation unit that receives a document image as input and generates first text object data based on text objects recognized from the document image using a document understanding converter; A second text object data generation unit that receives the above document image and generates second text object data based on text objects recognized from the above document image using OCR; and An unstructured document classification system based on deep learning natural language processing, characterized by including a final text object data output unit that compares the first text object data and the second text object data with each other and outputs the final text object data obtained by merging the first text object data and the second text object data when they match within a preset error range.
In Article 2, The above-mentioned first text object data generation unit is, An encoding unit that receives the above document image and encodes the input document image to convert it into an image embedding; A decoding unit that generates a token sequence for converting the above image embeddings into information in a structured format; and An unstructured document classification system based on deep learning natural language processing, characterized by including a first text object data extraction unit that receives the image embedding and the token sequence as input and extracts text objects of the document image according to the token sequence through a pre-built document understanding converter.
In Article 2, The above second text object data generation unit is, A document object detection unit that uses a text object detection model to detect character regions where characters are located in the document image, crops each of the detected character regions into a single object, generates a cropped image having the cropped region, and generates position values for each of the cropped images; and An unstructured document classification system based on deep learning natural language processing, characterized by including a second text object data extraction unit that extracts text objects through character recognition of the cropped image using a pre-established OCR model.
In Article 1, The above text object tagging unit is, An unstructured document classification system based on deep learning natural language processing, characterized by tokenizing and recognizing the above text objects, and applying NE tags to each token through a NER model equipped with a predefined NE dataset to identify the location of each text object and classify fields.
In Article 5, The above document type classification section is, A document type prediction unit that has a deep learning model pre-trained on the classification of document types according to the field information of the NE tag, receives the NE tag as input to the deep learning model, predicts the document type according to the composition ratio of the field information of the input NE tag and the arrangement relationship of the field information of the NE tag applied to adjacent text objects, and outputs the predicted document type information as the type of the document image; and An unstructured document classification system based on deep learning natural language processing, characterized by including a classification model retraining unit that retrains a deep learning model by adjusting the weights for each of the fields of the NE tag when different document end information is predicted for two or more prediction error documents having the same composition ratio and placement relationship of the field information of the NE tag.
A text object extraction step that receives a document image generated from a document of an unstructured form, extracts text objects from the document image using a Document Understanding Transformer and an OCR (Optical Character Reader) respectively, and integrates the extracted text objects into a single object to provide it; A text object tagging step for performing NE tagging through a NER model on a text object provided through the text object extraction step above; and A deep learning natural language processing-based unstructured document classification method characterized by including a document type classification step that classifies the type of the document image according to the field combination configuration of the NE tag applied through the text object tagging step using a pre-built deep learning model.
In Article 7, The above text object extraction step is, A first text object data generation step of receiving a document image as input and generating first text object data based on text objects recognized from the document image using a document understanding converter; A second text object data generation step of receiving the above document image and generating second text object data based on text objects recognized from the above document image using OCR; and A deep learning natural language processing-based unstructured document classification method characterized by including a final text object data output step of comparing the first text object data and the second text object data with each other, and outputting the final text object data obtained by merging the first text object data and the second text object data when they match within a preset error range.
In Article 8, The above first text object data generation step is, An encoding step of receiving the above document image and encoding the input document image to convert it into an image embedding; A decoding step for generating a token sequence for converting the above image embeddings into information in a structured format; and A deep learning natural language processing-based unstructured document classification method characterized by including a first text object data extraction step of receiving the image embedding and the token sequence as input, and extracting text objects of the document image according to the token sequence through a pre-built document understanding converter.
In Article 8, The above second text object data generation step is, A document object detection step of using a text object detection model to detect character regions where characters are located in the document image, cropping each of the detected character regions into a single object, generating a cropped image having the cropped region, and generating position values for each of the cropped images; and A deep learning natural language processing-based unstructured document classification method characterized by including a second text object data extraction step of extracting text objects through character recognition of the cropped image using a pre-established OCR model.
In Article 7, The above text object tagging step is, A deep learning natural language processing-based unstructured document classification method characterized by tokenizing and recognizing the above text objects, and applying NE tags to each token through a NER model equipped with a predefined NE dataset to identify the location of each text object and classify fields.
In Article 11, The above document type classification step is, A document type prediction step in which a deep learning model is pre-trained to classify document types according to the field information of the NE tag, the deep learning model receives the NE tag as input, predicts the document type according to the composition ratio of the field information of the input NE tag and the arrangement relationship of the field information of the NE tag applied to adjacent text objects, and outputs the predicted document type information as the type of the document image; and A deep learning natural language processing-based unstructured document classification method characterized by including a classification model retraining step in which, when different document end information is predicted for two or more prediction error documents having the same composition ratio and placement relationship of field information of the NE tag, the weights for the fields of the NE tag are each adjusted to retrain the deep learning model.

Description

Deep learning-based unstructured document classification system and method The present invention relates to a deep learning-based unstructured document classification system and method, and more specifically, to an unstructured document processing system and method capable of understanding and classifying unstructured documents by combining a DUT model and an OCR model to derive mutually complementary text recognition results. Generally, unstructured documents consisting of text are processed in modern work environments just like structured documents such as invoices, receipts, bills of lading, and business cards; some of these exist as digital electronic files, while others exist in the form of scanned images or photographs. Visual Document Understanding (VDU) is a task that aims to understand documents regardless of their various formats, layouts, and content; therefore, it can be considered an important prerequisite for automated document processing. Among VDU systems designed to understand the content and determine the type of unstructured documents, template-based OCR solutions are representative. To implement such an OCR solution, it is necessary to create a list of patterns that each field can represent and generate regular expressions for each. Furthermore, a process of searching and matching every document line to determine which pattern it matches is required, and through this process, it is possible to infer which field a specific extracted line belongs to. However, approaches using such OCR solutions have limitations in identifying types of unstructured documents. In other words, document pattern matching is not suitable for unstructured data or documents whose structures change frequently. FIG. 1 is a block diagram showing the overall configuration of an unstructured document classification system based on deep learning natural language processing according to one embodiment of the present invention. FIG. 2 is a block diagram showing the configuration of a text object extraction unit according to an embodiment of the present invention. FIG. 3 is a block diagram showing the configuration of a document type classification unit according to one embodiment of the present invention. FIGS. 4 and 5 are diagrams illustrating the NE (Named Entity) tagging method of a text object tagging unit according to an embodiment of the present invention. FIG. 6 is a block diagram showing the configuration of a classification model retraining unit according to one embodiment of the present invention. FIG. 7 is a diagram illustrating the operation method of a web crawling execution unit according to an embodiment of the present invention. FIG. 8 is a diagram illustrating the operation method of a field weighting adjustment unit according to an embodiment of the present invention. FIG. 9 is a flowchart showing the overall configuration of an unstructured document classification method based on deep learning natural language processing according to another embodiment of the present invention. FIG. 10 is a flowchart showing the configuration of a text object extraction step according to another embodiment of the present invention. FIG. 11 is a flowchart showing the configuration of a document type classification step according to another embodiment of the present invention. FIG. 12 is a flowchart showing the configuration of a classification model retraining step according to another embodiment of the present invention. The terms used in this specification will be briefly explained, and the invention will be described in detail. The terms used in this invention have been selected based on currently widely used general terms, taking into account their functions within the invention; however, these terms may vary depending on the intent of those skilled in the art, case law, the emergence of new technologies, etc. Additionally, in specific cases, terms have been arbitrarily selected by the applicant, and in such cases, their meanings will be described in detail in the relevant description of the invention. Therefore, the terms used in this invention should be defined not merely by their names, but based on their meanings and the overall content of the invention. When a part of a specification is described as "comprising" a certain component, this means that, unless specifically stated otherwise, it does not exclude other components but may include additional components. Furthermore, terms such as "...part" or "module" as used in the specification refer to a unit that processes at least one function or operation, and this may be implemented in hardware or software, or as a combination of hardware and software. Embodiments of the present invention are described below with reference to the attached drawings so that those skilled in the art can easily implement them. However, the present invention may be embodied in various different forms and is not limited to the embodiments described herein. Furthermore, in order to clear