CN-122024249-A - Method, system, device, medium and program product for classifying document images

CN122024249ACN 122024249 ACN122024249 ACN 122024249ACN-122024249-A

Abstract

The present disclosure provides a method, a system, a device, a medium and a program product for classifying document images, the method comprising performing image classification on document images to be classified by using an image classification model to obtain dimension vectors of the image classification model; the method comprises the steps of carrying out text extraction on a document image to be classified by utilizing an OCR model, carrying out word segmentation on the extracted text to obtain a word segmentation set, carrying out text classification on the word segmentation set by utilizing a text classification model to obtain a dimension vector of the text classification model, and inputting the dimension vector of the image classification model and the dimension vector of the text classification model into a multi-modal model to obtain an image classification result of the document image to be classified. The method combines the image classification model, the OCR model, the text classification model and the multi-modal model, performs image classification, text extraction, text classification and multi-modal classification on the document images to be classified, avoids the problems of difficult real-time response and tuning difficulty, and improves classification accuracy and classification effect.

Inventors

YANG WEIJIA
YANG ZHI
ZHAO GUOQIANG
ZHANG YUQIN
CHEN GUORUI

Assignees

上海赢科信息技术有限公司

Dates

Publication Date: 20260512
Application Date: 20241111

Claims (10)

1. A method for classifying document images, the method comprising: Carrying out image classification on the document images to be classified by using the image classification model to obtain the dimension vector of the image classification model; Extracting texts of the document images to be classified by utilizing an OCR model, and segmenting the extracted texts to obtain segmented sets; Performing text classification on the word segmentation set by using a text classification model to obtain a dimension vector of the text classification model; And inputting the dimension vector of the image classification model and the dimension vector of the text classification model into a multi-mode model to obtain an image classification result of the document image to be classified.
2. The method for classifying document images according to claim 1, wherein the step of classifying the document images to be classified using the image classification model to obtain the dimension vector of the image classification model comprises: Inputting the document images to be classified into an image classification model to obtain image classification results of the document images to be classified; and acquiring a dimension vector of the image classification model in response to the image classification result not belonging to the object class or the image classification result belonging to the object class and the confidence coefficient of the image classification result being smaller than or equal to a preset confidence coefficient.
3. The method of classifying document images according to claim 2, wherein the step of performing text classification on the segmented word set using a text classification model to obtain a dimension vector of the text classification model includes: Inputting the word segmentation set into a text classification model to obtain a text classification result; And responding to the confidence coefficient of the text classification result being smaller than or equal to the preset confidence coefficient, and acquiring the dimension vector of the text classification model.
4. The method for classifying document images according to claim 1, wherein the step of extracting text from the document images to be classified by using an OCR model and segmenting the extracted text to obtain a segmented set of words includes: Inputting the document images to be classified into an OCR model for text extraction, and obtaining a target text with text confidence coefficient larger than preset confidence coefficient; Performing word segmentation on the target text, and filtering word segmentation which does not belong to a vocabulary list to obtain a word segmentation set; And/or the number of the groups of groups, The step of inputting the dimension vector of the image classification model and the dimension vector of the text classification model into the multi-modal model to obtain the image classification result of the document image to be classified comprises the following steps: inputting the dimension vector of the image classification model and the dimension vector of the text classification model into an attention module to obtain attention weight; Fusing the dimension vector of the image classification model and the dimension vector of the text classification model based on the attention weight to obtain a fused dimension vector; And inputting the fusion dimensional vector into a classification module to obtain an image classification result of the document image to be classified.
5. The method for classifying document images according to claim 3, wherein the method for classifying further comprises: responding to the image classification result belonging to the object class and the confidence coefficient of the image classification result being greater than the preset confidence coefficient, and outputting the image classification result of the document image to be classified; And/or the number of the groups of groups, The classification method further comprises the following steps: And outputting the image classification result of the document image to be classified in response to the confidence coefficient of the text classification result being greater than the preset confidence coefficient.
6. The method of classifying document images according to claim 1, wherein the method further comprises: acquiring a target text with text confidence coefficient larger than preset confidence coefficient, and performing word segmentation on the target text to obtain a word segmentation result; Filtering the word segmentation result to obtain word frequency and category matrixes; Combining word segmentation sets of the document images to be classified of each category to obtain combined word segmentation sets; removing the segmented words with the duty ratio smaller than the preset duty ratio in the combined segmented word set to obtain a segmented word set with segmented words removed; Performing word segmentation and duplication removal on the combined word segmentation set with the removal duty ratio smaller than the preset duty ratio word segmentation to obtain a frequent word set; Filtering the word frequency and the category matrix based on the frequent word set to remove word segmentation columns which are not in the frequent word set, so as to obtain a new word frequency and a new category matrix; carrying out chi-square test on the new word frequency and the new category matrix to obtain a test result; Obtaining a target word segmentation set with an assumed value smaller than or equal to a preset assumed value in a test result; And obtaining a vocabulary table based on the target word segmentation set.
7. A classification system for document images, the classification system comprising: the image classification module is used for carrying out image classification on the document images to be classified by utilizing the image classification model to obtain the dimension vector of the image classification model; the text extraction module is used for extracting texts of the document images to be classified by utilizing an OCR model and segmenting the extracted texts to obtain a segmented set; the text classification module is used for carrying out text classification on the word segmentation set by using a text classification model to obtain a dimension vector of the text classification model; And the multi-modal classification module is used for inputting the dimension vector of the image classification model and the dimension vector of the text classification model into the multi-modal model to obtain the image classification result of the document image to be classified.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory for execution on the processor, wherein the processor implements the method of classifying document images according to any one of claims 1 to 6 when executing the computer program.
9. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the method of classifying document images according to any one of claims 1 to 6.
10. A computer program product comprising a computer program which, when executed by a processor, implements a method of classifying document images according to any one of claims 1 to 6.

Description

Method, system, device, medium and program product for classifying document images Technical Field The present disclosure relates to the field of image processing technologies, and in particular, to a method, a system, an apparatus, a medium, and a program product for classifying document images. Background In many fields such as insurance claims, coupon verification and after-sales, document images uploaded by the applicant are required to be classified. Existing image classification techniques rely primarily on image classification models, OCR (Optical Character Recognition ) models, or multimodal models alone to process these document images. However, using these methods alone has the following problems: the image features are difficult to distinguish, especially form images, the similarity of forms is extremely high, and the forms are difficult to distinguish from the appearance features of the images. This makes it difficult to accurately identify the image classification model. Character extraction is incomplete, namely, all characters cannot be completely extracted even in an OCR model with optimal performance in the face of different shooting conditions, and the situations of word adhesion, character loss and character recognition errors occur. This can have an impact on the accuracy of extracting text from the OCR model and then classifying. The calculation force requirement is high, the response speed is low, classification can be performed by training a multi-mode model, but the huge parameter quantity of the model has high calculation force requirement, and the real-time response requirement is difficult to meet. The multi-mode optimization is difficult, when the multi-mode model has classification errors, the prior art can hardly check the problem points due to the black box property of the neural network, and only more samples can be accumulated for retraining, but the process can have unpredictable influence on all classifications. These problems result in a significant number of manual review after the use of the model prescreening by most self-service claims and verification systems. When problems such as missing transmission and mistransmission of documents are found, repeated returning and repeated returning are needed, repeated reciprocating is needed for many times, and time and labor are consumed. The real-time automatic classification of the images cannot be realized, so that the customer satisfaction is continuously reduced. Disclosure of Invention The technical problem to be solved by the present disclosure is to overcome the defects of low accuracy, difficult real-time response and difficult optimization in the prior art that the document image classification is performed by using a single model, and provide a document image classification method, system, device, medium and program product. The technical problems are solved by the following technical scheme: the first aspect of the present disclosure provides a classification method for document images, the classification method comprising: Carrying out image classification on the document images to be classified by using the image classification model to obtain the dimension vector of the image classification model; Extracting texts of the document images to be classified by utilizing an OCR model, and segmenting the extracted texts to obtain segmented sets; Performing text classification on the word segmentation set by using a text classification model to obtain a dimension vector of the text classification model; And inputting the dimension vector of the image classification model and the dimension vector of the text classification model into a multi-mode model to obtain an image classification result of the document image to be classified. Preferably, the step of using the image classification model to perform image classification on the document image to be classified, and obtaining the dimension vector of the image classification model includes: Inputting the document images to be classified into an image classification model to obtain image classification results of the document images to be classified; and acquiring a dimension vector of the image classification model in response to the image classification result not belonging to the object class or the image classification result belonging to the object class and the confidence coefficient of the image classification result being smaller than or equal to a preset confidence coefficient. Preferably, the step of using the text classification model to perform text classification on the word segmentation set, and obtaining the dimension vector of the text classification model includes: Inputting the word segmentation set into a text classification model to obtain a text classification result; And responding to the confidence coefficient of the text classification result being smaller than or equal to the preset confidence coefficient, and acquiring the dimension vector of the