CN-115273118-B - Method and device for extracting key information for document image recognition

CN115273118BCN 115273118 BCN115273118 BCN 115273118BCN-115273118-B

Abstract

The invention provides a key information extraction method and device for document image recognition, wherein the method comprises the steps of determining a preset number of candidate key values for each key based on an OCR recognition result, determining a candidate path total score of all key selection candidate key values according to scores of the corresponding candidate key values selected by each key, and determining a target key value selected by each key under the condition that the candidate path total score is highest so as to obtain a corresponding information extraction result, wherein the score of each key when the corresponding candidate key value is selected at least comprises a score value determined by the position relation between the key and the candidate key value. The method effectively avoids the problem of easy error caused by the fixed keywords in the rule-based information extraction method. Meanwhile, the method comprehensively considers the selection condition of all candidate key values, effectively utilizes the mutual connection and mutual exclusivity among key information, has higher accuracy and can solve the global optimization problem of similar entity information extraction.

Inventors

LIU CHANGSONG
WANG YANWEI
LI JIE
ZHANG YUQI
ZHANG RUIXUE
ZHANG CHEN

Assignees

清华大学
上海浦东发展银行股份有限公司

Dates

Publication Date: 20260505
Application Date: 20220623

Claims (9)

1. A key information extraction method for document image recognition, comprising: determining a preset number of candidate key values for each key based on the OCR recognition result; Determining the total score of candidate paths of all the key selection candidate key values according to the score of the corresponding candidate key value selected by each key; Under the condition that the total score of the candidate paths is highest, determining a target key value selected by each key to obtain a corresponding information extraction result; wherein, the grading when each key selects the corresponding candidate key value at least comprises the grading value determined by the position relation between the key and the candidate key value; The step of determining the total score of candidate paths of all the key selection candidate key values according to the score of the corresponding candidate key value selected by each key comprises the following steps: Determining the score of each key when selecting a corresponding candidate key value according to the geometric feature score, the OCR recognition score and the entity recognition score; Determining the total score of candidate paths of all the key selection candidate key values according to the scores when all the keys select the corresponding candidate key values; The geometric feature score is a score value determined according to the position relation between the key and the candidate key value, the OCR recognition score is a score value determined according to the posterior confidence of OCR recognition, and the entity recognition score is a score value determined according to the posterior probability output by the entity recognition model.
2. The key information extraction method for document image recognition according to claim 1, wherein said determining a preset number of candidate key values for each key comprises: For each key, according to the rectangular frame distance between the key and the key value, finding a preset number of candidate key values with the nearest rectangular frame distance around each key; the calculation mode of the rectangular frame distance comprises that for two points with the nearest rectangular frame Euclidean distance between a key and a key value, the two points are determined according to the smaller difference value of the abscissa and the ordinate of the two points.
3. The method for extracting key information for document image recognition according to claim 1, wherein determining a candidate path total score for all key selection candidate key values according to the score for each key selection corresponding candidate key value comprises: for each key without the shared candidate key value, calculating the score when the corresponding candidate key value is selected; For each key with the shared candidate key value, selecting the candidate key value according to the fixed sequence priority of the key, and not selecting the candidate key value selected by other keys in the selection process; For the keys with the common candidate key values, if no candidate key value is selectable, the candidate key value scores 0.
4. The method for extracting key information for document image recognition according to claim 1, wherein before determining the score when each key selects the corresponding candidate key value according to the geometric feature score, the OCR recognition score, and the entity recognition score, further comprising: determining a geometric feature score according to the following formula; ; Wherein d is the distance between the selected candidate key value rectangular frame and the rectangular frame of the key, d1, d2. And h1 and h2 are respectively the heights of rectangular frames corresponding to the keys and the selected candidate key value characters.
5. The method for extracting key information for document image recognition according to claim 1, wherein before determining the score when each key selects the corresponding candidate key value according to the geometric feature score, the OCR recognition score, and the entity recognition score, further comprising: determining an OCR recognition score according to the following formula; ; wherein R is OCR recognition score of the selected candidate key value, m is character number of the selected candidate key value, The posterior confidence of the j-th character recognition for the selected candidate key value.
6. A key information extracting apparatus for document image recognition, comprising: A preliminary extraction module for determining a preset number of candidate key values for each key based on the OCR recognition result; The scoring processing module is used for determining the total score of candidate paths of all the key selection candidate key values according to the score of the corresponding candidate key value selected by each key; The accurate extraction module is used for determining a target key value selected by each key under the condition that the total score of the candidate path is highest so as to obtain a corresponding information extraction result; wherein, the grading when each key selects the corresponding candidate key value at least comprises the grading value determined by the position relation between the key and the candidate key value; The score processing module 402 is specifically configured to determine a score when each key selects a corresponding candidate key value according to a geometric feature score, an OCR recognition score and an entity recognition score, determine a total score of candidate paths of all key selection candidate key values according to scores when all keys select corresponding candidate key values, wherein the geometric feature score is a score value determined according to a positional relationship between a key and a candidate key value, the OCR recognition score is a score value determined according to a posterior confidence of OCR recognition, and the entity recognition score is a score value determined according to a posterior probability output by an entity recognition model.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the key information extraction method for document image recognition according to any one of claims 1 to 5 when the program is executed by the processor.
8. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the key information extraction method for document image recognition according to any one of claims 1 to 5.
9. A computer program product comprising a computer program which, when executed by a processor, implements the key information extraction method for document image recognition according to any one of claims 1 to 5.

Description

Method and device for extracting key information for document image recognition Technical Field The invention relates to the field of artificial intelligence, in particular to a key information extraction method and device for document image recognition. Background The text or document image is identified and then the key information of the identified content is extracted. The key information extraction method is a rule-based method, namely searching corresponding key information through regular expression or editing distance. One is to perform entity recognition based on a machine learning or deep learning method, and output key values and position information corresponding to keywords. The rule-based information extraction method is easy to implement key value extraction under the condition of fixed keywords or easy description of features, and is simple but easy to make mistakes. A prediction result is given for each key value based on bert or a transform and other deep learning models, but comprehensive measurement is not carried out from the whole page or document level, and the mutual connection and mutual exclusivity between key information are ignored. The problem of distinguishing the same entity category cannot be solved directly at the model level, such as a payer and a payee, which belong to the same entity, but need to distinguish and extract. Disclosure of Invention Aiming at the problems existing in the prior art, the invention provides a key information extraction method and device for document image recognition. The invention provides a key information extraction method for document image recognition, which comprises the steps of determining a preset number of candidate key values for each key based on an OCR (Optical Character Recognition) recognition result, determining the total score of candidate paths of all key selection candidate key values according to the scores of the corresponding candidate key values selected by each key, and determining a target key value selected by each key under the condition that the total score of the candidate paths is highest so as to obtain a corresponding information extraction result, wherein the score when each key selects the corresponding candidate key value at least comprises a score value determined by the position relation between the key and the candidate key value. The key information extraction method for document image recognition comprises the steps of finding a preset number of candidate key values around each key according to the rectangular frame distance between the key and the key value, wherein the rectangular frame distance is calculated in a mode that two points with the nearest rectangular frame Euclidean distance between the key and the key value are determined according to the smaller difference value of the two-point abscissa and the ordinate. The key information extraction method for document image recognition comprises the steps of determining the total score of candidate paths of all key selection candidate key values according to scores of corresponding candidate key values selected by each key, determining the score of each key when the corresponding candidate key value is selected according to geometric feature scores, OCR recognition scores and entity recognition scores, determining the total score of candidate paths of all key selection candidate key values according to the scores of all keys when the corresponding candidate key value is selected, wherein the geometric feature scores are score values determined according to the position relation between the key and the candidate key values, the OCR recognition scores are score values determined according to the posterior confidence of OCR recognition, and the entity recognition scores are score values determined according to the posterior probability output by an entity recognition model. The key information extraction method for document image recognition comprises the steps of determining the total score of candidate paths of all key selection candidate key values according to the score of each key selection corresponding to the candidate key values, calculating the score when the corresponding candidate key value is selected for each key without the common candidate key value, respectively selecting the candidate key values according to the fixed sequence priority of the key for each key with the common candidate key value, and not selecting the candidate key value selected by other keys in the selection process, wherein the score of the candidate key value is 0 for the key with the common candidate key value if the candidate key value is not selectable. According to the method for extracting key information for document image recognition provided by the invention, before determining the score when each key selects the corresponding candidate key value according to the geometric feature score, the OCR recognition score and the entity recognition score, the method furt