CN-115797954-B - Form identification method, form identification device, electronic equipment, medium and program product
Abstract
The embodiment of the specification discloses a form identification method, a form identification device, electronic equipment, a medium and a program product. The method comprises the steps of obtaining a target image containing a table, determining global relation information corresponding to the table based on the target image, wherein the global relation information comprises relations among all target characters in the table and relations among all target cells, determining local relation information corresponding to the table based on the global relation information, wherein the local relation information comprises relations among all target characters in all first areas in the table and relations among all target cells in all second areas in the table, and finally reconstructing the table in the target image based on the local relation information.
Inventors
- XIA BOQIAN
- WANG HONGBIN
Assignees
- 支付宝(杭州)信息技术有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20221129
Claims (16)
- 1. A method of table identification, the method comprising: acquiring a target image containing a table; Determining global relation information corresponding to the table based on the target image, wherein the global relation information comprises the relation between all target characters in the table and the relation between all target cells; Determining local relation information corresponding to the table based on the global relation information, wherein the local relation information comprises target position relations among target characters in each first area in the table and target row-column relations among target cells in each second area in the table; reconstructing a table in the target image based on the local relationship information; the determining the local relation information corresponding to the table based on the global relation information includes: determining the target position relation based on target feature information corresponding to the table, wherein the target feature information comprises the global relation information; And determining the target row relationship based on the target position relationship.
- 2. The method of claim 1, the determining global relationship information corresponding to the table based on the target image, comprising: Determining multi-mode information corresponding to each target character in the table based on the target character in the target image, the target position corresponding to the target character and the target image; and determining global relation information corresponding to the table based on the multi-mode information.
- 3. The method of claim 2, wherein the multi-modal information includes text modality information, location modality information, and image feature modality information corresponding to the target text; the determining the multi-mode information corresponding to each target text in the table based on the target text in the target image, the target position corresponding to the target text and the target image comprises the following steps: Extracting target image features corresponding to the target characters from the target image based on the target characters in the target image and the target positions corresponding to the target characters; The text modal information is determined based on target characters in the target image, the position modal information is determined based on target positions corresponding to the target characters, and the image feature modal information is determined based on target image features corresponding to the target characters.
- 4. The method of claim 2, the determining global relationship information corresponding to the table based on the multimodal information, comprising: inputting the multi-mode information corresponding to each target text in the table into an encoder, and outputting target feature information corresponding to the table, wherein the target feature information comprises first target multi-mode feature information of the multi-mode information corresponding to each target text in the table and global relation information corresponding to the table.
- 5. The method of claim 4, wherein the determining local relationship information corresponding to the table based on the global relationship information comprises: Determining the target position relation based on the target characteristic information, wherein the target position relation is used for representing whether each target character in each first area in the table is positioned in the same cell or not; And determining the target rank relation based on the target position relation and the first target multi-mode characteristic information.
- 6. The method of claim 5, the determining the target positional relationship based on the target characteristic information comprising: constructing a target single word sub-graph set based on the target characteristic information, wherein the target single word sub-graph set comprises at least one target single word sub-graph, and the target single word sub-graph comprises a plurality of target word nodes positioned in the same area of the table and a connection relation among the plurality of target word nodes; The target single word sub-graph set is input into a first graph rolling neural network, target position relations among all target word nodes in the target single word sub-graph are output, and the first graph rolling neural network is obtained by training based on a plurality of single word sub-graphs with known position relations among all word nodes.
- 7. The method of claim 5, the determining the target rank relationship based on the target location relationship and the first target multi-modal feature information comprising: The target unit grid graph set comprises at least one target unit grid graph, wherein the target unit grid graph comprises a plurality of target unit grid nodes positioned in the same area of the table and a connection relation among the plurality of target unit grid nodes, the target unit grid nodes comprise second target multi-mode characteristic information corresponding to the target unit grid, and the second target multi-mode characteristic information is obtained based on first target multi-mode characteristic information corresponding to all target characters positioned in the target unit grid nodes; And inputting the target cell graph set into a second graph convolution neural network, outputting a target row-column relationship among target cell nodes in the target cell graph, and training the second graph convolution neural network based on a plurality of cell graphs with known row-column relationships among the cell nodes.
- 8. The method of claim 7, the constructing a target cell lattice atlas based on the target positional relationship and the first target multi-modal feature information, comprising: determining target cell information corresponding to each target cell in the table based on the target position relation, wherein the target cell information comprises target characters positioned in the target cells; Fusing first target multi-modal feature information corresponding to each target character in the target cell to obtain second target multi-modal feature information; and constructing a target cell grid atlas based on the second target multi-modal feature information.
- 9. The method of claim 6, wherein the plurality of target text nodes includes a center target text node, a first target text node adjacent to the center target text node, and a second target text node adjacent to the first target text node, and wherein the target positional relationship is used to characterize a positional relationship between the center target text node and the first and second target text nodes.
- 10. The method of claim 7, wherein the plurality of target cell nodes includes a central target cell node, a first target cell node adjacent to the central target cell node, and a second target cell node adjacent to the first target cell node, and wherein the target rank relationship is used to characterize a rank relationship between the central target cell node and the first target cell node and the second target cell node.
- 11. The method of claim 5, the reconstructing a table in the target image based on the local relationship information, comprising: and reconstructing a table in the target image according to the target position relation and the target row-column relation in the table.
- 12. The method of claim 8, the reconstructing a table in the target image based on the local relationship information, comprising: Generating a target row graph set and a target column graph set based on the target row-column relationship and the target cell graph set, wherein the target row graph set comprises target row graphs corresponding to each row in the table, the target row graph comprises target cell nodes corresponding to target cells positioned in the same row, the target column graph set comprises target column graphs corresponding to each column in the table, and the target column graph comprises target cell nodes corresponding to target cells positioned in the same column; and reconstructing a table in the target image based on the target row atlas, the target column atlas and target cell information corresponding to the target cell node.
- 13. A form identification device, the device comprising: the acquisition module is used for acquiring a target image containing a table; The system comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining global relation information corresponding to the table based on the target image, and the global relation information comprises the relation among all target characters in the table and the relation among all target cells; The second determining module is used for determining local relation information corresponding to the table based on the global relation information, wherein the local relation information comprises target position relations among target characters in each first area in the table and target row-column relations among target cells in each second area in the table; a reconstruction module for reconstructing a table in the target image based on the local relationship information; the second determining module is specifically configured to: determining the target position relation based on target feature information corresponding to the table, wherein the target feature information comprises the global relation information; And determining the target row relationship based on the target position relationship.
- 14. An electronic device includes a processor and a memory; the processor is connected with the memory; the memory is used for storing executable program codes; the processor runs a program corresponding to executable program code stored in the memory by reading the executable program code for performing the method according to any one of claims 1-12.
- 15. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any of claims 1-12.
- 16. A computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to perform the form identification method of any of claims 1-12.
Description
Form identification method, form identification device, electronic equipment, medium and program product Technical Field The present disclosure relates to the field of information processing technologies, and in particular, to a method, an apparatus, an electronic device, a medium, and a program product for identifying a form. Background There are a large number of extraction and entry of forms in each industry. For example, in the insurance industry, claims accounting requires the extraction of invoice form details and related information. If the extraction is completely dependent on manpower, not only high manpower cost is required, but also extraction errors are easily caused. At present, in the actual process of invoice form identification by a machine, the invoice form formats are various, the invoice form identification method or the traditional image identification method based on rules can meet the requirement of identifying the new invoice form type only by continuously adjusting the threshold value set by the extraction rule or the traditional image algorithm, and the identification and input of the wired invoice form can only be solved by utilizing the deep learning technology to detect the line of the invoice form and combining the character identification technology, and the identification and reconstruction of the wireless form can not be solved. Disclosure of Invention The embodiment of the specification provides a form identification method, a device, electronic equipment, a medium and a program product, wherein the accuracy and the robustness of form identification can be improved by combining global relation information and local relation information in a form to reconstruct the form. The technical scheme is as follows: In a first aspect, an embodiment of the present disclosure provides a method for identifying a table, including: acquiring a target image containing a table; determining global relation information corresponding to the table based on the target image, wherein the global relation information comprises the relation between target characters in the table and the relation between target cells; Determining local relation information corresponding to the table based on the global relation information, wherein the local relation information comprises the relation between the target characters in each first area in the table and the relation between the target cells in each second area in the table; Reconstructing a table in the target image based on the local relationship information. In one possible implementation manner, the determining global relationship information corresponding to the table based on the target image includes: Determining multi-mode information corresponding to each target character in the table based on the target character in the target image, the target position corresponding to the target character and the target image; and determining global relation information corresponding to the table based on the multi-mode information. In one possible implementation manner, the multi-mode information includes text mode information, position mode information and image feature mode information corresponding to the target text; The determining multi-modal information corresponding to each target text in the table based on the target text in the target image, the target position corresponding to the target text, and the target image includes: Extracting target image features corresponding to the target characters from the target image based on the target characters in the target image and target positions corresponding to the target characters; the text modal information is determined based on the target characters in the target image, the position modal information is determined based on the target positions corresponding to the target characters, and the image feature modal information is determined based on the target image features corresponding to the target characters. In one possible implementation manner, the determining global relationship information corresponding to the table based on the multimodal information includes: Inputting the multi-modal information corresponding to each target character in the table into an encoder, and outputting target characteristic information corresponding to the table, wherein the target characteristic information comprises first target multi-modal characteristic information of the multi-modal information corresponding to each target character in the table and global relation information corresponding to the table. In one possible implementation manner, the local relationship information includes a target position relationship between target characters in each first area in the table and a target row relationship between target cells in each second area in the table; The determining the local relationship information corresponding to the table based on the global relationship information includes: determining the target posit