CN-116152839-B - Table identification method, apparatus, device and storage medium

CN116152839BCN 116152839 BCN116152839 BCN 116152839BCN-116152839-B

Abstract

The invention provides a form identification method, a form identification device, a form identification equipment and a form storage medium, wherein the form identification method comprises the steps of acquiring a track point sequence collected when a user writes on a screen, and preprocessing the track point sequence; determining form track points and text track points from the preprocessed track point sequence, rendering the form track points into images to obtain pure form images, rendering the text track points into images to obtain text images, carrying out form recognition on the pure form images to obtain form recognition results, carrying out text recognition on the text images to obtain text recognition results, matching the form recognition results with the text recognition results to obtain matching results, and generating a form containing text contents based on recognition information containing the form recognition results, the text recognition results and the matching results. The form identification method provided by the invention can restore the form containing text content written by the user.

Inventors

TAN BO
CHENG ZHIPENG
XIE MINGLIANG

Assignees

科大讯飞股份有限公司

Dates

Publication Date: 20260505
Application Date: 20230220

Claims (12)

1. A form identification method, comprising: The method comprises the steps of obtaining a track point sequence collected when a user writes on a screen, and preprocessing the track point sequence, wherein preprocessing the track point sequence comprises traversing the track point sequence to obtain a track point at the upper left corner of a user writing area, subtracting coordinates of the track point at the upper left corner from coordinates of all track points of the track point sequence to obtain a track point sequence taking the track point at the upper left corner as a coordinate origin point; determining form track points and text track points from the preprocessed track point sequence; Rendering the table track points into an image to obtain a pure table image, and rendering the text track points into an image to obtain a text image; Performing table recognition on the pure table image to obtain a table recognition result, and performing text recognition on the text image to obtain a text recognition result; Matching the form recognition result with the text recognition result to obtain a matching result; And generating a table containing text contents based on the identification information containing the table identification result, the text identification result and the matching result.
2. The form recognition method according to claim 1, wherein the track point sequence is a track point sequence collected up to a first recognition time, and the recognition information is recognition information corresponding to the first recognition time; the table identification method further comprises the following steps: Detecting whether the track point sequence collected up to the current moment is changed compared with the track point sequence collected up to the last identification moment or not at preset time intervals; If yes, determining identification information corresponding to a second identification time based on the track points, which are collected by the second identification time and change in the track point sequence compared with the track point sequence collected by the last identification time, and the identification information corresponding to the last identification time, wherein the second identification time is the current time or a time after the current time; And generating a table containing text contents based on the identification information corresponding to the second identification time.
3. The method for identifying a form according to claim 1 or 2, wherein determining the form trace point and the text trace point from the preprocessed trace point sequence comprises: Rendering the preprocessed track point sequence into an image serving as an image to be detected; detecting a form area and a plurality of text areas in the form area from the image to be detected; determining track points belonging to the table area and track points respectively belonging to each text area from the preprocessed track point sequence; And determining table track points from track points belonging to the table areas, and determining the track points respectively belonging to the text areas as text track points.
4. The method for identifying a table according to claim 3, wherein determining the track points belonging to the table region and the track points respectively belonging to each text region from the preprocessed track point sequence comprises: Determining a track point subsequence belonging to the table area and a track point subsequence respectively belonging to each text area from a plurality of track point subsequences contained in the preprocessed track point sequence, wherein each track point subsequence is a track point sequence forming a stroke; determining the track points contained in the track point subsequence belonging to the table area as the track points belonging to the table area; for each text region, the track points contained in the track point sub-sequence belonging to the text region are determined as track points belonging to the text region.
5. The method according to claim 4, wherein determining the sub-sequence of track points belonging to the table region and the sub-sequence of track points respectively belonging to each text region from among the plurality of sub-sequences of track points included in the preprocessed track point sequence comprises: For each of the form region and the number of text regions: Determining a track point subsequence meeting a preset condition from a plurality of track point subsequences contained in the preprocessed track point sequence, wherein the preset condition is that the proportion of track points located in the area in the track point subsequence is larger than a preset proportion threshold; And determining the track point subsequence meeting the preset condition as the track point subsequence belonging to the area.
6. The method of claim 3, wherein rendering the text trace points into an image results in a text image, comprising: Rendering the text track points of each text region into images respectively to obtain text images respectively corresponding to each text region; performing table recognition on the pure table image to obtain a table recognition result, and performing text recognition on the text image to obtain a text recognition result, wherein the method comprises the following steps: Inputting the pure form image into a form recognition model obtained by training in advance to obtain a form recognition result containing form structure information and position information of each form cell, wherein the form recognition model is obtained by training a training form image marked with the form structure information and the position information of each form cell; Inputting a text image corresponding to each text region into a text recognition model which is obtained by training in advance, and obtaining a text recognition result containing text content corresponding to the text region, wherein the text recognition model is obtained by training a training text image marked with the text content.
7. The form recognition method according to claim 2, wherein the determining the recognition information corresponding to the second recognition time based on the track points of the track point sequence change collected by the second recognition time compared with the track points of the track point sequence change collected by the last recognition time and the recognition information corresponding to the last recognition time includes: determining a plurality of change areas based on the position information of the track points, which are collected by the second recognition moment, of the track point sequence and change in comparison with the track point sequence collected by the last recognition moment; If the plurality of variation areas comprise text variation areas, determining varied text contents based on text track points belonging to the text variation areas; If the plurality of variable regions comprise a table variable region, re-acquiring a table identification result based on table track points belonging to the table variable region and other table track points; And updating the identification information corresponding to the last identification time based on the changed text content and/or the re-acquired form identification result, wherein the updated identification information is used as the identification information corresponding to the second identification time.
8. The method according to claim 7, wherein determining a plurality of variation regions based on the position information of the track points of the track point sequence variation collected by the second recognition time compared with the track point of the track point sequence variation collected by the last recognition time includes: Acquiring a pre-constructed modification table, wherein the positions corresponding to each pixel position of the screen in the modification table are all first values; modifying a first value at a position in the modification table corresponding to the position of the changed track point to a second value; Searching for connected domains with the second value from the modified modification table, and determining each connected domain as a change area.
9. The form identification method of claim 7, wherein determining a text change region and/or a form change region from the plurality of change regions comprises: For each change region: Matching the change area with each text area and each table cell area in the table obtained by the identification of the last identification moment; If the change area is matched with the text area, determining the change area as the text change area; If the change area is matched with the table cell area, determining the change area as a table change area; If the change area is not matched with each text area and each table cell area in the table obtained by the identification at the last identification time, the detection of the table area and the text area is carried out on the change area based on a detection model obtained by training in advance.
10. The table identification device is characterized by comprising a track point data acquisition module, a track point determination module, an image acquisition module, an identification module, a matching module and a table generation module; The track point data acquisition module is used for acquiring a track point sequence collected when a user writes on a screen and preprocessing the track point sequence, wherein the track point data acquisition module is specifically used for traversing the track point sequence to obtain a track point at the upper left corner of a user writing area when preprocessing the track point sequence, subtracting the coordinates of the track points at the upper left corner from the coordinates of all track points of the track point sequence to obtain a track point sequence taking the track point at the upper left corner as an original coordinate point; the track point determining module is used for determining form track points and text track points from the preprocessed track point sequence; the image acquisition module is used for rendering the table track points into images to obtain pure table images, and rendering the text track points into images to obtain text images; The identification module is used for carrying out form identification on the pure form image to obtain a form identification result, and carrying out text identification on the text image to obtain a text identification result; the matching module is used for matching the form recognition result with the text recognition result to obtain a matching result; The table generation module is used for generating a table containing text contents based on identification information containing the table identification result, the text identification result and the matching result.
11. A processing device is characterized by comprising a memory and a processor; the memory is used for storing programs; the processor is configured to execute the program to implement the steps of the form identification method according to any one of claims 1 to 9.
12. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the form recognition method according to any one of claims 1 to 9.

Description

Table identification method, apparatus, device and storage medium Technical Field The present invention relates to the field of table identification technologies, and in particular, to a method, an apparatus, a device, and a storage medium for table identification. Background In some scenarios, it is desirable to identify a form that the user writes on the screen. The present form recognition schemes are mostly recognition schemes based on form images, that is, form images obtained by capturing a form written by a user on a screen are obtained, and the form is obtained by analyzing the form images. Although the recognition scheme based on the form image can realize form recognition, the recognition scheme has larger limitation, and is particularly embodied in that the recognition scheme has better recognition effect on the form image with better quality, but has poor recognition effect on the form image with poor quality. Disclosure of Invention In view of the above, the present invention provides a method, apparatus, device and storage medium for identifying a form, which are used for solving the problem that the existing form image-based identification scheme has a larger limitation, and the technical scheme is as follows: A form identification method, comprising: Acquiring a track point sequence collected when a user writes on a screen, and preprocessing the track point sequence; determining form track points and text track points from the preprocessed track point sequence; Rendering the table track points into an image to obtain a pure table image, and rendering the text track points into an image to obtain a text image; Performing table recognition on the pure table image to obtain a table recognition result, and performing text recognition on the text image to obtain a text recognition result; Matching the form recognition result with the text recognition result to obtain a matching result; And generating a table containing text contents based on the identification information containing the table identification result, the text identification result and the matching result. Optionally, the track point sequence is a track point sequence collected up to a first recognition moment, and the recognition information is the recognition information corresponding to the first recognition moment; the table identification method further comprises the following steps: Detecting whether the track point sequence collected up to the current moment is changed compared with the track point sequence collected up to the last identification moment or not at preset time intervals; If yes, determining identification information corresponding to a second identification time based on the track points, which are collected by the second identification time and change in the track point sequence compared with the track point sequence collected by the last identification time, and the identification information corresponding to the last identification time, wherein the second identification time is the current time or a time after the current time; And generating a table containing text contents based on the identification information corresponding to the second identification time. Optionally, the determining the table track point and the text track point from the preprocessed track point sequence includes: Rendering the preprocessed track point sequence into an image serving as an image to be detected; detecting a form area and a plurality of text areas in the form area from the image to be detected; determining track points belonging to the table area and track points respectively belonging to each text area from the preprocessed track point sequence; And determining table track points from track points belonging to the table areas, and determining the track points respectively belonging to the text areas as text track points. Optionally, the determining, from the preprocessed track point sequence, the track points belonging to the table area and the track points respectively belonging to each text area includes: Determining a track point subsequence belonging to the table area and a track point subsequence respectively belonging to each text area from a plurality of track point subsequences contained in the preprocessed track point sequence, wherein each track point subsequence is a track point sequence forming a stroke; determining the track points contained in the track point subsequence belonging to the table area as the track points belonging to the table area; for each text region, the track points contained in the track point sub-sequence belonging to the text region are determined as track points belonging to the text region. Optionally, the determining the track point subsequence belonging to the table area and the track point subsequence respectively belonging to each text area from the track point subsequences included in the preprocessed track point sequence includes: For each of the form region and the number of