CN-117315697-B - Form structure extraction method, form structure extraction device, form structure extraction equipment and storage medium

CN117315697BCN 117315697 BCN117315697 BCN 117315697BCN-117315697-B

Abstract

The invention discloses a method, a device, equipment and a storage medium for extracting a form structure, wherein the method comprises the steps of obtaining a target form image; obtaining a binary image of a target form image by utilizing edge detection and a morphological algorithm, automatically positioning the initial position of the form in the binary image, carrying out cell convolution operation on the binary image according to the initial position to obtain cell positions, updating the initial position by utilizing the obtained cell positions, repeatedly utilizing the cell convolution operation to divide the rest cell positions in the form, marking the parallel relationship and the up-down relationship among the cells, and analyzing the hierarchical relationship among the cells in the form to obtain the form structure. And (3) automatically splitting the cells of the form by using feedback brought by image convolution, and helping to analyze the hierarchical relationship among the cells so as to extract and obtain the form structure.

Inventors

LIN PING
WU XIN
TANG QISONG
XIE TAO

Assignees

上海艺赛旗软件股份有限公司

Dates

Publication Date: 20260505
Application Date: 20230927

Claims (9)

1. A method for extracting a form structure, the method comprising: Acquiring a target form image, wherein the form image is an image containing a form structure; obtaining a binary image from the target form image by utilizing edge detection and a morphological algorithm; automatically positioning the initial position of a form in the binary image; The method comprises the steps of carrying out cell convolution operation on a binary image according to the initial position of a form to obtain a cell position, wherein the cell convolution operation comprises the steps of determining a current receptive field and the size thereof on the binary image through a transverse positioning cell boundary and a longitudinal positioning cell boundary according to the initial position, wherein the receptive field size comprises the width and the height of the receptive field, obtaining a corresponding convolution kernel according to the receptive field size, carrying out convolution operation according to the receptive field and the corresponding convolution kernel to obtain a convolution result, dividing the convolution result by the perimeter of a receptive field window to obtain a cell confidence coefficient, wherein the perimeter of the receptive field window is determined according to the width and the height of the receptive field, and considering the current receptive field as the cell if the cell confidence coefficient is larger than a first threshold; Updating the initial position by using the obtained cell position, dividing the rest cell positions in the form by repeatedly using the cell convolution operation, and marking the parallel relationship and the up-down relationship among the cells; and analyzing the hierarchical relationship of the cells in the form according to the parallel relationship and the up-down relationship among the cells to obtain the form structure.
2. The method for extracting a form structure according to claim 1, wherein obtaining a binary image of the target form image by using an edge detection and morphological algorithm comprises: performing edge detection on the target form image to obtain an edge detection image; carrying out feature enhancement on the edge detection image by using a morphological algorithm to obtain an enhanced feature map; and converting the enhanced feature map into a binary image by using a screening mechanism.
3. The method for extracting a form structure according to claim 2, wherein the edge detection adopts a Sobel edge detection algorithm; And/or, if the target form image is a color form image, converting the color form image into a gray level image before performing edge detection on the target form image to obtain an edge detection image; and/or, converting the enhanced feature map into a binary image using a screening mechanism, comprising: Wherein the method comprises the steps of Representing pixel values in a binary image The method comprises the steps of representing the pixel value of an enhanced feature image, wherein subscripts (x, y) represent coordinates of pixels in the image, a screening mechanism classifies the pixel value of which the pixel value is less than or equal to 8 in the enhanced feature image as 0, and the pixel value of which the pixel value is greater than 8 in the enhanced feature image as 1.
4. The method for extracting a form structure according to claim 1, wherein automatically locating the start position of the form in the binary image includes: the method comprises the steps of obtaining information of all long straight lines in a form by adopting morphological opening operation of a specific window size, wherein the long straight lines comprise transverse lines and vertical lines with the length being more than Q; acquiring the endpoint coordinates of all long straight lines in the form according to the information of all long straight lines in the form; selecting a coordinate with the shortest distance between the coordinate value and the upper left corner of the target form image from the endpoint coordinates of all long straight lines in the form as a starting position of the form; Wherein the step of obtaining long straight lines in the form by morphological opening operation with a specific window size comprises the steps of: Setting a first window as a sampling window of morphological opening operation, and capturing a transverse line with the length longer than Q in a binary image from the transverse direction by using the morphological opening operation, wherein the length of the first window is Q and the height is 1; setting a second window as a sampling window of morphological opening operation, and capturing vertical lines with the length longer than Q in the binary image in the longitudinal direction by using the morphological opening operation, wherein the length of the second window is 1, and the height of the second window is Q; And collecting all captured horizontal line and vertical line information to obtain information of all long straight lines in the form.
5. The method of claim 1, wherein determining the current receptive field and its size on the binary image from the starting position by locating cell boundaries laterally and cell boundaries longitudinally comprises: constructing an initial receptive field according to the initial position and the initial receptive field size value; S4-2, transversely positioning cell boundaries, determining the width of the receptive field, namely continuously transversely expanding the width of the receptive field to the right based on the initial receptive field until the transverse boundary of the current cell is reached, stopping transversely expanding, returning to obtain the width of the receptive field meeting the standard, and updating the receptive field, wherein the method specifically comprises the following steps: setting the pixels of a row of the bottom layer of the receptive field to 1, and executing a first circulation step until the width of the receptive field meeting the standard is obtained, wherein the first circulation step comprises: S4-2-1, carrying out average pooling operation on all pixels on the right side boundary of the current receptive field by using a first pooling window to obtain a first average pooling value, wherein the height of the first pooling window is the current receptive field height, and the width is 1; if the first average pooling value is not greater than the second threshold value, adding 1 to the width of the receptive field, updating the pixel value of the right boundary of the receptive field, and returning to S4-2-1; If the first average pooling value is greater than the second threshold, executing S4-2-2; s4-2-2, obtaining a corresponding convolution kernel according to the current receptive field size, performing convolution operation according to the receptive field and the corresponding convolution kernel to obtain a convolution result, dividing the convolution result by the receptive field window perimeter to obtain cell confidence, adding 1 to the receptive field width if the cell confidence is not greater than a first threshold value, updating a pixel value of a receptive field right boundary, returning to S4-2-1, and considering the current receptive field right boundary as a cell true right boundary if the cell confidence is greater than the first threshold value, terminating a first circulation step and returning to the current receptive field width; S4-3, longitudinally positioning the cell boundary, determining the height of the receptive field, continuously and longitudinally expanding the height of the receptive field downwards based on the receptive field obtained in the S4-2 until the longitudinal boundary of the current cell is reached, stopping the longitudinal expansion, returning to obtain the height of the receptive field meeting the standard, and updating to obtain the receptive field of the current cell size, wherein the method specifically comprises the following steps: setting the pixel of the right row of the updated receptive field to 1, and executing a second circulation step until the height of the receptive field meeting the standard is obtained, wherein the second circulation step comprises: S4-3-1, carrying out average pooling operation on all pixels at the bottom boundary of the current receptive field by using a second pooling window to obtain a second average pooling value, wherein the width of the second pooling window is the width of the current receptive field, and the height is 1; if the second average pooling value is not greater than the third threshold value, adding 1 to the receptive field height, updating the pixel value of the receptive field bottom boundary, and returning to S4-3-1; If the second average pooling value is greater than the third threshold, executing S4-3-2; S4-3-2, obtaining a corresponding convolution kernel according to the current receptive field size, carrying out convolution operation according to the receptive field and the corresponding convolution kernel to obtain a convolution result, dividing the convolution result by the receptive field window perimeter to obtain the confidence coefficient of the cell; if the confidence coefficient of the cell is not greater than the first threshold value, adding 1 to the receptive field height, updating the pixel value of the bottom boundary of the receptive field, and returning to S4-3-1; If the confidence coefficient of the cell is larger than the first threshold, the right boundary of the current receptive field is considered to be the true bottom boundary of the cell, the second circulation step is terminated, and the height of the current receptive field is returned.
6. The method for extracting a form structure according to claim 1, wherein updating a start position using the obtained cell position, dividing remaining cell positions in the form by repeating a cell convolution operation, and marking a parallel relationship, an up-down relationship between cells, comprises: S5-1, searching a right link cell, namely taking the upper right coordinate of a first cell B 1 at the upper left corner of the form as a new initial position, putting the first cell B 1 into a cell convolution operation for calculation, and if a new cell is returned, recording the new cell as a B 1-left1 cell and as a right link cell of a B 1 cell; If not, terminating the S5-1 and entering the S5-2; Bringing B 1-left1 into a cell convolution operation to obtain more right-linked cells and repeating this type of operation until no result is returned, finding all cells parallel to B 1 cells by repeating the operation in S5-1; s5-2, searching for the downlink cell: Taking the left lower coordinate of the first cell B 1 at the left upper corner of the form as a new initial position, putting the initial position into the convolution operation of the cell S4 for calculation, if the new cell is returned, recording the new cell B 2 as a lower link cell of the cell B 1 , and if the new cell is not returned, terminating the operation of the step S5-2; Repeating S5-1 searching right link cell operation after obtaining cell B 2 so as to obtain a right link cell parallel to cell B 2 ; The operation of searching the right link cell by repeating S5-1 and the operation of searching the lower link cell by S5-2 are repeated, the positions of all cells in the form are captured from the right direction and the downward direction, and the parallel relationship and the up-down relationship among the cells are marked.
7. The method for extracting a form structure according to claim 1, wherein analyzing the hierarchical relationship of the cells in the form according to the parallel relationship and the up-down relationship between the cells to obtain the form structure comprises: For parallel cells, if the information difference between the two cells is smaller than a fourth threshold value, judging the level of the two parallel cells; if the information difference between the two cells is greater than the fourth threshold value and the length of the cell A is shorter than that of the cell B, the cell A is superior to the cell B; For the upper and lower cells, if B i has no parallel cells and B i+1 has parallel cells, B i is the upper level of all B i+1 parallel cells by default, if B i and B i+1 have parallel cells, B i and B i+1 levels; the method for acquiring the information gap between the two unit cells comprises the following steps: convolving the cells by using convolution kernels with weights of 1 corresponding to the sizes of the cells to obtain information quantity of the corresponding cells; for parallel cells, the information gap between two cells is calculated according to the information quantity of the two cells.
8. The method for extracting a form structure according to claim 7, wherein convolving cells with a convolution kernel having weights of all 1 corresponding to the size of the cells to obtain the information amount of the corresponding cells, comprising: according to the sizes of different cells, constructing a convolution kernel with weight of 1: convolving each cell by using the convolution check to obtain the information quantity of the cell: R conv1 ＝∑Unit (x,y) Kernel-1 (x,y) Where R conv1 represents the information amount of the cell, unit (x,y) represents the pixels in each cell; and/or, for the parallel cells, calculating an information gap between the two cells according to the information quantity of the two cells, including: In the parallel cells, the information amount of the connected left cell A is recorded as R conv1-A , the information amount of the connected right cell B is recorded as R conv1-B , and the information gap between the cells is calculated: Wherein Dist info is the information gap between cell a and cell B, and W A and W B are the widths of cell a and cell B, respectively.
9. An extraction device for a form structure, the device comprising: The system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a target form image, and the form image is an image containing a form structure; the image processing module is used for obtaining a binary image from the target form image by utilizing edge detection and a morphological algorithm; the positioning module is used for automatically positioning the initial position of the form in the binary image; The unit cell convolution operation module is used for carrying out unit cell convolution operation on the binary image according to the initial position of the form to obtain the unit cell position, wherein the unit cell convolution operation comprises the steps of determining a current receptive field and the size thereof on the binary image through transversely positioning the unit cell boundary and longitudinally positioning the unit cell boundary according to the initial position, wherein the receptive field size comprises the width and the height of the receptive field, obtaining a corresponding convolution kernel according to the receptive field size, carrying out convolution operation according to the receptive field and the corresponding convolution kernel to obtain a convolution result, dividing the convolution result by the perimeter of a receptive field window to obtain the unit cell confidence, wherein the perimeter of the receptive field window is determined according to the width and the height of the receptive field, and considering the current receptive field as the unit cell if the unit cell confidence is larger than a first threshold; the repositioning module is used for updating the initial position by using the obtained cell positions, dividing the rest cell positions in the form by repeatedly using the cell convolution operation, and marking the parallel relationship and the up-down relationship among the cells; the form structure construction unit is used for analyzing the hierarchical relationship of the cells in the form according to the parallel relationship and the up-down relationship among the cells to obtain the form structure.

Description

Form structure extraction method, form structure extraction device, form structure extraction equipment and storage medium Technical Field The invention relates to a form structure extraction method, a form structure extraction device, form structure extraction equipment and a storage medium, and belongs to the technical field of data processing. Background The form is formed by combining cells with different sizes and different semantic levels. More or less text content is recorded in each cell, and when the text is too close to the boundary of the cell, the simple edge segmentation (e.g. Canny) method is difficult to effectively extract the boundary of the cell. However, deep learning algorithms such as TableOCR require a GPU to implement the recognition of the computation table, and cannot quickly and effectively deploy the algorithm on the CPU platform. Disclosure of Invention Image convolution has typically filtered and processed image block features with square convolution kernels, the weights in which determine the outcome of the image processing. The deformable convolution captures features with more extreme aspect ratios by controlling the morphological size of the convolution kernel through compensation of the learning coordinates. While a wide variety of cells are well suited to processing image features using deformable convolution kernels, the use of convolution calculations enables capture of different cell features in a form. In view of at least one of the above technical problems, the invention provides a form structure extraction method, a form structure extraction device, a form structure storage medium, and a form structure can be effectively and rapidly segmented on the basis of not depending on a deep learning frame by automatically judging the boundaries of different cells in the form by utilizing feedback of convolution calculation. The technical scheme adopted by the invention is as follows: in a first aspect, the present invention provides a method for extracting a form structure, where the method includes: Acquiring a target form image, wherein the form image is an image containing a form structure; obtaining a binary image from the target form image by utilizing edge detection and a morphological algorithm; automatically positioning the initial position of a form in the binary image; performing cell convolution operation on the binary image according to the initial position of the form to obtain a cell position; Updating the initial position by using the obtained cell position, dividing the rest cell positions in the form by repeatedly using the cell convolution operation, and marking the parallel relationship and the up-down relationship among the cells; and analyzing the hierarchical relationship of the cells in the form according to the parallel relationship and the up-down relationship among the cells to obtain the form structure. In some embodiments, obtaining a binary image from the target form image using edge detection and morphological algorithms includes: performing edge detection on the target form image to obtain an edge detection image; carrying out feature enhancement on the edge detection image by using a morphological algorithm to obtain an enhanced feature map; and converting the enhanced feature map into a binary image by using a screening mechanism. In some embodiments, the edge detection employs a Sobel edge detection algorithm. In some embodiments, if the target form image is a color form image, the method further includes converting the color form image into a grayscale image before performing edge detection on the target form image to obtain an edge detection image. In some embodiments, transforming the enhanced feature map into a binary image using a screening mechanism includes: Wherein the method comprises the steps of Representing pixel values in a binary imageThe method comprises the steps of representing the pixel value of an enhanced feature image, wherein subscripts (x, y) represent coordinates of pixels in the image, a screening mechanism classifies the pixel value of which the pixel value is less than or equal to 8 in the enhanced feature image as 0, and the pixel value of which the pixel value is greater than 8 in the enhanced feature image as 1. In some embodiments, automatically locating the starting position of the form in the binary image includes: the method comprises the steps of obtaining information of all long straight lines in a form by adopting morphological opening operation of a specific window size, wherein the long straight lines comprise transverse lines and vertical lines with the length being more than Q; acquiring the endpoint coordinates of all long straight lines in the form according to the information of all long straight lines in the form; selecting a coordinate with the shortest distance between the coordinate value and the upper left corner of the target form image from the endpoint coordinates of all long straight lines