CN-121982726-A - Automatic segmentation and extraction method for handwriting elements oriented to complex layout
Abstract
The invention discloses an automatic segmentation and extraction method of handwriting elements for complex layouts, which relates to the technical field of automatic segmentation and extraction of handwriting elements and aims to solve the technical problem of insufficient recognition and separation precision of handwriting contents in mixed image-text layouts, wherein the method comprises the steps of S200, accurate positioning and segmentation of handwriting areas, S201, dynamic threshold segmentation algorithm, S202, context perception connected domain analysis, S203, judgment of whether handwriting elements are adopted, S204, judgment of whether handwriting elements are reserved, S205, judgment of whether handwriting areas are reserved, and filtering and elimination. According to the invention, the dynamic threshold segmentation algorithm is cooperated with the context-aware connected domain analysis technology, the segmentation threshold can be adaptively adjusted according to the pixel mean value and standard deviation of the image local window by dynamic threshold segmentation, and the context-aware connected domain analysis is combined with the context information of the document to carry out semantic analysis on the connected domain. The method solves the problem of insufficient accuracy in identifying and separating the handwritten content in the mixed image-text layout.
Inventors
- CHEN JIAHAI
- YE JIAMING
- ZHAO JIAYU
- DAI FEI
- ZHOU CHAO
- ZHAO ZHIHAO
- WANG TAOSHENG
- HAO ZHONGWEI
Assignees
- 安徽七天网络科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251216
Claims (10)
- 1. The automatic segmentation and extraction method for the handwriting elements facing the complex layout is characterized by comprising the following steps of: S100, preprocessing a document image and extracting multi-mode features, namely extracting features of different modes by preprocessing the image and utilizing three parallel paths of a deep neural network, layout analysis and texture analysis, and finally fusing to provide a high-quality feature map rich in information for the subsequent steps; S200, accurately positioning and segmenting a handwriting area, namely performing preliminary binarization through dynamic threshold segmentation by utilizing the feature map of the S100, performing context-aware connected domain analysis, intelligently screening out real handwriting elements according to area attributes and context relations, and filtering out various interferences; s201, a dynamic threshold segmentation algorithm, namely adaptively adjusting segmentation threshold values to meet image segmentation requirements under different illumination and ink concentration; s202, context-aware connected domain analysis, namely carrying out semantic analysis on the segmented connected domain by combining the context information of the document; s203, judging whether the region is a handwriting element or not based on the characteristics of the connected region and the context information; s204, if yes, reserving the region as a handwriting region candidate, and generating a fine outline of the handwriting region, wherein the outline precision of the region confirmed to be handwriting is further optimized; s205, judging whether the method is negative, filtering and removing, filtering non-handwritten interference elements, and feeding back filtering information to the step S202 to realize cyclic optimization; S300, restoring and optimizing the semanteme of the handwriting stroke, namely restoring the handwriting area extracted in the step S200 by using an advanced model based on an attention mechanism aiming at the possible quality problem of the stroke after segmentation, improving the stroke quality and laying a foundation for high-precision recognition; S400, outputting the processed areas in a unified size and background mode, and outputting standard image blocks which can be directly used for downstream recognition tasks.
- 2. The automatic segmentation and extraction method for handwritten elements in complex layouts according to claim 1, wherein the step S100 comprises: s101, inputting a complex layout scanned document image, namely taking a complex layout to be processed as an input image , wherein, Representing a two-dimensional matrix of pixels; S102, understanding the global structure of the deep neural network, namely analyzing the overall structure of the document by utilizing a CNN convolutional neural network deep learning model, identifying the macroscopic layout of text blocks, tables and graphs, and outputting a global feature map ; S103, layout analysis, namely analyzing the layout structure of the document, including the number of columns, paragraph distribution, frame style layout information and outputting layout characteristics ; S104, extracting the texture characteristics of the handwriting, namely extracting the fine granularity characteristics of the texture, the edge and the ink concentration of the handwriting, distinguishing the texture difference between the handwriting and a printing body, extracting the fine granularity characteristics of the handwriting strokes by using a Gabor filter, and outputting a texture characteristic diagram ; S105, multi-modal feature fusion and initial positioning of handwriting areas, namely fusing multi-modal features of global structures, layout and stroke textures, initially screening out the approximate range of the handwriting areas, fusing the three extracted features, and generating a fused feature map 。
- 3. The automatic segmentation and extraction method for handwritten elements in complex layouts according to claim 2, wherein the multi-modal feature fusion in step S105 adopts a complementary fusion formula: Wherein, the For the final output multi-modal fusion profile, For the characteristic type identification, the method respectively corresponds to the global characteristic, the layout characteristic and the Gabor texture characteristic of the deep neural network, To dynamically weight attention for three classes of features, For the aligned features after adaptive scale mapping, For the coefficient of complementarity of the details, For aligned Gabor texture features, For the aligned global feature downsampling process, Is the residual complementary term.
- 4. The automatic segmentation and extraction method for handwritten elements in complex layouts according to claim 1, wherein the step S201 performs a preliminary binarization process on the fusion feature map, and adopts a local adaptive threshold formula: Wherein, the Representing in an image A dynamic threshold for location; Is the mean value of the pixels in the local window of the location; Is the standard deviation of the pixels within the local window; to adjust the coefficients, the coefficients are typically negative to accommodate the processing requirements of the low contrast regions.
- 5. The automatic segmentation and extraction method for handwritten elements in complex layout according to claim 4, wherein the binarization result of the binarization process is used for Representation, defined as: Wherein the method comprises the steps of Is the input image is The pixel value of the location is determined, For binarized output, 1 represents the foreground candidate region and 0 represents the background.
- 6. The automatic segmentation and extraction method of handwritten elements for complex layouts according to claim 1, wherein the context-aware connected-domain analysis in step S202 adopts a region similarity measurement formula: Wherein, the And Is a connected domain; Is an intersection ratio function; Is a texture similarity function; And Is a weight parameter.
- 7. The automatic segmentation and extraction method for handwritten elements in complex layout according to claim 1, wherein the loop optimization mechanism of step S205 includes: Calculating a segmentation uncertainty index based on the confidence distribution of the connected domain classification, when the uncertainty index exceeds a preset threshold, Activating a parameter adaptive adjustment module to binarize the parameters in step S201 Connected domain weight in step S202 、 And performing reverse iterative optimization, wherein the parameter self-adaptive adjustment module performs closed-loop control by adopting the following formula: dynamic adjustment of weight based on filtering feedback information And Iterative optimization of connected domain analysis is realized, wherein weight adjustment adopts an adaptive learning algorithm, and binarization parameters are reversely adjusted And connected domain weight 、 Forming closed loop optimization, and dynamically optimizing a similarity measurement threshold according to the historical judgment accuracy; Wherein the binarization parameter Is a self-adaptive adjustment formula: Wherein, among them, Is the binary parameter after the adjustment, Is the current parameter; defining texture feature variance of connected domains with judging probability in a fuzzy interval [0.4,0.6] as texture uncertainty; Is the learning rate; is a contrast trend factor, which is positive when the local area average contrast is below a global threshold, aimed at enhancing the capture ability of low contrast ink; connected domain weight 、 Is a self-adaptive adjustment formula: Wherein, the The overlap uncertainty is defined as the proportion of connected domains with decision probabilities being in a fuzzy interval and IOU > 0; is to adjust the step length by lifting Corresponding weight The geometric separation capability of the overlapped area can be enhanced; is texture uncertainty by lifting Corresponding weight The texture recognition capability of the blur difference can be enhanced.
- 8. The automatic segmentation and extraction method for handwritten elements in complex layouts according to claim 1, wherein the step S300 includes: S301, judging whether a stroke fracture/defect exists or not, namely detecting whether a stroke fracture and an ink leakage defect exist in a handwriting area or not; s302, judging that the semantic restoration module based on the attention mechanism focuses on the semantic related area by using the attention mechanism to realize intelligent restoration of stroke level, and then entering into the step S301 again to carry out secondary detection; S303, judging that no repair is needed, and directly entering into standardization processing; S304, normalizing, namely unifying the size and the angle of the handwriting area, and performing binarization processing to generate a normalized image; S305, outputting normalized handwriting area images, sending the normalized handwriting area images to a recognition engine, and outputting the processed handwriting areas, wherein the processed handwriting areas can be directly used for handwriting recognition of downstream tasks.
- 9. The automatic segmentation and extraction method of handwritten elements for complex layouts according to claim 6, wherein the attention mechanism in step S302 adopts an attention weight calculation formula, where the attention weight calculation formula is: Wherein, the The characteristic matrix of the defect area of the handwriting stroke consists of local characteristics of fracture/leakage positions, The overall characteristic of common stroke trend and connection form is covered for the complete handwriting stroke template characteristic matrix, Filling feature matrix for stroke semantics Fine-grained stroke completion information; Is a dimension scaling factor.
- 10. The automatic segmentation and extraction method for handwritten elements in complex layouts according to claim 6, wherein the normalization in step S304 uses an image scaling formula: Wherein, the Is the normalized output image; And Standard width and standard height respectively; is a bilinear interpolation function.
Description
Automatic segmentation and extraction method for handwriting elements oriented to complex layout Technical Field The invention relates to the technical field of automatic segmentation and extraction of handwriting elements, in particular to an automatic segmentation and extraction method of handwriting elements for complex layouts. Background At the moment of digital transformation acceleration, document digitization has become one of the core demands of various fields such as government offices, financial services, medical archive management, historical document protection and the like. The complex layout document is used as a common form in daily office work and history preservation, and often contains a plurality of elements such as printed text, handwriting annotation, form lines, graphic identifiers and the like, wherein the handwriting elements bear key personalized information such as handwriting signatures in notes, handwriting annotation in files, handwriting filling contents in forms and the like, and the accurate segmentation and extraction are the precondition of realizing the deep mining of document information. The prior art either only depends on single dimension characteristics, such as only extracting global layout structure and ignoring handwriting stroke fine grain texture, or only focuses on local texture and breaks away from layout constraint, so that handwriting area range is difficult to accurately lock, or a fixed threshold segmentation, simple connected domain analysis and other rigidifying algorithms are adopted, so that complex conditions such as uneven illumination, ink shade difference, element overlapping inclination and the like in a mixed layout cannot be adapted, the problem that interference elements such as missed segmentation of handwriting area, printing bodies/correction lines and the like are misjudged to be handwriting content easily occurs, meanwhile, aiming at defects such as stroke breakage, ink leakage and the like caused by common cross writing in the mixed layout, the prior art lacks a specific restoration means combined with handwriting semantics, stroke integrity cannot be ensured only by simply filling at pixel level, handwriting content recognition deviation is further aggravated, and finally, the recognition accuracy and separation accuracy of the whole handwriting content cannot meet the high-accuracy requirements of practical scenes such as bills, history files, handwriting forms and the like. In view of this, we propose an automatic segmentation and extraction method for handwriting elements oriented to complex layouts. Disclosure of Invention The invention aims to overcome the defects of the prior art, adapt to the actual needs, and provide a method for automatically dividing and extracting handwriting elements oriented to complex layouts, so as to solve the technical problem of insufficient recognition and separation precision of handwriting contents in mixed image-text layouts. In order to solve the technical problems, the invention provides the following technical scheme that the automatic segmentation and extraction method for the handwriting elements facing the complex layout comprises the following steps: S100, preprocessing a document image and extracting multi-mode features, namely extracting features of different modes by preprocessing the image and utilizing three parallel paths of a deep neural network, layout analysis and texture analysis, and finally fusing to provide a high-quality feature map rich in information for the subsequent steps; S200, accurately positioning and segmenting a handwriting area, namely performing preliminary binarization through dynamic threshold segmentation by utilizing the feature map of the S100, performing context-aware connected domain analysis, intelligently screening out real handwriting elements according to area attributes and context relations, and filtering out various interferences; s201, a dynamic threshold segmentation algorithm, namely adaptively adjusting segmentation threshold values to meet image segmentation requirements under different illumination and ink concentration; s202, context-aware connected domain analysis, namely carrying out semantic analysis on the segmented connected domain by combining the context information of the document; s203, judging whether the region is a handwriting element or not based on the characteristics of the connected region and the context information; s204, if yes, reserving the region as a handwriting region candidate, and generating a fine outline of the handwriting region, wherein the outline precision of the region confirmed to be handwriting is further optimized; s205, judging whether the method is negative, filtering and removing, filtering non-handwritten interference elements, and feeding back filtering information to the step S202 to realize cyclic optimization; S300, restoring and optimizing the semanteme of the handwriting stroke, namely restoring the