CN-122024256-A - Hard-tipped pen stroke segmentation extraction method, system, device and storage medium
Abstract
The invention discloses a hard-tipped pen stroke segmentation and extraction method, a system, a device and a storage medium, and belongs to the technical field of image segmentation. The method comprises the steps of obtaining handwriting hard-character stroke image data, inputting the handwriting hard-character stroke image data into a pre-trained improved TransUNet model to obtain a stroke segmentation extraction result, wherein the improved TransUNet model comprises an encoder, a jump connection layer, a transducer encoding layer and a decoder. The invention innovatively fuses different attention mechanisms in the encoder, the jump connection layer and the decoder respectively, can more comprehensively capture the edge, space and multi-scale context information of hard strokes, and remarkably improves the segmentation precision and robustness compared with the existing stroke segmentation extraction method.
Inventors
- XU ZHANYANG
- NIU YAOHUI
- SHEN XUN
- Dai Liangchen
- LI MENGTING
- ZHANG MENGJIAO
Assignees
- 南京信息工程大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260416
Claims (10)
- 1. A hard stroke segmentation and extraction method is characterized by comprising the following steps: Acquiring handwriting hard-character stroke image data; inputting the hand-written hard-tipped pen stroke image data into a pre-trained improved TransUNet model to obtain a stroke segmentation extraction result; Wherein the modified TransUNet model includes an encoder, a skip connection layer, a transducer encoding layer, and a decoder; In the encoder, each downsampling layer adopts convolution to perform feature extraction on input data to obtain first features, a coordinate attention module is used for extracting attention vectors in the horizontal direction and the vertical direction of the first features respectively, the attention vectors in the horizontal direction and the vertical direction are multiplied with the first features element by element to obtain enhancement features, and the enhancement features are used as output of the downsampling layers; Extracting the boundary characteristics of the enhancement characteristics by adopting depth separable convolution in the jump connection layer, extracting the channel attention weight of the enhancement characteristics by adopting a channel attention mechanism, multiplying the channel attention weight and the enhancement characteristics element by element to obtain channel enhancement characteristics, pooling the channel enhancement characteristics by adopting a space attention mechanism to obtain space attention weight, multiplying the space attention weight and the channel enhancement characteristics element by element to obtain space enhancement characteristics, fusing the boundary characteristics and the space enhancement characteristics to obtain fine characteristics, and taking the fine characteristics as the output of the jump connection layer; The method comprises the steps of carrying out feature extraction on output of a transform coding layer by adopting average pooling and multipath parallel convolution respectively, carrying out first splicing and convolution on the features extracted by the average pooling and all convolution to obtain second features, carrying out global space dependence modeling on the second features by adopting a position attention mechanism to obtain position enhancement features, and inputting the position enhancement features into a first upsampling layer in a decoder.
- 2. The hard-tipped pen stroke segmentation extraction method according to claim 1, wherein the extracting the attention vector of the first feature in the horizontal direction and the vertical direction by the coordinate attention module, multiplying the attention vector of the horizontal direction and the attention vector of the vertical direction by the first feature element by element, and obtaining the enhanced feature comprises: global average pooling is carried out on the first features along the vertical direction and the horizontal direction respectively to obtain a vertical direction coding vector and a horizontal direction coding vector which are respectively expressed as: ; ; Wherein, the Representing the coded vector in the vertical direction, Representing the width of the first feature, Representing the height in the first feature Width of (L) Corresponding channel dimension at location The above-mentioned features are that, Representing the coded vector in the horizontal direction, Representing the height of the first feature, Representing the height in the first feature Width of (L) Corresponding channel dimension at location The above features; Performing second splicing on the vertical direction coding vector and the horizontal direction coding vector along the space dimension to obtain a first intermediate feature For the first intermediate feature Dimension reduction processing by 1X 1 convolution to obtain second intermediate feature And second intermediate feature Segmentation into third intermediate features along the spatial dimension And fourth intermediate feature For the third intermediate feature And fourth intermediate feature Respectively performing 1×1 convolution and Sigmoid activation function operation to respectively obtain horizontal attention vectors And a vertical direction of attention vector ; Vector of attention of first feature to horizontal direction And a vertical direction of attention vector And multiplying the elements to obtain the enhanced features.
- 3. The hard-tipped pen stroke segmentation extraction method according to claim 1, wherein the extracting boundary features of the enhanced features using depth separable convolution comprises: performing depth separable convolution on the enhancement features, and then performing Sigmoid activation function operation to obtain boundary features, wherein the boundary features are expressed as follows: ; Wherein, the The boundary characteristics are represented by a graph of the boundary characteristics, Representing the Sigmoid activation function, Representing a depth-separable convolution, Representing the enhancement features.
- 4. The hard-tipped pen stroke segmentation extraction method according to claim 1, wherein the channel enhancement features are expressed as: ; Wherein, the Representing the channel enhancement features of the channel, The channel attention weight is represented as a function of the channel attention weight, The enhanced features are represented by the features of the enhancement, Representing element-by-element multiplication; the spatial enhancement features are expressed as: ; Wherein, the The spatial enhancement features are represented as such, Representing the spatial attention weight.
- 5. The hard-tipped pen stroke segmentation extraction method according to claim 1, wherein the fusing the boundary features and the spatial enhancement features to obtain fine features comprises: fusing boundary features and space enhancement features through a residual fusion mechanism to obtain fine features, wherein the fine features are expressed as follows: ; Wherein, the The fine features are represented by the features of the feature, The spatial enhancement features are represented as such, The boundary characteristics are represented by a graph of the boundary characteristics, Representing the learnable parameters.
- 6. The hard-tipped pen stroke segmentation and extraction method according to claim 1, wherein the performing feature extraction on the output of the transform coding layer by using average pooling and multipath parallel convolution respectively, performing first stitching and convolution on the features extracted by the average pooling and all convolution to obtain second features, includes: the multipath parallel convolution comprises convolutions with void ratios of 1, 6, 12 and 18 respectively; and performing first splicing on the average pooled and all the convolutionally extracted features, and performing 1X 1 convolution on the features after the first splicing to obtain second features.
- 7. The hard-tipped pen stroke segmentation extraction method according to claim 1, wherein the step of obtaining the location enhancement feature after the step of performing global spatial dependency modeling on the second feature by using the location attention mechanism , inputting the location enhancement feature into the first upsampling layer in the decoder comprises: generating a query matrix for the second features by convolution, respectively Key matrix Sum matrix Thereafter, a spatial attention matrix is calculated, expressed as: ; Wherein, the Indicating the current first Spatial position and arbitrary first The strength of the correlation of the individual spatial positions, Indicating the current first The key vector of the individual spatial positions, Representing any of the first The query vector for each spatial location, Representing the total number of spatial locations, Representing a natural exponential function; aggregating correlation strengths by weighting A location enhancement feature is obtained, expressed as: ; Wherein, the The location-enhancing features are represented as such, Representing the parameters that can be learned, Representing any of the first The value vector of the individual spatial positions, Representing the first of the second features Feature vectors for the spatial locations.
- 8. A hard-tipped pen stroke segmentation extraction system comprising: the data acquisition module is used for acquiring the image data of the hand-written hard strokes; the segmentation extraction module is used for inputting the handwriting hard-character stroke image data into a pre-trained improved TransUNet model to obtain a stroke segmentation extraction result; Wherein the modified TransUNet model includes an encoder, a skip connection layer, a transducer encoding layer, and a decoder; In the encoder, each downsampling layer adopts convolution to perform feature extraction on input data to obtain first features, a coordinate attention module is used for extracting attention vectors in the horizontal direction and the vertical direction of the first features respectively, the attention vectors in the horizontal direction and the vertical direction are multiplied with the first features element by element to obtain enhancement features, and the enhancement features are used as output of the downsampling layers; Extracting the boundary characteristics of the enhancement characteristics by adopting depth separable convolution in the jump connection layer, extracting the channel attention weight of the enhancement characteristics by adopting a channel attention mechanism, multiplying the channel attention weight and the enhancement characteristics element by element to obtain channel enhancement characteristics, pooling the channel enhancement characteristics by adopting a space attention mechanism to obtain space attention weight, multiplying the space attention weight and the channel enhancement characteristics element by element to obtain space enhancement characteristics, fusing the boundary characteristics and the space enhancement characteristics to obtain fine characteristics, and taking the fine characteristics as the output of the jump connection layer; The method comprises the steps of carrying out feature extraction on output of a transform coding layer by adopting average pooling and multipath parallel convolution respectively, carrying out first splicing and convolution on the features extracted by the average pooling and all convolution to obtain second features, carrying out global space dependence modeling on the second features by adopting a position attention mechanism to obtain position enhancement features, and inputting the position enhancement features into a first upsampling layer in a decoder.
- 9. The hard-tipped pen stroke segmentation and extraction device is characterized by comprising a processor and a storage medium; The storage medium is used for storing instructions; the processor is configured to operate according to the instruction to perform the steps of the hard stroke segmentation extraction method according to any one of claims 1 to 7.
- 10. A computer-readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the hard-tipped pen stroke segmentation extraction method according to any one of claims 1 to 7.
Description
Hard-tipped pen stroke segmentation extraction method, system, device and storage medium Technical Field The invention belongs to the technical field of image segmentation, and particularly relates to a hard-tipped pen stroke segmentation and extraction method, a system, a device and a storage medium. Background The strokes are basic constituent units of hard strokes, and the stroke classification results directly influence the accuracy of subsequent tasks such as handwriting structural analysis, writing quality evaluation, copying error correction and the like. For the handwriting of hard-tipped pen regular script characters of primary and middle school students, the phenomena of continuous strokes, adhesion, intersection, local boundary blurring and the like often occur in single-character images due to obvious differences of writing speed, pen-carrying force and individual styles, so that a stroke extraction task has higher difficulty. The traditional method based on skeletonization, contour tracing or heuristic rules has a certain effect in a word-shaped scene with regular structure and less noise, but the problems of stroke fracture, wrong segmentation of a crossing area, missing of a thin stroke and the like are easy to occur in a handwritten word scene. In recent years, a deep learning segmentation method is introduced into the field of hard-tipped pen stroke extraction. The convolutional neural network can automatically learn stroke edges and texture features, the transducer structure can model long-distance dependency relationships, and the combination of the stroke edges and the texture features can improve the whole word segmentation quality to a certain extent. However, the existing scheme still has the following defects that (1) the traditional convolutional neural network has limited capability in modeling long-distance dependency relationship, global structure information among strokes is difficult to fully capture, (2) the feature transfer mode of jump connection in the coding-decoding structure is simpler, detail features of stroke edges and crossing areas cannot be effectively focused on, and (3) space information loss is serious in the up-sampling process of a decoder, so that the segmentation precision of a complex stroke structure is reduced. Disclosure of Invention The invention aims to overcome the defects in the prior art, and provides a hard-tipped pen stroke segmentation and extraction method, a system, a device and a storage medium, which are used for solving the problems of edge detail loss, insufficient segmentation precision of a crossing region, space information loss and the like in the existing hard-tipped pen stroke extraction method. The invention provides the following technical scheme: In a first aspect, a hard-tipped pen stroke segmentation extraction method is provided, comprising the steps of obtaining handwriting hard-tipped pen stroke image data; inputting the hand-written hard-tipped pen stroke image data into a pre-trained improved TransUNet model to obtain a stroke segmentation extraction result; Wherein the modified TransUNet model includes an encoder, a skip connection layer, a transducer encoding layer, and a decoder; In the encoder, each downsampling layer adopts convolution to perform feature extraction on input data to obtain first features, a coordinate attention module is used for extracting attention vectors in the horizontal direction and the vertical direction of the first features respectively, the attention vectors in the horizontal direction and the vertical direction are multiplied with the first features element by element to obtain enhancement features, and the enhancement features are used as output of the downsampling layers; Extracting the boundary characteristics of the enhancement characteristics by adopting depth separable convolution in the jump connection layer, extracting the channel attention weight of the enhancement characteristics by adopting a channel attention mechanism, multiplying the channel attention weight and the enhancement characteristics element by element to obtain channel enhancement characteristics, pooling the channel enhancement characteristics by adopting a space attention mechanism to obtain space attention weight, multiplying the space attention weight and the channel enhancement characteristics element by element to obtain space enhancement characteristics, fusing the boundary characteristics and the space enhancement characteristics to obtain fine characteristics, and taking the fine characteristics as the output of the jump connection layer; The method comprises the steps of carrying out feature extraction on output of a transform coding layer by adopting average pooling and multipath parallel convolution respectively, carrying out first splicing and convolution on the features extracted by the average pooling and all convolution to obtain second features, carrying out global space dependence modeling on the