KR-20260063560-A - Data augmentation apparatus and method for character recognition

KR20260063560AKR 20260063560 AKR20260063560 AKR 20260063560AKR-20260063560-A

Abstract

The present invention discloses a data augmentation device and method for character recognition. The present invention detects an outline from an input character image, modifies the outline by connecting a line of length determined through a random function from points determined according to a predefined logic among the points constituting the outline to the outside or inside of the original character, and performs Gaussian smoothing on the character image with the modified outline to generate a blur image to smooth the surface of the outline, and then performs thresholding on the blur image to generate a character image with the modified outline from the input character image, thereby enabling the generation of highly reliable augmented data from an original character image with only simple operations.

Inventors

최강선
박채호

Assignees

한국기술교육대학교 산학협력단

Dates

Publication Date: 20260507
Application Date: 20241030

Claims (11)

A data augmentation method for character recognition performed in a data augmentation device for character recognition comprising a memory storing instructions and a processor executing instructions stored in said memory, wherein A processor that executed the instruction stored in the above memory (a) A step of detecting outlines from an input character image; (b) A step of modifying the outline by connecting lines of a length determined through a random function from points among the points constituting the outline that are determined to modify the outline surface according to a predefined logic; (c) a step of generating a blur image by performing Gaussian smoothing on the character image with the above-mentioned outline modified; and (d) A method for data augmentation for character recognition characterized by including the step of performing thresholding on the above blurred image to generate a character image with a modified outline from the above input character image.
In claim 1, in step (b), the processor In accordance with the Markov process logic, a decision is made as to whether to modify the outline at the points constituting the outline according to the following Equation 1, wherein the state determined by Equation 1 is a modified state or a non-modified state for the outline, and [Mathematical Formula 1] s t = XOR(s t-1 ,(rand()≥p m )) The above St represents the state at the current point in time, and pm is the probability of maintaining the previous state, having a pre-set value between 0 and 1, and A data augmentation method for character recognition characterized in that rand() is a random function that generates a random number between 0 and 1.
In claim 1, in step (b), the processor A data augmentation method for character recognition characterized by modifying the outline by connecting lines in the normal direction of the outline.
In claim 1, in step (b), the processor A data augmentation method for character recognition characterized by extending the outline surface by connecting lines outside the original character from points determined to modify the outline surface, or eroding the outline surface by connecting lines inside the original character.
In claim 1, in step (d), the processor A data augmentation method for character recognition characterized by generating a clear character image by comparing the pixel value of each pixel of the above-described blurred image with a predefined threshold and setting the pixel value as a character area or a background area.
A computer program stored in a non-transient storage medium and executed on a computer including a processor to perform a data augmentation method for character recognition according to any one of claims 1 to 5.
A data augmentation device for character recognition comprising a processor and a memory for storing predetermined instructions, The processor that executed the instructions stored in the memory above (a) A step of detecting outlines from an input character image; (b) A step of modifying the outline by connecting lines of a length determined through a random function from points among the points constituting the outline that are determined to modify the outline surface according to a predefined logic; (c) a step of generating a blurred image by performing Gaussian smoothing on the character image with the above-mentioned outline modified; and (d) A data augmentation device for character recognition characterized by performing a step of performing thresholding on the above blurred image to generate a character image with a modified outline from the above input character image.
In claim 7, in step (b) above, the processor In accordance with the Markov process logic, a decision is made as to whether to modify the outline at the points constituting the outline according to the following Equation 1, wherein the state determined by Equation 1 is a modified state or a non-modified state for the outline, and [Mathematical Formula 1] s t = XOR(s t-1 ,(rand()≥p m )) The above St represents the state at the current point in time, and pm is the probability of maintaining the previous state, having a pre-set value between 0 and 1, and A data augmentation device for character recognition characterized in that rand() is a random function that generates a random number between 0 and 1.
In claim 7, in step (b) above, the processor A data augmentation device for character recognition characterized by modifying an outline by connecting lines in the normal direction of the outline.
In claim 7, in step (b) above, the processor A data augmentation device for character recognition characterized by extending the outline surface by connecting lines outside the original character from points determined to modify the outline surface, or eroding the outline surface by connecting lines inside the original character.
In claim 1, in step (d), the processor A data augmentation device for character recognition characterized by generating a clear character image by comparing the pixel value of each pixel of the above-described blurred image with a predefined threshold and setting it to 1 or 0.

Description

Data augmentation apparatus and method for character recognition The present invention relates to a data augmentation device and method, and more specifically, to a data augmentation device and method for character recognition. Metal movable type printing technology has been developed in Korea for a long time. The Jikji Simche Yojeol, produced during the Goryeo Dynasty, was printed in 1377 and is the oldest known metal movable type book in the world. This continued through the Joseon Dynasty, during which various metal movable type books were produced in Joseon that predate the Gutenberg Bible, which is known as the oldest in Europe. As such, Korea has long held a significant position in metal movable type printing technology. However, the ancient printing methods utilizing movable type from the Goryeo and Joseon dynasties faced various technical challenges. Not only did variations in character shapes occur between movable types produced for the same character, but print quality was also inconsistent due to the limitations of printing technology. This was caused by various issues, such as ink not adhering properly to the character surfaces, the surfaces of all movable types not forming a single plane, and paper slippage during printing. Consequently, even when printing multiple times using the same movable type, the resulting print quality varied, potentially leading to low-quality outputs. Scientific analysis methods for such ancient printed materials have been studied, requiring processes of precise scanning, character stroke extraction, character segmentation, and character recognition. In particular, the character recognition process requires a large amount of training data to accurately recognize characters of various shapes and qualities. While it is possible to generate a large amount of training data using modern fonts, performance is limited unless data on the special typefaces used in ancient printed materials is added. To address the situation where available scanned data of ancient prints is insufficient compared to the types of characters used, a data augmentation technique resembling the character shapes of ancient prints is required. Meanwhile, clustering methods are used to evaluate the performance of these data augmentation techniques. Traditional clustering methods such as K-means, Hierarchical Clustering, and Density-based Approaches effectively group small to medium-sized datasets by forming similarity-based clusters through distance measurements. However, these methods are less efficient for complex or large-scale datasets. In contrast, Deep Clustering utilizes Deep Learning to process large-scale, high-dimensional data and uncover complex patterns that cannot be discovered by traditional methods. Deep Clustering employs self-supervised learning to derive high-dimensional data representations and performs clustering based on these advanced representations. Notable implementations include Representation Learning, inspired by Spectral Clustering, and Autoencoder-based methods, which excel at extracting features from unlabeled data to perform clustering. Key metrics for evaluating clustering include Accuracy (ACC), Adjusted Rand Index (ARI), and Normalized Mutual Information (NMI). These metrics quantitatively measure the effectiveness of clustering by evaluating the accuracy of clustering results, the degree of agreement with labels, and the amount of shared information. ACC measures the accuracy of clustering by comparing it with actual labels after assigning optimal cluster labels. ARI evaluates the similarity between actual labels and clustering results by adjusting for random matching. NMI assesses how well clustering reflects the underlying structure of the data by measuring normalized mutual information between clustering results and the actual data structure. Handwritten Data Augmentation is a technology that enhances the diversity of handwritten image data through adjustments such as character thickness, thinning, stretching, and partial erasing. This approach can be integrated with CNN-based OCR models and, when used in conjunction with existing data augmentation techniques, can improve model performance. FIG. 1 is a diagram illustrating the hardware configuration of a data augmentation device for character recognition according to a preferred embodiment of the present invention. FIG. 2 is a block diagram illustrating the function of a data augmentation device for character recognition and each step of a data augmentation method according to a preferred embodiment of the present invention. FIG. 4 is a drawing illustrating a method of modifying the outline surface of a character according to a preferred embodiment of the present invention. FIG. 5 is a drawing illustrating an example of augmented data according to a preferred embodiment of the present invention. FIG. 6 is a drawing illustrating other examples of augmented data according to a preferred embodiment of the present inve