US-12620245-B2 - Image processing apparatus, image processing method, and storage medium
Abstract
A training image in accordance with a way a hane occurs, which is found in actual handwriting, is generated. Among line segments constituting a handwritten character in a character image representing the handwritten character, a line segment at which a handwritten hane may occur is detected. Then, by performing processing to add a simulated hane to the end portion of the detected line segment, a training image is generated.
Inventors
- Hidekazu Seto
Assignees
- CANON KABUSHIKI KAISHA
Dates
- Publication Date
- 20260505
- Application Date
- 20220610
- Priority Date
- 20210616
Claims (18)
- 1 . An image processing apparatus comprising: a memory that stores a program; and a processor that executes the program to perform: obtaining a character image representing a handwritten character; detecting a line segment at which an extra segment may occur in handwriting among line segments constituting the handwritten character in the character image; adding a simulated extra segment to the end portion of the detected line segment; and generating training data for machine learning by associating the character image in which the simulated extra segment has been added and a correct answer class with each other; performing the machine learning by using the generated training data and generating a trained model; and outputting text information by inputting a target image to the trained model, wherein the adding is performed by randomly determining a length of the simulated extra segment.
- 2 . The image processing apparatus according to claim 1 , wherein in the detecting: in an area accounting for a predetermined percentage in the bottom portion in the character image, a number of connected pixel groups for each row and a number of connected pixels in each connected pixel group are obtained; and in a case where the maximum value of the number of connected pixel groups for each row is 1 and the maximum number of connected pixels in each connected pixel group is less than or equal to a threshold value, a line segment existing in the area is detected as the line segment at which the extra segment may occur, in the adding, the simulated extra segment is added to the bottom end portion of the detected line segment, the connected pixel group for each row represents a black pixel group existing continuously in a direction horizontal with respect to an erect direction of the handwritten character, and the number of connected pixels represents a number of black pixels constituting the connected pixel group.
- 3 . The image processing apparatus according to claim 1 , wherein in the detecting: in an area accounting for a predetermined percentage in the top portion in the character image, a number of connected pixel groups for each row and a number of connected pixels in each connected pixel group are obtained; and in a case where the maximum value of the number of connected pixel groups for each row is 1 and the maximum number of connected pixels in each connected pixel group is less than or equal to a threshold value, a line segment existing in the area is detected as the line segment at which the extra segment may occur, in the adding, the simulated extra segment is added to the top end portion of the detected line segment, the connected pixel group for each row represents a black pixel group existing continuously in a direction horizontal with respect to an erect direction of the handwritten character, and the number of connected pixels represents a number of black pixels constituting the connected pixel group.
- 4 . The image processing apparatus according to claim 1 , wherein in the detecting: in an area accounting for a predetermined percentage in the right portion in the character image, a number of connected pixel groups for each column and a number of connected pixels in each connected pixel group are obtained; and in a case where the maximum value of the number of connected pixel groups for each column is 1 and the maximum number of connected pixels in each connected pixel group is less than or equal to a threshold value, a line segment existing in the area is detected as the line segment at which the extra segment may occur, in the adding, the simulated extra segment is added to the right end portion of the detected line segment, the connected pixel group for each column represents a black pixel group existing continuously in a direction vertical with respect to an erect direction of the handwritten character, and the number of connected pixels represents a number of black pixels constituting the connected pixel group.
- 5 . The image processing apparatus according to claim 1 , wherein the length is determined in a range of 1% to 10% of a height of the cut-out image in a case where the height is taken to be a reference.
- 6 . The image processing apparatus according to claim 1 , wherein the processor executes the program to perform: deformation processing for the obtained character image; and the detecting is performed by taking a deformed character image as a target.
- 7 . The image processing apparatus according to claim 1 , wherein the processor executes the program to perform: the machine learning by using the generated training data.
- 8 . The image processing apparatus according to claim 1 , wherein the adding is performed by randomly determining an angle of the simulated extra segment.
- 9 . The image processing apparatus according to claim 2 , wherein the adding is performed by randomly determining an angle of the simulated extra segment.
- 10 . The image processing apparatus according to claim 3 , wherein the adding is performed by randomly determining an angle of the simulated extra segment.
- 11 . The image processing apparatus according to claim 4 , wherein the adding is performed by specifying a landing point and a drawing start point of the simulated extra segment and determining an angle of the simulated extra segment based on the specified landing point and the specified drawing start point.
- 12 . The image processing apparatus according to claim 6 , wherein the deformation processing includes one of rotation, enlargement or reduction, expansion or contraction, and aspect ratio change.
- 13 . The image processing apparatus according to claim 9 , wherein the angle is determined in a range of 15° to 60° in a case where the angle in the exactly rightward direction is taken to be 0° and the angle in the exactly upward direction is taken to be 90° in the character image.
- 14 . The image processing apparatus according to claim 10 , wherein the angle is determined in a range of 135° to 225° in a case where the angle in the exactly rightward direction is taken to be 0° and the angle in the exactly upward direction is taken to be 90° in the character image.
- 15 . The image processing apparatus according to claim 11 , wherein in the adding, an angle formed by the specified landing point and the specified drawing start point is determined to be an angle of the simulated extra segment.
- 16 . The image processing apparatus according to claim 11 , wherein the adding is performed by determining a length of the simulated extra segment as a random length in a range in which a distance between two points of the specified drawing start point and the specified landing point is not exceeded.
- 17 . An image processing method comprising the steps of: obtaining a character image representing a handwritten character; detecting a line segment at which an extra segment may occur in handwriting among line segments constituting the handwritten character in the character image; adding a simulated extra segment to the end portion of the detected line segment; and generating training data for machine learning by associating the character image in which the simulated extra segment has been added and a correct answer class with each other; performing the machine learning by using the generated training data and generating a trained model; and outputting text information by inputting a target image to the trained model, wherein the adding is performed by randomly determining a length of the simulated extra segment.
- 18 . A non-transitory computer readable storage medium storing a program for causing a computer to perform an image processing method comprising the steps of: obtaining a character image representing a handwritten character; detecting a line segment at which an extra segment may occur in handwriting among line segments constituting the handwritten character in the character image; adding a simulated extra segment to the end portion of the detected line segment; and generating training data for machine learning by associating the character image in which the simulated extra segment has been added and a correct answer class with each other; performing the machine learning by using the generated training data and generating a trained model; and outputting text information by inputting a target image to the trained model, wherein the adding is performed by randomly determining a length of the simulated extra segment.
Description
BACKGROUND Field The present invention relates to a technique to generate training data. Description of the Related Art In recent years, due to the change in working environment accompanying the spread of computers, the chances that a business document is scanned and computerized are increasing in number. In the computerization-target documents, a document to which a handwritten character is input, for example, such as a receipt, exists and in order to utilize computerized documents for data analysis, such as aggregation, character data is extracted by performing optical character recognition (OCR) for a handwritten character area. Here, as one of the OCR techniques that cope with handwritten characters, there is a method that uses a trained model obtained by performing machine learning, such as a neural network. In this method, first, training is performed by using training data (also called teacher data) that pairs a character image in which a handwritten character is drawn and a correct answer class obtained by converting a character included in the character image into text. Then, by inputting the character image including handwritten characters to the trained model, it is made possible to utilize the handwritten character in the scanned document as text information. In general, in order to perform image recognition by machine learning, a large number of images is necessary as training data, but various shapes exist as handwritten characters and it is difficult to comprehensively collect images of handwritten characters in all patterns. Consequently, data augmentation of training data is performed generally by performing deformation processing, such as rotation and enlargement/reduction, for the character image for the prepared training data. Then, as an example of deformation processing, a technique has been proposed, which adds an extra segment (segment that is not necessary but added unintentionally to the last line segment of a character, for example, as shown in FIG. 13, and hereinafter, referred to as “bane” (Japanese)) to a line segment constituting a character within a character image (Japanese Patent Laid-Open No. 2008-219825). In Japanese Patent Laid-Open No. 2008-219825 described above, as to the method of adding a hane, there is only a description, such as “the line segment is extracted and an ornament, such as a hane, is provided at the tip portion of the line segment” and details of in which case a hane is added and what kind of hane is added are not disclosed. There are many line segments constituting characters within a character image and in a case where an inappropriate hane is added, the character image becomes an image representing a handwritten character quite different from actual handwriting. In a case where many a character image such as this is generated, it is no longer possible to achieve the original object and the training accuracy is reduced on the contrary. Further, depending on the position and shape of the added bane, there is a case where the character changes to a character that can be read differently by a person. For example, in the example in FIG. 13, as a result of adding a bane surrounded by a broken-line frame to the bottom portion of the figure “2”, it can be read as the figure “3”. Using the character image such as this as a correct answer image adversely affects the training as in a case where training data to which a wrong correct answer class is attached is used. An image processing apparatus according to the technique of the present disclosure is an image processing apparatus including: a memory that stores a program; and a processor that executes the program to perform: obtaining a character image representing a handwritten character; detecting a line segment at which an extra segment may occur in handwriting among line segments constituting the handwritten character in the character image; adding a simulated extra segment to the end portion of the detected line segment; and generating training data for machine learning by associating the character image in which the simulated extra segment has been added and a correct answer class with each other. Further features of the present disclosure will become apparent from the following description of exemplary embodiments with reference to the attached drawings. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 a diagram showing an example of a configuration of an image processing system; FIG. 2A is a block diagram showing an example of a hardware configuration of an image processing apparatus and FIG. 2B is a block diagram showing an example of a hardware configuration of a terminal device; FIG. 3 is a diagram showing an example of a table as a character image DB; FIG. 4 is a diagram showing an example of a table as a training image DB; FIG. 5A is a flowchart showing a flow of training processing and FIG. 5B is a flowchart showing a flow of estimation processing; FIG. 6 is a flowchart showing a flow of training data gen