Search

CN-115620315-B - Handwritten text detection method, device, server and storage medium

CN115620315BCN 115620315 BCN115620315 BCN 115620315BCN-115620315-B

Abstract

The invention discloses a handwritten text detection method, a device, a server and a storage medium, wherein text detection is carried out on an image to be detected, text areas in the image to be detected are determined, text type detection is carried out on each text area, text types of texts in each text area are determined, if the text types in each text area are target text areas of handwritten texts, the target text areas are marked to obtain a marked image to be detected, text recognition is carried out on the handwritten texts in the target text areas to obtain a text recognition result of the target text areas, the marked image to be detected and the text recognition result of the target text areas in the marked image to be detected are output.

Inventors

  • WANG LINJING
  • SU ZHIFENG
  • SUN TIE
  • Su Qinning
  • LONG BIN
  • GONG JING

Assignees

  • 平安银行股份有限公司

Dates

Publication Date
20260512
Application Date
20221104

Claims (9)

  1. 1. A method for detecting handwritten text, the method comprising: Acquiring an image to be detected; performing text detection on the image to be detected, and determining a text region in the image to be detected; detecting the text type of the text region to obtain the text type of the text in the text region; If a target text region with the text type of handwriting text exists in the text region, marking the target text region to obtain a marked image to be detected; Performing character recognition on the handwriting text in the target text region to obtain a character recognition result of the target text region; Outputting the marked image to be detected and a character recognition result of a target text area in the marked image to be detected; the text detection is performed on the image to be detected, a text region in the image to be detected is determined, the text type detection is performed on the text region, and the text type of the text in the text region is obtained, including: performing text detection on the image to be detected according to a preset text detection model, determining a text region in the image to be detected, performing text type detection on the text region, and determining the text type of the text in the text region, wherein the text detection model is obtained by adjusting model parameters of a pre-training model according to first sample data containing a handwriting text region; The method comprises the steps of obtaining first sample data before text detection is carried out on an image to be detected according to a preset text detection model, wherein the first sample data comprises a plurality of first sample images, each first sample image comprises a handwritten text and position information of a handwritten text area where the handwritten text is located, inputting the first sample data into a pre-training model for text detection to obtain predicted position information of the handwritten text area in each first sample image, and adjusting model parameters of the pre-training model according to the predicted position information of the handwritten text area in each first sample image and the position information of the handwritten text area in each first sample image to obtain a text detection model; The method comprises the steps of obtaining a handwriting single character data set before inputting first sample data into a pre-training model for text detection, randomly carrying out character combination on handwriting single character images in the handwriting single character data set to obtain a plurality of single-line handwriting text images, selecting a preset number of target single-line handwriting text images from the single-line handwriting text images, placing the selected target single-line handwriting text images in a preset canvas, determining a handwriting text area of the target single-line handwriting text images in the canvas, setting the canvas after determining the handwriting text area as an original sample image, placing preset printed text in other image areas except the handwriting text area in the original sample image to obtain second sample images, marking the handwriting text area in each second sample image to obtain second sample data, inputting the second sample data into an initial model, and training the initial model to obtain the pre-training model.
  2. 2. The method for detecting handwritten text according to claim 1, wherein said inputting said second sample data into an initial model, training said initial model, obtaining a pre-trained model comprises: inputting second sample images in the second sample data into an initial model for feature extraction to obtain approximate binarization feature images of the second sample images; Performing contour recognition on the approximate binarization feature map of each second sample image to obtain training position information of a handwritten text area in each second sample image; Obtaining a first training loss of the initial model according to a real binarization map of each second sample image and an approximate binarization feature map of each second sample image, wherein the real binarization map of the second sample image is obtained by carrying out binarization processing according to a handwritten text area in the second sample image; Obtaining a second training loss of the initial model according to training position information of the handwritten text areas in the second sample images and cross entropy between real position information of the handwritten text areas in the second sample images, wherein the real position information is the position information of an image area where a target single-line handwritten text image in the second sample images is located; obtaining target training loss of the initial model according to the first training loss and the second training loss, performing iterative training on the initial model according to the target training loss of the initial model, and stopping iterative training when the initial model meets a preset model convergence condition to obtain a pre-training model.
  3. 3. The method for detecting handwritten text according to claim 2, wherein inputting the second sample image in the second sample data to an initial model for feature extraction, obtaining an approximate binarized feature map of each of the second sample images includes: Inputting second sample images in the second sample data into an initial model to perform feature extraction of different scales, and obtaining feature images of different scales of each second sample image; Combining the feature images of different scales of the second sample images to obtain combined feature images of the second sample images; carrying out image convolution on the combined feature images of the second sample images to obtain probability feature images of the second sample images, and carrying out up-sampling operation on the combined feature images of the second sample images to obtain threshold feature images of the second sample images; And performing differential binarization according to the difference image between the probability feature image and the threshold feature image of each second sample image to obtain an approximate binarization feature image of each second sample image.
  4. 4. The method for detecting handwritten text according to claim 3, wherein said obtaining a first training loss of said initial model based on a true binarization map of each of said second sample images and an approximate binarization feature map of each of said second sample images includes: Obtaining binarization loss according to the cross entropy between the real binarization map of each second sample image and the approximate binarization feature map of each second sample image; Determining predicted handwritten text areas of the second sample images according to the threshold feature images of the second sample images, and obtaining threshold loss according to the distance between the predicted handwritten text areas of the second sample images and the handwritten text areas of the second sample images; and determining a first training loss of the initial model according to the binarization loss and the threshold loss.
  5. 5. The method for detecting handwritten text as recited in claim 1, wherein said performing text detection on said image to be detected according to a preset text detection model, and determining text areas in said image to be detected includes: inputting the image to be detected into a preset text detection model to obtain a probability matrix of the image to be detected; Binarizing according to the probability matrix of the image to be detected to obtain a binarization matrix of the image to be detected; And selecting a target pixel point with a value of a preset value in the binarization matrix of the image to be detected, and determining a text region in the image to be detected according to the position of the target pixel point.
  6. 6. The method for detecting handwritten text as recited in any one of claims 1 to 5, wherein said performing text recognition on handwritten text in said target text area to obtain a text recognition result of said target text area includes: image segmentation is carried out according to a target text region in the image to be detected, so that a target image region in which the target text region is located in the image to be detected is obtained; And inputting the target image area into a preset character recognition model to perform character recognition on the handwriting text in the target image area, so as to obtain a character recognition result of the target text area.
  7. 7. A handwritten text detection device, characterized in that the device comprises: the acquisition module is used for acquiring the image to be detected; The text region detection module is used for carrying out text detection on the image to be detected and determining a text region in the image to be detected; the text type detection module is used for detecting the text type of each text region and determining the text type of the text in each text region; the marking module is used for marking the target text area if the text area with the text type of handwriting text exists in each text area, so as to obtain a marked image to be detected; the text detection module is used for carrying out text recognition on the handwriting text in the target text area to obtain a text recognition result of the target text area; The output module is used for outputting the marked image to be detected and the character recognition result of the target text area in the marked image to be detected; The model detection module is used for carrying out text detection on the image to be detected according to a preset text detection model, determining a text region in the image to be detected, carrying out text type detection on the text region, and determining the text type of the text in the text region; The training module is used for acquiring first sample data, wherein the first sample data comprises a plurality of first sample images, each first sample image comprises a handwritten text and position information of a handwritten text area where the handwritten text is located, inputting the first sample data into a pre-training model for text detection to obtain predicted position information of the handwritten text area in each first sample image, and adjusting model parameters of the pre-training model according to the predicted position information of the handwritten text area in each first sample image and the position information of the handwritten text area in each first sample image to obtain a text detection model; The training system comprises a pre-training module, a target single-line handwriting text image, a script text area, a printing body area and a second sample image, wherein the pre-training module is used for acquiring a handwriting single character data set, randomly combining handwriting single character images in the handwriting single character data set to obtain a plurality of single-line handwriting text images, selecting a preset number of target single-line handwriting text images from the plurality of single-line handwriting text images, placing the selected target single-line handwriting text images in a preset canvas, determining the script text area of the target single-line handwriting text images in the canvas, setting the canvas after determining the script text area as a base sample image, placing preset printed body texts in other image areas except the script text area in the base sample image to obtain a second sample image, marking the script text area in each second sample image to obtain second sample data, inputting the second sample data into an initial model, and training the initial model to obtain a pre-training model.
  8. 8. A server comprising a memory storing an application program and a processor for running the application program in the memory to perform the operations in the handwritten text detection method according to any of claims 1 to 6.
  9. 9. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the handwritten text detection method according to any of claims 1 to 6.

Description

Handwritten text detection method, device, server and storage medium Technical Field The invention relates to the field of text detection and recognition, in particular to a handwritten text detection method, a device, a server and a storage medium. Background The conventional text detection technology based on deep learning can obtain a better detection result aiming at regular printing bodies such as identity cards, bank card recognition, license plate recognition, PDF-word conversion and the like, and has low accuracy in text detection in the prior art when the text detection technology is used for various digital document types containing multiple special printing text fonts, handwriting fonts, text tilting and other complex situations. Disclosure of Invention The embodiment of the invention provides a handwritten text detection method, a handwritten text detection device, a server and a storage medium, so as to improve the accuracy of text detection. In one aspect, an embodiment of the present invention provides a method for detecting a handwritten text, where the method includes: Acquiring an image to be detected; performing text detection on the image to be detected, and determining a text region in the image to be detected; detecting the text type of the text region to obtain the text type of the text in the text region; If a target text region with the text type of handwriting text exists in the text region, marking the target text region to obtain a marked image to be detected; Performing character recognition on the handwriting text in the target text region to obtain a character recognition result of the target text region; And outputting the marked image to be detected and a character recognition result of a target text area in the marked image to be detected. In another aspect, an embodiment of the present invention provides a handwritten text detection apparatus, including: the acquisition module is used for acquiring the image to be detected; The text region detection module is used for carrying out text detection on the image to be detected and determining a text region in the image to be detected; the text type detection module is used for detecting the text type of each text region and determining the text type of the text in each text region; the marking module is used for marking the target text area if the text area with the text type of handwriting text exists in each text area, so as to obtain a marked image to be detected; the text detection module is used for carrying out text recognition on the handwriting text in the target text area to obtain a text recognition result of the target text area; And the output module is used for outputting the marked image to be detected and the character recognition result of the target text area in the marked image to be detected. On the other hand, the embodiment of the invention provides a server, which comprises a memory and a processor, wherein the memory stores an application program, and the processor is used for running the application program in the memory so as to execute the operation in the handwriting text detection method. In another aspect, an embodiment of the present invention provides a storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the above handwritten text detection method. The method comprises the steps of obtaining an image to be detected, carrying out text detection on the image to be detected to determine text areas in the image to be detected, carrying out text type detection on the text areas to determine text types in the text areas, marking the target text areas if the text types in the text areas are target text areas of handwriting texts to obtain the marked image to be detected, carrying out text recognition on the handwriting texts in the target text areas to obtain text recognition results of the target text areas, and outputting the marked image to be detected and the text recognition results of the target text areas in the marked image to be detected. Drawings In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Fig. 1 is a schematic diagram of an application scenario of a handwritten text detection method provided by an embodiment of the present invention; fig. 2 is a flow chart of a handwritten text detection method according to an embodiment of the present invention; FIG. 3 is a schematic diagram of a text detection model according to an embodiment of the present invention; fig. 4 is a schematic structural diagram of a handwritten text detection device accordi