CN-119323796-B - Text detection and recognition method for medical bill image

CN119323796BCN 119323796 BCN119323796 BCN 119323796BCN-119323796-B

Abstract

The invention relates to the technical field of text detection and identification of images, in particular to a text detection and identification method for medical bill images. The method comprises the following steps of S1, carrying out feature enhancement on a medical bill image through a feature enhancement module, reconstructing a table structure of the medical bill image, S2, carrying out feature fusion on the medical bill image after reconstructing the table structure to form a pre-detection table image, S3, integrating the pre-detection table image with a visual language pre-training model MDETR to enhance feature expression, S4, identifying a text region from the pre-detection table image by using a connection time sequence classification CTC method improved through an attribute mechanism, and S5, carrying out depth analysis on the detected text region through a fusion model of an SVTR network and a PP-HGNet. The text region can be identified in the complex medical bill image through feature enhancement, table structure reconstruction, feature fusion and integration with the visual language pre-training model MDETR.

Inventors

ZHANG FENGXIANG
ZHU BO
QIU LAN
LIU ZIYU

Assignees

昆明理工大学

Dates

Publication Date: 20260508
Application Date: 20241010

Claims (7)

1. A text detection and recognition method for medical bill images is characterized by comprising the following steps: S1, carrying out feature enhancement on a medical bill image through a feature enhancement module, and reconstructing a table structure of the medical bill image; S2, after the table structure is reconstructed, further carrying out feature fusion on the medical bill image to form a pre-detection table image; s3, integrating the pre-detection form image with the visual language pre-training model MDETR to enhance the feature expression; S4, identifying a text region from the pre-detection table image by using a connection time sequence classification CTC method improved by an Attention mechanism; s5, further carrying out depth analysis on the detected text region through a fusion model of the SVTR network and the PP-HGNet; in the step S1, the feature enhancement is performed on the medical bill image through a feature enhancement module, and the method comprises the following steps: s1.1, converting a color medical bill image into a gray medical bill image, and adjusting a histogram of the medical bill image; S1.2, smoothing the medical bill image through a Gaussian filtering algorithm, removing high-frequency noise, preliminarily extracting an area containing a table structure from the processed gray medical bill image, and dividing the extracted area into a plurality of small area blocks; S1.3, enhancing the contrast of the medical bill image by a histogram stretching method, and sharpening the image edge by using a Laplacian operator; s1.4, enhancing horizontal line segments in the image by using a dilation operation; s1.5, dynamically adjusting a threshold value according to local characteristics of the medical bill image, and performing self-adaptive binarization processing; s1.6, detecting and reconstructing horizontal line segments by applying Hough transformation in each area block, performing cluster analysis on the detected line spacing through a K-means clustering algorithm, and calculating a line spacing average value; s1.7, setting a threshold u based on a line space average value, further extracting a horizontal line segment by using expansion operation, setting a threshold h based on the line space average value, detecting a vertical line segment, fusing the detected horizontal line segment with the vertical line segment, and reconstructing a table structure of the medical bill image; in the step S2, after reconstructing the table structure, feature fusion is further performed, which includes the following steps: s2.1, further processing the image by using a self-defined nuclear mask; s2.2, performing expansion operation on the pre-extracted regional blocks again, and extracting horizontal line segments; s2.3, re-applying standard Hough transformation in the pre-extracted region block to reconstruct a horizontal line segment; s2.4, carrying out cluster analysis on the reconstructed horizontal line segments again, and solving a line distance average value; s2.5, setting a threshold value based on the row spacing average value, extracting a vertical line segment, and further perfecting a table structure; s2.6, performing regional blocking again, and determining a regional threshold value m; s2.7, fusing the characteristics of each block area, and pre-extracting the outline to form a pre-detection table image; in the step S3, the feature expression is enhanced by integrating the pre-detection table image with the visual language pre-training model MDETR, which includes the following steps: S3.1, using the formed pre-detection table image as input, and performing standardization processing on the pre-detection table image; S3.2, inputting the pre-detected table image after the pretreatment into a MDETR model; S3.3, interacting the image features and the text features, and encoding the image features; S3.4, decoding the coded and interacted features and outputting a detection result; S3.5, performing post-processing on the target detection result output by the model MDETR by a non-maximum suppression NMS method; s3.6, using the detection result provided by the model MDETR to further confirm and refine the boundaries of the table and the positions of the cells.
2. The text detection and recognition method for medical bill images according to claim 1, wherein in S1.2, the Gaussian filtering algorithm is specifically: ; Wherein, the Representing pixel point in Offset on the shaft; Representing pixel point in Offset on the shaft; Standard deviation of gaussian distribution is shown; representing the weight of each pixel in the medical ticket image and its neighborhood when gaussian filtered.
3. The text detection and recognition method for medical bill images according to claim 1, wherein in S1.3, laplacian is specifically: ; Wherein, the Representing medical bill image at coordinates Gray values at; representing in two dimensions A shaft; representing in two dimensions A shaft; Representing a function in a two-dimensional space; representing the partial derivative.
4. The method for text detection and recognition of medical bill image according to claim 1, wherein in S1.5, the threshold is dynamically adjusted according to local characteristics of the medical bill image, and the adaptive binarization processing is performed, comprising the following steps: S1.51, selecting a window with the size of 5 multiplied by 5, and for each pixel point in the medical bill image, calculating the local average brightness around the pixel point by using the selected window; S1.52, calculating an offset value, and adding the offset value to the local average brightness to be used as a threshold value b of a local area; s1.53, comparing the gray value of each pixel point with a calculated threshold value of a local area, if the gray value is larger than the threshold value b, setting the gray value as white, otherwise setting the gray value as black; s1.54, repeating the steps S1.51 to S1.53 on the whole image until all the pixel points are processed.
5. The method for text detection and recognition of medical ticket images according to claim 1, wherein in S1.6, the applying Hough transform to detect straight lines in the images comprises the steps of: S1.61, detecting a straight line by using a standard Hough transform; S1.62, traversing each edge point in the medical bill image, and accumulating votes for straight lines representing the edge points in a parameter space; S1.63, calculating all straight line parameter combinations for each edge point, and increasing the count in a corresponding parameter space; s1.64, searching a local maximum point in a parameter space; s1.65, drawing a straight line in the original medical bill image according to the detected local maximum point.
6. The text detection and recognition method for medical ticket images according to claim 1, wherein in S4, text regions are recognized from the pre-detection form image using a connection timing classification CTC method modified by an Attention mechanism, comprising the steps of: s4.1, inputting a pre-detection form image; s4.2, extracting features in the image by using a convolutional neural network CNN; s4.3, inputting the feature map into a cyclic neural network RNN, and adding an Attention mechanism on the basis of the neural network RNN; S4.4, generating an output sequence, and calculating CTC loss between the output sequence and the real label; S4.5, decoding the output sequence by adopting a greedy strategy, determining a text region in the image according to a decoding result, and identifying the text region.
7. The method for text detection and recognition of medical bill images according to claim 1, wherein in S5, the detected text region is further subjected to depth analysis through a fusion model of SVTR network and PP-HGNet, and the method comprises the following steps: S5.1, inputting detected text region images, and preprocessing each text region image; S5.2, splicing the characteristic diagram generated by the SVTR network with the characteristic diagram generated by the PP-HGNet to form a new characteristic diagram; S5.3, further processing the spliced feature graphs by using an attention mechanism to generate a final feature representation; s5.4, converting the final characteristic representation into a text sequence by using a CTC decoder, and performing post-processing on the decoded text sequence.

Description

Text detection and recognition method for medical bill image Technical Field The invention relates to the technical field of text detection and identification of images, in particular to a text detection and identification method for medical bill images. Background The image data of the medical bill is mainly obtained by taking pictures of a scanning piece, a mobile phone and other mobile equipment in an actual scene. In view of the generally limited resolution of mobile device cameras, captured medical ticket images tend to be mixed with a variety of interfering factors, such as stains, scratches, handwritten notes, and stamp patterns. These interference factors are not only of various types, but also can mask or cover important text information, and form serious challenges for accurate recognition of the text detection algorithm, resulting in reduced algorithm accuracy and increased error rate. In addition, part of the images can undergo a compression process when transmitted through a network, which can cause the degradation of image quality and further influence the accuracy of subsequent image processing steps and text recognition, and the medical bill is used as a critical medical record, has compact and various text layout, occupies more than 80% of page proportion, contains wide information category and integrates multiple languages and symbologies. Specifically, chinese is used to record patient details, medical service items, etc., english may appear in medical facility names, doctor data, etc., greek letters are more common in medical terms and symbols, and mathematical formulas may appear in examination results or professional medical reports. The diversity and complexity of the text form increases the difficulty in identifying and processing medical notes, and requires the implementation of targeted parsing and identifying strategies for various texts. In addition, in specific application scenarios, such as instant settlement, online insurance audit and the like, the image text recognition of the medical bill needs to have high timeliness. Comprehensively, a text detection and recognition method for medical bill images is provided. Disclosure of Invention The invention aims to provide a text detection and recognition method for medical bill images, which aims to solve the problems of complex medical bill images, poor image quality and various text forms in the background technology. In order to achieve the above object, the present invention provides a text detection and recognition method for medical bill images, comprising the following steps: S1, carrying out feature enhancement on a medical bill image through a feature enhancement module, and reconstructing a table structure of the medical bill image; S2, after the table structure is reconstructed, further carrying out feature fusion on the medical bill image to form a pre-detection table image; s3, integrating the pre-detection form image with the visual language pre-training model MDETR to enhance the feature expression; S4, identifying a text region from the pre-detection table image by using a connection time sequence classification CTC method improved by an Attention mechanism; S5, further carrying out depth analysis on the detected text region through a fusion model of the SVTR network and the PP-HGNet. As a further improvement of the technical scheme, in S1, the feature enhancement module performs feature enhancement on the medical bill image, and the method includes the following steps: s1.1, converting a color medical bill image into a gray medical bill image, and adjusting a histogram of the medical bill image; s1.2, smoothing the medical bill image through a Gaussian filtering algorithm, removing high-frequency noise, preliminarily extracting an area containing a table structure from the processed gray medical bill image, and dividing the extracted area into a plurality of smaller area blocks; S1.3, enhancing the contrast of the medical bill image by a histogram stretching method, and sharpening the image edge by using a Laplacian operator; s1.4, enhancing horizontal line segments in the image by using a dilation operation; s1.5, dynamically adjusting a threshold value according to local characteristics of the medical bill image, and performing self-adaptive binarization processing; s1.6, detecting and reconstructing horizontal line segments by applying Hough transformation in each area block, performing cluster analysis on the detected line spacing through a K-means clustering algorithm, and calculating a line spacing average value; s1.7, setting a threshold u based on a line space average value, further extracting a horizontal line segment by using expansion operation, setting a threshold h based on the line space average value, detecting a vertical line segment, fusing the detected horizontal line segment with the vertical line segment, and reconstructing a table structure of the medical bill image. As a further improvement of