CN-115565191-B - Inclined text line identification method, system and equipment

CN115565191BCN 115565191 BCN115565191 BCN 115565191BCN-115565191-B

Abstract

The application relates to the technical field of text recognition methods, in particular to an inclined text line recognition method, an inclined text line recognition system and inclined text line recognition equipment, which can solve the problem that inclined fonts in text lines cannot be accurately recognized when texts are recognized to a certain extent. The inclined text line identification method comprises the steps of obtaining an image to be identified, extracting multi-scale features of the image, decoding the multi-scale features, obtaining decoded multi-scale features, obtaining shared features based on the decoded multi-scale features, obtaining a nine-channel feature map based on the shared features, reading four vertex coordinates of a text box in the feature map, performing perspective transformation on the shared features channel by channel based on the four vertex coordinates of the text box to obtain text region features in the feature map, obtaining final identification character strings based on the text region features, and outputting the character strings.

Inventors

WANG BODI
PENG BIN
YAO YI

Assignees

深圳市凌云视迅科技有限责任公司

Dates

Publication Date: 20260508
Application Date: 20221008

Claims (7)

1. A method of identifying a diagonal text line, the method comprising: acquiring an image to be identified, wherein the image comprises text lines; extracting multi-scale features of the image, decoding the multi-scale features, and acquiring decoded multi-scale features; Based on the shared feature, obtaining a nine-channel feature map and reading four vertex coordinates of a text box in the feature map, wherein a first channel in the feature map is a score map, and a region with high confidence in the score map corresponds to a text line region in the image; Performing perspective transformation on the shared feature channel by channel based on the four vertex coordinates of the text box to obtain text region features; based on the text region characteristics, obtaining a final recognition character string, and outputting the character string; In the step of obtaining a nine-channel feature map based on the shared feature and reading four vertex coordinates of a text box in the feature map, the method further comprises: Processing the shared features through a convolution layer to generate a nine-channel first-scale feature map, wherein the region with high confidence coefficient represents coordinate values in the rest eight channels in the feature map, the region comprising a plurality of pixel values with the confidence coefficient larger than a first threshold value is characterized as a region with high confidence coefficient, and the pixel values of the rest regions are 0; In the step of performing perspective transformation on the shared feature channel by channel based on the four vertex coordinates of the text box, the method further comprises: Taking the vertex of the upper left corner of the text box as a starting point, sequencing the vertices clockwise, and sequentially obtaining four vertex coordinates; And performing perspective transformation on the four vertex coordinates, wherein the coordinates after perspective transformation of the four vertices are (0, 0), (w, h) and (0, h), h is a first height value, and w is a value after width scaling when the aspect ratio of the text box is unchanged before and after perspective change.
2. A method of oblique text line recognition as defined in claim 1, wherein in the step of extracting multi-scale features of the image and decoding the multi-scale features, obtaining decoded multi-scale features further comprises: Extracting a first scale feature, a second scale feature, a third scale feature and a fourth scale feature in the image, wherein the first scale, the second scale and the third scale respectively represent the proportion of the image size in the scale feature to the image size to be identified; And taking a first scale feature in the multi-scale features as a shared feature.
3. A method of oblique text line recognition as defined in claim 1, wherein in the step of extracting multi-scale features of the image and decoding the multi-scale features, obtaining decoded multi-scale features further comprises: Extracting a first scale feature, a second scale feature, a third scale feature and a fourth scale feature in the image, wherein the first scale, the second scale and the third scale respectively represent the proportion of the image size in the scale feature to the image size to be identified; and generating a first scale feature map based on the fusion of the decoded multi-scale features in the channel dimension, wherein the first scale feature map is a shared feature.
4. The method of claim 1, wherein in the step of obtaining a final recognition character string based on the text region feature, further comprising: Based on the text region characteristics, generating a characteristic sequence through a convolutional neural network, a cyclic neural network and a full-connection layer to output a posterior probability matrix; Based on the posterior probability matrix, the final recognition character string is obtained.
5. A diagonal text line recognition system, comprising: the image acquisition unit is used for acquiring an image to be identified, wherein the image comprises text lines; The image processing device comprises an image acquisition unit, a sharing feature unit, a multi-scale feature processing unit and a multi-scale feature processing unit, wherein the image acquisition unit is used for acquiring multi-scale features of the image, decoding the multi-scale features and acquiring decoded multi-scale features; The method comprises the steps of obtaining a vertex coordinate unit, obtaining a nine-channel feature map based on the shared feature, and reading four vertex coordinates of a text box in the feature map, wherein a first channel in the feature map is a score map, and a region with high confidence in the score map corresponds to a text line region in the image; The perspective transformation unit is used for conducting perspective transformation on the shared feature channel by channel based on the four vertex coordinates of the text box so as to obtain text region features; The output posterior probability matrix unit is used for generating a characteristic sequence through a convolutional neural network, a cyclic neural network and a full-connection layer so as to output a posterior probability matrix; The character string output unit is used for obtaining a final recognition character string based on the posterior probability matrix and outputting the character string; in the step of obtaining a nine-channel feature map based on the shared feature and reading four vertex coordinates of a text box in the feature map, the vertex coordinate obtaining unit is further configured to: And processing the shared features through a convolution layer to generate a nine-channel first-scale feature map, wherein the region with high confidence coefficient represents coordinate values in the rest eight channels in the feature map, wherein the region comprising a plurality of pixel values with confidence coefficient larger than a first threshold value is characterized as a region with high confidence coefficient, and the pixel values of the rest regions are 0.
6. The oblique text line recognition system of claim 5, wherein in the step of extracting the multi-scale features of the image and decoding the multi-scale features, the obtain shared feature unit is further configured to: Extracting a first scale feature, a second scale feature, a third scale feature and a fourth scale feature in the image, wherein the first scale, the second scale and the third scale respectively represent the proportion of the image size in the scale feature to the image size to be identified; And taking a first scale feature in the multi-scale features as a shared feature.
7. The inclined text line identification device is characterized by comprising a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; The memory is configured to store at least one executable instruction that causes the processor to perform the operations of the method for diagonal text line recognition as recited in any one of claims 1-4.

Description

Inclined text line identification method, system and equipment Technical Field The application relates to the technical field of text recognition methods, in particular to an inclined text line recognition method, an inclined text line recognition system and inclined text line recognition equipment. Background With the development of science and technology, a great deal of information is recorded, edited, arranged, stored and transmitted by using a computer. With the increase of the information quantity, an effective method is needed to input the information into the computer rapidly and accurately, the efficiency is difficult to improve by manually processing a large amount of text data, and the problem can be solved by using the computer to automatically identify the text. The character recognition technology is a comprehensive technology with a relatively wide subject, and relates to subjects such as pattern recognition, digital signal processing, image processing, computer vision, fuzzy mathematics and the like, and text line recognition is increasingly important in the technical field of character recognition at present. In the implementation of the text line recognition process, feature extraction is often performed through a single convolutional neural network, and text line recognition is performed in combination with a deep learning model. The deep learning is widely applied to industrial vision, the performance of the deep learning in a complex scene is obviously superior to that of a traditional image processing algorithm, the deep learning is driven by data, images are mapped to a high-dimensional feature space, and then different processing is carried out according to different tasks. However, since the fonts forming the text in the image may be italics with rotation angles or italics with horizontal directions, text information in the image cannot be effectively identified through a simple text line identification process, features extracted by a single convolutional neural network are limited, trend information of the text cannot be extracted, and the inclined fonts in the text line cannot be accurately identified. Disclosure of Invention In order to solve the problem that the inclined fonts in the text line cannot be accurately identified when the text line is identified, the application provides an inclined text line identification method, an inclined text line identification system and inclined text line identification equipment. Embodiments of the present application are implemented as follows: A first aspect of an embodiment of the present application provides a method for identifying an oblique text line, the method including: acquiring an image to be identified, wherein the image comprises text lines; extracting multi-scale features of the image, decoding the multi-scale features, and acquiring decoded multi-scale features; Based on the shared feature, obtaining a nine-channel feature map and reading four vertex coordinates of a text box in the feature map, wherein a first channel in the feature map is a score map, and a region with high confidence in the score map corresponds to a text line region in the image; Performing perspective transformation on the shared feature channel by channel based on the four vertex coordinates of the text box to obtain text region features in the feature map; Based on the text region characteristics, a final recognition character string is obtained, and the character string is output. In some embodiments, in the step of extracting multi-scale features of the image and decoding the multi-scale features, obtaining decoded multi-scale features further comprises: Extracting a first scale feature, a second scale feature, a third scale feature and a fourth scale feature in the image, wherein the first scale, the second scale and the third scale respectively represent the proportion of the image size in the scale feature to the image size to be identified; And taking a first scale feature in the multi-scale features as a shared feature. In some embodiments, in the step of extracting multi-scale features of the image and decoding the multi-scale features, obtaining decoded multi-scale features further comprises: Extracting a first scale feature, a second scale feature, a third scale feature and a fourth scale feature in the image, wherein the first scale, the second scale and the third scale respectively represent the proportion of the image size in the scale feature to the image size to be identified; and generating a first scale feature map based on the fusion of the decoded multi-scale features in the channel dimension, wherein the first scale feature map is a shared feature. In some embodiments, in the step of obtaining a nine-channel feature map of the first scale based on the shared feature and reading four vertex coordinates of a text box in the text line, the method further includes: And processing the shared features through a convolution layer to gene