JP-7856174-B2 - Image orientation recognition method, image orientation recognition model, and storage medium

JP7856174B2JP 7856174 B2JP7856174 B2JP 7856174B2JP-7856174-B2

Inventors

レイディン
ビヌドォン
シャヌシャヌジアン
ジィアシジャン
ヨンウエイジャン

Assignees

株式会社リコー

Dates

Publication Date: 20260511
Application Date: 20250117
Priority Date: 20240221

Claims (13)

A computer-based method for determining the orientation of an image, Multiple character regions are detected from the target image to be identified, and the position coordinates and region features of each character region are obtained; For each character region, the image features in the target image of the character region are identified based on the position coordinates , the image features and the region features are merged to generate a first merged feature of the character region, and based on the first merged feature, a plurality of text line classification results for the character region are generated; This includes acquiring positional and morphological features for each character region, fusing the classification results of multiple text lines in the character region based on the positional and morphological features, and generating a direction identification result for the target image. A method for identifying the orientation of an image, characterized by the features described above.
For each of the aforementioned character regions, positional features and morphological features are acquired, and based on the positional features and morphological features, the classification results of multiple text lines in the character region are merged to generate the direction identification result of the target image. The process includes inputting the position coordinates of each character area and the text line classification result into an image orientation identification module, and obtaining the orientation identification result of the target image output by the image orientation identification module, The aforementioned image orientation identification module is For each character region, the positional features and morphological features are identified based on the positional coordinates, the positional features and morphological features are merged to generate a second fused feature, and the weights of the character region are generated based on the second fused feature; Based on the weight of each character region, the classification results of multiple text lines in that character region are merged to generate the direction identification result of the target image. The image orientation identification method according to feature 1.
The aforementioned positional features include the relative polar angle, which is the difference between the polar angle and the reference polar angle, and the polar diameter. The aforementioned morphological features include at least one of the height, width, area, contour length, and center of mass of the character region. The image orientation identification method according to feature 2.
Detecting multiple character regions from the target image and obtaining position coordinates and region features for each character region is possible. This includes inputting the target image into a character region detection module and obtaining the position coordinates and region features of each character region output by the character region detection module. The image orientation identification method according to feature 2.
For each character region, the image features in the target image of the character region are identified based on the position coordinates , the image features and region features are merged to generate a first merged feature of the character region, and based on the first merged feature, multiple text line classification results for the character region are generated. This includes inputting the position coordinates and regional features of the target image and the character region into a feature fusion and text line classification module, and generating multiple text line classification results for the character region output by the feature fusion and text line classification module. The image orientation identification method described in feature 4.
Before detecting the character region from the aforementioned target image, The character region detection module is pre-trained using first training data that includes multiple first images on which character regions are marked; Second training data, including multiple second images on which the text line classification results are indicated, is input to the trained character region detection module; the feature fusion and text line classification module is pre-trained using the position coordinates and region features of the character regions output by the character region detection module, and the second training data; A third training data set is obtained, which includes multiple third images in the target region with their orientations indicated; The third training data is input to the trained character region detection module, and the position coordinates and region features of the character regions output by the character region detection module, along with the third training data, are input to a trained feature fusion and text line classification module, and the image orientation recognition module is trained using the text line classification results output by the feature fusion and text line classification module and the third training data. The image orientation identification method according to feature 5.
A character region detection module that detects multiple character regions from an image to be identified and obtains the position coordinates and region features of each character region, A feature fusion and text line classification module that, for each character region, identifies image features in the target image of the character region based on the position coordinates , fuses the image features and the region features to generate a first fused feature of the character region, and generates multiple text line classification results for the character region based on the first fused feature, Includes an image orientation recognition module that acquires positional features and morphological features for each of the character regions, merges the classification results of multiple text lines in the character region based on the positional features and morphological features, and generates an orientation recognition result for the target image. An image orientation identification device characterized by the following:
The aforementioned image orientation identification module further, For each character region, the positional features and morphological features are identified based on the positional coordinates, the positional features and morphological features of the character region are merged to generate a second fused feature, and the weight of the character region is generated based on the second fused feature; Based on the weight of each character region, the classification results of multiple text lines in that character region are merged to generate the direction identification result of the target image. The image orientation identification device according to feature 7.
The aforementioned positional features include the relative polar angle, which is the difference between the polar angle and the reference polar angle, and the polar diameter. The aforementioned morphological features include at least one of the height, width, area, contour length, and center of mass of the character region. The image orientation identification device according to feature 8.
The aforementioned character area detection module further, An image feature diagram corresponding to the target image is generated, the character region is detected based on the image feature diagram, the position coordinates of the character region are obtained, and features corresponding to the character region are extracted from the image feature diagram and used as the region features of the character region. The image orientation identification device according to feature 8.
The character region detection module is pre-trained using first training data that includes multiple first images on which character regions are marked; Second training data, including multiple second images on which the text line classification results are indicated, is input to the trained character region detection module; the feature fusion and text line classification module is pre-trained using the position coordinates and region features of the character regions output by the character region detection module, and the second training data; A third training data set is obtained, which includes multiple third images in the target region with their orientations indicated; The present invention further includes a training module which inputs the third training data into the trained character region detection module, inputs the position coordinates and region features of the character regions output by the character region detection module, and the third training data into a trained feature fusion and text line classification module, and trains the image orientation discrimination module using the text line classification results output by the feature fusion and text line classification module and the third training data. The image orientation identification device according to feature 10.
A program for causing a computer to execute the image orientation identification method described in any one of claims 1 to 6.
A computer-readable storage medium storing the program described in claim 12.

Description

This invention relates to image processing technology, and more specifically to a method for identifying image orientation, an image orientation identification model (also referred to as an image orientation identification device), and a storage medium. In intelligent office applications, it's common to perform character recognition (OCR, or Optical Character Recognition) on document images scanned and uploaded by users. To ensure accuracy, it's necessary to verify that the image orientation is correct before recognition. However, the orientation of images generated by user scans can vary (0°, 90°, 180°, 270°, etc., with 0° being considered correct) due to various factors. Therefore, it's necessary to rotate the image manually or algorithmically to ensure correct orientation before recognition. Here, we assume the character orientation is parallel to the long or short side of the image, and we disregard any slight tilt of the character orientation relative to the image's length/width. The purpose of image orientation recognition is to rotate an image in the positive direction. Currently, the primary method for image orientation recognition involves directly classifying images via deep convolutional networks. However, conventional deep convolutional networks extract features from the entire image, failing to represent the important role of text within that image. This introduces significant noise, reducing classification accuracy and resulting in low accuracy in image orientation recognition. Therefore, methods to improve the accuracy of image orientation recognition are needed. The following detailed explanation of the optimal implementation method will clarify other advantages and benefits. The drawings show preferred embodiments but should not be considered limiting to the present invention. In the drawings, identical parts are denoted by the same reference numerals. Figure 1 is a schematic diagram showing the structure of an image orientation recognition model according to an embodiment of the present invention. Figure 2 is a flowchart showing an image orientation identification method according to an embodiment of the present invention. Figure 3 shows an example of an image orientation identification module according to one embodiment of the present invention. Figure 4 shows an example of a different character area according to one embodiment of the present invention. Figure 5 is a schematic diagram showing another structure of the image orientation recognition model according to an embodiment of the present invention. Figure 6 is a schematic diagram showing yet another structure of the image orientation recognition model according to an embodiment of the present invention. The technical problems, solutions, and advantages of this invention will be described in detail below using drawings and specific embodiments to clarify them further. The provision of specific arrangements and constituent details in the following description is merely to aid in a comprehensive understanding of the embodiments of the invention. Therefore, various changes and modifications can be made to the embodiments described herein, as long as they do not deviate from the scope and spirit of the invention. For the sake of clarity, descriptions of known functions and structures will be omitted. Throughout the specification, the terms "one embodiment" or "one example" refer to a specific feature, structure, or characteristic associated with that embodiment, and that which is included in at least one embodiment of the present invention. Therefore, "in one embodiment" or "in one example" as described in various parts of the specification do not necessarily refer to the same embodiment. These specific features, structures, or characteristics can be combined in any way appropriate in one or more embodiments. In each embodiment of the present invention, the numbering of the processes described below does not indicate the order of execution. The execution order of each process is determined by its function and inherent logic, and is not limited in any way to the implementation processes of the embodiments of the present invention. Embodiments of the present invention provide an image orientation identification method for identifying the orientation of an image. Here, the orientation of an image is typically one of 0°, 90°, 180°, or 270°. For example, 0° indicates that the image is "right," while 90°, 180°, and 270° indicate rotations from 0° to 90°, 180°, and 270°, respectively. In embodiments of the present invention, an image being "right" means that the orientation of the text in the image conforms to reading habits. In some cases, multiple text orientations exist in an image. In this case, the primary text orientation in the image is defined as the text orientation in that image. The primary text is typically the text with the largest area and oriented in the same direction in the image. Similarly, the orientation of text lines in an