CN-122024245-A - Multi-path OCR (optical character recognition) system based on multi-recognition target classification
Abstract
The invention relates to the technical field of image recognition and discloses a multi-path OCR recognition system based on multi-recognition target classification, which comprises an image feature extraction module, a route distribution module and a parallel recognition module, wherein the image feature extraction module is used for obtaining shallow feature tensors, the route distribution module is used for generating route control masks according to local gradient distribution, the parallel recognition module comprises a printing recognition unit and a handwriting recognition unit, the route distribution module is used for extracting gradient tensors by using a directional derivative operator and determining anisotropic response feature values according to gradient variance ratio so as to construct the route control masks, and the feature tensors are distributed through element-by-element multiplication operation.
Inventors
- LIU DAIDI
- WU JINGXING
- DU YONGHENG
- WU XIAOHONG
- NIE RONG
- WANG PENG
- BAI GANG
- HUANG YUFENG
- WEI GANGQIANG
- ZHANG ZUOQIANG
- SU QIGUI
Assignees
- 长沙谱蓝网络科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260409
Claims (10)
- 1. A multi-way OCR recognition system based on multi-recognition object classification, comprising: the image feature extraction module is used for performing convolution operation on the input image to generate a shallow feature tensor; The route distribution module is connected with the image feature extraction module and is used for generating a route control mask according to the local gradient distribution of the shallow feature tensor; The parallel recognition module is connected with the route distribution module and comprises a printing recognition unit and a handwriting recognition unit, wherein the route distribution module is used for respectively extracting gradient tensors in the horizontal direction, the vertical direction and two diagonal directions in a local receptive field of a shallow characteristic tensor by calling four directional derivative operators, calculating the sum of variances of the gradient tensors in the horizontal direction and the gradient tensors in the vertical direction to obtain an orthogonal gradient variance sum, calculating the sum of variances of the gradient tensors in the two diagonal directions to obtain an opposite angle gradient variance sum, calculating the ratio of the orthogonal gradient variance sum to the opposite angle gradient variance sum to determine an anisotropic response characteristic value of the local receptive field, constructing a route control mask according to the anisotropic response characteristic value, and generating the sub-characteristic tensors distributed to the printing recognition unit or the handwriting recognition unit through element-by-element multiplication operation of the route control mask and the shallow characteristic tensors; the dynamic resource management module is connected with the route distribution module and is used for monitoring the distribution state of the route control mask in real time and adjusting the operator activation of the identification branch; and the global fusion module is connected with the parallel identification module and is used for logically reconstructing the local identification result according to the spatial distribution information of the route control mask.
- 2. The multi-path OCR system based on multi-recognition object classification according to claim 1, wherein the image feature extraction module comprises a multi-layer residual connection unit, the shallow feature tensor is extracted from a feature layer with spatial resolution greater than a preset pixel threshold in the image feature extraction module, and the route distribution module is further configured to perform global pooling processing of channel dimensions on the shallow feature tensor, obtain a spatial saliency initial descriptor, and perform weight compensation on an extraction process of the directional derivative operator by using the initial descriptor to suppress interference of a background texture on an anisotropic response feature value, align a response edge of the route control mask with a stroke physical boundary in a pixel coordinate system, and complete feature decoupling of a printed stroke and a handwritten stroke before the shallow feature tensor enters deep recognition.
- 3. The multi-way OCR system of claim 1, wherein the routing assignment module, when calculating the anisotropic response characteristic value R, follows the following logical relationship: wherein R is an anisotropic response characteristic value, And (3) with For the gradient variance of the shallow feature tensor in the horizontal and vertical directions, And (3) with And the routing distribution module is used for judging that the local receptive field is a printing structure area when the anisotropic response characteristic value R is larger than a preset structuring judgment threshold value, and distributing a weight mapping value to the printing identification unit at the space coordinates corresponding to the local receptive field in the routing control mask.
- 4. The multi-way OCR system based on multi-recognition object classification according to claim 1, wherein the dynamic resource management module is configured to monitor a mask average of the routing control mask in global coordinates in real time, and when the mask average is lower than an activation threshold preset based on an image size of the input image, the dynamic resource management module stops distributing the sub-feature tensor to the corresponding print recognition unit or handwriting recognition unit.
- 5. The multi-path OCR system based on multi-recognition object classification according to claim 1, wherein the four directional derivative operators are anisotropic sobel operators, and the routing distribution module is used for carrying out convolution operation on shallow feature tensors by using the anisotropic sobel operators, obtaining directional gradient response of a stroke structure, establishing stroke topology distribution characteristics according to distribution states of orthogonal gradient variance sum and diagonal gradient variance sum, and correcting recognition probability distribution output by a printing recognition unit or a handwriting recognition unit by using the stroke topology distribution characteristics.
- 6. The multi-pass OCR recognition system based on multi-recognition object classification of claim 1, wherein the route assignment module includes a distribution recalibration unit to convert the route control mask into a boolean filter matrix and perform a zeroing process on non-character regions in the shallow feature tensor based on the boolean filter matrix to eliminate recognition path false triggers caused by background folds during the feature flow stage.
- 7. The multi-path OCR recognition system based on multi-recognition object classification according to claim 1, wherein the print recognition unit is configured to receive sub-feature tensors corresponding to the print recognition unit and perform text line alignment recognition using a preset fixed-size convolution kernel, and the handwriting recognition unit is configured to receive sub-feature tensors corresponding to the handwriting recognition unit and capture irregular stroked curve features using the multi-scale adaptation unit.
- 8. The multi-path OCR system based on multi-recognition object classification according to claim 1, wherein the global fusion module is connected with the parallel recognition module and is used for performing spatial position correlation on the local recognition result output by the printing recognition unit and the handwriting recognition unit, and reconstructing a logic text sequence of the composite layout document according to the spatial position distribution information provided by the routing control mask.
- 9. The multi-pass OCR system according to claim 1, wherein the number of channels of the shallow feature tensor is not less than 64, the number of channels of the route control mask is identical to the number of recognition paths of the parallel recognition module, and the route distribution module performs normalization processing on the route control mask before performing the element-by-element multiplication operation so that the numerical range of the sub feature tensor is between 0 and 1.
- 10. The multi-pass OCR recognition system based on multi-recognition object classification of claim 1, wherein the routing assignment module is configured to calculate a discrete difference in anisotropic response characteristic values between adjacent local receptive fields and adjust a contrast of the routing control mask based on a size of the discrete difference to enhance a linear separability of the printed stroke features and the handwritten stroke features in the feature space.
Description
Multi-path OCR (optical character recognition) system based on multi-recognition target classification Technical Field The invention belongs to the technical field of image recognition, and particularly relates to a multi-channel OCR recognition system based on multi-recognition target classification. Background The current optical character recognition technology generally adopts an end-to-end deep convolutional neural network architecture, global features are extracted from images to be recognized by utilizing a unified backbone network, semantic representations of different scales are obtained by stacking convolutional layers and pooling layers, the architecture has stability when processing single-mode text images and is used for coping with illumination changes or background interference, however, as the requirements of automatic document processing evolve, the images to be recognized often show the characteristic of high aliasing of heterogeneous visual targets such as printing bodies, handwriting bodies, seals and the like, the convolutional kernel of the backbone network responds to the regular printing features and the morphological variable handwriting features in the same receptive field at the same time, so that feature tensors are extracted to generate feature space aliasing on the underlying physical representation, in order to alleviate the contradiction, linear improvement paths for increasing network depth or parameter scale are generally adopted in the industry, the mode can not realize effective stripping of heterogeneous semantics from mechanism, but the background area with lower visual information entropy is caused to be input into redundant computing resources, and edge side equipment generates computation load and processing time delay when processing large-scale document flows. In addition to the limitation of the neural network topology structure on the characteristic characterization level, the underlying processing logic and the matching algorithm constructed aiming at specific character features should deal with the principle deficiency when the heterogeneous target dynamically evolves, for example, the Chinese patent with the authority bulletin number of CN111553336B discloses a printing body Uyghur document image recognition system and method based on a conjoined segment, character recognition is realized by constructing a static feature template library and utilizing Euclidean distance matching, the technology relies on conjoined segment physical segmentation boundary and character structure stability, the presupposition is that characters have high structural consistency, handwriting strokes have strong irregularity and random variability under the actual composite layout working condition, a static template matching mechanism cannot adapt to the dynamic topological variation of the characteristic space, and lacks a physical isolation routing mechanism for heterogeneous semantics, so that calculation force specific allocation is difficult to realize in a feature circulation stage. Therefore, how to realize physical isolation of heterogeneous visual targets in a feature circulation stage and construct an asynchronous distribution mechanism for realizing dynamic dispatching of computational power according to semantic distribution on the basis becomes the technical problem to be solved by the invention. Disclosure of Invention The invention provides a multi-path OCR (optical character recognition) system based on multi-recognition target classification, which comprises the following components: the image feature extraction module is used for performing convolution operation on the input image to generate a shallow feature tensor; The route distribution module is connected with the image feature extraction module and is used for generating a route control mask according to the local gradient distribution of the shallow feature tensor; The parallel recognition module is connected with the route distribution module and comprises a printing recognition unit and a handwriting recognition unit, wherein the route distribution module is used for respectively extracting gradient tensors in the horizontal direction, the vertical direction and two diagonal directions in a local receptive field of a shallow characteristic tensor by calling four directional derivative operators, calculating the sum of variances of the gradient tensors in the horizontal direction and the gradient tensors in the vertical direction to obtain an orthogonal gradient variance sum, calculating the sum of variances of the gradient tensors in the two diagonal directions to obtain an opposite angle gradient variance sum, calculating the ratio of the orthogonal gradient variance sum to the opposite angle gradient variance sum to determine an anisotropic response characteristic value of the local receptive field, constructing a route control mask according to the anisotropic response characteristic value, and generati