CN-122020060-A - OCR path selection method and device based on dynamic routing and related equipment
Abstract
The embodiment of the invention discloses an OCR path selection method, device and related equipment based on dynamic routing. The method comprises the steps of obtaining a document to be processed, preprocessing the document to be processed to obtain a plurality of pages to be processed, extracting multidimensional features of each page to be processed to obtain multidimensional features of each page to be processed, obtaining structured texts of user constraint conditions, encoding the structured texts to obtain user constraint vectors, carrying out vector splicing on the multidimensional features and the user constraint vectors to obtain splicing features, and carrying out OCR processing path decision on processing paths of each page to be processed according to the splicing features based on a pre-trained path decision model to obtain optimal processing paths of each page to be processed. According to the method, intelligent selection of the OCR processing path is realized through comprehensive analysis of multidimensional features and personalized consideration of user constraints, the most suitable processing strategy is automatically selected, and the efficiency and accuracy of OCR processing are improved.
Inventors
- LIU YU
Assignees
- 北京泰信天成科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260212
Claims (8)
- 1. An OCR path selection method based on dynamic routing, comprising: Acquiring a document to be processed, and preprocessing the document to be processed to obtain a plurality of pages to be processed; identifying the paging content to be processed, and carrying out region division on different contents to obtain a plurality of identification units; extracting multidimensional features of each identification unit to obtain multidimensional features of each identification unit; Obtaining a structured text of a user constraint condition, and coding the structured text to obtain a user constraint vector; vector stitching is carried out on the multidimensional features and the user constraint vectors to obtain stitching features; And based on a pre-trained path decision model, performing OCR processing path decision on the processing path of each to-be-processed paging according to the splicing characteristics to obtain an optimal processing path of each recognition unit.
- 2. The method for dynamically routing-based OCR routing as set forth in claim 1, wherein said extracting the multidimensional feature of each of the pending pages to obtain the multidimensional feature of each of the pending pages comprises: Performing text density recognition on the pages to be processed by adopting a lightweight text detection network to obtain text density characteristics; performing geometric feature recognition on the pages to be processed through Hough transformation to obtain geometric deformation features; extracting image quality characteristics of the pages to be processed to obtain the image quality characteristics; performing background recognition on the pages to be processed through color clustering and texture analysis to obtain background complexity characteristics; and carrying out layout identification on the pages to be processed by adopting edge detection and connected area analysis to obtain layout complexity characteristics.
- 3. The dynamic routing-based OCR path selection method of claim 1, wherein the training process of the path decision model comprises: acquiring sample pages corresponding to the sample document, wherein each sample page is marked with a corresponding standard processing path; Inputting the splicing characteristics corresponding to the sample paging into an initial path decision model for model training to obtain a predicted processing path output by the model; And calculating decision loss between the predicted processing path and the standard processing path corresponding to the sample paging based on a preset loss function, and optimizing model parameters of an initial path decision model based on the decision loss to obtain the path decision model.
- 4. The dynamic routing-based OCR path selection method of claim 1, comprising: Carrying out semantic recognition on each recognition unit, determining the association relation among the recognition units, and marking each recognition unit according to the association relation to obtain an association label; And acquiring OCR recognition results corresponding to all the recognition units, and fusing and correcting the OCR recognition results with the association relation according to the association label to obtain a complete recognition result.
- 5. An OCR path selection device based on dynamic routing, comprising: the pretreatment module is used for acquiring a document to be treated and carrying out pretreatment on the document to be treated to obtain a plurality of pages to be treated; the feature extraction module is used for extracting multidimensional features of each to-be-processed page to obtain multidimensional features of each to-be-processed page; The coding module is used for obtaining the structured text of the user constraint condition and coding the structured text to obtain a user constraint vector; The splicing module is used for carrying out vector splicing on the multidimensional features and the user constraint vectors to obtain splicing features; and the path selection module is used for carrying out OCR processing path decision on the processing path of each to-be-processed page according to the splicing characteristics based on a pre-trained path decision model to obtain the optimal processing path of each to-be-processed page.
- 6. The dynamic routing-based OCR routing apparatus of claim 5, wherein the feature extraction module comprises: the first extraction unit is used for carrying out text density identification on the pages to be processed by adopting a lightweight text detection network to obtain text density characteristics; the second extraction unit is used for carrying out geometric feature recognition on the pages to be processed through Hough transformation to obtain geometric deformation features; The third extraction unit is used for extracting the image quality characteristics of the pages to be processed to obtain the image quality characteristics; The fourth extraction unit is used for carrying out background recognition on the pages to be processed through color clustering and texture analysis to obtain background complexity characteristics; And a fifth extraction unit, configured to perform layout recognition on the page to be processed by using edge detection and connected region analysis, so as to obtain a layout complexity feature.
- 7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the dynamic routing based OCR routing method of any one of claims 1 to 4 when the computer program is executed.
- 8. A computer readable storage medium, characterized in that it stores a computer program which, when executed by a processor, causes the processor to perform the dynamic routing based OCR routing method according to any one of claims 1 to 4.
Description
OCR path selection method and device based on dynamic routing and related equipment Technical Field The embodiment of the invention relates to the technical field of artificial intelligence, in particular to an OCR path selection method and device based on dynamic routing and related equipment. Background With the rapid growth of digital office and document processing demands, optical Character Recognition (OCR) technology has become the core technology for document digital conversion. Conventional OCR systems typically employ a fixed process flow using the same recognition algorithm and parameter configuration for all types of documents, and this "one-shot" approach exposes significant limitations in facing complex and diverse document types. In recent years, researchers have begun to explore more intelligent document processing methods. Chinese patent application CN121457462A discloses a document analysis method of content perception and intelligent routing, which adopts a preset routing decision model to determine an analysis tool [ CN121457462A ] of each page of document by extracting multi-dimensional feature vectors of multi-modal documents. Chinese patent application CN120182989B discloses a archive sorting multimodal intelligent AI classification system, in which a dual channel generator and a cross-modality consistency loss function are introduced, and a PPO algorithm is used to make dynamic routing decisions CN 120182989B. In addition, the chinese patent application CN121009497a proposes a routing method for multi-modal problems under the AI platform, which processes various modal data through a multi-modal intent fusion model and calculates intent recognition confidence [ CN121009497a ]. However, existing OCR systems and document processing methods still have significant technical drawbacks. Firstly, the existing system lacks the capability of accurately evaluating the complexity of a document, and cannot perform self-adaptive processing path selection according to multi-dimensional characteristics such as text density, geometric deformation, image quality, background complexity, layout complexity and the like of the document. Secondly, the traditional method ignores the influence of the constraint condition of the user on the processing strategy, and cannot integrate the specific requirements of the user on the aspects of precision, speed, power consumption and the like into the path decision process. Again, existing systems lack intelligent management in terms of resource allocation, particularly in environments where computing resources such as mobile and edge devices are limited, it is difficult to find an optimal balance between processing accuracy, latency, and power consumption. Finally, the path selection strategy of the existing method is relatively solidified, and a dynamic optimization mechanism based on the actual processing effect is lacked, so that the overall processing efficiency is low, and the differentiated requirements in different application scenes can not be met. Disclosure of Invention The embodiment of the invention provides an OCR path selection method, an OCR path selection device and related equipment based on dynamic routing, which aim to solve the technical problem that OCR processing resources are difficult to formulate for processing documents in the traditional technology. In a first aspect, an embodiment of the present invention provides a dynamic routing-based OCR path selection method, including: Acquiring a document to be processed, and preprocessing the document to be processed to obtain a plurality of pages to be processed; extracting multidimensional features of each to-be-processed page to obtain multidimensional features of each to-be-processed page; Obtaining a structured text of a user constraint condition, and coding the structured text to obtain a user constraint vector; vector stitching is carried out on the multidimensional features and the user constraint vectors to obtain stitching features; And based on a pre-trained path decision model, performing OCR processing path decision on the processing path of each to-be-processed page according to the splicing characteristics to obtain an optimal processing path of each to-be-processed page. In a second aspect, an embodiment of the present invention provides an OCR path selection device based on dynamic routing, including: the pretreatment module is used for acquiring a document to be treated and carrying out pretreatment on the document to be treated to obtain a plurality of pages to be treated; the feature extraction module is used for extracting multidimensional features of each to-be-processed page to obtain multidimensional features of each to-be-processed page; The coding module is used for obtaining the structured text of the user constraint condition and coding the structured text to obtain a user constraint vector; The splicing module is used for carrying out vector splicing on the multidimensi