EP-4742197-A1 - OPTICAL CHARACTER RECOGNITION AND COMPUTER VISION METHOD

EP4742197A1EP 4742197 A1EP4742197 A1EP 4742197A1EP-4742197-A1

Abstract

The present invention addresses the use of an OCR and computer vision technique/technology that enhances optimization of processing and memory resources related to every OCR stage subsequent to Image Acquisition, utilizing adaptive thresholding to incorporate images (text, images, and video) through pre-processed "reversed binarization" (bitmapping and pixel-wise).

Inventors

CABRITA PAIS HOMEM, Luís Manuel

Assignees

Hebdómada Unipessoal, Lda

Dates

Publication Date: 20260513
Application Date: 20251110

Claims (8)

A computer-implemented optical character recognition and computer vision method for application in images and video including text, consisting of computing the void interval in between any original Foreground recognizable objects as new objects comprising the stages - image acquisition, where the method receives an image or video containing information as input; - pre-processing, where the method performs the enhancement of image quality for analysis by different techniques, including but not limited to noise reduction, contrast adjustment and color correction; - image segmentation, where the method segments and extracts features, isolating distinctive patterns, detecting edges, corners, textures, and key points; - object extraction, where the identifying features of objects included in the segments are recognized, including general patterns, shapes, and strokes where it locates and identifies different objects, such as drawing bounding boxes around recognized items; - image classification process, in which the recognition by object or item is performed through algorithmic and/or machine learning identification, such as categorizing the entire image or objects into classes, and further labelling with confident scores; and - post-processing, where error correction, final validation of recognized objects and items and reconstruction of its structures are applied, including operations such as thresholding, result filtering, and context integration; characterized in that the method further comprises the conduction of computing the void interval between any original Foreground recognizable or devised objects, through color or Black and White high contrast reversed binarization on the first binary image to generate a reversed binary use of the image.
A computer-implemented optical character recognition and computer vision method according to the previous claim, characterized in that the image segmentation stage or further stages comprises applying a reversed binarization procedure that identifies void intervals between original visual objects, contours, or textures and interprets said void intervals as new structural objects, resulting in a structured sequence of intervalbased features suitable for further analysis or classification.
A computer-implemented optical character recognition and computer vision method according to any of the previous claims, characterized in that the pre-processing stage further includes an edge-detection or contour-detection sub-procedure applied to the reversed binarized image or video frame sequence.
A computer-implemented optical character recognition and computer vision method according to the previous claim, characterized in that the edge-detection sub-procedure includes performing a morphological operation selected from contour thinning, topological skeletonization, or medial-axis transform upon the reversed binary image, thereby reducing the thickness of interval-derived contours or object boundaries to single-pixel-wide structural lines for enhanced feature mapping and spatial topology extraction.
A computer-implemented optical character recognition and computer vision method according to any of the previous claims, characterized in that the binarization process employs a thresholding or segmentation method selected from the group consisting, but not limited to, Canny edge-based thresholding, Sobel gradient thresholding, Laplacian of Gaussian (LoG) filtering, adaptive mean/gaussian thresholding, region-growing segmentation, k-means clustering, or watershed segmentation, to optimize the separation of foreground and background regions before the reversed binarization process.
A computer-implemented optical character recognition and computer vision method according to any of the previous claims, characterized in that the machine learning identification stage utilizes neural network models selected from the group consisting of, but not limited to, convolutional neural networks (CNNs), vision transformers (ViTs), graph neural networks (GNNs), and spatio-temporal deep learning architectures trained on interval-derived contour or structural patterns produced by the reversed binarization pipeline.
A computer-implemented optical character recognition and computer vision method according to any of the previous claims, further comprising encrypting data using the interval objects or characters as cryptographic elements by at least one of the following methods: a. concealing messages in plain image and video, including text by encoding information in patterns of said intervals between objects or characters, whereby encrypted data is embedded in spatial relationships between rather than in object or character content, thereby achieving steganographic concealment; b. employing asymmetric cryptography using public keys represented by patterns of said interval objects or characters, whereby cryptographic keys are distributed in visually inconspicuous spacing patterns, thereby reducing detectability of key exchange; and c. applying diagonalization to said intervals objects or characters to generate functionally equivalent but non-recognizable representations, whereby cryptographic functions are obfuscated from pattern analysis while maintaining computational equivalence.
A computer-program product comprising computer-executable instructions which, when executed by a processor, cause the processor to perform the method according to any one of the previous claims.

Description

FIELD OF THE INVENTION The present invention relates to the general field of OCR and computer vision technology and has several technical applications from a new use of the binarization mapping (under various OCR stages) of the bitmap code for any object (text/images and video). SUMMARY OF THE INVENTION The present invention relates to a software-based system developed to make Optical Character Recognition (OCR) faster and more efficient in terms of processing and memory use. It introduces a new reversed binarization method, related to thinning or skeletonization, which enhances the way images and symbols are analyzed. This method allows the system to process both the foreground (the characters or objects) and the background (the surrounding space) through precise pixel mapping. It can also use multiple layers to combine other advanced technologies, such as cryptography, cellular automata, and machine learning techniques based on language and semantics. The system combines enhanced image pre-processing, machine-learning-based recognition, and context-aware post-processing to improve overall performance and accuracy. As a result, it enables more intelligent image analysis, better integration with AI and cryptographic methods, and improved performance in OCR applications involving text, images, and even video. BACKGROUND OF THE INVENTION An important range of academic contributions is admissive of cross-referencing for the technology. An exemplative short list is hereby submitted: "Document Image Binarization: A Comprehensive Survey". Author: K. N. Otsu. Publication: IEEE Transactions on Pattern Analysis and Machine Intelligence (1979);"Handwritten Digit Recognition with a Back-Propagation Network". Author: Yann LeCun et al. Advances in Neural Information Processing Systems (1989);"Convolutional Networks for Images, Speech, and Time Series". Author: Yann LeCun, Yoshua Bengio. Publication: MIT Press (1995);"Bidirectional Long Short-Term Memory Networks for Machine Reading". Author: Alex Graves, Jürgen Schmidhuber. Publication: Advances in Neural Information Processing Systems (2005);"Tesseract: An Open-Source Optical Character Recognition Engine". Author: Ray Smith. Publication: Google Research (2007);All of the previous and similar are fundamental in having presented either theoretical advancements or appurtenant surveys (artificial neural networks, including convolutional; different template and lines matching; types and uses of memory; binarization original methodology; computer vision and pattern recognition; more advanced pattern analysis under machine intelligence in deep learning text recognition, etc). Several previous patent documents that relate to the field of OCR technology are listed herewith: DE Patent 507 041, filed by Gustav Tauschek and is directed to an early OCR machine (1929);US Patent 2,663,758, filed by David Hammond Shepard on 1951 and US Patent 2,933,246, filed by Jacob Rabinow address the general outlines of Character Recognition (CR) by optical means (from mechanical analog to electronic digital standpoints);US Patent 6,199,042, filed by Kurzweil on 2001 discloses a technology catalyser for sub-niches of different technical explorations & enablers (neural networks, recognizing techniques for handwritten characters, general-purpose improving engines and techniques, OCR accuracy improvement by the use of multiple classifiers, idem the latter for the most challenging documents, diversification of specialized applications, adaptive processing, using of noise-tolerant n-gram models, handling of multiple languages, machine learning techniques, etc.). The first OCR technical solutions, forebearers of modern OCR technology, such as Gustav Tauschek's "Reading Machine" (1930s), were mechanical-analog electric conductivity ray controllers, wherein the apparatus and the provision of light positively trapped each other at the indicative "cells" for each character. These were opposite to the lens, after the carriers' placement, and whence from, on the other side of the lens a photo-electric cell was onwards illuminated from a source of light, and in front of which a wheel was moving (with stencil-recesses with numerals or other character-bearings, different from the blades of the wheel as control means to open or close). Next in importance, D. H. Shepard 's first "Apparatus for Reading" (1950s) optical system, to the end of printed, or else punched openings particular characters sensing/scanning, also relied on light reflected through a lens system, retrieving the proper image of the character on the surface of a so-called rotating mask (with sectors at a time centrally disposed, alternatively a travelling belt), dully endowed with openings to a second lens system, all throughout focused onto a photoelectric cell or alike. Presumed was a sort of "hitting" on at least a portion of each character, relative to a selected arrangement (in radial angular direction) of openings of the mask. This was exemp