US-20260127907-A1 - SYSTEM AND METHOD FOR DETECTING AND ASSOCIATING ELEMENTS IN AN IMAGE
Abstract
A system for detecting and associating elements in an image comprises detecting a plurality of text tokens in a query image. A first element is determined based on an entry point object list and the plurality of text tokens, wherein the entry point object list comprises of text objects or template shapes present in the query image. A plurality of region of interests (ROIs) around the first element in the query image are determined and a plurality of ROI images are created from the query image based on the plurality of ROIs. Determine a potential second element present in the plurality of ROI images. Generate a confidence score for each of the potential second elements. Filter the results based on the confidence score and a predetermined threshold to determine a second element. Subsequently, associate the second element with the first element as single component.
Inventors
- Yogananda Ganesh Kashyap Ramaprasad
- Srirama R Nakshathri
- Pratyusha Rasamsetty
- Deepak Kumar
Assignees
- Infrrd Inc
Dates
- Publication Date
- 20260507
- Application Date
- 20241107
Claims (13)
- 1 . A system for detecting and associating elements in an image, the system comprising one or more processors configured to: detect a plurality of text tokens in a query image; determine a first element based on an entry point object list and the plurality of text tokens, wherein the entry point object list comprises of text objects or template shapes present in the query image; determine a plurality of region of interests (ROIs) around the first element in the query image; create a plurality of ROI images from the query image based on the plurality of ROIs; determine potential second element present in the plurality of ROI images; generate a confidence score for each of the potential second elements; filter the potential second elements based on the confidence score and a predetermined threshold to determine a second element; and associate the second element with the first element as single component.
- 2 . The system according to claim 1 , wherein the one or more processors are configured to perform optical character recognition (OCR) on the query image to detect the plurality of text tokens, wherein the plurality of text tokens comprises of detected text and associated location coordinates.
- 3 . The system according to claim 2 , wherein the one or more processors are configured to: perform text matching between a first text from the entry point object list and the plurality of text tokens to determine partial matches and exact matches; perform named entity recognition (NER) to obtain NER predictions; aggregate and determine matched text tokens based on the exact matches, the partial matches and the NER predictions; determine location coordinates of each of the matched text tokens; and store, as a first element, each matched text token with associated location coordinates.
- 4 . The system according to claim 3 , wherein the one or more processors are configured to: create a bounding box around the detected text; store the location coordinates of the detected text along with location coordinates associated with the bounding box; and perform NER based on the location coordinates associated with the bounding box.
- 5 . The system according to claim 1 , wherein the one or more processors are configured to: generate a contour around a first template shape from the entry point object list; fit a first polygon on the generated contour; identify template shapes in the query image and generate contour around the identified template shape; determine a second polygon based on the generated contour around the identified template shape; match the first polygon with the second polygon, wherein the second polygon is associated with the identified template shape; determine location coordinates corresponding to each matched template shape; overlay text tokens from the plurality of text tokens present within a predefined region based on the location coordinates corresponding to each matched template shape; and store the matched template shape, corresponding location coordinates, and the overlaid text tokens as a first element.
- 6 . The system according to claim 5 , wherein the one or more processors are configured to: determine internal angles of the first polygon; determine internal angles of the second polygon associated with each template shape; and match the multiple template shapes with the first template shape based on the internal angles of the first polygon and the internal angles of the second polygon.
- 7 . The system according to claim 6 , wherein the one or more processors are configured to: determine a scale associated with the query image; and estimate dimensions of the plurality of shapes based on the location coordinates corresponding to each matched template shape and the determined scale associated with the query image.
- 8 . The system according to claim 1 , wherein the one or more processors are configured to determine at least two regions of interest (ROIs) of varying sizes around the first element in the query image.
- 9 . The system according to claim 1 , wherein the one or more processors are configured to generate the confidence score based on proximity of the potential second element and the first element.
- 10 . The system according to claim 1 , wherein the one or more processors are configured to determine atleast one second element from the plurality of potential second elements, wherein the confidence score associated with the second element is higher than the predetermined threshold.
- 11 . A method for detecting and associating elements in an image, the method executed by one or more processors comprising the steps of: detecting, by an optical character recognition (OCR) module, a plurality of text tokens in a query image; determining, by a detection module, a first element based on an entry point object list and the plurality of text tokens, wherein the entry point object list comprises of text objects or template shapes present in the query image; determining, by a region of interest (ROI) module, a plurality of region of interests (ROIs) around the first element in the query image; creating, by an image creator module, a plurality of ROI images from the query image based on the plurality of ROIs; determining, by the detection module, a potential second element present in the plurality of ROI images; generating, by a confidence score module, a confidence score for each of the potential second element; filtering, by the confidence score module, the potential second element based on the confidence score and a predetermined threshold to determine a second element; and associating, by an association module, the determined second element with the first element as a single component.
- 12 . The method according to claim 11 , wherein the detection module may be configured to execute the steps of: performing text matching between a first text from the entry point object list and the plurality of text tokens to determine partial matches and exact matches; performing named entity recognition (NER) to obtain NER predictions; aggregating and determining matched text tokens based on the exact matches, the partial matches and the NER predictions; determining location coordinates of each of the matched text tokens; and storing each matched text token with associated location coordinates as a first element.
- 13 . The method according to claim 11 , wherein the detection module may be configured to execute the steps of: generating a contour around a first template shape from the entry point object list; fitting a first polygon on the generated contour; identifying template shapes in the query image and generating contours around the identified template shapes; determining a second polygon based on the generated contours around each of the identified template shapes; matching the first polygon with the second polygon, wherein the second polygon is associated with the identified template shape; determining location coordinates corresponding to each matched template shape; overlaying text tokens from the plurality of text tokens present within a predefined region based on the location coordinates corresponding to each matched template shape; and storing the matched template, corresponding location coordinates, and the overlaid text tokens as a first element.
Description
BACKGROUND Field The disclosed technology relates to image processing, specifically to the detection and association of elements within piping and instrumentation diagrams, to enhance the efficiency of retrieving relevant information for end users. Description of the Related Art Piping and instrumentation diagrams (P&IDs) are widely used in engineering and industrial settings to represent the arrangement of piping, equipment, and instrumentation within a system. These diagrams are crucial for design, maintenance, and operational tasks. However, the complexity of these diagrams, which often include numerous interconnected elements (or components), may make it challenging for users to quickly locate and associate specific components. The need to efficiently detect and relate elements in P&IDs is well recognized. Traditional methods involve manual inspection or the use of basic software tools that allow for limited search and identification of components. These approaches are often time-consuming and prone to error, particularly when dealing with large-scale or intricate diagrams where visual elements are dispersed across the document. Existing techniques for element detection in P&IDs typically focus on textual elements using optical character recognition (OCR). While effective for text, these methods fall short when it comes to visual elements such as symbols or icons that represent various components. Furthermore, these methods do not adequately address the need to associate related elements that are spatially or contextually linked but positioned at different locations within the diagram. In view of the foregoing, there is a growing demand for an improved system that can not only detect visual elements within P&IDs but also establish relationships between them. Such a system would enhance the usability of these diagrams, allowing for faster and more accurate retrieval of relevant information, thereby improving efficiency in engineering and industrial workflows. SUMMARY A system for detecting and associating elements in an image is disclosed. The system comprising one or more processors may be configured to detect a plurality of text tokens in a query image. A first element may be determined based on an entry point object list and the plurality of text tokens, wherein the entry point object list may comprise of text objects or template shapes present in the query image. A plurality of region of interests (ROIs) may be determined around the first element in the query image and a plurality of ROI images may be created from the query image based on the plurality of ROIs. Further, a potential second element present in the plurality of ROI images may be determined and a confidence score for each of the potential second elements may be generated. Subsequently, the potential second element may be filtered based on the confidence score and a predetermined threshold to determine a second element, and the second element may be associated with the first element as single component. BRIEF DESCRIPTION OF DRAWINGS The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which: FIG. 1A illustrates a system 100 enabling detection and association of elements present in piping and instrumentation diagrams (P&IDs), in accordance with an embodiment. FIG. 1B illustrates a pipe mapping module 110, in accordance with an embodiment. FIG. 2 illustrates a flowchart 200 for a method enabling the detection and association of elements within P&IDs, in accordance with an embodiment. FIG. 3 illustrates an entry point object list 300, in accordance with an example embodiment. FIG. 4A illustrates a flowchart 400A for identification of a first element based on a first text object among plurality of text objects (302, 304), in accordance with an embodiment. FIG. 4B illustrates a flowchart 400B for identification of the first element based on a first template shape among plurality of template shapes (306, 308), in accordance with an embodiment. FIG. 5 illustrates a flowchart 500 to perform visual object segmentation (VOS) on plurality of ROI images, in accordance with an embodiment. FIG. 6 depicts a query image 106, in accordance with an example embodiment. FIG. 7 depicts a portion 700 of the query image 106 with each instance of the detected text object 302, in accordance with the example embodiment. FIG. 8 depicts a portion 800 of the query image 106 with each instance of the detected template shape 306, in accordance with the example embodiment. FIG. 9A depicts an ROI image 900A created based on a small ROI created around the text object 302, in accordance with the example embodiment. FIG. 9B depict