Search

US-12619638-B2 - System for verifying ambiguous entities in documents

US12619638B2US 12619638 B2US12619638 B2US 12619638B2US-12619638-B2

Abstract

A system for verifying ambiguous entities in documents. The system comprises a processor configured to tag at least one entity in at least one page of a document, create an entity relationship among the tagged entities of each of the at least one page, receive a question from a user for the document, generate at least one corresponding answer for said question, generate an embedded question and an embedded corresponding answer for said question and each of the at least one corresponding answer, respectively, retrieve k-similar answers with associated labels for each of the at least one corresponding embedded answer for said question, and validate each of the corresponding at least one embedded answer if the labels associated with the embedded question, the corresponding at least one embedded answer, and each of the k-similar answers match.

Inventors

  • Jianglong He
  • Deepak Kumar

Assignees

  • Infrrd Inc

Dates

Publication Date
20260505
Application Date
20240104

Claims (8)

  1. 1 . A system for verifying ambiguous entities in documents, the system comprising: one or more processors configured to: tag at least one entity in at least one page of a document, wherein: the document is received by a server; and tagging associates a label with each of the entities; create an entity relationship among the tagged entities of each of the at least one page; receive a question from a user for the document; generate at least one corresponding answer for said question; generate an embedded question and an embedded corresponding answer for said question and each of the at least one corresponding answer, respectively; retrieve k-similar answers with associated labels for each of the at least one corresponding embedded answer for said question; and validate each of the corresponding at least one embedded answer if the labels associated with the embedded question, the corresponding at least one embedded answer, and each of the k-similar answers match; an Ambiguous Entity Verifier (AEV) module, wherein the AEV module comprises of: an input module configured to receive the document; a tagging and relationship module configured to: label each of the entities on each of the pages of the document; and create an entity relationship among the tagged entities; a hierarchical database comprises a vector database, wherein the vector database is configured to store plurality of vector embedding's with associated labels, wherein the hierarchical database is configured to store the plurality of vector embedding's and k-similar answers in a hierarchical manner; a decision module configured to receive at least one validated answer and determine a verified answer, wherein the decision module comprises: a context generator module configured to generate a context associated with said question; a pooling module configured to determine a pool output based on the similarity scores associated with each of the k-similar answers associated with each of the corresponding embedded answer; and a Language Model (LM) decoder configured to: receive the at least one validated answer with a confidence score and the context associated with said question; and determine the verified answer, wherein the verified answer is among the at least one validated answer; and an output module configured to output the verified answer chosen by the decision module; and a Verification Retriever Module (VRM), wherein the VRM comprises: a Question and Answer (QA) module configured to: direct the question to each of the at least one page of the document; and generate at least one corresponding answer along with the confidence score for said question; a Language Model (LM) embedding module that is configured to generate the embedded question and the embedded corresponding answer for said question and each of the at least one corresponding answer, respectively; a similar answer (SA) retriever module configured to retrieve the k-similar answers from the hierarchical database; a similarity score module configured to generate a similarity score for each of the retrieved k-similar answers; and a validation module configured to validate the corresponding embedded answer if the generated labels of said embedded question, each of the corresponding at least one embedded answer, and each of the k-similar answers match.
  2. 2 . The system according to claim 1 , wherein the system is configured to receive a digital document, wherein: at least one page of the document is converted to an image; and an optical character recognition is performed to extract text from said image.
  3. 3 . The system according to claim 1 , wherein the pool output is a maximum pool value, wherein the maximum pool value is the similarity score associated with one similar answer among the k-similar answers, wherein the said similar answer is closest to the corresponding embedded answer.
  4. 4 . The system according to claim 1 , wherein the pool output is a mean pool value, wherein the mean pool value is the mean of similarity scores associated with k-similar answers.
  5. 5 . The system according to claim 1 , wherein the hierarchical database is configured to: store labels associated with the embedded question and each of the corresponding embedded answers in hierarchical manner; and store context associated with the question in hierarchical manner.
  6. 6 . The system according to claim 5 , wherein the hierarchical database is configured to: store all entity labels at a base level; store grouped entity labels at an advanced level; store entity label associated with the question at a question level; and store the context at a context level, wherein the context comprises of vicinity, text and additional information associated with the entity.
  7. 7 . A method for verifying ambiguous entities in documents, the method comprising the steps of: receiving, by an input module, a document with at least a first page; tagging, by a tagging and relationship module, each entity present on the first page in the document; creating, by the tagging and relationship module, a relationship among the tagged entities present on the first page; receiving from a user, by a Question and Answer (QA) module, a first question; generating, by the Question and Answer (QA) module, a first corresponding answer along with a first confidence score to said first question; generating, by a Language Model (LM) embedding module, a first embedded question and a first corresponding embedded answer for said first question and the first corresponding answer, respectively; retrieving, by a Similar Answer (SA) retriever module, first k-similar answers for the first corresponding embedded answer, wherein the SA module is configured to: retrieve k-similar answers with associated labels from a hierarchical database; and receive similarity scores for each of the k-similar answers; validating, by a validation module, the first corresponding embedded answer if the generated labels for the first embedded question, the first corresponding embedded answer, and the first k-similar answers match; outputting, by the validation module, a first Verification retriever module (VRM) answer, wherein the first VRM answer comprises the first corresponding embedded answer along with the first confidence score and the first k-similar answers; receiving, by a pooling module the first VRM answer with the first answer and the k-similar answers associated with the first answer, wherein similarity scores are associated with the first k-similar answers; calculating, by the pooling module a first pool output for the similarity scores associated with the k-similar answers associated with the first VRM answer; receiving, by a Language Model (LM) decoder, the first VRM answer with the first pool output; receiving, by the LM decoder, a first context associated with said first question; and outputting, by an output module, a first verified answer, wherein the first verified answer is the first answer with the first confidence score.
  8. 8 . The method according to claim 7 , wherein the method further comprising the steps of: receiving, by the pooling module: the first VRM answer with the first answer, the first confidence score, and the k-similar answers associated with the first answer, wherein similarity scores are associated with the first k-similar answers; a second VRM answer with a second answer, a second confidence score, and second k-similar answers associated with the second answer, wherein similarity scores are associated with the second k-similar answers; and a Nth VRM answer with a Nth answer, a Nth confidence score, and Nth k-similar answers associated with the Nth answer, wherein similarity scores are associated with the Nth k-similar answers; calculating, by the pooling module: the first pool output for the similarity scores associated with the first k-similar answers; a second pool output for the similarity scores associated with the second k-similar answers; and a Nth pool output for the similarity scores associated with the Nth k-similar answers; receiving, by the Language Model (LM) decoder, the first answer with the first pool output, the second answer with the second pool output, the Nth answer with the Nth pool output; receiving, by the LM decoder, at least one context associated with said first question; and outputting, by the output module, at least one verified answer, wherein the at least one verified answer is among the first answer with a first reconfigured confidence score, the second answer with a second reconfigured confidence score, and the Nth answer with a Nth reconfigured confidence score.

Description

FIELD OF THE INVENTION This application relates generally to the field of verifying ambiguous entities in a document, more particularly, verifying the ambiguity of the entities present in the document based on the context. BRIEF STATEMENT OF THE PRIOR ART The increasing volume of digital information and documents poses significant challenges in accurately extracting and verifying entities with ambiguous references. Traditional document processing systems often struggle to appropriately tag and understand ambiguous entities within the documents, leading to potential misinterpretations and errors. Current document processing systems typically lack the sophistication needed to handle ambiguous entities effectively. Ambiguous entities are those that may have multiple interpretations or meanings based on the context within a document. Conventional systems may fail to accurately tag such entities and establish their relationships within the document, leading to challenges in later retrieval and validation processes. Moreover, user queries about documents with ambiguous entities often yield suboptimal results from existing systems. The lack of context-awareness and advanced processing capabilities hampers the generation of relevant and accurate answers to user queries. In many cases, users may receive answers that do not fully consider the nuanced nature of ambiguous entities within the document. The absence of a robust entity relationship mechanism in current systems further compounds the problem. Establishing connections and dependencies among tagged entities within a document is crucial for understanding the overall context and meaning. Existing systems often fall short in creating comprehensive entity relationships, limiting their ability to provide accurate and contextually relevant answers to user queries. Due to the shortcomings in the currently available approaches, there is a need for an innovative Ambiguous Entity Verification System in Document Processing. SUMMARY OF THE INVENTION In an embodiment, a system for verifying ambiguous entities in documents is disclosed. The system comprising one or more processors configured to tag at least one entity in at least one page of a document, wherein tagging implies associating of a label with each of the entities. Further, the processor is configured to create an entity relationship among the tagged entities of each of the at least one page, receive a question from a user for the document, generate at least one corresponding answer for said question, generate an embedded question and an embedded corresponding answer for said question and each of the at least one corresponding answer, respectively. The processor is further configured to retrieve k-similar answers with associated labels for each of the at least one corresponding embedded answer for said question, and validate each of the at least one corresponding embedded answer, if the labels associated with the embedded question, the corresponding at least one embedded answer, and each of the k-similar answers match. BRIEF DESCRIPTION OF DRAWINGS Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which: FIG. 1 illustrates a system 100 for verifying ambiguous entities in documents, in accordance with an embodiment. FIG. 2 illustrates architecture of a Verification Retriever Module (VRM) 116, in accordance with an embodiment. FIG. 3 illustrates architecture of a decision module 118, in accordance with an embodiment. FIG. 4 illustrates a flowchart 400 depicting the working of VRM 116 in an Ambiguous entity verifier (AEV) module 110, in accordance with an embodiment. FIG. 5 illustrates a flowchart 500 depicting the working of the decision module 118 in the AEV module 110, in accordance with an embodiment. FIGS. 6A and 6B illustrate a flowchart 600 verifying ambiguous entities in documents with N pages, in accordance with an embodiment. FIGS. 7A and 7B illustrate a flowchart 700 verifying ambiguous entities in documents, in accordance with an example embodiment. DETAILED DESCRIPTION The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show illustrations in accordance with example embodiments. These example embodiments, which may be herein also referred to as “examples” are described in enough detail to enable those skilled in the art to practice the present subject matter. However, it may be apparent to one with ordinary skill in the art, that the present invention may be practised without these specific details. In other instances, well-known methods, procedures and components have not been described in detail so as not to unnecessarily obscure aspects of the embodiments. The embodiments can be combined, other embodiments can be utilized, or structural, logical, and design changes can be made without departing