KR-20260062345-A - FACT VERIFICATION METHOD BASED ON DOMAIN ONTOLOGY AND APPARATUS THEREOF
Abstract
According to one embodiment of the present invention, a fact verification device comprises: a generation unit configured to generate sentence-triple pairs by searching for a triple that matches an input sentence among tripes in an ontology; a calculation unit configured to calculate a relevance score between the input sentence and the triple using a language model based on the sentence-triple pairs; and a verification unit configured to verify the factual status of the input sentence using a triple selected among the triples based on the relevance score, wherein the triples include a schema triple containing structural information between entities included in the triple and an instance triple containing semantic information of the entities, and the relevance score can be calculated for each of the schema triple and the instance triple.
Inventors
- 이경호
- 염규환
- 황석주
- 김동현
- 이경화
Assignees
- 대한민국(방위사업청장)
Dates
- Publication Date
- 20260507
- Application Date
- 20241029
Claims (11)
- In fact, regarding the verification device, A generation unit configured to generate sentence-triple pairs by searching for triples among the tripes in the ontology that match the input sentence; A calculation unit configured to calculate a correlation score between the input sentence and the triple using a language model based on the sentence-triple pairs above; and It includes a verification unit configured to verify the truthfulness of the input sentence using a triple selected from the triples based on the relevance score, The above triples include schema triples containing structural information between entities included in the above triples and instance triples containing semantic information of the above entities, and A fact verification device characterized in that the above relevance score is calculated for each of the above schema triple and the above instance triple.
- In paragraph 1, The above generation unit generates the sentence-triple pair by inserting a special token between the input sentence and the triple, and A fact verification device characterized by the above language model distinguishing the input sentence and the triple in the sentence-triple pair based on the above special token.
- In paragraph 1, The above language model learns the relationship between sentences and triples based on factual triples extracted from factual sentences and non-factual triples extracted from non-factual sentences, and A fact verification device characterized in that the above non-fact triple is generated by changing the type or instance of an object included in the above fact triple.
- In paragraph 1, The above verification unit is, A fact verification device characterized by selecting a plurality of schema triples and instance triples from among the triples in order of high relevance to the input sentence based on the relevance score, and inputting the input sentence, the schema triples, and the instance triples into a fact verification model to verify the factual status of the input sentence.
- In paragraph 4, The above fact verification model is, A fact verification device characterized by classifying an input sentence as a factual sentence or a non-factual sentence based on an attention value between at least one of the embeddings of the input sentence, the embeddings of the schema triples, and the embeddings of the instance tri-people.
- In a method for a fact verification device to verify the truthfulness of an input sentence based on an ontology, A step of generating sentence-triple pairs by searching for triples among the tripes in the above ontology that match the input sentence; A step of calculating a correlation score between the sentence and the triple constituting the sentence-triple pair using a language model based on the sentence-triple pair; and The method includes a step of verifying the truthfulness of the input sentence using a triple selected from the triples based on the relevance score, The above triples include schema triples containing structural information between entities included in the above triples and instance triples containing semantic information of the above entities, and A method characterized in that the relevance score is calculated for each of the schema triple and the instance triple.
- In paragraph 6, The step of generating the above is, The method includes the step of generating sentence-triple pairs by inserting a special token between the input sentence and the triple. A method characterized by the above language model distinguishing the input sentence and the triple in the sentence-triple pair based on the above special token.
- In paragraph 6, Prior to the above generating step, The method includes the step of training the language model based on fact triples extracted from fact sentences and non-fact triples extracted from non-fact sentences, wherein A method characterized in that the above non-factual triple is generated by changing the type or instance of an object included in the above factual triple.
- In paragraph 6, The above verification step is, A step of selecting a plurality of schema triples and instance triples from among the triples based on the relevance scores, in order of high relevance to the input sentence; and A method characterized by including the step of inputting the above input sentence, the above schema triples, and the above instance triples into a fact verification model to verify the truthfulness of the above input sentence.
- In Paragraph 9, The above fact verification model is, A method characterized by classifying the input sentence as a factual sentence or a non-factual sentence based on an attention value between at least one of the embeddings of the input sentence, the embeddings of the schema triples, and the embeddings of the instance tri-people.
- As a computer program stored on a computer-readable recording medium, When the above computer program is executed by a processor, A step of generating sentence-triple pairs by searching for triples among the tripes in the ontology that match the input sentence, and A step of calculating a correlation score between the sentence and the triple constituting the sentence-triple pair using a language model based on the above sentence-triple pair, and A command for the processor to perform a fact verification method comprising the step of verifying the truthfulness of the input sentence using a triple selected from the triples based on the relevance score, wherein The above triples include schema triples containing structural information between entities included in the above triples and instance triples containing semantic information of the above entities, and A computer program characterized in that the above relevance score is calculated for each of the above schema triples and the above instance triples.
Description
Fact Verification Method Based on Domain Ontology and Apparatus Thereof The present invention relates to a method and apparatus for verifying facts, and specifically, to a method and apparatus for verifying whether a statement in a special domain, such as a defense domain, is true based on an ontology. To utilize natural language processing (NLP) models in specialized fields such as the defense domain, securing high-quality data containing expertise specific to that domain is essential. However, security is critical for defense domain data, and constructing the data for training NLP models requires a high level of expertise. Therefore, to build NLP models specialized for the defense domain, an ontology capable of effectively representing this expertise is required. Conversely, existing general-purpose ontologies contain knowledge irrelevant to the defense domain and are thus unsuitable for domains that demand precise knowledge, such as defense. Furthermore, maintaining reliability is crucial in domains that require accurate knowledge and information, such as the defense domain. To utilize natural language processing models in these specialized domains, the process of securing a domain corpus is essential. However, domain corpora may contain incorrect information due to factors such as facts changing over time or human error. Moreover, when generative AI is used to acquire large amounts of data, these problems can be exacerbated by the AI failing to understand domain knowledge or by hallucinations. In specialized domains, minor errors can cause significant problems, so it is necessary to prevent incorrect data from being input during the training process. To this end, a process to verify the authenticity of each domain corpus is required. This process can be divided into methods performed manually by humans and methods utilizing fact-checking models. When performed manually, there is the issue of substantial costs involved, as domain experts must directly inspect the domain corpus. While using fact-checking models is cost-effective, it requires additional data for training, and reliability is difficult to achieve because the results cannot be interpreted. FIG. 1 is a drawing showing a fact verification device according to one embodiment of the present invention. FIG. 2 is a diagram illustrating a fact verification process according to one embodiment of the present invention. FIG. 3 is a diagram showing an ontology applied to one embodiment of the present invention. FIG. 4 is a diagram illustrating the process of generating non-factual triples for fact verification according to one embodiment of the present invention. FIG. 5 is a diagram illustrating the process of calculating a relevance score according to one embodiment of the present invention. FIG. 6 is a diagram illustrating a fact verification method according to an embodiment of the present invention. The advantages and features of the present invention and the methods for achieving them will become clear by referring to the embodiments described below in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below but may be implemented in various different forms. These embodiments are provided merely to ensure that the disclosure of the present invention is complete and to fully inform those skilled in the art of the scope of the invention, and the present invention is defined only by the scope of the claims. Accordingly, in some embodiments, well-known processing steps, well-known device structures, and well-known techniques are not specifically described to avoid the present invention being interpreted ambiguously. The terms used in this specification have been selected from among currently widely used general terms while considering their functions in the present invention; however, these may vary depending on the intent of those skilled in the art, case law, the emergence of new technologies, etc. Additionally, in specific cases, terms have been arbitrarily selected by the applicant, and in such cases, their meanings will be described in detail in the relevant description of the invention. Therefore, the terms used in this specification should be interpreted not merely by their names, but based on their meanings and the overall content of the present invention. Throughout this specification, when a part is described as 'comprising' a certain component, this means that, unless specifically stated otherwise, it does not exclude other components but may include additional components. Additionally, in this specification, the components of the fact verification device may refer to software or hardware components such as a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC), and perform at least one function or operation. However, the components are not limited to software or hardware. The components may be configured to reside in an addressable storag