KR-20260062346-A - ACTIVE LEARING METHOD AND APPARATUS FOR IMPROVING KNOWLEDGE GRAPH ACCURACY
Abstract
According to one embodiment of the present invention, a knowledge graph improvement device comprises a calculation unit configured to calculate validity scores of triples within a knowledge graph based on an embedding model, a selection unit configured to select error triples by sampling the triples based on the validity scores, and a modification unit configured to update the knowledge graph by modifying the labels of the error triples, wherein the selection unit can sort the triples using the validity scores according to at least one of the relationships of the triples in the knowledge graph and clusters configured for the triples.
Inventors
- 이경호
- 황석주
- 김동현
- 이경화
Assignees
- 대한민국(방위사업청장)
Dates
- Publication Date
- 20260507
- Application Date
- 20241029
Claims (11)
- In a device for improving a knowledge graph, A calculation unit configured to calculate validity scores of triples within a knowledge graph based on an embedding model; A selection unit configured to select error triples by sampling the triples based on the validity score; and It includes a modification unit configured to update the knowledge graph by modifying the label of the error triple, The above selection unit is, A device characterized by sorting the triples using the validity score for at least one of the relationships between the triples and the clusters formed for the triples in the knowledge graph.
- In paragraph 1, The above selection unit is, A device characterized by selecting a sample starting from the triple with the lowest validity score among the above triples.
- In paragraph 1, The above calculation unit is, Calculate the uncertainty of the model prediction based on the above validity score, and The above selection unit is, A device characterized by selecting a sample starting from the triple with the highest uncertainty among the above triples.
- In paragraph 1, The above selection unit is, A device characterized by clustering the triples based on the embeddings of the triples obtained from the above embedding model.
- In paragraph 1, The above embedding model is, A device characterized by being trained to minimize binary cross-entropy loss based on the above triples and negative triples generated from the above triples.
- In a method for a knowledge graph improvement device to improve the accuracy of a knowledge graph, A step of calculating validity scores of triples within a knowledge graph based on an embedding model; A step of selecting error triples by sampling the triples using the validity score for at least one of the relationships of the triples and clusters configured for the triples in the knowledge graph; and Step of updating the knowledge graph by modifying the labels of the above error triples A method including
- In paragraph 6, The above-mentioned selection step is, A method characterized by including the step of selecting a sample starting from the triple with the lowest validity score among the above triples.
- In paragraph 6, The above calculation step is, It further includes a step of calculating the uncertainty of the model prediction based on the above validity score, and The above-mentioned selection step is, A method characterized by including the step of selecting a sample starting from the triple with the highest uncertainty among the triples.
- In paragraph 6, Prior to the above selection step, A method characterized by further including the step of clustering the triples based on the embeddings of the triples obtained from the above embedding model.
- In paragraph 6, The above embedding model is, A method characterized by being trained to minimize binary cross-entropy loss based on the triples and negative triples generated from the triples.
- As a computer program stored on a computer-readable recording medium, When the above computer program is executed by a processor, A step of calculating validity scores of triples within a knowledge graph based on an embedding model, and A step of selecting error triples by sampling the triples using the validity score for at least one of the relationships between the triples and the clusters configured for the triples in the knowledge graph, and A computer program comprising instructions for a processor to perform a method for improving the accuracy of a knowledge graph, which includes the step of updating the knowledge graph by modifying the labels of the error triples.
Description
Active Learning Method and Apparatus for Improving Knowledge Graph Accuracy The present invention relates to a knowledge graph, and more specifically, to an active learning method and apparatus for improving the accuracy of a knowledge graph. A knowledge graph is a graph-structured database that represents relational knowledge in the form of triples (head, relation, tail), and it is widely used in various application fields to effectively manage and utilize domain knowledge. However, because the structured knowledge constituting the knowledge graph is extracted automatically from large-scale data sources, incorrect knowledge—specifically, error triples—can occur. This leads to a problem where the overall accuracy of the knowledge graph is lowered due to these error triples. To improve the accuracy of a knowledge graph, a process of identifying and properly correcting erroneous triples is required. Furthermore, the participation of domain knowledge experts is necessary to guarantee the reliability of these accuracy improvements. In particular, the involvement of domain experts is essential because knowledge in the defense and military domains requires a higher level of understanding compared to general domain knowledge. However, since knowledge graphs generally consist of hundreds of thousands to hundreds of millions of triples, it is practically impossible for domain knowledge experts to verify and correct every single triple due to scale constraints. To address this issue, a method is required to effectively identify and correct erroneous triples within large-scale knowledge graphs. FIG. 1 is a drawing showing a knowledge graph improvement device according to one embodiment of the present invention. FIG. 2 is a diagram illustrating a process for improving the accuracy of a knowledge graph according to an embodiment of the present invention. FIG. 3 is a diagram showing the validity score used in one embodiment of the present invention. FIG. 4 is a diagram showing the long-tail distribution of a knowledge graph used in one embodiment of the present invention. Figure 5 is a drawing showing the results of improving the accuracy of a knowledge graph according to one embodiment of the present invention. FIG. 6 is a diagram illustrating a method for improving the accuracy of a knowledge graph according to an embodiment of the present invention. The advantages and features of the present invention and the methods for achieving them will become clear by referring to the embodiments described below in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below but may be implemented in various different forms. These embodiments are provided merely to ensure that the disclosure of the present invention is complete and to fully inform those skilled in the art of the scope of the invention, and the present invention is defined only by the scope of the claims. Accordingly, in some embodiments, well-known processing steps, well-known device structures, and well-known techniques are not specifically described to avoid the present invention being interpreted ambiguously. The terms used in this specification have been selected from among currently widely used general terms while considering their functions in the present invention; however, these may vary depending on the intent of those skilled in the art, case law, the emergence of new technologies, etc. Additionally, in specific cases, terms have been arbitrarily selected by the applicant, and in such cases, their meanings will be described in detail in the relevant description of the invention. Therefore, the terms used in this specification should be interpreted not merely by their names, but based on their meanings and the overall content of the present invention. Throughout this specification, when a part is described as 'comprising' a certain component, this means that, unless specifically stated otherwise, it does not exclude other components but may include additional components. Additionally, in this specification, the components of a knowledge graph improvement device may refer to software or hardware components such as a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC), and perform at least one function or operation. However, the components are not limited to software or hardware. The components may be configured to reside in an addressable storage medium or configured to operate one or more processors. Accordingly, by example, the components include components such as software components, object-oriented software components, class components, and task components, as well as processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables. The functions provided by the components of the present invention may be combined into a smaller