DE-102024210901-A1 - Device, computer-implemented method, and data structure for clustering entities of a knowledge graph

DE102024210901A1DE 102024210901 A1DE102024210901 A1DE 102024210901A1DE-102024210901-A1

Abstract

A device, a computer-implemented method, and a data structure for clustering entities of a knowledge graph, wherein the method comprises: providing a knowledge graph, the knowledge graph comprising a set of entities; determining a kernel, the kernel comprising the similarities between pairs of entities (x, y) of the set of entities; determining parameters and a clustering of entities of the set of entities with an objective function that depends on a composition of a similarity measure and a given heuristic and a function that measures the quality of the clustering, the similarity measure depending on the kernel and the parameters.

Inventors

Sergei Chubanov

Assignees

Robert Bosch Gesellschaft mit beschränkter Haftung

Dates

Publication Date: 20260513
Application Date: 20241113

Claims (12)

A computer-implemented method for clustering entities of a knowledge graph, characterized in that the method comprises: providing (202) the knowledge graph, wherein the knowledge graph comprises a set of entities, determining (204) a kernel, wherein the kernel comprises the similarities between pairs of entities (x, y) of the set of entities, determining (208) parameters and a clustering of entities of the set of entities with an objective function that depends on a composition of an auxiliary similarity measure and a given heuristic and a function that measures the quality of the clustering, wherein the auxiliary similarity measure depends on the kernel and the parameters.
Procedure according to Claim 1 , characterized in that the method comprises determining (208) the auxiliary similarity measure using a neural network designed to map the kernel and parameters to the auxiliary similarity measure.
Procedure according to one of the Claims 1 or 2 , characterized in that the method comprises determining (204) the similarity of at least one pair of entities (x, y) in the kernel as a function of the similarity of lexical information associated with a first entity in the knowledge graph and of lexical information associated with a second entity in the knowledge graph, wherein the first entity is located in a first path of the knowledge graph (G), wherein the second entity is located in a second path of the knowledge graph, wherein the first path begins at or ends at one of the entities (x) of the pair of entities (x, y), and wherein the second path begins at or ends at the other entity (y) of the pair of entities (x, y).
Procedure according to Claim 3 , characterized in that the method comprises determining (204) a respective weight associated with the pair of the first entity and the second entity depending on the similarity of the lexical information associated with the first entity and the lexical information associated with the second entity, wherein the first entity is associated with the entity at the beginning or end of the first path with the same sequences of predicates with which the second entity is associated with the entity at the beginning or end of the second path.
Method according to one of the preceding claims, characterized in that the method comprises: providing (202) the knowledge graph with the set of entities, which includes entities representing physical quantities and which are labelled with the name of the respective physical quantity, and determining (208) the clustering depending on the objective function.
Procedure according to Claim 5 , characterized in that the method comprises: receiving a name of a physical quantity, finding a cluster comprising an entity labelled with the name of the physical quantity, in clustering, selecting another entity of the cluster labelled with the name of another physical quantity, and measuring the other physical quantity.
Method according to one of the preceding claims, characterized in that the method comprises outputting (210) the clustering.
Device (100) for clustering entities of a knowledge graph, characterized in that the device (100) comprises at least one processor (102) and at least one memory (104), wherein the at least one memory (104) stores instructions which, when executed by the at least one processor (102), cause the device (100) to execute the method according to one of the preceding claims, wherein the at least one processor (102) is designed to execute the instructions.
Computer program, characterized in that the computer program includes instructions which, when executed by a computer, cause the computer to perform the procedure according to one of the Claims 1 until 7 to execute.
Data structure (600), characterized in that the data structure (600) comprises at least one data field (602) for storing a knowledge graph, wherein the knowledge graph is a set of entities ten comprises, wherein the data structure (600) includes at least one data field (602) for storing a kernel, wherein the kernel comprises the similarities between pairs of entities (x, y) of the set of entities, wherein the data structure (600) includes at least one data field (602) for storing a clustering of entities of the set of entities and parameters determined by an objective function which depends on a composition of a similarity measure and a given heuristic and a function which measures the quality of the clustering, wherein the similarity measure depends on the kernel and the parameters.
Data structure (600) according to Claim 10 , characterized in that the data structure (600) includes at least one data field (602) for storing a neural network designed to map the kernel and parameters to the similarity measure.
Data structure (600) according to Claim 10 or 11 , characterized in that the data structure (600) comprises at least one data field (602) for storing the similarity of at least one pair of entities (x, y) in the kernel, which is determined depending on a similarity of lexical information associated in the knowledge graph with a first entity and of lexical information associated in the knowledge graph with a second entity, wherein the first entity is located in a first path of the knowledge graph, wherein the second entity is located in a second path of the knowledge graph, wherein the first path begins at one of the entities (x) or ends at one of the entities (x) of the pair of entities (x, y), and wherein the second path begins at the other entity (y) or ends at the other entity (y) of the pair of entities (x, y).

Description

State of the art The invention relates to a device, a computer-implemented method and a data structure for clustering entities of a knowledge graph. Disclosure of the invention A computer-implemented method for clustering entities in a knowledge graph, characterized in that the method comprises: providing a knowledge graph, wherein the knowledge graph comprises a set of entities; determining a kernel, wherein the kernel comprises the similarities between pairs of entities in the set of entities; determining parameters and a clustering of entities in the set of entities with an objective function that depends on a combination of an auxiliary similarity measure, a given heuristic, and a function that measures the quality of the clustering, wherein the auxiliary similarity measure depends on the kernel and the parameters. The method finds a clustering using any given heuristic based on the auxiliary similarity measure of pairs of entities. The auxiliary similarity measure is based on the kernel and on configuration parameters that define weights of a neural network. The auxiliary similarity measure is configured via an optimization algorithm to improve the quality of the resulting clustering found by the heuristic with respect to the original kernel. Therefore, clusters of a size that is neither too small nor too large are found. The procedure involves determining the auxiliary similarity measure using a neural network designed to map the kernel and parameters to the auxiliary similarity measure. In addition to the original kernel, the neural network captures deeper path-based similarities between the entities. The neural network enables clustering to be computed in a reasonable amount of time and improves the result of the chosen heuristic compared to what the heuristic would return if the original kernel were used to measure similarities between entity pairs. The procedure involves determining the similarity of at least one pair of entities in the kernel based on the similarity of lexical information associated with a first entity in the knowledge graph and lexical information associated with a second entity in the knowledge graph. The first entity is located in a first path of the knowledge graph, and the second entity is located in a second path. The first path begins or ends at one of the entities in the pair, while the second path begins or ends at the other entity in the pair. This means that the similarity of the at least one pair is based on pairs of entities that are lexically similar. The lexical information associated with an entity in the knowledge graph might include, for example, the entity's name, broken down into words, and optionally its synonyms. In this way, synonyms—that is, contextually similar words—are correctly identified based on the similarity of the lexical information. The procedure may involve determining a respective weight associated with the pair of first entity and second entity, depending on the similarity of the lexical information associated with the first entity and the lexical information associated with the second entity, wherein the first entity is associated with the entity at the beginning or end of the first path with the same sequences of predicates with which the second entity is associated with the entity at the beginning or end of the second path. The procedure can involve providing the knowledge graph with the set of entities representing physical quantities, labeled with the name of the respective physical quantity, and determining the clustering based on the objective function. The procedure finds a reasonably good solution in a reasonable amount of time. Using the objective function for the entities labeled with the name of the respective physical quantity leads to higher-quality clustering in terms of intra-cluster connectivity for the physical quantities. The clusters can provide access to a link between physical quantities that are not connected by a path in the knowledge graph. The procedure may include receiving a name of a physical quantity, finding a cluster containing an entity labeled with the name of the physical quantity, selecting another entity of the cluster labeled with the name of a different physical quantity, and measuring the other physical quantity. The process may include outputting the clustering results. A device for clustering entities of a knowledge graph comprises at least one processor and at least one memory, wherein the at least one memory stores instructions which, when executed by the at least one processor, cause the device to execute the procedure, wherein the at least one processor is designed to execute the instructions. A computer program comprises instructions which, when executed by a computer, cause the computer to perform the procedure. A data structure comprises at least one data field for storing a knowledge graph, wherein the knowledge graph comprises a set of entities, wherein the data structure comprises at least o