CN-122020323-A - Multi-level classification method, system and medium for knowledge graph data

CN122020323ACN 122020323 ACN122020323 ACN 122020323ACN-122020323-A

Abstract

The invention belongs to the technical field of knowledge graph data classification, and discloses a multi-level classification method, a system and a medium for knowledge graph data, wherein the multi-level classification method for knowledge graph data firstly carries out preprocessing on original data to obtain standard data, and then establishing a multidimensional feature space containing entity dimension, relation dimension and attribute dimension, constructing a feature matrix and carrying out layering treatment. And then, carrying out feature extraction and fusion by adopting a deep neural network classification model, and realizing self-adaptive optimization of features by using a feature matrix optimization equation set, wherein the method comprises the steps of feature importance evaluation, feature correlation calculation, feature weight optimization and the like. Finally, a final classification model is obtained through model training and optimization, multi-level classification of the knowledge-graph data is realized, and the technical problem of low classification precision caused by insufficient extraction of the knowledge-graph data features in the prior art is solved.

Inventors

HE KUN
ZHU JIANPING

Assignees

华中科技大学

Dates

Publication Date: 20260512
Application Date: 20260414

Claims (10)

1. A multi-level classification method for knowledge graph data is characterized by comprising the following steps: The training stage comprises the steps of acquiring entity dimension features, relation dimension features and attribute dimension features in knowledge graph data to construct corresponding entity feature submatrices, relation feature submatrices and attribute feature submatrices, organizing the entity feature submatrices, the relation feature submatrices and the attribute feature submatrices into a knowledge graph feature matrix, respectively carrying out matrix decomposition on the knowledge graph feature matrix, the relation feature submatrices and the attribute feature submatrices to correspondingly acquire a first layer feature set representing basic features of entity nodes in the knowledge graph, a second layer feature set representing relation features among entity nodes in the knowledge graph and a third layer feature set representing hierarchical structural features in the knowledge graph; The method comprises the steps of taking features in a first layer of feature set, a second layer of feature set and a third layer of feature set as training samples, training a deep neural network classification model, and obtaining a trained deep neural network classification model when loss is converged, wherein the deep neural network classification model comprises a feature extraction layer, a feature fusion layer and a classification layer, and the feature extraction layer is used for extracting the representation of the features in the first layer of feature set, the second layer of feature set and the third layer of feature set to respectively obtain a first layer of feature vector, a second layer of feature vector and a third layer of feature vector, fusing the feature vectors through the feature fusion layer, and obtaining a classification result through the classification layer; And the application stage comprises the steps of acquiring a first layer of feature set, a second layer of feature set and a third layer of feature set of the knowledge graph data to be classified, and inputting the first layer of feature set, the second layer of feature set and the third layer of feature set into the trained deep neural network classification model to obtain a classification result.
2. The multi-level classification method of knowledge graph data according to claim 1, wherein the feature fusion layer adopts a dynamic feature weight coefficient to adaptively fuse a first layer feature vector, a second layer feature vector and a third layer feature vector, and the calculation mode of the dynamic feature weight coefficient is as follows: Carrying out weighted summation on the use frequency, the feature coverage rate and the feature distinction degree of each feature in the current feature vector to obtain an importance score of each feature in the current feature vector; Carrying out weighted summation on cosine similarity, mutual information value, pearson correlation coefficient and distance measurement value between any two features in the current feature vector to obtain a correlation score between any two features in the current feature vector; Carrying out weighted summation on the importance score, the relevance score, the historical classification accuracy and the current loss value of the model of each feature in the current feature vector to obtain a dynamic feature weight coefficient of each feature in the current feature vector; The current feature vector is a first layer feature vector, a second layer feature vector or a third layer feature vector.
3. The knowledge-graph data multi-level classification method according to claim 2, wherein the importance score The calculation mode of (a) is as follows: ; In the formula, The frequency of use of the feature; is the feature coverage rate; Is the characteristic distinction degree; Is the corresponding weight coefficient and satisfies ; Evaluating an error term for importance; The relevance score The calculation mode of (a) is as follows: ; In the formula, For the cosine similarity degree of the reference points, Is the included angle of the characteristic vector, For the value of the mutual information, For the pearson correlation coefficient, For the distance metric value in question, Is the corresponding weight coefficient and satisfies ; Calculating an error term for the correlation; The dynamic characteristic weight coefficient The calculation mode of (a) is as follows: ; In the formula, Classifying the historical classification accuracy; the current loss value of the model; the weight coefficients of each item respectively are satisfied ; The error term is optimized for the weight.
4. The method of claim 3, wherein the first layer feature vector, the second layer feature vector, and the third layer feature vector are fused by the feature fusion layer to obtain a fused feature vector The method comprises the following steps: ; In the formula, Is a first layer feature vector; is a second layer feature vector; is a third layer feature vector; Dynamic weight coefficients of features in the first layer feature vector, the second layer feature vector and the third layer feature vector respectively A weight coefficient matrix is formed; and fusing error items for the features.
5. The method for multi-level classification of knowledge-graph data according to any one of claims 1-4, wherein obtaining entity dimension features, relationship dimension features, and attribute dimension features in knowledge-graph data comprises: Mapping entity nodes in the knowledge graph into unique numerical identifications by adopting an entity coding algorithm to obtain entity node identification characteristics, converting entity type information into feature vectors by adopting an entity type vectorization method to obtain entity type characteristics, and coding entity hierarchical relation information into numerical characteristics by adopting a hierarchical coding algorithm to obtain entity hierarchical characteristics, wherein the entity node identification characteristics, the entity type characteristics and the entity hierarchical characteristics form the entity dimension characteristics; The method comprises the steps of obtaining a relationship type characteristic by adopting a relationship type coding algorithm to code different types of relationships among entities, obtaining a relationship strength characteristic by adopting a relationship strength calculation algorithm to calculate weight values of the relationships among the entities, and obtaining a relationship direction characteristic by adopting a relationship direction coding method to represent the directionality of the relationships among the entities, wherein the relationship type characteristic, the relationship strength characteristic and the relationship direction characteristic form the relationship dimension characteristic; The method comprises the steps of adopting an attribute name coding algorithm to code attribute names of entities to obtain attribute name features, adopting an attribute value normalization method to process attribute values of different types to obtain attribute value features, adopting an attribute weight calculation algorithm to calculate importance degrees of different attributes to obtain attribute weight features, wherein the attribute name features, the attribute value features and the attribute weight features form attribute dimension features.
6. The multi-level classification method of knowledge-graph data of claim 5, wherein the entity dimension features are: ; In the formula, Is a physical dimension feature vector; identifying features for the entity node; Is a physical type feature; Is an entity hierarchy feature; Is the corresponding weight coefficient and satisfies ; The error term is a physical dimension characteristic error term; The relationship dimension is characterized in that: ; In the formula, Is a relationship dimension feature vector; is a relationship type feature; Is a relationship strength feature; is a relational directional feature; Is the corresponding weight coefficient and satisfies ; The relation dimension characteristic error term; the attribute dimension features are: ; In the formula, Is an attribute dimension feature vector; is a property name feature; is an attribute value feature; is an attribute weight feature; Is the corresponding weight coefficient and satisfies ; Is an attribute dimension feature error term.
7. The method for multi-level classification of knowledge-graph data according to any one of claims 1-4, wherein performing matrix decomposition on the knowledge-graph feature matrix, the relational feature sub-matrix, and the attribute feature sub-matrix respectively to obtain the first-layer feature set, the second-layer feature set, and the third-layer feature set, includes: The knowledge graph feature matrix is subjected to singular value decomposition, features with singular values exceeding a preset threshold are used as a first layer of feature set, the relation feature submatrix is subjected to non-negative matrix decomposition to extract the second layer of feature set, and the attribute feature submatrix is subjected to tensor decomposition to extract the third layer of feature set.
8. The method for multi-level classification of knowledge-graph data according to any one of claims 1-4, further comprising the following normalization of knowledge-graph data before obtaining entity dimension features, relationship dimension features, and attribute dimension features in the historical knowledge-graph data: ; In the formula, Is normalized data; Is knowledge graph data before standardization; Data mean value; Data standard deviation; Is a regulatory factor; is a normalized error term.
9. A multi-level classification system for knowledge graph data, which is characterized by comprising a computer readable storage medium and a processor; the computer-readable storage medium is for storing executable instructions; The processor is configured to read executable instructions stored in the computer-readable storage medium to perform the knowledge-graph data multi-level classification method of any one of claims 1-8.
10. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the knowledge-graph data multi-level classification method of any of claims 1-8.

Description

Multi-level classification method, system and medium for knowledge graph data Technical Field The invention belongs to the technical field of classification of knowledge-graph data, and particularly relates to a multi-level classification method, system and medium for knowledge-graph data. Background The multi-level classification of the knowledge graph data is an important research direction in the fields of artificial intelligence and knowledge engineering, and is mainly used for processing and analyzing large-scale knowledge graph data. The traditional knowledge graph data classification method mainly comprises a rule-based classification method, a statistical-based classification method and a deep learning-based classification method. The rule-based classification method classifies the knowledge graph data through the manually defined rule set, has strong interpretability, but the construction of the rule set requires a great deal of manual intervention, and is difficult to process complex classification scenes. The classification method based on statistics can automatically find the statistical rule in the data by classifying through calculating the statistical correlation among the features, but is difficult to effectively model the features of the nonlinear relationship. The classification method based on deep learning utilizes a deep neural network to automatically learn characteristic representation, has strong characteristic learning capability, but has the main defects that firstly, only entity and relation characteristics are generally considered in the characteristic extraction process, and the multi-level characteristic of knowledge graph data is not fully considered, so that the extracted characteristic expression capability is insufficient. And secondly, the characteristic fusion process adopts a simple characteristic splicing or linear combination mode, and the nonlinear relation and interaction between the characteristics are ignored. Again, the determination of feature weights relies primarily on empirical settings (such as setting fixed feature weights empirically) or simple statistical methods, lacking an adaptive optimization mechanism. In addition, when the existing method is used for processing a large-scale knowledge graph, the problems of feature redundancy and information loss are easy to occur due to high feature dimension and complex feature relation. Therefore, in the prior art, when the problem of classifying the knowledge graph data is solved, the classification accuracy is difficult to meet the actual application requirement due to insufficient feature extraction, simple feature fusion mode, fixed feature weight and the like. Particularly, when a knowledge graph with a complex hierarchical structure and rich semantic information is processed, the prior art is difficult to effectively capture the intrinsic characteristics and hierarchical relations of data, so that the classification effect is affected. In summary, in the prior art, there is a technical problem that the classification accuracy is low due to insufficient feature extraction of knowledge graph data. Disclosure of Invention Aiming at the defects or improvement demands of the prior art, the invention provides a multi-level classification method, a system and a medium for knowledge-graph data, and aims to solve the technical problem of low classification precision caused by insufficient extraction of the features of the knowledge-graph data in the prior art. In order to achieve the above object, the present invention provides a method for classifying knowledge graph data in multiple levels, comprising: The training stage comprises the steps of acquiring entity dimension features, relation dimension features and attribute dimension features in knowledge graph data to construct corresponding entity feature submatrices, relation feature submatrices and attribute feature submatrices, organizing the entity feature submatrices, the relation feature submatrices and the attribute feature submatrices into a knowledge graph feature matrix, respectively carrying out matrix decomposition on the knowledge graph feature matrix, the relation feature submatrices and the attribute feature submatrices to correspondingly acquire a first layer feature set representing basic features of entity nodes in the knowledge graph, a second layer feature set representing relation features among entity nodes in the knowledge graph and a third layer feature set representing hierarchical structural features in the knowledge graph; The method comprises the steps of taking features in a first layer of feature set, a second layer of feature set and a third layer of feature set as training samples, training a deep neural network classification model, and obtaining a trained deep neural network classification model when loss is converged, wherein the deep neural network classification model comprises a feature extraction layer, a feature fusion layer a