CN-115270761-B - Relation extraction method integrating prototype knowledge
Abstract
A method for extracting relation of fused prototype knowledge includes inputting training sample at input layer, utilizing pre-training language model to sign head and tail entities related to relation at coding layer, connecting hidden layer vectors corresponding to said sign in series as sentence coding vector, initializing prototype vector of each relation at prototype vector layer, minimizing distance between sample and prototype vector of corresponding relation by comparison and learning, maximizing distance between sample and prototype vector of other relation class, learning and updating prototype vector of each relation to obtain prototype knowledge, fusing prototype knowledge to coding vector of current sample by multi-head self-attentive method at memory fusion layer for feature enhancement, carrying out classification prediction on relation between entities by using a multi-classifier according to sentence coding vector fused with prototype knowledge at output layer.
Inventors
- PANG NING
- ZHAO XIANG
- XIAO WEIDONG
- GE BIN
- HU YANLI
- TAN ZHEN
- ZHANG LI
Assignees
- 中国人民解放军国防科技大学
Dates
- Publication Date
- 20260508
- Application Date
- 20220728
Claims (5)
- 1. The relation extraction method of the integrated prototype knowledge is characterized by comprising the following steps of: step 1, inputting training samples in an input layer; step 2, at the coding layer, utilizing a pre-training language model to sign the head and tail entities related to the relation, and connecting hidden layer vectors corresponding to the signs in series to be used as sentence coding vectors; Step 3, in a prototype vector layer, firstly initializing prototype vectors of each relation, then minimizing the distance between a sample and the prototype vector of the corresponding relation, maximizing the distance between the sample and the prototype vectors of other relation categories, and learning and updating the prototype vectors of each relation to obtain prototype knowledge through comparison and learning; Step 4, fusing prototype knowledge into the coding vector of the current sample in a memory fusion layer by a multi-head self-attention method for feature enhancement; Step 5, at the output layer, classifying and predicting the relation between the entities according to sentence coding vectors fused with prototype knowledge through a multi-classifier; in step 4, the memory fusion layer fuses the prototype vector as memory knowledge into the current sentence characteristic vector through a multi-head self-attention mechanism; first, is provided with Attention header for the sentence feature vector And (3) performing self-attention operation to obtain: Wherein, the Is a feature vector Is a function of the linear variation of (c), And For prototype vector sets In order to obtain relevant information from multiple views, the hidden features obtained by multiple attentions are converged to obtain: Wherein, the As a result of the parameters that can be trained, Representing the feature vectors after convergence by the self-attention layer operation, Representing the feature vector after the ith attention head operation, M represents the number of attention heads, Is a normalized operation function; Finally, the prototype knowledge and the current sentence feature vector are fused by using a feature combiner, wherein the feature combiner is formed by a full connection layer, and the combination strength of the two features is dynamically adjusted through parameter learning, and the fusion process is expressed as follows: Wherein, the And As a result of the parameters that can be trained, Representing sentence feature vectors fused with prototype knowledge.
- 2. The method of claim 1, wherein in step 2, the identifier is added before and after the head-to-tail entity for the text sentence to be input , , , Representing the beginning and end of the head entity and the beginning and end of the tail entity, respectively, the sentence is expressed as: Wherein the symbols are And Indicating the start and end of a sentence, Representing text of the first Words are input into a pre-training language model to obtain a hidden layer vector sequence of the sentence: Will identifier And The corresponding hidden layer vectors are input in series into one fully connected layer, , Representing characters And The corresponding hidden layer state is used for the display, Representation words The corresponding hidden layer state is used for the display, , , , And (3) representing the hidden layer state of the beginning and ending entity corresponding to the ending identifier to obtain a final sentence characteristic vector: Wherein, the And Is a parameter that can be trained and is, Representing a splicing operation, setting feature vectors Vector dimensions of (2) 。
- 3. The method of claim 2, wherein in step 3, the prototype vector layer first initializes a prototype vector set Wherein Represent the first Prototype knowledge of individual relationship categories, in common The prototype vector set is then updated by contrast learning, the updating principle being that the prototype vector of each relationship In the vector space, the feature vectors of all sentences belonging to the classes are close to each other, and the feature vectors of sentences belonging to the other classes are far away from each other; first, a training sample set is constructed, and a group of sentence characteristic vectors is given B represents the cardinality of a set of sentence feature vectors, the training sample set consisting of sentence-prototype pairs, for a class of relationships Sentence feature vectors of (a) Its positive example sentence-prototype pair is Negative example sentence-prototype sample pairs are expressed as ; Then, updating and optimizing the prototype vector, wherein the basic idea is to minimize the positive case distance in the training sample pair and maximize the negative case distance in the training sample in the vector space, and the following loss function is defined: Wherein, the Representing a distance function, defined as: 。
- 4. A method of extracting relationships by incorporating prototype knowledge according to claim 3, wherein in step 5, the output layer generates sentence feature vectors based on the prototype knowledge Determining, by a multi-classifier, the relationship category: Wherein, the And Is a parameter that can be trained and is, For the set of all training parameters, Representation of Belongs to the category of Is a probability of (2).
- 5. The method of claim 4, wherein the defining a loss function at the output layer is a cross entropy loss function: Wherein, the Representing an indication function when Is that True category of (2) And 1 if not, and 0 if not.
Description
Relation extraction method integrating prototype knowledge Technical Field The invention belongs to the technical field of natural language processing in artificial intelligence, and relates to a relation extraction method integrating prototype knowledge. Background Relationship extraction is a critical task in natural language processing that serves a range of downstream applications such as knowledge graph construction and information retrieval. The task is defined as selecting an appropriate relationship class label from a set of candidate relationship classes for a pair of candidate entities based on a contextual description of the pair of candidate entities in the text. Recent developments in deep learning have led to interest in relation extraction through supervised learning over neural networks. In practice, these methods require a certain amount of labeled training data. However, there is often a long tail phenomenon in these data, i.e., few training samples of partial relationship categories, which results in inaccurate recognition of these categories by the model. In order to alleviate the problem, a supervised learning method of class imbalance is proposed, in the early research, the early research mainly focuses on resampling or reassigning training data, the resampling method is to reduce the sampling frequency of a high-frequency class and increase the sampling frequency of a low-frequency relation, the reassigning method is to assign a small weight to the high-frequency class, reduce the influence of the high-frequency class on model optimization, assign a large weight to the low-frequency class and increase the influence of the low-frequency class on model optimization. Different from the methods, the proposal provides a new supervised learning method with unbalanced category, and prototype knowledge is obtained through learning, thereby achieving the purpose of transferring information from a high-frequency relation to a low-frequency relation. In the supervised learning problem of unbalanced category, the long tail phenomenon exists in data distribution, and most categories only contain a small number of samples, so that the overall performance of the model and the universality of the model are greatly influenced, and therefore, a relation extraction method integrating prototype knowledge needs to be provided to solve the problem. In the previous relation extraction studies, most studies focused only on how to improve overall performance, and did not focus on long-tailed relation categories. We have therefore observed room for improvement in the relationship extraction task, focusing on long tail phenomena in relationship extraction. The prior method mainly improves in the optimization direction of the model, and the improvement mainly reflects the training strategy of the model, including data sampling and class assignment, and does not obtain the feature vector with more discrimination among classes through contrast learning from the characteristic of the data, however, for an effective relation extraction model, obtaining the feature vector with the most discrimination for various relation learning is the key for ensuring accurate relation extraction. In addition, the previous model does not consider using prototype knowledge, so that feature fusion of the memory feature and the current example is not realized, and migration of high-frequency features like low-frequency features is not realized. In some high-frequency relationships, features may be more generic, i.e., low-frequency relationship features are expressed to some extent, and if fused into a low-frequency instance, feature migration may be implemented, thereby feature enhancement is performed on low-frequency relationship categories. Nevertheless, previous relation extraction studies largely ignore this approach to feature migration. Disclosure of Invention In view of the above, the present invention is directed to a method for extracting relationships of fusion prototype knowledge, which provides a prototype vector training method based on contrast learning and a prototype knowledge fusion method based on multi-head self-attention. Firstly, after obtaining original feature vectors of samples, pulling up the distances between sentences with the same semantics and prototypes in a vector space through contrast learning, pulling up the distances between sentences with different semantics and prototypes in the vector space to obtain prototype vectors with distinction and representativeness, secondly, after obtaining the prototype vectors, obtaining prototype features most relevant to the sentence feature vectors from a plurality of feature angles by utilizing a multi-head self-attention method, adaptively combining prototype knowledge and sentence feature vectors through a feature combiner to strengthen the feature vectors of the sentences, and finally, predicting the relation of entities involved in the sentences through a mult