CN-116383387-B - Logic logic-based joint event extraction method
Abstract
The invention discloses a joint event extraction method based on logic logic, which comprises the steps of inputting a sentence into a logic logic relation extraction model to obtain an event relation pair in the sentence, inputting each event in the event relation pair in the sentence into a joint event extraction model based on a graph attention network to obtain a extraction result corresponding to the sentence, wherein the extraction result comprises trigger word classification and argument classification. The method and the device improve the accuracy of extracting a plurality of events in the sentence.
Inventors
- SONG SHENGLI
- DUAN XINRONG
- LI JINGYANG
- Hu Guangneng
Assignees
- 西安电子科技大学
Dates
- Publication Date
- 20260512
- Application Date
- 20230406
Claims (6)
- 1. A method for extracting a joint event based on logic logic, comprising: Inputting logic sentences into a logical relation extraction model to obtain event relation pairs in the sentences; Inputting each event in the event relation pair in the sentence into a joint event extraction model based on a graph attention network to obtain a corresponding extraction result of the sentence, wherein the extraction result comprises trigger word classification and argument classification; the logic logical relation extraction model comprises a coding layer, a feature extraction layer and an event relation identification layer; Inputting logic the sentence into a logical relation extraction model to obtain an extraction result corresponding to the sentence, including: Inputting sentences into a coding layer to obtain text feature matrixes corresponding to the sentences output by the coding layer; Inputting the text feature matrix into the feature extraction layer to obtain a global and local feature representation matrix output by the feature extraction layer; Inputting the global and local feature representation matrixes into the event relation recognition layer to recognize event relation pairs in sentences; Inputting sentences into a coding layer to obtain text feature matrixes corresponding to the sentences output by the coding layer, wherein the text feature matrixes comprise: Inputting sentences into an embedding layer in the coding layer to convert each word in the sentences into word vectors, and generating word vector representation matrixes after the word vectors are coded by a BERT model; Introducing an external dictionary by using a SoftLexicon method, matching characters in sentences with the dictionary to obtain words corresponding to the characters, and respectively placing the words into four word sets according to the positions of the characters in the words, wherein B, M, E, S is the word set which respectively indicates the positions of the characters at the beginning, the middle part and the end of the words and independently forms a word; after four word sets of each character in a sentence are obtained, each word set is expressed as a vector with a fixed length, word frequency is used as a weight coefficient of each word, word vectors of all words in each set are embedded for weighted calculation, and the vectors of the word sets of each character are respectively obtained; Splicing the vectors of the four word sets corresponding to a character into the BERT word vector corresponding to the character to obtain a new word vector representation matrix ; The trigger word features of the events, the sequence features of the events and the relation connecting word features are endowed with different weights to be fused, and a multidimensional feature matrix is obtained ; Will be And (3) with Splicing to obtain final text feature matrix ; Inputting the text feature matrix into the feature extraction layer to obtain a global and local feature representation matrix output by the feature extraction layer, wherein the method comprises the following steps: Inputting the text feature matrix into the convolution layer of the feature extraction layer to obtain the final feature representation of the multi-layer convolution layer ; M is the number of convolution kernels, n is the number of words in the sentence; After the maximum pooling operation is carried out on all words, a matrix P is obtained, , The vector is obtained after the i-th word is subjected to the maximum pooling operation; Will be Inputting the self-attention layer of the feature extraction layer to obtain vocabulary level features , ; Inputting the text feature matrix into the bi-directional gating circulation unit of the feature extraction layer to obtain an output matrix Wherein the bidirectional gating circulation unit consists of a forward GRU and a reverse GRU, and the number of the hidden units is set to be s, then ; Each row of each word represents a two-way gating of each word the feature of sentence level extracted by the circulation unit; handle matrix Inputting another self-attention layer of the feature extraction layer to obtain sentence-level features ; Will be 、 Inputting the global attention mechanism layer of the feature extraction layer to obtain an output feature matrix G; Sum up matrix P and matrix Output matrix spliced to global attention layer and outputting global and local characteristic representation matrix Wherein, the method comprises the steps of, An output matrix of the last one-dimensional hidden layer of the bidirectional gating circulating unit layer; Putting all events in the event relation pair in the sentence into a set to form a text set, inputting the text set into a joint event extraction model based on a graph attention network to obtain a extraction result corresponding to the sentence, wherein the extraction result comprises the following steps: Representing word vectors into a matrix Part-of-speech embedding matrix Entity class embedding matrix Splicing together to obtain text feature matrix ; Matrix text features Inputting Bi-LSTM model to obtain output matrix ; Performing dependency syntactic analysis on the sentence by using DDParser to obtain a syntactic dependency graph, and expanding the syntactic dependency graph; taking characteristic nodes and relation edges of the syntactic dependency graph as m-th layer input of an N-order graph semantic neural network, wherein the graph semantic neural network is used for characteristic of each node in the graph Performing aggregation calculation to obtain aggregation characteristics Finally, the output of the graph attention network layer is obtained The set of the two sets, The number of nodes in the set is n+k+m; The trigger word and the argument are extracted jointly by a trigger word and argument identification layer in a joint event extraction model based on a graph attention network, a BIO labeling method is used for carrying out multi-classification tasks, an output matrix O of the upper layer is input into a full connection layer first, and a matrix is obtained after an activation function Then, a softmax layer is connected to normalize all types of vectors, so that event trigger word classification is realized; After obtaining the candidate trigger words, use The output matrix performs argument classification on entity list in sentence, and performs average pooling on multiple word vectors contained in trigger words to obtain vector representation of candidate trigger words Then handle And vectors for each of the other words Splicing and inputting the components into a fully-connected network, and connecting a softmax layer to realize the meta classification; the implementation process for expanding the syntax dependency graph comprises the following steps: two word vector nodes defining paths existing between any Is the shortest path of (a) The edge between any two adjacent term vector nodes is defined as ; Refers to the ith term vector node; the BiGRU network is adopted to fuse the characteristics of all nodes on the shortest path of two word vector nodes, and the outputs of the GRU in the front and back directions are respectively And Will be And Spliced together to obtain a fused feature vector h, namely BiGRU output at the time t Taking the two nodes as surrounding nodes of the two nodes respectively; finally, obtaining the extended syntax dependency graph Where V is a set of nodes, comprising three subsets 、 And , Is a set of n character vector nodes, n is the sentence length, Is a collection of k word vector nodes after word segmentation, The size of the surrounding node set of each word vector node calculated by the shortest path algorithm is m.
- 2. The method of claim 1, wherein the event relationship identification layer employs a conditional random field CRF model; Let a tag sequence of CRF output be The total score for one tag sequence L is then: Wherein A is a transfer score matrix, Representing slave tags To the tag Is used for the transition probability of (1), Indicating that the ith character is in the label A lower score; Maximizing correct tag sequences The calculation method of the objective function of the logic logical relation extraction model is as follows: the loss function loss of the model is defined as The parameters are optimized by back propagation.
- 3. The method of claim 1, wherein, in said representing the word vector into a matrix Part-of-speech embedding matrix Entity class embedding matrix Splicing together to obtain text feature matrix Previously, the method further comprises: generating a word vector representation matrix by ERNIE model encodings in the graph-attention-network-based joint event extraction model ; The joint event extraction model based on the graph attention network carries out word segmentation and part-of-speech tagging on event texts in the input sentences to finally obtain part-of-speech embedding matrixes corresponding to sentences S ; Entity class marking is carried out on the text according to BIO marking rules, then random initialization is carried out, back propagation is carried out for optimization, a trained entity class vector is obtained, entity class embedded representation corresponding to each word is obtained, and finally an entity class embedded matrix corresponding to sentence S is obtained 。
- 4. The method of claim 3 wherein the graph-attention-network-based joint event extraction model performs word segmentation and part-of-speech tagging on event text in an input sentence to finally obtain a part-of-speech embedding matrix corresponding to sentence S Comprising: The joint event extraction model based on the graph attention network carries out word segmentation and part-of-speech tagging on event texts in input sentences, then marks the part of speech of each word according to BIO tagging rules, the tags comprise B-pos, I-pos and E-pos, words consisting of single characters are represented by S-pos, pos refers to the part of speech of each word, then the parts of speech of each word are randomly initialized and are optimized through back propagation, trained part-of-speech vectors are obtained, part-of-speech embedding representation corresponding to each part of speech is obtained, and finally part-of-speech embedding matrix corresponding to sentence S is obtained 。
- 5. The method of claim 1, wherein the graph annotation force network characterizes each node in a syntactic dependency graph Performing aggregation calculation to obtain aggregation characteristics The calculation method of (2) is shown in the following formula: Where K is the number of attention heads, Is the weight matrix of the kth attention header relative to the node, Is to calculate the weight coefficient of the kth attention, Is a node All neighbor nodes in a syntactic dependency graph Is a set of (a) and (b), A nonlinear activation function; through the calculation, the output of the graph attention network layer is obtained The set of the two sets, The number of nodes in the set is n+k+m, but in the subsequent classification process, the k word vector nodes and m surrounding nodes do not need to be classified, so that the k word vector nodes and m surrounding nodes are discarded, and only the first n character nodes are left and converted into matrix representation O.
- 6. The method of claim 1, wherein event-triggered word classification is implemented using the formula: Wherein, the Is the trigger word type probability distribution for the i-th entity, Is a parameter matrix for event-triggered word classification, , , Wherein Is the number of event types and, Representing vector dimension size; the meta classification is achieved by the following formula: Wherein, the Is the probability distribution of the role played by the jth entity in the event triggered by the ith candidate trigger word, Is a parameter matrix of event argument classification, , Is the number of argument types.
Description
Logic logic-based joint event extraction method Technical Field The invention relates to the technical field of event extraction, in particular to a method for extracting a joint event based on logic logic. Background With the rapid development of internet and text mining technologies, related research of event tasks is increasingly paid attention to by researchers, and a text often contains a plurality of events, and the events may be all described around the same topic. Between these events, there are a variety of logic logics, such as time series, cause and effect, conditions, turns, etc., and by analyzing these logic logics, it is possible to more deeply understand the evolution and progress of the events in the text and to help infer relationships between the events. Event extraction is an important task to extract structured event information from unstructured data. Typically comprising four subtasks, trigger word recognition, event type detection, event argument recognition and argument character detection. The research methods of sentence-level event extraction can be classified into pipeline-based and joint-based methods. The pipeline mode firstly identifies the event type and then extracts the event argument, and the joint mode avoids the influence of the extraction error of the trigger word on the argument extraction by joint learning of the trigger word and the argument. Event extraction is useful in many fields, for example, storing extracted event information in a knowledge base can provide useful information for information retrieval, and thus knowledge reasoning. The prior art scheme is as follows: An event extraction method, an event extraction device, electronic equipment and a storage medium of a patent application of an automated institute of China academy of sciences (patent number: 202110827424.5) are provided, and the event extraction method comprises the steps of inputting a document to be extracted into an event extraction model, wherein the model comprises a sentence-level feature extraction layer, a document-level feature extraction layer, a feature decoding layer and an event prediction layer, the sentence-level feature extraction layer encodes each sentence in the document to be extracted by using a transducer model to obtain a corresponding context feature vector and an event element representation vector, the document-level feature extraction layer then extracts features to obtain a document encoding vector and a document event element representation vector, the feature decoding layer analyzes to obtain a role relationship representation vector, an event relationship representation vector and an event-to-role relationship representation vector, and finally extracts a plurality of events at the event prediction layer, realizes distribution of event elements and outputs a prediction result. The method has the defects that in the extraction of the events, only the characteristics of the sentence sequence are considered, but the syntactic characteristics of the sentences are ignored, so that the model is difficult to acquire the correlation of a plurality of events in one sentence, and different weight information is not given to different characteristics. A method for extracting causal relation of Beijing Ming Zhaohui technology Co., ltd, an apparatus, an electronic device and a readable storage medium (patent number: 202210308591.3) is provided, which comprises the steps of word segmentation operation of a text to be extracted to obtain a plurality of unit words, part-of-speech labeling of each unit word to obtain a part-of-speech identifier corresponding to each unit word, obtaining a preset event rule set, combining the part-of-speech identifier with the unit words matched with event sub-rules in the preset event rule set to obtain a plurality of unit events, obtaining a rule model after training, inputting the unit events into the rule model after training, and obtaining a causal relation extraction result of the text to be extracted through output of the rule model after training. The disadvantage of this approach is that the dependency between words is not taken into account, nor is external lexical information used, so that the semantics of the characters are not fully exploited. In addition, although the accuracy of the method for artificially constructing rules is high for specific fields, the portability is not high, the generalization is weak, and the method cannot be widely used for data in various fields. A patent application of Shanxi university (patent number: 202210348614.3) discloses a chapter-level event extraction method and a chapter-level event extraction device based on multi-granularity entity heterograms, which comprise the steps of respectively extracting entities by using context information based on sentences and paragraphs, fusing entity sets with two granularities based on a multi-granularity entity selection strategy, improving the accuracy of entity