CN-116167359-B - Central event extraction method, device and medium
Abstract
The invention discloses a method, equipment and medium for extracting a central event, which comprises the following steps of determining a central sentence according to similarity of each single sentence, first weight of a trigger word and second weight of a network security entity, determining the trigger word, determining event type pointed by the central sentence based on the trigger word, calculating the central sentence and the event type through a BiLSTM model and a CRF model to obtain the central event, determining the central sentence through three dimensions, reducing extraction range, reducing interference of a secondary event on extraction of the central sentence, reducing errors existing in a mode of extracting the central event through a flow line through a BiLSTM model and the CRF model, and improving convenience and effectiveness of a central event extraction task.
Inventors
- GU ZHAOQUAN
- ZHANG JUNJIAN
- JIA YAN
- ZHANG HUAN
- FANG BINXING
- HAN WEIHONG
- LUO CUI
- ZHOU KE
- WANG HAIYAN
- TAN HAO
Assignees
- 鹏城实验室
Dates
- Publication Date
- 20260505
- Application Date
- 20230220
Claims (8)
- 1. The method for extracting the central event is characterized by comprising the following steps of: Constructing an initial trigger word list, screening to obtain a target word list composed of target words with parts of speech being verbs and/or proper nouns according to parts of speech of each word in the initial trigger word list, extracting the target words with parts of speech being verbs in the target word list, taking the target words as main trigger words and/or extracting the target words with parts of speech being proper nouns in the target word list, taking verb parts of the target words as main trigger words, and noun parts of the target words as associated words, wherein one main trigger word corresponds to one word family; Receiving an input text to be processed, filtering the text to be processed, segmenting the filtered text to be processed, obtaining a word list to be screened, and extracting words with parts of speech of verbs and/or proper nouns in the word list to be screened to form a candidate word list; Selecting a plurality of candidate words with similarity equal to preset similarity from the candidate word list, extracting the candidate words with parts of speech as verbs from the plurality of candidate words, adding the candidate words as secondary trigger words into the word families of the main trigger words corresponding to the candidate words and/or extracting the candidate words with parts of speech as proper nouns from the plurality of candidate words, adding the verb parts of the candidate words as secondary trigger words into the word families of the main trigger words corresponding to the candidate words, and adding the noun parts of the candidate words as associated words into the word families of the main trigger words corresponding to the candidate words; combining the word families added with the secondary trigger words and the associated words to obtain a trigger word dictionary; After determining similarity between each single sentence included in a text to be extracted and a title of the text to be extracted, a first weight of a trigger word included in each single sentence, and a second weight of a network security entity included in each single sentence, determining a central sentence according to the similarity, the first weight and the second weight of each single sentence; determining the trigger words contained in the central sentence, and determining the event type pointed by the central sentence based on the trigger words; Calculating the central sentence and the event type through BiLSTM models and CRF models to obtain a central event; the determining the similarity between each sentence included in the text to be extracted and the title of the text to be extracted, the first weight of the trigger word included in each sentence, and the second weight of the network security entity included in each sentence includes: After encoding each sentence into a plurality of sentence vectors, respectively, and encoding the headline into a headline vector, determining the similarity between each sentence vector and the headline vector, determining a first duty ratio between the number of trigger words included in each sentence and the number of words included in each sentence, taking the first duty ratio as the first weight, and determining a second duty ratio between the number of network security entities included in each sentence and the number of words included in each sentence, taking the second duty ratio as the second weight.
- 2. The method for extracting a central event according to claim 1, wherein after the step of combining the word families to which the secondary trigger words and the related words are added to obtain a trigger word dictionary, the method further comprises: Screening each word family included in the trigger word dictionary, and if any one or more of the word families are screened to obtain the primary trigger word and/or the secondary trigger word and/or the associated word which do not meet the preset condition, eliminating the target word and/or the candidate word corresponding to the primary trigger word and/or the secondary trigger word and/or the associated word which do not meet the preset condition.
- 3. The method of extracting a central event according to claim 1, wherein the step of determining a central sentence according to the similarity of each of the single sentences, the first weight and the second weight comprises: and according to the similarity of each single sentence and the value of the score between the first weight and the second weight, arranging the single sentences from large to small, and then selecting one or more single sentences with the forefront arrangement as the center sentence, wherein the total length of the center sentence is not more than a preset length.
- 4. The method for extracting a central event according to claim 1, wherein the step of determining the trigger words included in the central sentence and determining the event type pointed to by the central sentence based on the trigger words further comprises, before: And formulating an event type table according to ACE standards, wherein the event type table is used for determining the event type pointed by the central sentence according to the trigger word.
- 5. The method for extracting a central event according to claim 1, further comprising, before the step of calculating the central sentence and the event type by BiLSTM model and CRF model: after the center sentence and the event type are spliced to obtain a spliced input text, word segmentation is carried out on the spliced input text to obtain a word sequence; The word sequence is encoded into a number of word vectors by Glove word vector models.
- 6. The method of claim 5, wherein the step of computing the center sentence and the event type through BiLSTM models and CRF models to obtain a center event comprises: inputting the word vectors into the BiLSTM model to obtain a plurality of groups of predictive probability vectors; And inputting a plurality of groups of predictive probability vectors into the CRF model, calculating probability values of the predictive tag sequences corresponding to the groups of predictive probability vectors respectively through the CRF model, and outputting the central event based on the target sequence after determining a group of predictive tag sequences with the maximum probability values as the target sequence.
- 7. An electronic device comprising a memory, a processor and a computer processing program stored on the memory and executable on the processor, the computer processing program being configured to implement the steps of the method of extracting a central event as claimed in any one of claims 1 to 6.
- 8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer processing program which, when executed by a processor, implements the steps of the method of extracting a central event according to any of claims 1 to 6.
Description
Central event extraction method, device and medium Technical Field The present invention relates to the field of event extraction technologies, and in particular, to a method, an apparatus, and a medium for extracting a central event. Background The central event extraction in the prior art aims to extract the central event from the input text for subsequent analysis, for example, in the field of security threat information service, the evidence-based knowledge of the existing or potential threat faced by the IT or information asset can be analyzed by detecting the open source threat information, so that a decision basis is provided for threat response by the evidence-based knowledge. However, the current open source threat intelligence data is much and complex, and is stored and released by unstructured text information, so that effective and key central events are difficult to extract from the data, and the subsequent decision on the open source threat intelligence is very unfavorable. Disclosure of Invention The invention mainly aims to provide a method, equipment and medium for extracting a central event, and aims to solve the technical problem that an effective and key central event is difficult to extract by the existing method for extracting the central event. In order to achieve the above object, the present invention provides a method for extracting a central event, the method for extracting a central event comprising the steps of: After determining similarity between each single sentence included in a text to be extracted and a title of the text to be extracted, a first weight of a trigger word included in each single sentence, and a second weight of a network security entity included in each single sentence, determining a central sentence according to the similarity, the first weight and the second weight of each single sentence; determining the trigger words contained in the central sentence, and determining the event type pointed by the central sentence based on the trigger words; And calculating the center sentence and the event type through BiLSTM models and CRF models to obtain a center event. Optionally, before the step of determining the similarity between each sentence included in the text to be extracted and the title of the text to be extracted, the first weight of the trigger word included in each sentence, and the second weight of the network security entity included in each sentence, the method further includes: Constructing an initial trigger word list, screening to obtain a target word list composed of target words with parts of speech being verbs and/or proper nouns according to parts of speech of each word in the initial trigger word list, extracting the target words with parts of speech being verbs in the target word list, taking the target words as main trigger words and/or extracting the target words with parts of speech being proper nouns in the target word list, taking verb parts of the target words as main trigger words, and taking noun parts of the target words as related words, wherein one main trigger word corresponds to one word family; Receiving an input text to be processed, filtering the text to be processed, segmenting the filtered text to be processed, obtaining a word list to be screened, and extracting words with parts of speech of verbs and/or proper nouns in the word list to be screened to form a candidate word list; Selecting a plurality of candidate words with similarity equal to preset similarity from the candidate word list, extracting the candidate words with parts of speech as verbs from the plurality of candidate words, adding the candidate words as secondary trigger words into the word families of the main trigger words corresponding to the candidate words and/or extracting the candidate words with parts of speech as proper nouns from the plurality of candidate words, adding the verb parts of the candidate words as secondary trigger words into the word families of the main trigger words corresponding to the candidate words, and adding the noun parts of the candidate words as associated words into the word families of the main trigger words corresponding to the candidate words; And combining the word families added with the secondary trigger words and the associated words to obtain a trigger word dictionary. Optionally, after the step of combining the word family added with the secondary trigger word and the related word to obtain a trigger word dictionary, the method further includes: Screening each word family included in the trigger word dictionary, and if any one or more of the word families are screened to obtain the primary trigger word and/or the secondary trigger word and/or the associated word which do not meet the preset condition, eliminating the target word and/or the candidate word corresponding to the primary trigger word and/or the secondary trigger word and/or the associated word which do not meet the preset condition. Optionally, in the