CN-121997930-A - Method and device for extracting weak supervision relation of collaborative rule discovery and prompt learning

CN121997930ACN 121997930 ACN121997930 ACN 121997930ACN-121997930-A

Abstract

The application relates to a weak supervision relation extraction method and device for collaborative rule discovery and prompt learning, wherein the method comprises the steps of dividing data in a corpus to obtain a marked data set and an unmarked data set, mining the marked data set based on prompt learning to obtain a mining rule set, judging whether the unmarked data set is an empty set, extracting the unmarked data from the unmarked data set to carry out annotation if the unmarked data set is not the empty set, obtaining preliminary marked data, carrying out relation extraction on the preliminary marked data based on the mining rule set and a target relation extraction mode to obtain a corresponding relation extraction result, judging whether the preliminary marked data meets a preset deletion condition based on the relation extraction result, deleting the preliminary marked data from the unmarked data set if the preset deletion condition is met, and otherwise, adding the preliminary marked data to a manual verification set. Therefore, the technical problems that in the related technology, errors and deletions exist in the relation extraction result due to the fact that the quantity of the marked data is small under the weak supervision environment are solved.

Inventors

HONG LIANG
HOU WENJUN

Assignees

武汉大学

Dates

Publication Date: 20260508
Application Date: 20241107

Claims (10)

1. A weak supervision relation extraction method for collaborative rule discovery and prompt learning is characterized by comprising the following steps: dividing data in a corpus to obtain a marked data set and an unmarked data set; mining the noted dataset based on hint learning to obtain a mining rule set; judging whether the unlabeled data set is an empty set, if the unlabeled data set is not the empty set, extracting unlabeled data from the unlabeled data set for labeling to obtain preliminary labeled data, and extracting the relationship of the preliminary labeled data based on the mining rule set and a target relationship extraction mode to obtain a corresponding relationship extraction result; And judging whether the preliminary annotation data meets a preset deleting condition or not based on the relation extraction result, deleting the preliminary annotation data from the unlabeled data set if the preset deleting condition is met, and otherwise, adding the preliminary annotation data into a manual verification set.
2. The method of claim 1, wherein mining the labeled dataset based on hint learning to obtain a mining rule set comprises: Preprocessing the marked data set to obtain a text data set; Comparing a plurality of text sequences in the text data set, and mining public subsequences meeting preset mining conditions as rule main bodies to obtain mining results; and mining the public subsequence meeting the preset mining condition from the rest text sequences based on the mining result until reaching a preset stopping condition, so as to obtain the mining rule set.
3. The method according to claim 1, wherein the performing relationship extraction on the preliminary annotation data based on the mining rule set and the target relationship extraction manner to obtain a corresponding relationship extraction result includes: Respectively constructing a rule semantic graph and a sentence semantic graph based on the mining rule set, the structure and semantic information of sentences in the preliminary annotation data; calculating the similarity of the rule semantic graph and the sentence semantic graph; And obtaining the relation extraction result based on the similarity.
4. The method according to claim 1, wherein the performing relationship extraction on the preliminary annotation data based on the mining rule set and the target relationship extraction manner to obtain a corresponding relationship extraction result includes: Extracting the rule with highest sentence association degree from the mining rule set, and integrating the rule into a prompt template of a pre-training language model to construct input of the pre-training language model; constraining the output of the pre-trained language model using a prototype-based mapper method to obtain the relationship extraction result.
5. The method according to claim 1, wherein the determining whether the preliminary annotation data satisfies a preset deletion condition based on the relation extraction result includes: obtaining a relation extraction result of the previous round; judging whether the relation extraction result is consistent with the relation extraction result of the previous round; And if the relation extraction result is consistent with the relation extraction result of the previous round, judging that the preliminary annotation data meets the preset deletion condition.
6. A weak supervision relation extraction device for collaborative rule discovery and prompt learning, comprising: The division module is used for dividing the data in the corpus to obtain a marked data set and an unmarked data set; The mining module is used for mining the marked data set based on prompt learning so as to obtain a mining rule set; The extraction module is used for judging whether the unlabeled data set is an empty set, extracting unlabeled data from the unlabeled data set for labeling if the unlabeled data set is not the empty set, obtaining preliminary labeled data, and extracting the relationship of the preliminary labeled data based on the mining rule set and the target relationship extraction mode to obtain a corresponding relationship extraction result; And the judging module is used for judging whether the preliminary annotation data meet a preset deleting condition or not based on the relation extraction result, deleting the preliminary annotation data from the unlabeled data set if the preset deleting condition is met, and otherwise, adding the preliminary annotation data into a manual verification set.
7. The apparatus of claim 6, wherein the mining module comprises: the processing unit is used for preprocessing the marked data set to obtain a text data set; The first mining unit is used for comparing a plurality of text sequences in the text data set, mining public subsequences meeting preset mining conditions as rule main bodies, and obtaining mining results; and the second mining unit is used for mining the public subsequence meeting the preset mining condition from the rest text sequences based on the mining result until reaching a preset stopping condition so as to obtain the mining rule set.
8. The apparatus of claim 6, wherein the extraction module comprises: The first construction unit is used for respectively constructing a rule semantic graph and a sentence semantic graph based on the mining rule set, the structure and the semantic information of sentences in the preliminary annotation data; The calculating unit is used for calculating the similarity of the rule semantic graph and the sentence semantic graph; And the acquisition unit is used for acquiring the relation extraction result based on the similarity.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the collaborative rule discovery and hint learning weak supervision relation extraction method of any of claims 1-5.
10. A computer-readable storage medium having stored thereon a computer program, wherein the program is executed by a processor for implementing a weak supervision relation extraction method of collaborative rule discovery and hint learning of any one of claims 1-5.

Description

Method and device for extracting weak supervision relation of collaborative rule discovery and prompt learning Technical Field The application relates to the technical field of relation extraction, in particular to a weak supervision relation extraction method and device for collaborative rule discovery and prompt learning. Background Relation extraction extracts structured information from unstructured and semi-structured corpora, and is the basis of text analysis and mining. The method is helpful for constructing a domain knowledge base and supporting intelligent applications such as information retrieval, knowledge recommendation and the like. However, due to factors such as field expertise, high manual labeling cost, data updating, and the like, the relationship extraction method is generally affected by lack of training data sparseness and priori knowledge. Rules are typically used to automatically annotate data. However, due to its fixed pattern matching, the rule may generate missing triples, i.e., triples that are not labeled with the representation of the target relationship, thus limiting the coverage of rule discovery. At present, the pre-training language model realizes stronger expandability under the zero sample scene of natural language processing. It may generate usable training data for the vertical domain small model. In the related art, the relation extraction is realized by combining a rule and a pre-training language model box, but the situation that the relation extraction result has errors and deletions due to a small quantity of marked data in a weak supervision environment is still difficult to avoid, for example, when the pre-training language model lacks related domain knowledge, an error triplet inconsistent with a real label can be generated, weak related and irrelevant information contained in the rule can mislead the pre-training language model, and based on limited priori knowledge, the relation extraction result of the rule and the pre-training language model is difficult to verify one by one, so that improvement is needed. Disclosure of Invention The application provides a weak supervision relation extraction method and device for collaborative rule discovery and prompt learning, which are used for solving the technical problems that in the related technology, errors and deletions exist in relation extraction results due to a small quantity of marked data under a weak supervision environment. The embodiment of the first aspect of the application provides a weak supervision relation extraction method for collaborative rule discovery and prompt learning, which comprises the following steps of dividing data in a corpus to obtain a marked data set and an unmarked data set, mining the marked data set based on prompt learning to obtain a mining rule set, judging whether the unmarked data set is an empty set, extracting the unmarked data from the unmarked data set to label if the unmarked data set is not the empty set, obtaining preliminary marked data, extracting the preliminary marked data based on the mining rule set and a target relation extraction mode to obtain a corresponding relation extraction result, judging whether the preliminary marked data meets a preset deletion condition based on the relation extraction result, deleting the preliminary marked data from the unmarked data set if the preset deletion condition is met, and otherwise, adding the preliminary marked data to a manual verification set. Optionally, in one embodiment of the present application, the mining of the noted dataset based on prompt learning to obtain a mining rule set includes preprocessing the noted dataset to obtain a text dataset, comparing a plurality of text sequences in the text dataset, mining a common subsequence satisfying a preset mining condition as a rule body to obtain a mining result, mining the common subsequence satisfying the preset mining condition from the remaining text sequences based on the mining result until a preset stop condition is reached, to obtain the mining rule set. Optionally, in an embodiment of the present application, the extracting the relation from the preliminary labeling data based on the mining rule set and the target relation extracting manner to obtain a corresponding relation extracting result includes respectively constructing a rule semantic graph and a sentence semantic graph based on the mining rule set and the structure and semantic information of sentences in the preliminary labeling data, calculating a similarity of the rule semantic graph and the sentence semantic graph, and obtaining the relation extracting result based on the similarity. Optionally, in one embodiment of the present application, the extracting the relation of the preliminary labeling data based on the mining rule set and the target relation extracting manner to obtain a corresponding relation extracting result includes extracting a rule with the highest sentence association degree