Search

CN-115879449-B - Entity relation extraction method and device, electronic equipment and storage medium

CN115879449BCN 115879449 BCN115879449 BCN 115879449BCN-115879449-B

Abstract

The embodiment of the invention discloses a method and a device for extracting entity relations, electronic equipment and a storage medium. The method comprises the steps of obtaining at least one text message in a preset text library, determining embedded vectors corresponding to the text message, determining vector pairs corresponding to any two embedded vectors, storing a label combination result from each vector pair, and determining entity relations corresponding to the text message according to the label combination result and a pre-trained classifier. According to the embodiment of the invention, the entity relation of the text information is extracted by analyzing the embedded vector of the text information by the pre-trained classifier, so that the problems of entity nesting and entity pair overlapping can be solved, the error can be reduced, the problem of poor generalization can be solved and the accuracy of entity relation extraction can be improved on the basis of ensuring the tight correlation of each element in the triplet extraction process.

Inventors

  • ZHU TIANYOU
  • WANG LUTAO
  • LI BO
  • Bian Jingchen
  • CHEN ZHENYU
  • LI JIWEI
  • CHEN SIYU
  • Liu Pufan

Assignees

  • 国家电网有限公司大数据中心

Dates

Publication Date
20260512
Application Date
20221107

Claims (8)

  1. 1. A method for extracting an entity relationship, comprising: acquiring at least one text message in a preset text library; determining an embedded vector corresponding to the text information; Determining any two vector pairs corresponding to the embedded vectors, and storing the combination result of each vector pair and the mark; Determining entity relations corresponding to the text information according to the marking combination result and a pre-trained classifier; The determining the vector pairs corresponding to any two embedded vectors and storing the combination result of each vector pair and the mark comprises the following steps: combining any two embedded vectors according to an enumeration method to form corresponding vector pairs, wherein at least one vector pair comprises two embedded vectors; The method comprises the steps of combining a plurality of vector pairs, wherein each vector pair is used as a corresponding marking combination result, the marking combination result comprises a beginning embedding vector corresponding to a head entity and a tail entity of the text information respectively, a beginning embedding vector pair is formed, and an ending embedding vector corresponding to the head entity and the tail entity respectively is formed, wherein the head entity is a head corresponding to an entity in the text information, and the tail entity is a tail corresponding to the entity in the text information; The determining the entity relationship corresponding to the text information according to the marking combination result and a pre-trained classifier comprises the following steps: Inputting the marking combination result into the pre-trained classifier, wherein the classifier at least comprises two classes of classifiers; determining a first embedded vector label corresponding to the beginning embedded vector pair and a second embedded vector label corresponding to the ending embedded vector pair according to the pre-trained classifier; And decoding a head entity, a tail entity and a corresponding entity relation type corresponding to the text information according to the first embedded vector label and the second embedded vector label.
  2. 2. The method of claim 1, further comprising, prior to obtaining at least one text message in the pre-set text library: performing text preprocessing on the text information; And storing the text information obtained after the text preprocessing into the preset text library.
  3. 3. The method of claim 2, wherein the text preprocessing mode includes at least one of: Segmenting the at least one text message by using a segmentation tool, and labeling the segmentation part of speech corresponding to each segmentation generated by segmentation; Removing stop words included in the at least one text message; carrying out grammar structure analysis on the at least one text message according to a pre-configured grammar processing rule, and obtaining an analysis result; and performing de-duplication operation on the at least one text message by adopting a preset text similarity algorithm.
  4. 4. The method of claim 1, wherein the determining the embedded vector corresponding to the text information comprises: inputting the text information into a preset pre-training language model; And encoding the text information into corresponding embedded vectors according to the preset pre-training language model, wherein the text information comprises at least one vocabulary, and the vocabulary at least corresponds to one embedded vector.
  5. 5. The method of claim 1, wherein the first embedded vector tag and the second embedded vector tag comprise at least three types of embedded vector tags.
  6. 6. An entity relationship extraction apparatus, comprising: the acquisition module is used for acquiring at least one text message in a preset text library; the first determining module is used for determining an embedded vector corresponding to the text information; The second determining module is used for determining any two vector pairs corresponding to the embedded vectors and storing the combination result of the vector pairs and the marks; the relation determining module is used for determining the corresponding entity relation in the text information according to the combination result and the classifier trained in advance; The second determining module includes: The combination unit is used for combining any two embedded vectors according to an enumeration method to form corresponding vector pairs, wherein at least one vector pair comprises two embedded vectors; The result determining unit is used for taking each vector pair as a corresponding mark combination result, wherein the mark combination result comprises a beginning embedding vector corresponding to a head entity and a tail entity of the text information respectively, forming a beginning embedding vector pair, and an ending embedding vector corresponding to the head entity and the tail entity respectively, and forming an ending embedding vector pair, wherein the head entity is a head corresponding to the entity in the text information, the tail entity is a tail corresponding to the entity in the text information; The relationship determination module includes: The input unit is used for inputting the marking combination result into the pre-trained classifier, wherein the classifier at least comprises two classes of classifiers; the label determining unit is used for determining a first embedded vector label corresponding to the initial embedded vector pair and a second embedded vector label corresponding to the ending embedded vector pair according to the pre-trained classifier; and the decoding unit is used for decoding the head entity, the tail entity and the corresponding entity relation type corresponding to the text information according to the first embedded vector label and the second embedded vector label.
  7. 7. An electronic device, the electronic device comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the entity relationship extraction method of any one of claims 1-5.
  8. 8. A computer readable storage medium storing computer instructions for causing a processor to implement the entity relationship extraction method of any one of claims 1-5 when executed.

Description

Entity relation extraction method and device, electronic equipment and storage medium Technical Field The present invention relates to the field of natural language processing technologies, and in particular, to a method and apparatus for extracting an entity relationship, an electronic device, and a storage medium. Background In recent years, internet information technology is rapidly developed, and new data are generated in mass every day on websites such as news, social networks and the like. These data contain a wide variety of content, among which are a number of very valuable information that plays a vital role in the life of people. In order to extract and effectively use such valuable information, the concept of knowledge maps is proposed. Although the existing knowledge graph already contains hundreds of millions of data, the information on the network is continuously increased every day, and the information in the knowledge graph needs to be perfected accordingly. In the prior art, the conventional pipeline method is adopted to firstly identify the entities and then extract a relation for each pair of possible entities. The method has the advantages that the task is easy to execute, the fact that interdependence between the entity and the relation is indistinct is ignored, the final recognition result possibly causes the problems of error, generalization and the like, the sentence characteristic representation is optimized based on the attention mechanism of the relation, the joint extraction is realized by mapping the relation extraction to the entity pair, the problem of relation overlapping is solved to a certain extent by the design method, but the problem of entity overlapping is difficult to recognize based on the relation, and the problem of insufficient interactivity and difficult to efficiently solve the relation overlapping is solved. In general, although the existing method greatly improves the interaction degree of entity relationships, the tight correlation degree of each element in the process of extracting triples is ignored, the problems of errors, generalization and the like can be caused, and the recall rate is lower when the relationship overlapping problem is processed. Disclosure of Invention In view of this, the invention provides a method, a device, an electronic device and a storage medium for entity relation extraction, which can solve the problems of entity nesting and entity pair overlapping, reduce the problems of error and poor generalization on the basis of ensuring the tight correlation of each element in the process of extracting triples, and improve the accuracy of entity relation extraction. According to an aspect of the present invention, an embodiment of the present invention provides a method for extracting an entity relationship, including: acquiring at least one text message in a preset text library; determining an embedded vector corresponding to the text information; Determining any two vector pairs corresponding to the embedded vectors, and storing the combination result of each vector pair and the mark; and determining the entity relationship corresponding to the text information according to the marking combination result and a pre-trained classifier. According to another aspect of the present invention, an embodiment of the present invention further provides an entity relationship extraction apparatus, including: the acquisition module is used for acquiring at least one text message in a preset text library; the first determining module is used for determining an embedded vector corresponding to the text information; The second determining module is used for determining any two vector pairs corresponding to the embedded vectors and storing the combination result of the vector pairs and the marks; and the relation determining module is used for determining the corresponding entity relation in the text information according to the combination result and the pre-trained classifier. According to another aspect of the present invention, an embodiment of the present invention further provides an electronic device, including: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the entity relationship extraction method of any one of the embodiments of the present invention. According to another aspect of the present invention, an embodiment of the present invention further provides a computer readable storage medium, where computer instructions are stored, where the computer instructions are configured to cause a processor to implement the entity relationship extraction method according to any one of the embodiments of the present invention. According to the technical scheme, at least one text message in a preset text library is acquired, embedded vectors corresponding to the text message