CN-121234933-B - Entity identification method based on semantic analysis and label visual feature expansion

CN121234933BCN 121234933 BCN121234933 BCN 121234933BCN-121234933-B

Abstract

The invention belongs to the technical field of information extraction, and provides an entity identification method based on semantic analysis and label visual feature expansion, which solves the problem that an entity boundary is difficult to determine by implicitly constructing embedded representation containing sentence structure information by calculating the similarity of a candidate entity and text global semantics and extracting interaction features of the candidate entity and the text global semantics, and systematically introducing visual features into the entity recognition task, and constructing a text-visual cross-modal entity recognition paradigm to enrich the context semantic information of the embedded representation. The entity identification method combines character level, sentence level and visual characteristic triple semantic representation, considers local entity semantics, integrates global context information, and introduces visual characteristics to expand the limitation of embedded representation based on plain text.

Inventors

DU XIAN
WANG YUHANG
LI YINGSHUN
GUO ZHANNAN
SUN XUE
ZENG XIANGAN

Assignees

大连理工大学
沈阳顺义科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20251202

Claims (4)

1. An entity identification method based on semantic analysis and label visual feature expansion is characterized by comprising the following steps: step 1, acquiring a text data set to be processed, and designating a tag set of the text data set to be processed, wherein the tag set is a set of predefined entity categories, and the elements of the set are entity categories; Step 2, utilizing a pre-trained language model to treat each text sample in the text data set Encoding to obtain Acquiring character-level embedded vector sequence of text sample Wherein Representing the first of the text samples The embedded vector of the individual characters is used, The number of characters representing a sample of text, Representing character-level embedded vector dimensions; Step 3, embedding vector sequence according to character level of text sample Constructing span-based candidate entity embedded representations of the text sample Span-based candidate entity embedded representation Each candidate entity embedded vector Expressed in terms of The corresponding character is the beginning, Embedding vector of candidate entity with corresponding character as end, embedding vector of candidate entity By embedding vectors Embedding vectors Is represented by a semantic average of (a), Entity identification is based on the assumption that the global semantic information of a text sample is determined jointly by the local semantic information of each entity in the text sample by computing the embedded vectors of candidate entities of different spans Contribution to global semantics of the text sample to assist entity recognition; Step 4, sentence-level embedded vector containing global semantics of text sample By embedding character-level text samples into vector sequences Hidden state of first character output as input Bi-LSTM module And hidden status of last character Semantic average representation of (c): ; the first character forward embedded vector output for the Bi-LSTM module, The vector is embedded in reverse for the first character output by the Bi-LSTM module, The last character output for the Bi-LSTM module is the forward embedded vector, The vector is reversely embedded for the last character output by the Bi-LSTM module; Step 5, designing a semantic similarity module; Step 6, designing a candidate entity embedding and text sample sentence level embedding interaction feature extraction module; Step 7, designing a candidate entity category prediction module; step 8, designing loss functions of all modules, wherein loss of the semantic similarity module adopts loss based on interval during training , wherein, , , Representing the predicted entity score tensor output by the semantic similarity module; representing the positive sample interval threshold value, Representing the negative sample interval threshold value, Representing the positive candidate entity and, Representing a negative candidate entity, By continuous functions The approximation is made that, Parameters are adjusted for the approximation degree, the loss of the candidate entity embedding and text sample sentence level embedding interaction characteristic extraction module adopts a two-class cross entropy loss function, Wherein When the candidate entity is a real entity When the candidate entity is not a true entity , , , , Representing the predicted entity score output by the candidate entity embedding and text sample sentence level embedding interaction feature extraction module, In order for the convolution operation to be performed, In order to perform the flattening operation, In the form of a linear transformation matrix, Calculating for the full connection layer; Represent the first The number of candidate entities is chosen to be, Representing normalization operation on tensors element by element; representing tensor stacking operation, candidate entity class prediction module adopts multi-classification cross entropy loss function , , , , , Represent the first The number of tags to be used in the process of the label, Representing the number of tags; Candidate entity embedding representation, total loss Wherein 、、 For losing weight, used for adjusting learning effect; Step 9, during prediction, the semantic similarity module keeps consistent with the loss function during training, and the predicted entity set of the semantic similarity module The score is greater than Is a prediction entity set of a candidate entity embedding and text sample sentence level embedding interaction feature extraction module Consisting of candidate entities with positive class scores greater than negative class scores, and a prediction entity set of a candidate entity class prediction module Consists of candidate entities with predictive category labels as valid labels, and the final predictive entity set is the intersection of the three, namely Will then Decoding into predefined entity categories in the tag set.
2. The entity recognition method based on semantic analysis and tag visual feature extension according to claim 1, wherein the specific implementation process of step 5 is as follows, calculating an embedded vector of each candidate entity Implication local semantics and text sample sentence-level embedded vectors For computing the embedded vector of each candidate entity Sentence-level embedded vector with global semantics implying text samples Is used as the prediction entity score 。
3. The entity recognition method based on semantic analysis and label visual feature extension according to claim 2, wherein the specific implementation process of step 6 is as follows, extracting the embedded vector of the candidate entity Sentence-level embedded vector with global semantics implying text samples Is to embed vectors of candidate entities And sentence-level embedded vectors that implicate global semantics of text samples Respectively reconstructed into two-dimensional embedded matrixes, which are And stacked as two-channel embedded tensors, as Wherein, the method comprises the steps of, The height of the two-dimensional embedding matrix after reconstruction, The width of the two-dimensional embedded matrix after reconstruction, and , Representing tensor stacking operation, convolving the stacked embedded tensors of two channels Activation, extraction of embedded vectors of candidate entities Sentence-level embedded vector with global semantics implying text samples The obtained characteristic map is flattened after being averaged and pooled, and then is input into a full-connection layer for classification, and the predicted entity score is obtained 。
4. The entity recognition method based on semantic analysis and label visual feature extension according to claim 3, wherein the specific implementation process of step 7 is as follows, respectively collecting each entity category in the label set A representative picture is provided for each of the pictures, Encoding the picture through the vision pre-training model to obtain the image embedded vector of the entity class label Fusing the image embedded vectors under the labels of each entity category by using a self-attention mechanism, and then averaging the image embedded vectors of each entity category in the label set position by position to obtain the image embedded of the label of the entity category , Then using cross-attention mechanism to obtain embedded representation of the information of the implying candidate entity tag , Wherein the reconstructed candidate entity embeds the representation As a result of the query, Entity class label image embedded representation As a result of the keys and the values, Representing the number of tags, embedding the information of the candidate entity tag Embedding representations with reconstructed candidate entities Splicing to obtain candidate entity embedded representation containing tag information Finally, the candidate entity containing the label information is embedded into the representation Inputting to the full connection layer for classification to obtain candidate entity category scores , Indicating the degree to which the candidate entity belongs to a category, wherein, In order to take the averaging operation from location to location, For the calculation of the self-attention, Calculated for cross-attention.

Description

Entity identification method based on semantic analysis and label visual feature expansion Technical Field The invention belongs to the technical field of information extraction, and relates to an entity identification method based on semantic analysis and label visual feature expansion. Background With the popularity of computers and the rapid development of the internet, a great deal of information is presented to people in the form of electronic documents. In order to cope with the serious challenges of information explosion, some automated tools are urgently needed to help people quickly find truly needed information from massive information sources, and information extraction technology research is generated in the background. The information extraction system mainly extracts specific fact information from texts. For example, details of terrorist events such as time, place, perpetrator, victim, attack target, weapon used, etc. are extracted from news stories, new product release from companies such as company name, product name, release time, product performance, etc. are extracted from economic news, and symptoms, diagnosis records, inspection results, prescriptions, etc. are extracted from medical records of patients. Typically, the extracted information is described in a structured form, which can be stored directly in a database for user query and further analysis. The end of the 80 s of the 20 th century, the holding of a message understanding series of conferences (Message Understanding Conference, MUC) has made information extraction an important branch of the natural language processing field. MUC sets up named entity Recognition (NAMED ENTITY Recognition, NER) as a core subtask in the field of Natural Language Processing (NLP) information extraction, NER's task is to determine whether a text string in an unstructured text represents a named entity, and determine the category of the named entity. Therefore, the development of named entity recognition is important for downstream applications such as natural language question semantic analysis in intelligent question and answer and semantic search, and the like, in the extraction of triples (entity-relation-entity) for constructing knowledge maps. In recent years, with the rapid development of deep learning technology, particularly, the wide application of pre-trained language models (such as BERT and RoBERTa) based on a transducer architecture, the performance of NER tasks is significantly improved. These pre-trained language models (PRETRAINED LANGUAGE MODEL, PLM) employ the attention mechanism and perform self-supervised pre-training on a large scale corpus, which generates contextually relevant character/word embeddings that provide a powerful feature basis for NLP tasks. The named entity recognition method based on deep learning mainly follows two paradigms, namely 1) a sequence labeling-based method (such as BiLSTM-CRF and BERT-CRF) to convert entity recognition tasks into classification tasks for each character, and 2) a fragment sorting-based method (such as Span-based method) to directly enumerate all possible text fragments (Span) and respectively judge whether the text fragments are entities and further judge the categories of the text fragments. However, the existing entity recognition schemes have the defects of lack of full utilization of sentence structure information on one hand. The prior studies have explored the provision of word-level knowledge for Chinese NER tasks by introducing lexical information, which has a positive effect on boundary detection, but these approaches ignore the importance of sentence-level linguistic knowledge provided by syntactic information, as noted by Xiao et al in DuST: CHINESE NER using dual-grafted syntax-aware transformer network. And Jie and Lu show in the Dependency-Guided LSTM-CRF for NAMED ENTITY Recognizion that capturing long-distance dependencies and syntactic associations between words in sentences by encoding a complete Dependency tree structure improves the performance of named entity Recognition tasks. Another aspect is the lack of utilization of multimodal information. Moon et al, multimodel NAMED ENTITY Recognition for Short Social Media Posts, indicated that despite the numerous NER studies that have achieved success in normalizing text entity recognition through word context parsing and character-level feature fusion, faced with extremely short text and rough content in social media, short text lacks sufficient context to disambiguate ambiguous entities. Liu et al in HIERARCHICAL ALIGNED Multimodal Learning for NER on Tweet Posts indicate that the supplementation of visual information can alleviate the problems of semantic blurring and insufficient information caused by plain text. Based on the technical analysis, a research and development conception for entity identification based on semantic analysis and label visual feature expansion is provided, the problem that an entity boun