CN-116662924-B - Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism

CN116662924BCN 116662924 BCN116662924 BCN 116662924BCN-116662924-B

Abstract

The invention discloses an aspect-level multimode emotion analysis method of a double-channel and attention mechanism, which is based on a neural network, extracts emotion information contained in image features in a multi-scale mode through combining aspect word features and sentence features and introduces a GCN (generalized graphic communication network) network into an aspect-level multimode emotion analysis task, so that feature extraction and interaction fusion capability of a model are greatly improved. In the invention, a pre-training encoder is adopted in a feature extraction layer to extract aspect words, sentence features and image features, and after the aspect words and the sentence features are fused in a bidirectional manner in an attention mechanism layer, final aspect word features and sentence feature representations are obtained. And the image features establish an image feature extraction network through a channel attention mechanism and a spatial attention mechanism, and finally, the interactive fusion features of all modes are dynamically extracted through a GCN module. In the experiment, the performance index of the multi-modal emotion analysis on the data set based on the aspect of the attention mechanism is improved.

Inventors

LIANG YAN
Hou Zenghui
Yin Entong
CHEN SIXU
XU LU

Assignees

重庆邮电大学

Dates

Publication Date: 20260508
Application Date: 20230320

Claims (7)

1. An aspect-level multi-mode emotion analysis method based on a dual-channel and attention mechanism is characterized by comprising the following steps of: Extracting hidden characteristic representations of sentence characteristics and aspect word characteristics in a dataset by using a Bert pre-training encoder, and extracting image characteristics by using ResNet-152 pre-training networks; Step 2, calculating the feature correlation of the sentence features and the aspect word features through a multi-head attention mechanism, so that corresponding attention weighting is obtained between the high-similarity features; Step 3, weighting image features by using aspect word features guided by sentences, and obtaining image channel features through a channel attention mechanism; Step 4, weighting the image channel characteristics by using sentence characteristics guided by aspect words, and generating a spatial attention pattern by using the spatial relation of the characteristics in a spatial attention mechanism to obtain final characteristic representation of the image; Step 5, calculating a dynamic adjacency matrix by sentence characteristics guided by aspect words and final characteristics of images generated by channel attention and space attention; and 6, classifying the final fusion characteristics, the aspect word characteristics and the sentence characteristics obtained in the step 2 through the multi-head attention mechanism by using a pooling mechanism through a classification module.
2. The method for analyzing the emotion of the aspect level multimode based on the dual-channel and the attention mechanism according to claim 1, wherein the step 1 extracts the hidden characteristic representation from the sentence characteristics and the aspect word characteristics in the dataset by using a Bert pre-training coder, and extracts the image characteristics by using a ResNet-152 pre-training network, specifically: Outputting sentence and aspect word feature information through two Bert-based pre-training sentence feature encoders, extracting image features through a pre-training ResNet network, providing optimal initialization parameters for the model through a pre-training model, enabling the model to have better generalization performance through fine adjustment on a target task, accelerating model convergence, and obtaining sentence features through the Bert pre-training model And aspect word features Wherein Representing the length of the sentence, The length of the representative aspect word, Representing the dimension of the output feature, the image features being expressed as Wherein Representing a ResNet-152 model of the model, The channels representing the features of the image are, 、 Representing the width and height of the image features, respectively, wherein Representing original aspect words, sentences and images; Representing aspect words, sentences and image features extracted through the pre-training network.
3. The method for analyzing the emotion in the aspect level and multiple modes based on the dual-channel and attention mechanism according to claim 2, wherein the step 2 adopts a multi-head attention mechanism, and fuses the related information between the aspect word characteristics and the sentence characteristics, and the specific method is as follows: In order to obtain the interactive features between sentence features and aspect word features, a multi-head attention mechanism is adopted to calculate the similarity between the sentence features and the aspect word features, and feature fusion between the sentence features and the aspect word features can be effectively realized, and the expression is as follows: ; ; ; Representing a multi-headed attentiveness mechanism, Representing the characteristics of the input and, In order for the scaling factor to be a factor, Representing the output of the ith layer in the transducer, The representative layer is normalized and the data of the representative layer, In order to activate the function, 、 Respectively representing a trainable parameter matrix; wherein the aspect word features and sentence features are respectively used as query matrixes Calculating aspect word features guided by sentence features And sentence features guided by aspect word features 。
4. The method for analyzing the emotion in multiple modes in aspect level based on dual channel and attention mechanism according to claim 3, wherein the step 3 uses the feature of aspect word of sentence guidance to weight the image feature, and the image channel feature is obtained by the channel attention mechanism, and the specific method is as follows: In order to introduce aspect word features into an image, the aspect word features and the image features are fused through a multi-head self-attention mechanism, and the specific formula is as follows: ; ; Wherein, in the channel attention mechanism, its input Is an image feature guided by aspect words through a multi-head attention mechanism, A representative multi-layer perceptron of the machine, Representing an average pooling of the data in the pool, Representing the maximum pooling of the water in the water, Representative of The function is activated and the function is activated, Representing the output of the channel attention.
5. The method for analyzing the emotion of the aspect level multimode based on the dual-channel and attention mechanism according to claim 4, wherein the step 4 is characterized in that the sentence characteristics guided by the aspect words are used for weighting the image channel characteristics, in the spatial attention mechanism, the spatial attention pattern is generated by using the spatial relation of the characteristics, and the final characteristic representation of the image is obtained, and the specific steps are as follows: Through a multi-head attention mechanism, the sentence characteristics guided by the aspect words are used for weighting the image characteristics output by the channel attention mechanism, and important areas related to the emotion of the aspect words in the image characteristics are highlighted in the channel attention mechanism, wherein the specific formula is as follows: Equation (7) represents the implementation details of the channel attention mechanism, wherein The representative matrix is spliced and the matrix is processed, Representing a convolution operation and, The representative Relu is used to activate a function, Representing the image features guided by the multi-headed attention mechanism, by sentence features, Representing the output of spatial attention.
6. The method for analyzing the emotion of the aspect level multimode based on the dual-channel and attention mechanism according to claim 5, wherein the step 5 calculates a dynamic adjacency matrix by sentence characteristics guided by aspect words and final characteristics of images generated by channel attention and space attention, and obtains final fusion characteristic representation by using aggregation capability and message transmission capability of a graph neural network, and the method specifically comprises the following steps: Splicing sentence features and final features of images, obtaining attention matrix by self-attention mechanism, and using the attention matrix as adjacent matrix of GCN, in GCN, for given node graph Wherein Is the overall node in the graph, the concatenation matrix corresponding to the sentence feature and the final feature of the image, As an adjacency matrix between all nodes, The weight of (2) depends on the similarity between the nodes; ; ; ; Wherein, the Output representing attention of a stitched space Sentence feature guided by sum aspect words , Is a node Is the first of (2) A characteristic representation of the output of the layer, Is GCN No A matrix of trainable weights for the layer, Activating function Relu, and since GCN completes feature extraction and coding work between associated nodes, in Output of all nodes of a layer Expressed as: ; n represents the number of nodes.
7. The method for analyzing the emotion of the aspect level multimode based on the dual-channel and attention mechanism according to claim 6, wherein the step 6 is characterized in that the final fusion feature, the aspect word feature and the sentence feature obtained by the multi-head attention mechanism in the step 2 are classified by using a pooling mechanism through a classification module, and the specific steps are as follows: For aspect word features and sentence features, since [ CLS ] is added as a marker when features are extracted using a pre-training model initially, the final hidden state of the marker is taken as a set representation of aspect word and sentence features, and is recorded as And For the first characterization of the output part fused with the GCN features, the features are adopted as classified features because the first characterization is a weighted sum among the features So after pooling and splicing, the total output characteristics Can be expressed as: ; in the classification phase: ; ; Wherein the method comprises the steps of Is a trainable weight, and adopts cross entropy loss function to calculate loss value , 、 The number of training samples and the actual labels of the samples are represented respectively.

Description

Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism Technical Field The invention belongs to the computer language processing and emotion analysis directions, and particularly relates to an aspect-level multi-mode emotion analysis method based on a dual-channel and attention mechanism. Background In recent years, content released by users on various online platforms has grown rapidly. It is becoming increasingly important to analyze word content to identify public opinion regarding targeted aspects or entities. How to exploit the emotional tendency of a certain aspect contained in artificial intelligence and other related technologies is becoming a hot point of research in recent years. Emotion expresses the person's attitudes to an objective thing, and usually transmits emotion tendencies in various ways such as limb language, facial expression, language words, etc. Emotion analysis (SENTIMENT ANALYSIS, SA), also known as Opinion Mining (OM), aims to extract opinions from a large number of unstructured sentences and classify them as positive, neutral or negative emotion polarity. In the Internet age, social platforms such as microblogs, knowledges, weChats and the like are developed, and characters and pictures are gradually becoming main carriers for users to transfer target aspects or entity opinion emotion in the network world. The task of aspect-based emotion analysis has received extensive attention in academia and industry over the last decade. In the early days, sentence characteristics are usually generated by using machine learning methods such as emotion dictionary, dependency relationship, statistical method and the like, but the traditional method needs to consume a great deal of manpower to select and extract the characteristics, and the characteristics lack of correlation between aspect words and sentence contexts, and meanwhile, the mobility and the robustness are poor. The deep learning method is successful in various tasks of natural language processing, and meanwhile, the application of the neural network in aspect-level emotion analysis is promoted. By learning and extracting feature correlations of aspect words between sentence contexts using various neural network models in deep learning, the performance of the models is also gradually improved. A number of deep network model approaches such as convolutional neural networks (Convolutional Neural Network, CNN), recurrent neural networks (Recurrent Neural Network, RNN), graph neural networks (Graph Neural Network, GNN), and attention mechanisms (Attention Mechanism) have been proposed, with further developments based on sentence-based aspect-level emotion analysis. As the content of many online platforms becomes more and more modelled, the emotional polarity of information prediction targets from other modalities is also becoming of increasing interest to researchers. And the scientific research achievement obtained in the image processing field is deeply learned, so that a theoretical basis is provided for multi-mode emotion analysis based on aspect level. Xu et al firstly introduce image mode information into aspect-level emotion analysis, extract image features by using CNN, extract sentence features by using Long-short-term memory (Long-short term Memory, LSTM) network, and verify the feasibility of the proposed method through an interactive attention mechanism. Then Gu et al adopt a bidirectional gating circulation unit (Bidirectional Gate Recurrent Unit, biGRU) network and a multi-head self-attention mechanism to encode sentence semantic information, adopts ResNet-152 models and a capsule network to extract image characteristics, adopts the multi-head attention network in multi-mode interaction fusion, furthest improves the contribution of each mode to emotion transmission, and improves the performance of the network. Yu et al propose a hierarchical interaction module for modeling pairwise interactions between given aspect words, sentence information, and image information. In order to make up for the semantic difference between sentence features and image features, an auxiliary reconstruction module based on an automatic encoder concept is further provided, and model performance is improved. However, the existing model still has some defects, 1) channel information and spatial information in an image cannot be fully extracted in the image feature extraction process, so that emotion information in the image cannot be effectively combined with aspect word information. 2) Information fusion between modes cannot be effectively carried out, so that the performance of the model is not ideal. Thus, research is directed to aspect-level based multimodal emotion analysis tasks and more efficient models are presented herein. CN114936623A is an aspect emotion analysis method for fusing multi-modal data, which comprises the steps of firstly preprocessing data, adjusting text and image formats to meet t