CN-121706033-B - Fine granularity emotion analysis-oriented emotion data automatic labeling method and system

CN121706033BCN 121706033 BCN121706033 BCN 121706033BCN-121706033-B

Abstract

The invention provides an emotion data automatic labeling method and system for fine-granularity emotion analysis, and belongs to the field of artificial intelligence and emotion analysis. The method comprises the steps of extracting text, voice and visual characteristics, carrying out coarse-granularity emotion pre-classification and confidence degree weighted fusion, combining large language model reasoning and spectral clustering to achieve fine-granularity emotion recognition, optimizing a result by using a conflict resolution mechanism, generating a final label through multi-level voting integration, evaluating the credibility of the final label through multi-mode consistency, feature space outlier degree and multiple dimensionalities of a conflict resolution decision effect, and recognizing a low-credibility sample. The method realizes progressive analysis from coarse granularity to fine granularity, effectively solves the problems of modal isomerism, information conflict and low labeling reliability, reduces manual labeling cost, and improves the accuracy and reliability of emotion analysis.

Inventors

YUE JUNQING
LIU MINYU
ZHANG DI
XIANG TAO
LIAO XIAOFENG

Assignees

重庆大学

Dates

Publication Date: 20260508
Application Date: 20260213

Claims (9)

1. An emotion data automatic labeling method for fine-granularity emotion analysis is characterized by comprising the following steps of: Collecting original multi-mode data comprising text, voice and visual data, and extracting high-dimensional semantic features of all modes by utilizing a pre-training deep learning model; Three classification treatments of positive, negative and neutral emotion tendencies are respectively carried out based on the high-dimensional semantic features of each mode, and pre-classification coarse granularity emotion labels and corresponding classification confidence coefficients of each mode are obtained; Dynamically calculating fusion weights of all modes according to the classification confidence of the modes, and carrying out weighted fusion processing on the high-dimensional semantic features of all modes by utilizing the fusion weights to generate a fusion feature matrix; based on fusion feature matrix and pre-classification coarse granularity emotion labels, predicting fine granularity emotion labels containing 11 categories by combining a pre-training large language model reasoning and spectral clustering method; Conflict between the coarse-granularity labels of each mode pre-classification and the coarse-granularity labels predicted by the large model is detected and resolved, and the coarse-granularity emotion classification result is iteratively optimized through a loss function; The method for detecting and resolving conflict between the coarse-granularity labels of each mode pre-classification and the coarse-granularity labels of large model prediction, and iteratively optimizing coarse-granularity emotion classification results through a loss function comprises the following steps: For any sampling time, pre-classifying coarse-granularity emotion labels and coarse-granularity emotion labels predicted by a large model in each mode, if the coarse-granularity emotion labels are completely consistent, judging that no conflict exists, directly taking the consistent labels as coarse-granularity labels after conflict resolution; aiming at the situation that the conflict is judged, carrying out weighted voting decision by utilizing fusion weights corresponding to all modes to obtain a conflict resolution rough classification label corresponding to the sampling moment; updating model parameters of the coarse classifier and parameters of the pre-training deep learning model by adopting a cross entropy loss function, and realizing iterative optimization of coarse granularity emotion classification results; integrating fine granularity emotion labels from different prediction sources, and generating a final fine granularity emotion label and a coarse granularity emotion label corresponding to the fine granularity emotion label through a weighted voting mechanism; The final fine granularity emotion label is generated through a weighted voting mechanism The method comprises the following steps: ; Wherein, the Representing the emotion category for which the weighted sum is the greatest, 、 Is the voting weight of the super-parameter, In order to indicate the function, Is the first Fine granularity emotion labels of the moment clusters, Is the first The moment big model predicts the output fine granularity emotion label, Is an emotion label set; Mapping the final fine granularity emotion labels to final coarse granularity labels by using a mapping function; Constructing a comprehensive evaluation system from a plurality of dimensions of the multi-modal consistency, the feature space outlier degree and the conflict resolution decision effect, quantitatively evaluating the credibility of the final label, and identifying a low-credibility sample; for a sample at any sampling time, the multi-dimensional evaluation index calculation and reliability judgment process is as follows: Respectively calculating scores of multi-modal consistency, feature space outliers and conflict resolution decisions; after the scores of the three dimensions are obtained, introducing a super-parameter weight, and calculating the final comprehensive credibility of the sample; and setting a reliability threshold, and judging the sample as a low-reliability sample when the comprehensive reliability of the sample is lower than the threshold.
2. The automatic emotion data labeling method for fine granularity emotion analysis of claim 1, wherein the extracting high-dimensional semantic features of each mode by using a pre-training deep learning model comprises: input text sequence using pre-trained text parametric feature model Text features encoded as fixed dimensions Wherein a single text feature BERT extracts model functions for pre-trained text parameter features, For text sequences The first of (3) The number of units of text that are to be presented, , For the length of the sampling sequence, For the dimension of the text feature, Is of dimension of Is a set of real matrices of (a), Is of length of Is a set of real column vectors; From an input audio signal using a pre-trained speech parameter feature model Extracting speech features from a speech signal Wherein a single speech feature , The pre-trained speech features extract a model function, Is an audio signal The first of (3) The number of audio signal units is one, , As a dimension of the speech feature, Is of dimension of Is a set of real matrices of (a), Is of length of Is a set of real column vectors; corresponding image or video frame sequences from input using pre-trained visual feature models Visual characteristics of space and time sequence are extracted Wherein, single image visual characteristics , Extracting model functions for pre-trained image visual features, individual video visual features , Extracting model functions for pre-trained video visual features, As a dimension of the visual characteristics, Is of dimension of Is a set of real matrices of (a), Is of length of Is used to determine the set of real column vectors, For image or video frame sequences The first of (3) A single image or a unit of video, 。
3. The automatic emotion data labeling method for fine granularity emotion analysis according to claim 2, wherein the pre-classification coarse granularity emotion label is expressed as: , Wherein, the Pre-classifying coarse granularity emotion tags for each modality, subscript Corresponding to three different modes of text feature, voice feature and visual feature respectively, pos is positive emotion tendency, neg is negative emotion tendency, neu is neutral emotion tendency, The sequence length is adopted; The classification confidence is expressed as: , Wherein, the For confidence vectors, subscripts And the three different modes respectively correspond to the text characteristic, the voice characteristic and the visual characteristic.
4. The automatic emotion data labeling method for fine granularity emotion analysis according to claim 3, wherein the dynamically calculating fusion weights of modes according to classification confidence of each mode, and performing weighted fusion processing on high-dimensional semantic features of each mode by using the fusion weights to generate a fusion feature matrix comprises: for the original confidence vector 、、 Performing Softmax normalization treatment, subscript Three different modes respectively corresponding to text features, voice features and visual features; Introducing temperature parameters to adjust the smoothness of weight distribution, and setting fusion weights of all modes The method comprises the following steps: , Wherein, the As a function of the natural index of refraction, As a function of the temperature parameter(s), Is the first The confidence value probability value of each modality, To traverse the labels of all modalities; For the modality that is currently being weighted, For each traversed modality; fusion weight based on calculation 、、 Respectively to text characteristics Speech features Visual characteristics Weighting fusion is carried out to obtain a fusion feature matrix The fusion feature matrix 。
5. The automatic emotion data labeling method for fine-granularity emotion analysis according to claim 4, wherein the predicting fine-granularity emotion labels including 11 categories based on fusion feature matrix and pre-classification coarse-granularity emotion labels by combining a pre-training large language model reasoning and spectral clustering method comprises: Converting the fusion feature matrix into a natural language description vector through a lightweight semantic mapping network, and mapping each mode pre-classification coarse granularity emotion label into natural language description; based on a preset prompting word template, integrating natural language description vectors and coarse granularity emotion tag descriptions, and constructing prompting words input into a pre-training large language model; Inputting the prompt word into a pre-trained large language model to obtain a fine granularity emotion label predicted and output by the large model; Establishment of A mapping function for mapping the coarse granularity emotion tags predicted by the large model to the coarse granularity emotion tags predicted by the large model, wherein, Refers to the mapping of real space in 11 dimensions to real space in 3 dimensions; and performing spectral clustering operation on the fusion feature matrix to obtain fine granularity emotion labels containing 11 kinds of clusters.
6. The automatic emotion data labeling method for fine granularity emotion analysis of claim 5, wherein the mapping function is defined as follows: , Wherein, the In order to map the function of the function, As input variables of fine granularity emotion tags, happy represents a pleasant fine granularity emotion tag, angry represents an anger fine granularity emotion tag, sadness represents a sad fine granularity emotion tag, fear represents a Fear fine granularity emotion tag, surprise represents a surprise fine granularity emotion tag, disgust represents an aversion fine granularity emotion tag, frustration represents a depression fine granularity emotion tag, confused represents a confusion fine granularity emotion tag, anxious represents an anxiety fine granularity emotion tag, startled represents a frightening fine granularity emotion tag, and Neutral represents a Neutral fine granularity emotion tag.
7. The automatic emotion data labeling method for fine granularity emotion analysis according to claim 6, wherein the detecting and resolving conflicts between coarse granularity labels pre-classified by each mode and coarse granularity labels predicted by a large model, and iteratively optimizing coarse granularity emotion classification results through a loss function comprises: For any sampling instant Pre-classification coarse granularity emotion tag for each mode 、、 Coarse granularity emotion tag for large model prediction If the two types of the labels are completely consistent, judging that no conflict exists, and directly taking the consistent label as a rough classification label after conflict resolution; Wherein, the 、、 The three single-mode models of text, voice and vision are respectively at the first Coarse granularity emotion pre-classification labels output at moment; aiming at the situation that the conflict is judged, weighting voting decision is carried out by utilizing the fusion weights corresponding to all modes to obtain the corresponding sampling time Is to resolve the conflict of the coarse classification label The expression is: , Wherein, the Representing the emotion category for which the weighted sum is the greatest, Is the first At the moment, the first Fusion weights of the individual modalities; for the indication function, identifying whether the label categories match; Is the first At the moment, each mode pre-classifies coarse granularity emotion labels, As a set of modal tags, ; In the case of a set of emotion tags, ; Updating model parameters of the coarse classifier and parameters of the pre-trained deep learning model by adopting a cross entropy loss function to realize iterative optimization of coarse granularity emotion classification results, and the loss function The mathematical expression of (2) is: , Wherein, the Is the first At the moment, the first Manually marked coarse granularity emotion labels corresponding to the individual modes, The representation is taken as a natural logarithm of the confidence probability value.
8. The automatic emotion data labeling method for fine-granularity emotion analysis of claim 7, wherein for any sampling time The multi-dimensional evaluation index calculation and reliability judgment process of the sample of (a) is as follows: Calculating a multimodal consistency score: ; Wherein, the Is the first A multimodal consistency score for the time of day sample; Calculating feature space outlier score: ; Wherein, the Is the first The feature space outlier score of the time-of-day sample, As a function of the natural index of refraction, In order to be a mahalanobis distance, , 、 Respectively fusion feature matrix Corresponding means and variances; the inverse of the covariance matrix, A transpose of the deviation vector representing the sample vector from the center; calculating a conflict resolution decision score: ; Wherein, the Is the first The conflict resolution decision score for the time-of-day sample, When no conflict exists, calculating weighted voting scores of the emotion categories k, taking the maximum value as a decision score at the moment, Indicating when the first Executing the branch when no conflict exists at all; after the scores of the three dimensions are obtained, introducing a super-parameter weight, and calculating the final comprehensive credibility of the sample: ; Wherein, the Represent the first The final integrated confidence level of the time-of-day samples, For the super-parameter weight to be the same, A score representing the three dimensions is presented, ; And setting a reliability threshold, and judging the sample as a low-reliability sample when the comprehensive reliability of the sample is lower than the threshold.
9. An automatic emotion data labeling system for fine granularity emotion analysis, for performing the method of any of claims 1-8, the system comprising: the multi-modal feature extraction module is used for collecting original multi-modal data comprising text, voice and visual data and extracting high-dimensional semantic features of all modes by utilizing a pre-training deep learning model; the coarse-granularity emotion classification module is used for respectively carrying out three classification treatments of positive, negative and neutral based on the high-dimensional semantic features of each mode to obtain pre-classification coarse-granularity emotion labels and corresponding classification confidence coefficients of each mode; the feature fusion module is used for dynamically calculating fusion weights of all modes according to the classification confidence of the modes, and carrying out weighted fusion processing on the high-dimensional semantic features of all modes by utilizing the fusion weights to generate a fusion feature matrix; the fine granularity emotion recognition module is used for predicting fine granularity emotion labels containing 11 categories based on fusion feature matrixes and pre-classification coarse granularity emotion labels and combining a pre-training large language model reasoning and spectral clustering method; the conflict resolution and iteration updating module is used for detecting and resolving conflicts between the coarse-granularity labels of each mode pre-classification and the coarse-granularity labels of the large model prediction, and iteratively optimizing the coarse-granularity emotion classification result through the loss function; The method for detecting and resolving conflict between the coarse-granularity labels of each mode pre-classification and the coarse-granularity labels of large model prediction, and iteratively optimizing coarse-granularity emotion classification results through a loss function comprises the following steps: For any sampling time, pre-classifying coarse-granularity emotion labels and coarse-granularity emotion labels predicted by a large model in each mode, if the coarse-granularity emotion labels are completely consistent, judging that no conflict exists, directly taking the consistent labels as coarse-granularity labels after conflict resolution; aiming at the situation that the conflict is judged, carrying out weighted voting decision by utilizing fusion weights corresponding to all modes to obtain a conflict resolution rough classification label corresponding to the sampling moment; updating model parameters of the coarse classifier and parameters of the pre-training deep learning model by adopting a cross entropy loss function, and realizing iterative optimization of coarse granularity emotion classification results; The multi-level voting integration module is used for integrating fine granularity emotion labels from different prediction sources, generating a final fine granularity emotion label through a weighted voting mechanism, and generating a coarse granularity emotion label corresponding to the fine granularity emotion label; The final fine granularity emotion label is generated through a weighted voting mechanism The method comprises the following steps: ; Wherein, the Representing the emotion category for which the weighted sum is the greatest, 、 Is the voting weight of the super-parameter, In order to indicate the function, Is the first Fine granularity emotion labels of the moment clusters, Is the first The moment big model predicts the output fine granularity emotion label, Is an emotion label set; Mapping the final fine granularity emotion labels to final coarse granularity labels by using a mapping function; the confidence comprehensive evaluation module is used for constructing a comprehensive evaluation system from a plurality of dimensions of the multi-modal consistency, the feature space outlier and the conflict resolution decision effect, quantitatively evaluating the credibility of the final label and identifying a low-credibility sample; for a sample at any sampling time, the multi-dimensional evaluation index calculation and reliability judgment process is as follows: Respectively calculating scores of multi-modal consistency, feature space outliers and conflict resolution decisions; after the scores of the three dimensions are obtained, introducing a super-parameter weight, and calculating the final comprehensive credibility of the sample; and setting a reliability threshold, and judging the sample as a low-reliability sample when the comprehensive reliability of the sample is lower than the threshold.

Description

Fine granularity emotion analysis-oriented emotion data automatic labeling method and system Technical Field The invention belongs to the field of artificial intelligence and emotion analysis, and particularly relates to an emotion data automatic labeling method and system for fine-granularity emotion analysis. Background With the rapid development of social media, man-machine interaction systems and intelligent terminals, multimodal emotion data including text, voice and visual information is presented with explosive growth. The emotion calculation is used as an important crossing direction of artificial intelligence and cognitive science, and has important application value in the fields of public opinion analysis, mental health evaluation, intelligent customer service, personalized recommendation and the like. Traditional emotion analysis methods rely primarily on single modality data, such as text-based emotion analysis or speech-based emotion recognition. However, the emotional expression of human beings has complexity and multidimensional, and a single mode often cannot reflect the real emotional state comprehensively and accurately. For example, the user may speak an angry utterance by calm intonation, or express sad emotion under smiling expression. Therefore, multi-modal emotion analysis becomes a key technical path for improving emotion recognition accuracy. For multi-modal analysis of complex emotions, the prior art proposes a number of solutions. However, the current multi-modal emotion analysis method still faces the following technical challenges: 1. Different modal data (text, speech, visual) have different feature representations, data structures and semantic granularity, and how to achieve efficient cross-modal feature alignment and semantic fusion is a core challenge. The existing method generally adopts a simple feature splicing or post fusion strategy, and is difficult to capture deep cross-modal semantic association; 2. in multi-modal emotion analysis, emotion information transmitted by different modalities may contradict each other. The existing method lacks an effective conflict detection and resolution mechanism, and cannot intelligently judge what mode should be taken as the dominant mode under a specific situation; 3. The fine-granularity emotion marking requires professional marking personnel to manually mark the multi-mode data, has high cost and low efficiency, is easily influenced by subjective factors, and the existing automatic marking method is mostly dependent on large-scale marking data and is difficult to adapt to the actual scene with scarce marking resources; 4. The existing automatic labeling method generally lacks quantitative evaluation of the credibility of the generated label, is difficult to identify low-quality labeling samples, and influences the training effect of a downstream model; 5. the difficulty of predicting the fine granularity emotion type directly from the original multi-mode data is high, and a progressive reasoning mechanism from coarse granularity emotion cognition to fine granularity emotion discrimination is lacked. Therefore, an innovative method capable of efficiently processing modal isomerism, intelligently resolving modal conflict, realizing fine-granularity emotion automatic labeling and evaluating label credibility is needed. Disclosure of Invention The invention aims to overcome the defects of the prior art, provides an emotion data automatic labeling method and system for fine granularity emotion analysis, and realizes end-to-end closed loop automatic labeling of multi-mode emotion data by utilizing a deep learning technology through coarse granularity to fine granularity progressive recognition and conflict resolution, thereby enhancing the credibility and the interpretability of labeling results and reducing the labor cost. In order to achieve the above purpose, the present invention provides the following technical solutions: the invention provides an emotion data automatic labeling method for fine granularity emotion analysis, which comprises the following steps: Collecting original multi-mode data comprising text, voice and visual data, and extracting high-dimensional semantic features of all modes by utilizing a pre-training deep learning model; Three classification treatments of positive, negative and neutral emotion tendencies are respectively carried out based on the high-dimensional semantic features of each mode, and pre-classification coarse granularity emotion labels and corresponding classification confidence coefficients of each mode are obtained; dynamically calculating fusion weights of all modes according to the classification confidence of the modes, and carrying out weighted fusion processing on the high-dimensional semantic features of all modes by utilizing the fusion weights to generate a fusion feature matrix; based on fusion feature matrix and pre-classification coarse granularity emotion labels, predicting fine