CN-121996706-A - Emotion semantic reasoning method with multi-mode feature cooperation

CN121996706ACN 121996706 ACN121996706 ACN 121996706ACN-121996706-A

Abstract

The invention relates to the technical field of emotion turning prediction and discloses a multi-modal feature collaborative emotion semantic reasoning method which comprises the steps of multi-modal feature extraction, causal triplet generation, integration of triplets into a dynamic causal link diagram through an increment updating algorithm, potential energy value distribution according to path distance between the triplets and emotion result nodes to form a potential energy distribution diagram, sliding window analysis generation of a historical emotion strength sequence, second-order difference obtaining of an emotion acceleration sequence, calculation of causal potential energy gradient vectors, fusion prediction feature vector splicing with the emotion acceleration sequence, calculation of turning probability by a model based on the fusion feature, triggering early warning when the model is negative and exceeds a threshold value, calculation of attribution significance score by reverse traversal of causal diagram generation path set, output of turning early warning signals and interpretation text of the causal path with the highest score, and realization of collaborative utilization of causal information and time sequence information.

Inventors

XUE LI
HUO YUPENG

Assignees

安徽情感识别科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260128

Claims (10)

1. A multi-mode feature collaborative emotion semantic reasoning method is characterized by comprising the following steps: Acquiring a real-time multi-mode data stream of a dialogue, and generating a time sequence multi-mode characteristic sequence by utilizing a multi-mode encoder; Analyzing dialogue texts by using an entity relation extraction algorithm, identifying event entities, state entities and causal relations thereof, generating a real-time causal triplet sequence, and marking emotion result nodes; integrating the causal triplet sequence into a dynamic causal link map by using a causal map incremental updating algorithm; The causal potential energy distribution algorithm is utilized to distribute the maximum potential energy value for emotion result nodes based on the topological structure of the dynamic causal link diagram, and the potential energy value after attenuation is distributed for other nodes according to the shortest path length to the emotion result nodes, so that a causal potential energy distribution diagram is generated; Analyzing the sequential multi-mode feature sequence by utilizing a sliding window emotion analysis algorithm to generate a historical emotion intensity value sequence, and generating an emotion acceleration sequence by second-order differential calculation; Determining a current dialogue position node by utilizing a potential energy gradient calculation algorithm, calculating causal potential energy gradient components along each edge-out direction, and splicing the causal potential energy gradient vector and the emotion acceleration sequence to generate a fusion prediction feature vector; calculating a turning probability value and a predicted causal potential gradient direction based on the fusion prediction feature vector by using a turning point joint prediction model, and generating a turning early warning signal when the turning probability value exceeds a preset threshold value and the direction is negative; Traversing the current position node to the root node along the causal edge reversely by utilizing a multi-hop backtracking algorithm, generating a multi-hop causal path set, and calculating attribution significance scores of all paths; And converting the multi-hop causal path with the highest attribution significance score into causal interpretation text, and outputting a turning early warning signal and causal chain interpretation.
2. The method of claim 1, wherein the causal potential assignment algorithm comprises: The maximum potential energy value is distributed for the emotion result node, and potential energy symbols are determined according to emotion polarities; for the non-emotion result nodes, calculating the shortest path length from the non-emotion result nodes to all emotion result nodes by breadth-first search, and taking the minimum value as a distance value; And calculating a potential energy value according to the shortest path length, wherein the potential energy value is equal to the larger of the difference between the potential energy lower limit value and the maximum potential energy value minus the potential energy attenuation coefficient multiplied by the distance value, and the potential energy symbol inherits the emotion polarity of the nearest emotion result node.
3. The method of claim 2, wherein the maximum potential energy value is in the range of 1 to 10, the potential energy attenuation coefficient is in the range of greater than zero and less than the maximum potential energy value, and the potential energy lower limit value is in the range of 0.1 to 1.
4. The method of claim 1, wherein the potential energy gradient calculation algorithm comprises: Performing word vector cosine similarity matching on the newly identified entity and the nodes in the causal potential energy distribution diagram, and determining the current position node; traversing all outgoing edges of the node at the current position, and obtaining the next node pointed by each outgoing edge; for each edge-out direction, calculating a causal potential energy gradient component to be the potential energy value of the next node minus the potential energy value of the node at the current position; The gradient components in all directions are combined into a causal potential gradient vector.
5. The method of claim 4, wherein prior to vector stitching, the causal potential gradient vector and the emotional acceleration sequence are separately Z-score normalized; the emotion acceleration sequence takes the value of a plurality of time points, and the value range of the time point number is 3 to 8.
6. The method of claim 1, wherein the sliding window emotion analysis algorithm is implemented based on a recurrent neural network that receives a multimodal sequence of feature vectors within a sliding window, the output layer maps hidden states to scalar emotion intensity values through a linear mapping layer, and constrains the output in a negative-to-positive range through a hyperbolic tangent activation function; and the emotion acceleration is calculated by subtracting twice the emotion intensity value at the current moment from the emotion intensity value at the previous moment and adding the emotion intensity value at the previous two moments.
7. The method of claim 1, wherein the turning point joint prediction model is a multi-layer perceptron model comprising a probability output head and a direction output head; the probability output head outputs a turning probability value through a full-connection layer and then a Sigmoid activation function; The direction output head outputs probability distribution of positive, negative and neutral direction categories through the fully connected layer and then the Softmax activation function.
8. The method of claim 1, wherein the multi-hop traceback algorithm comprises: Initializing an empty multi-hop causal path set and a current path stack, and pushing a current position node into the path stack; Acquiring precursor node sets corresponding to all the incoming edges of the trestle top nodes; if the precursor node set is empty, inverting the node sequence in the path stack, adding the node sequence as a complete path into the set, and then popping up the stack top node and backtracking; if not, pushing each precursor node into the path stack in sequence to continue backtracking; Repeating until the path stack is empty.
9. The method of claim 1, wherein the calculating of the attribution saliency score comprises: Counting the number of nodes of the multi-hop causal path as the path length; traversing each directed edge on the path and calculating a confidence weight product; Calculating the consistency coefficient of the potential energy polarity of the path end point node and the direction of the predicted causal potential energy gradient, 1 when the two are consistent, and 0 when the two are inconsistent; the attribution saliency score is equal to the weight product multiplied by the consistency coefficient divided by the path length.
10. A multi-modal feature collaborative emotion semantic reasoning system for performing the method of any of claims 1-9, comprising: the multi-modal feature extraction module is used for acquiring real-time multi-modal data streams of the dialogue and generating a time sequence multi-modal feature sequence; The causal relation extraction module is used for analyzing dialogue texts to identify event entities, state entities and causal relations, generating a causal triplet sequence and marking emotion result nodes; The dynamic causal graph construction module is used for integrating the causal triplet sequence into a dynamic causal link graph; the causal potential energy distribution module is used for generating a causal potential energy distribution map by distributing potential energy values to each node based on the graph topological structure; The emotion analysis module is used for analyzing the time sequence multi-mode feature sequence to generate an emotion intensity value sequence and an emotion acceleration sequence; the potential energy gradient calculation module is used for calculating a causal potential energy gradient vector and fusing the causal potential energy gradient vector with the emotion acceleration sequence to generate a fusion prediction feature vector; the turning prediction module is used for calculating a turning probability value and a turning direction based on the fusion prediction feature vector and generating a turning early warning signal; The multi-hop attribution module is used for generating a multi-hop causal path set from the node reverse traversal of the current position and calculating a significance score; And the interpretation generation module is used for converting the path with the highest significance score into a causal interpretation text and outputting an early warning result.

Description

Emotion semantic reasoning method with multi-mode feature cooperation Technical Field The invention relates to the technical field of emotion turning prediction, in particular to an emotion semantic reasoning method with multi-mode feature cooperation. Background In a medical customer service dialogue scenario, multiple rounds of dialogue between a patient and a customer service person involve topics such as condition consultation, expense description, appointment arrangement and the like, and the emotional state of the patient is influenced by superposition of multiple factors, and may gradually develop from initial anxiety to discontent or trust. The existing emotion turning point prediction method mainly has the following technical problems: First, the existing emotion turning prediction method based on time sequence statistics only relies on the numerical variation trend of emotion intensity to predict, outputs turning probability but does not explain turning reasons, and customer service staff cannot adjust the coping strategy accordingly. Second, existing causal attribution methods can only perform post-hoc analysis after an emotion break occurs, and cannot provide a prognosis before the break occurs. Third, when the emotion of the patient is about to deteriorate, the system cannot either early warn in advance or predictably point out the deep causal chain (such as "high examination cost → economic pressure → generating distrust to the hospital") that may cause deterioration, resulting in the customer service personnel missing the best intervention opportunity and lacking effective intervention basis. The root cause of the problems is that the time sequence prediction method only pays attention to the surface layer change of the emotion intensity and ignores the causal driving mechanism of emotion evolution, the causal analysis method only can trace back the emotion change which occurs and lacks the capability of predicting causal conduction trend, and the two methods are respectively operated independently to cause that prediction and attribution cannot be unified. Disclosure of Invention The invention provides a multi-mode feature collaborative emotion semantic reasoning method, which solves the technical problem that a simple time sequence prediction method in the related art only depends on emotion intensity value change and ignores a causal driving mechanism. The invention provides a multi-mode feature collaborative emotion semantic reasoning method, which comprises the following steps: Acquiring a real-time multi-mode data stream of a dialogue, and generating a time sequence multi-mode characteristic sequence by utilizing a multi-mode encoder; Analyzing dialogue texts by using an entity relation extraction algorithm, identifying event entities, state entities and causal relations thereof, generating a real-time causal triplet sequence, and marking emotion result nodes; integrating the causal triplet sequence into a dynamic causal link map by using a causal map incremental updating algorithm; The causal potential energy distribution algorithm is utilized to distribute the maximum potential energy value for emotion result nodes based on the topological structure of the dynamic causal link diagram, and the potential energy value after attenuation is distributed for other nodes according to the shortest path length to the emotion result nodes, so that a causal potential energy distribution diagram is generated; Analyzing the sequential multi-mode feature sequence by utilizing a sliding window emotion analysis algorithm to generate a historical emotion intensity value sequence, and generating an emotion acceleration sequence by second-order differential calculation; Determining a current dialogue position node by utilizing a potential energy gradient calculation algorithm, calculating causal potential energy gradient components along each edge-out direction, and splicing the causal potential energy gradient vector and the emotion acceleration sequence to generate a fusion prediction feature vector; calculating a turning probability value and a predicted causal potential gradient direction based on the fusion prediction feature vector by using a turning point joint prediction model, and generating a turning early warning signal when the turning probability value exceeds a preset threshold value and the direction is negative; Traversing the current position node to the root node along the causal edge reversely by utilizing a multi-hop backtracking algorithm, generating a multi-hop causal path set, and calculating attribution significance scores of all paths; And converting the multi-hop causal path with the highest attribution significance score into causal interpretation text, and outputting a turning early warning signal and causal chain interpretation. Further, the causal potential energy allocation algorithm comprises: The maximum potential energy value is distributed for the emotion result node, a