CN-122020572-A - Customer service dialogue emotion analysis processing method and system based on deep learning

CN122020572ACN 122020572 ACN122020572 ACN 122020572ACN-122020572-A

Abstract

The invention provides a customer service dialogue emotion analysis processing method and system based on deep learning, and relates to the technical field of multi-mode emotion calculation, wherein the method comprises the following steps of 1, preprocessing and feature extraction are carried out on collected voice signals, and global and local emotion related features in the voice signals are captured to obtain deep voice emotion feature vectors; and 2, carrying out multi-mode fusion analysis on the deep voice emotion feature vector, the facial micro-expression features extracted from the video stream and the semantic features extracted from the text record, and identifying the real-time emotion state transmitted in the customer service dialogue to obtain an emotion identification result. The invention realizes the identification, the interpretability traceability and the intelligent business decision recommendation of customer service dialogue emotion.

Inventors

YE YONGDIAN

Assignees

厦门优数科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (10)

1. The customer service dialogue emotion analysis processing method based on deep learning is characterized by comprising the following steps of: Step 1, preprocessing and feature extraction are carried out on collected voice signals, global and local emotion related features in the voice signals are captured, and deep voice emotion feature vectors are obtained; step 2, carrying out multi-mode fusion analysis on the deep voice emotion feature vector, facial micro-expression features extracted from the video stream and semantic features extracted from the text record, and identifying real-time emotion states transmitted in customer service conversations to obtain emotion identification results; Step 3, constructing a dynamic space geometrical configuration according to three-dimensional space coordinate tracks of three feature points of the outer canthus point of the left eye, the outer canthus point of the right eye and the canthus point in continuous video frames according to facial micro expression features, carrying out multi-scale mechanical coupling analysis on the space geometrical configuration, regarding the geometrical configuration as a composite material laminated structure consisting of a plurality of micro primitives, mapping mechanical response of the geometrical configuration under different time scales into macroscopic emotion feature vectors by solving mechanical interaction among the micro primitives and combining with a micro mechanical homogenization theory, carrying out feature cascading on the macroscopic emotion feature vectors and initial emotion recognition results, and inputting the macroscopic emotion feature vectors into a preset correction network to obtain corrected emotion recognition results; And 4, based on the corrected emotion recognition result, deducing the relation between emotion expression and generation reasons in a directed graph form by reconstructing the context relation of clauses in the dialogue text to obtain an emotion cause analysis report, and combining the emotion recognition result and the emotion cause analysis report to obtain a corresponding business decision recommendation suggestion.
2. The deep learning-based customer service dialogue emotion analysis processing method according to claim 1 is characterized by further comprising the step of collecting original multi-modal data in a customer service dialogue process in real time through an internet of things perception layer device, wherein the original multi-modal data at least comprises a customer and customer service voice signal, a dialogue video stream and a dialogue text record.
3. The deep learning-based customer service dialogue emotion analysis processing method according to claim 2, wherein step 1, by preprocessing and feature extraction of the collected voice signal, captures global and local emotion related features in the voice signal to obtain a deep voice emotion feature vector, comprises: Extracting logarithmic Mel frequency spectrum characteristics from each frame of voice signal to obtain a two-dimensional time-frequency characteristic map, inputting the two-dimensional time-frequency characteristic map into a convolutional neural network, extracting local emotion characteristics through multi-layer convolution operation, and obtaining local characteristic tensor; inputting the local feature tensor into the coordinate attention process, carrying out global average pooling on the feature map along the horizontal coordinate direction and the vertical coordinate direction respectively to obtain position sensitive feature vectors in two directions, and carrying out splicing and convolution transformation on the feature vectors in two directions to obtain the channel attention weight and the space attention weight through calculation; The emotion feature tensor is expanded according to time sequence and is input into a two-way long-short-term memory network, and a long-time sequence emotion dependency relationship in a voice signal is captured to obtain a global time sequence emotion feature vector; And performing dimension reduction and nonlinear mapping on the global time sequence emotion feature vector through a full connection layer to obtain a final deep voice emotion feature vector.
4. A deep learning-based customer service dialogue emotion analysis processing method according to claim 3, wherein said step 2 comprises: The method comprises the steps of carrying out layer normalization processing on depth voice emotion feature vectors to obtain aligned voice emotion feature representation, extracting space-time features in a video frame sequence through a three-dimensional convolutional neural network based on facial micro-expression features extracted from a video stream to obtain facial dynamic feature tensors; Splicing the voice emotion feature representation, the facial dynamic feature tensor and the text semantic feature vector in feature dimension to obtain a multi-mode fusion feature matrix; Inputting the multi-modal fusion feature matrix into a cross-modal attention layer, taking the voice modal feature as a query matrix, taking the visual modal feature and the text modal feature as a key matrix and a value matrix, and respectively calculating the voice-visual attention weight and the voice-text attention weight; according to the attention weight, weighting and summing the visual modal characteristics and the text modal characteristics to obtain visual enhancement characteristics and text enhancement characteristics, and carrying out residual connection with the original voice characteristics to obtain the fused multi-modal emotion characteristics; The multi-mode emotion characteristics are input into a multi-layer perceptron to carry out nonlinear transformation, probability distribution of each emotion category is calculated through a classification layer, and the category with the highest probability is used as a real-time emotion recognition result of the current dialogue segment.
5. The deep learning-based customer service dialogue emotion analysis processing method according to claim 4, wherein step 3, according to facial microexpressive features, constructs a dynamically changing spatial geometry for three-dimensional space coordinate tracks of three feature points of a left eye outer canthus point, a right eye outer canthus point and a mouth angle point in continuous video frames: Based on the facial dynamic characteristic tensor, positioning a left outer canthus point, a right outer canthus point and a mouth angle point in each frame of video image through a key point detection network, and generating two-dimensional coordinates of three key points in an image coordinate system; Predicting depth information of each key point through a preset depth estimation network by combining the two-dimensional coordinates, and mapping the two-dimensional coordinates into three-dimensional space coordinates to obtain a three-dimensional space coordinate track sequence of three key points in continuous video frames; Performing surface fitting on triangles formed by three key points in each frame of image according to the three-dimensional space coordinate track sequence to construct a dynamically-changed space triangle mesh curved surface, wherein the vertex coordinates of the space triangle mesh curved surface continuously change along with time; Discretizing the constructed space triangle mesh curved surface into a plurality of micro primitives, wherein each micro primitive corresponds to a local area on the curved surface, and defining the mechanical connection relationship between primitives to form a multi-scale composite material laminated structure.
6. The deep learning-based customer service dialogue emotion analysis processing method of claim 5, wherein the mapping of the mechanical response of the geometric configuration under different time scales into the macroscopic emotion feature vector by solving the mechanical interaction among the microscopic primitives and combining with the micro-mechanical homogenization theory, the feature cascading of the macroscopic emotion feature vector and the initial emotion recognition result, and inputting the feature cascading into a preset correction network to obtain the corrected emotion recognition result comprises the following steps: Applying a preset mechanical boundary condition to the formed composite material laminated structure, solving stress field and strain field distribution of each microscopic element under different time scales by a finite element analysis method, and calculating mechanical interaction strength between the microscopic elements; Inputting the obtained microscale mechanical response result into a mesomechanics homogenization model, mapping a microscale stress-strain field into macroscopic equivalent mechanical property parameters by a volume average method, and generating a macroscopic emotion feature vector corresponding to a time scale; the real-time emotion recognition result is used as an initial emotion recognition result and is cascaded with the macroscopic emotion feature vector in feature dimension to form a combined emotion feature representation; And inputting the combined emotion characteristic representation into a preset deep neural network correction model, performing error compensation and calibration on the initial emotion recognition result through multi-layer nonlinear transformation, and outputting a corrected emotion recognition result.
7. The deep learning-based customer service dialogue emotion analysis processing method according to claim 6, wherein the step 4 is characterized in that based on the corrected emotion recognition result, the emotion cause analysis report is obtained by reconstructing the context relation of clauses in the dialogue text to infer the relationship between emotion expression and cause generation in a directed graph form, and the corresponding business decision recommendation suggestion is obtained by combining the emotion recognition result and emotion cause analysis report, and comprises the following steps: Based on the corrected emotion recognition result, positioning key clauses related to emotion state change from dialogue text records, performing dependency syntactic analysis and semantic role marking on the key clauses, extracting semantic dependency relations among the clauses, and constructing an initial semantic relation graph; Inputting an initial semantic relation graph into a graph neural network, learning context dependency relations among clause nodes through information propagation and aggregation among nodes, and describing causal relation paths between emotion expression nodes and possible reason nodes in a directed graph mode; According to the causal association path output by the graph neural network, generating an emotion cause analysis report by combining the corrected emotion recognition result, wherein the report comprises the current emotion type, key clause content triggering emotion, emotion evolution process and corresponding cause description; Inputting the corrected emotion recognition result and emotion cause analysis report into a preset service decision model, and outputting corresponding service decision recommendation suggestions by matching rules in a service rule base.
8. A deep learning-based customer service dialogue emotion analysis processing system implementing the method according to any one of claims 1 to 7, comprising: the extraction module is used for capturing global and local emotion related features in the voice signals through preprocessing and feature extraction of the collected voice signals to obtain deep voice emotion feature vectors; The fusion module is used for carrying out multi-mode fusion analysis on the deep voice emotion feature vector, the facial micro-expression features extracted from the video stream and the semantic features extracted from the text record, and identifying the real-time emotion state transmitted in the customer service dialogue to obtain an emotion identification result; The correction module is used for constructing a dynamic space geometrical configuration of three feature points of the outer canthus point of the left eye, the outer canthus point of the right eye and the mouth angle point in a continuous video frame according to facial micro expression features, carrying out multi-scale mechanical coupling analysis on the space geometrical configuration, regarding the geometrical configuration as a composite material laminated structure consisting of a plurality of micro primitives, mapping mechanical response of the geometrical configuration under different time scales into macroscopic emotion feature vectors by solving mechanical interaction among the micro primitives and combining a micro mechanical homogenization theory, carrying out feature cascading on the macroscopic emotion feature vectors and an initial emotion recognition result, and inputting the macroscopic emotion feature vectors into a preset correction network to obtain corrected emotion recognition result; the processing module is used for obtaining emotion cause analysis reports by reconstructing context relation of clauses in the dialogue text based on the corrected emotion recognition results and deducing relations between emotion expressions and generation causes in a directed graph mode, and obtaining corresponding business decision recommendation suggestions by combining the emotion recognition results and the emotion cause analysis reports.
9. A computing device, comprising: one or more processors; Storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the method of any of claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program which, when executed by a processor, implements the method according to any of claims 1 to 7.

Description

Customer service dialogue emotion analysis processing method and system based on deep learning Technical Field The invention relates to the technical field of multi-modal emotion calculation, in particular to a customer service dialogue emotion analysis processing method and system based on deep learning. Background According to the artificial intelligence technology, single-mode or simple multi-mode emotion feature extraction and classification are realized, a typical case is an artificial intelligence customer service dialogue emotion analysis system with small financial cooperation in the financial field and degree, voice signals and text records of customer service dialogues are collected, voice acoustic features and text semantic features are respectively extracted through an artificial intelligence deep learning algorithm, emotion recognition is completed through a classification network after simple feature dimension splicing, states such as repayment willingness and emotion fluctuation of customers are judged, references are provided for customer service work after financial lending, visual mode features such as customer face micro-expressions are not fused, emotion misjudgment is not caused by the fact that an artificial intelligence model is prone to occurrence of emotion misjudgment due to voice environment interference and text expression, recognition precision of weak emotion and complex emotion is insufficient, emotion traceability based on causality is lacked, deep causes generated by customer emotion are difficult to be mined, basic emotion category results are difficult to be output, targeted service decision support is difficult to be provided for customer service personnel, and practical application value of the artificial emotion analysis results is reduced. Disclosure of Invention The invention provides a customer service dialogue emotion analysis processing method and system based on deep learning, which realize identification, interpretability traceability and intelligent business decision recommendation of customer service dialogue emotion. In order to solve the technical problems, the technical scheme of the invention is as follows: In a first aspect, a deep learning-based customer service dialogue emotion analysis processing method includes: Step 1, preprocessing and feature extraction are carried out on collected voice signals, global and local emotion related features in the voice signals are captured, and deep voice emotion feature vectors are obtained; step 2, carrying out multi-mode fusion analysis on the deep voice emotion feature vector, facial micro-expression features extracted from the video stream and semantic features extracted from the text record, and identifying real-time emotion states transmitted in customer service conversations to obtain emotion identification results; Step 3, constructing a dynamic space geometrical configuration according to three-dimensional space coordinate tracks of three feature points of the outer canthus point of the left eye, the outer canthus point of the right eye and the canthus point in continuous video frames according to facial micro expression features, carrying out multi-scale mechanical coupling analysis on the space geometrical configuration, regarding the geometrical configuration as a composite material laminated structure consisting of a plurality of micro primitives, mapping mechanical response of the geometrical configuration under different time scales into macroscopic emotion feature vectors by solving mechanical interaction among the micro primitives and combining with a micro mechanical homogenization theory, carrying out feature cascading on the macroscopic emotion feature vectors and initial emotion recognition results, and inputting the macroscopic emotion feature vectors into a preset correction network to obtain corrected emotion recognition results; And 4, based on the corrected emotion recognition result, deducing the relation between emotion expression and generation reasons in a directed graph form by reconstructing the context relation of clauses in the dialogue text to obtain an emotion cause analysis report, and combining the emotion recognition result and the emotion cause analysis report to obtain a corresponding business decision recommendation suggestion. Furthermore, the step 1 is preceded by collecting original multi-modal data in the customer service dialogue process in real time through the internet of things perception layer equipment, wherein the original multi-modal data at least comprises a voice signal of a customer and customer service, a dialogue video stream and a dialogue text record. Further, by preprocessing and extracting features of the collected voice signals, capturing global and local emotion related features in the voice signals, a deep voice emotion feature vector is obtained, which comprises the following steps: Extracting logarithmic Mel frequency spectrum characteristics from ea