CN-121997201-A - Intelligent AI seat emotion recognition system and method
Abstract
An intelligent AI seat emotion recognition system and method comprises the steps of constructing an dominant emotion keyword library, carrying out feature extraction on high-recognition voice features, text data and interactive behavior data based on a solidified knowledge base and the dominant emotion keyword library, generating multi-modal features, constructing a sample database, extracting a dominant emotion labeling sample set from the sample database, carrying out dominant emotion multi-modal feature contribution degree statistical analysis, dynamic weight rule library construction and dominant emotion recognition operation, judging the dominant emotion of a user according to a dominant emotion recognition result, or extracting hidden micro features, semantic contradiction degree and interactive behavior features from the high-recognition voice features, the text data and the interactive behavior data, carrying out hidden emotion recognition and personalized calibration, and judging the hidden emotion of the user or judging the user as no hidden emotion. The invention ensures the efficiency of dominant emotion recognition, solves the problem that the prior art is difficult to recognize the pain points of the recessive emotion, and covers the whole scene of user emotion expression.
Inventors
- Xiong Zhehui
Assignees
- 深圳市云客派科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260109
Claims (10)
- 1. An intelligent AI agent emotion recognition method is characterized by comprising the following steps: Step S1, voice data, text data and interaction behavior data in the interaction process of a user and an AI agent are obtained in real time, differentiated noise separation is carried out on the voice data, pure voice data is obtained, a solidification knowledge base is constructed, and general emotion association characteristic enhancement is carried out on the pure voice data according to the solidification knowledge base, so that high-recognition voice characteristics are obtained; Step s2, constructing a dominant emotion keyword library, carrying out feature extraction on high-recognition voice features, text data and interactive behavior data based on a solidified knowledge base and the dominant emotion keyword library, generating multi-modal features, constructing a sample database, extracting a dominant emotion labeling sample set from the sample database, carrying out dominant emotion multi-modal feature contribution degree statistical analysis, dynamic weight rule library construction and dominant emotion recognition operation, and judging dominant emotion of a user according to a dominant emotion recognition result or executing step s3; And step 3, extracting hidden micro-features, semantic contradiction degree and interactive behavior features from the high-recognition voice features, the text data and the interactive behavior data to construct multi-modal features, carrying out hidden emotion recognition and personalized calibration on the multi-modal features, and judging the hidden emotion of the user or judging the user as no hidden emotion.
- 2. The intelligent AI-agent emotion recognition method of claim 1, wherein the process of differentiating noise separation of speech data comprises: Constructing a noise spectrum feature library, wherein the noise spectrum feature library comprises a plurality of reference features of noise types, carrying out voice signal framing on collected voice data, obtaining short-time Fourier transform spectrums of each frame, constructing a spectrogram, extracting spectrum features in the spectrogram, carrying out similarity matching on the spectrum features and the reference features in the noise spectrum feature library, obtaining the noise types of the spectrum features, carrying out differential noise separation on the spectrum features according to the noise types of the spectrum features, and then reconstructing the spectrum features with the noise separation completed into pure voice data.
- 3. The intelligent AI-agent emotion recognition method of claim 2, wherein the process of constructing the solidified knowledge base includes: Constructing a multi-emotion voice sample library, wherein the multi-emotion voice sample library comprises voice samples of all target emotion types, extracting all acoustic features of each voice sample in the sample library, carrying out correlation strength analysis on each acoustic feature and the target emotion types, obtaining correlation coefficients of each acoustic feature and any target emotion type, presetting a strong correlation threshold and an irrelevant threshold, screening out acoustic feature sets with absolute values of the correlation coefficients of any target emotion types being larger than the strong correlation threshold, marking the acoustic feature sets with absolute values of the correlation coefficients of any target emotion types being larger than the strong correlation threshold as universal emotion correlation features, screening out acoustic feature sets with absolute values of the correlation coefficients of any target emotion types being smaller than the irrelevant threshold, and marking the acoustic feature sets as irrelevant features; And constructing a solidification knowledge base according to the general emotion related features and the irrelevant features.
- 4. The intelligent AI-agent emotion recognition method of claim 3, wherein the step of performing generic emotion-related feature enhancement on clean speech data based on the consolidated knowledge base comprises: Extracting all acoustic features of pure voice data, constructing an acoustic feature vector, matching general emotion related features and irrelevant features in each acoustic feature solidification knowledge base in the acoustic feature vector, if the acoustic features are contained in the general emotion related features, distributing fixed enhancement coefficients for the acoustic features, if the acoustic features are contained in the irrelevant features, distributing fixed attenuation coefficients for the acoustic features, performing weighted calculation on each acoustic feature in the acoustic feature vector and the fixed enhancement coefficients or fixed attenuation coefficients corresponding to the acoustic features, obtaining enhanced features, performing PCA dimension reduction on the enhanced features, and generating high-recognition voice features.
- 5. The intelligent AI-agent emotion recognition method of claim 4, wherein constructing an explicit emotion keyword library, and performing feature extraction on high-recognition speech features, text data, and interactive behavior data based on the solidified knowledge base and the explicit emotion keyword library comprises: extracting acoustic features with absolute values of correlation coefficients of any dominant emotion larger than a strong correlation threshold value from a solidified knowledge base, marking the acoustic features as dominant voice features, and extracting the dominant voice features from high-recognition voice features; Constructing a dominant emotion keyword library, wherein the dominant emotion keyword library comprises keywords related to various types of dominant emotions and weights corresponding to the keywords, preprocessing text data, carrying out statistical analysis on the preprocessed text data based on the dominant emotion keyword library, acquiring occurrence frequencies of the keywords related to various types of dominant emotions in the text data, acquiring scores of various types of dominant emotions according to the occurrence frequencies of the keywords related to various types of dominant emotions and the weights corresponding to the keywords, carrying out sentence pattern analysis on the preprocessed text data, acquiring sentence pattern occupation ratios of various types, and constructing text features according to the scores of various types of dominant emotions and the sentence pattern occupation ratios of various types; performing quantization processing on the interactive behavior data to obtain various standardized feature values, and constructing interactive behavior features according to the various standardized feature values; And constructing multi-modal features according to the explicit voice features, the text features and the interactive behavior features.
- 6. The intelligent AI agent emotion recognition method of claim 5, wherein the steps of constructing a sample database, extracting a dominant emotion annotation sample set from the sample database for statistical analysis of dominant emotion multi-modal feature contribution, construction of a dynamic weight rule base, and dominant emotion recognition include: Constructing a sample database, wherein the sample database comprises a plurality of dominant emotion scenes and multi-modal features under the recessive emotion scenes, and the multi-modal features under the plurality of dominant emotion scenes are extracted from the sample database to serve as a dominant emotion annotation sample set; carrying out statistical analysis on contribution degrees of dominant emotion multi-modal features on the dominant emotion annotation sample set, obtaining contribution duty ratios of three types of features in the multi-modal features under different intensity levels, and constructing a dynamic weight rule base based on the contribution duty ratios; Constructing a dominant emotion recognition model, training the dominant emotion recognition model by using a dominant emotion labeling sample set, acquiring a trained dominant emotion recognition model, acquiring intensity levels of three types of features in current multi-modal features, setting dynamic weights of the three types of features in the multi-modal features according to contribution duty ratios corresponding to the intensity levels and a dynamic weight rule base, inputting the multi-modal features with the dynamic weights set into the dominant emotion recognition model, and outputting confidence degrees of all types of dominant emotions according to the dominant emotion recognition model; Presetting a dominant emotion confidence threshold, screening dominant emotion with highest confidence from the dominant emotion if the corresponding confidence is larger than the dominant emotion confidence threshold, judging the dominant emotion as the dominant emotion of the user, and performing implicit emotion recognition if the confidence of all types of dominant emotions is smaller than the dominant emotion confidence threshold.
- 7. The intelligent AI-agent emotion recognition method of claim 6, wherein the steps of performing explicit emotion multi-modal feature contribution statistical analysis on the explicit emotion labeling sample set and constructing the dynamic weight rule base include: Grouping dominant emotion marking sample sets according to dominant emotion types to obtain various dominant emotion samples, carrying out normalization processing on three types of characteristics in multi-modal characteristics in the various dominant emotion samples, mapping the three types of characteristics to a standardized characteristic value interval, selecting threshold points in the standardized characteristic value interval to divide sub-intervals with different intensity levels, carrying out intensity level division on the three types of characteristics in the multi-modal characteristics in the various dominant emotion samples to generate sub-samples with different intensity levels of each type of characteristics, carrying out emotion label marking and modal contribution degree marking on the sub-samples with different intensity levels of each type of characteristics in each type of dominant emotion samples, and obtaining core contribution sample numbers and auxiliary contribution sample numbers of the three types of characteristics in each type of dominant emotion samples under different intensity levels; Setting a core contribution weight coefficient and an auxiliary contribution weight coefficient, and acquiring contribution duty ratios of three types of features under different intensity levels under each type of dominant emotion condition according to the core contribution sample number, the auxiliary contribution sample number, the core contribution weight coefficient and the auxiliary contribution weight coefficient; Setting a contribution duty ratio threshold and a dynamic weight adjustment rule, comparing the contribution duty ratio of each type of feature under different intensity levels with the contribution duty ratio threshold, if the contribution duty ratio of a certain type of feature under the intensity level is larger than the contribution duty ratio threshold, adding the dynamic weight adjustment rule for the certain type of feature and the corresponding intensity level, and constructing a dynamic weight rule base according to each type of feature added with the dynamic weight adjustment rule and the corresponding intensity level.
- 8. The intelligent AI-agent emotion recognition method of claim 7, wherein the process of step s3 includes: extracting acoustic features with absolute values of correlation coefficients of any implicit emotion larger than a strong correlation threshold value from a solidified knowledge base, marking the acoustic features as implicit micro-features, extracting the implicit micro-features from high-recognition voice features, comparing text features of current text data of a user with text features of historical text data to obtain semantic contradiction degree, obtaining interactive behavior features of the current interactive behavior data, and constructing multi-modal features according to the implicit micro-features, the semantic contradiction degree and the interactive behavior features; Constructing a hidden emotion recognition model, extracting multi-modal features in a plurality of hidden emotion scenes from a sample database to serve as a hidden emotion marking sample set, and training the hidden emotion recognition model by using the hidden emotion marking sample set to obtain a trained hidden emotion recognition model; Inputting the multi-modal features into a hidden emotion recognition model, outputting the confidence degrees of various hidden emotions according to the hidden emotion recognition model, presetting a hidden emotion confidence coefficient threshold value, comparing the confidence degrees of various hidden emotions with the hidden emotion confidence coefficient threshold value, judging that the user has no hidden emotion if the confidence degrees of various hidden emotions are smaller than the hidden emotion confidence coefficient threshold value, and carrying out personalized calibration on the confidence degrees of the hidden emotions if the confidence degrees of the hidden emotions are larger than the hidden emotion confidence coefficient threshold value.
- 9. The intelligent AI agent emotion recognition method of claim 8, wherein the process of performing personalized calibration comprises: constructing a user portrait library, wherein the user portrait library is used for storing the static portrait features of the user, extracting the static portrait features of the user from the user portrait library, extracting the dynamic features from the interaction process of the user and the AI agent, and constructing the user portrait according to the static portrait features and the dynamic features of the user; Constructing a portrait association comparison table which comprises association weights and association coefficients of different portrait features and various hidden moods, and acquiring calibration coefficients according to static portrait features and dynamic features in the user portrait and the portrait association comparison table; And calibrating the confidence coefficient of the hidden emotion according to the calibration coefficient, and judging the hidden emotion as the hidden emotion of the user if the confidence coefficient after the calibration of the hidden emotion is larger than the threshold value of the confidence coefficient of the hidden emotion.
- 10. An intelligent AI agent emotion recognition system, specifically applied to the intelligent AI agent emotion recognition method of any one of claims 1 to 9, characterized by comprising a cloud end, wherein the cloud end is in communication connection with a data acquisition module, a dominant emotion recognition module and a recessive emotion recognition module; The data acquisition module is used for acquiring voice data, text data and interaction behavior data in the interaction process of the user and the AI agent in real time, carrying out differential noise separation on the voice data, acquiring pure voice data, constructing a solidification knowledge base, carrying out general emotion association feature enhancement on the pure voice data according to the solidification knowledge base, and acquiring high-recognition voice features; The dominant emotion recognition module is used for constructing a dominant emotion keyword library, extracting features of high-recognition voice features, text data and interactive behavior data based on a solidified knowledge base and the dominant emotion keyword library, generating multi-modal features, constructing a sample database, extracting a dominant emotion labeling sample set from the sample database, performing dominant emotion multi-modal feature contribution degree statistical analysis, dynamic weight rule library construction and dominant emotion recognition operation, and judging dominant emotion of a user or executing the dominant emotion recognition module according to a dominant emotion recognition result; The hidden emotion recognition module is used for extracting hidden micro-features, semantic contradiction degree and interaction behavior features from the high-recognition voice features, the text data and the interaction behavior data to construct multi-modal features, carrying out hidden emotion recognition and personalized calibration on the multi-modal features, and judging the hidden emotion of the user or judging the user as no hidden emotion.
Description
Intelligent AI seat emotion recognition system and method Technical Field The invention relates to the technical field of artificial intelligence and emotion analysis, in particular to an intelligent AI seat emotion recognition system and method. Background Under the wave of digital transformation, the customer service industry is undergoing a profound revolution. The traditional manual seat service mode faces many challenges such as high labor cost, limited service time, influence of mood swings on service quality and the like. Meanwhile, with the maturity of artificial intelligence technologies such as natural language processing, voice recognition, machine learning and the like, intelligent AI agents have been generated, and revolutionary changes are brought to customer service. The intelligent AI agent is an automatic customer service system based on artificial intelligence technology, and utilizes Natural Language Processing (NLP) and machine learning technology to simulate human customer service representatives to interact with customers (1). Compared with the traditional artificial agent, the intelligent AI agent has the remarkable advantages of 7×24 hours uninterrupted service, standardized flow execution, millisecond response speed and the like. However, in the existing intelligent AI seat emotion recognition technology, voice features are prone to noise interference to cause insufficient recognition, dominant and implicit layered recognition is not performed on emotion to make it difficult to achieve both recognition efficiency and implicit emotion capturing capability, multi-mode features adopt fixed weight distribution to be incapable of adapting to scene differences of different emotion types and feature intensity levels, and individual emotion expression habit differences of users are ignored to cause misjudgment and missed judgment of the implicit emotion, and the defects commonly cause that the prior art cannot realize comprehensive and accurate recognition of the emotion of the user, so that pertinence of an AI seat service strategy and improvement of user interaction experience are affected. Disclosure of Invention In order to solve the technical problems, the invention aims to provide an intelligent AI seat emotion recognition method, which comprises the following steps: Step S1, voice data, text data and interaction behavior data in the interaction process of a user and an AI agent are obtained in real time, differentiated noise separation is carried out on the voice data, pure voice data is obtained, a solidification knowledge base is constructed, and general emotion association characteristic enhancement is carried out on the pure voice data according to the solidification knowledge base, so that high-recognition voice characteristics are obtained; Step s2, constructing a dominant emotion keyword library, carrying out feature extraction on high-recognition voice features, text data and interactive behavior data based on a solidified knowledge base and the dominant emotion keyword library, generating multi-modal features, constructing a sample database, extracting a dominant emotion labeling sample set from the sample database, carrying out dominant emotion multi-modal feature contribution degree statistical analysis, dynamic weight rule library construction and dominant emotion recognition operation, and judging dominant emotion of a user according to a dominant emotion recognition result or executing step s3; And step 3, extracting hidden micro-features, semantic contradiction degree and interactive behavior features from the high-recognition voice features, the text data and the interactive behavior data to construct multi-modal features, carrying out hidden emotion recognition and personalized calibration on the multi-modal features, and judging the hidden emotion of the user or judging the user as no hidden emotion. Further, the process of performing differential noise separation on the voice data includes: Constructing a noise spectrum feature library, wherein the noise spectrum feature library comprises a plurality of reference features of noise types, carrying out voice signal framing on collected voice data, obtaining short-time Fourier transform spectrums of each frame, constructing a spectrogram, extracting spectrum features in the spectrogram, carrying out similarity matching on the spectrum features and the reference features in the noise spectrum feature library, obtaining the noise types of the spectrum features, carrying out differential noise separation on the spectrum features according to the noise types of the spectrum features, and then reconstructing the spectrum features with the noise separation completed into pure voice data. Further, the process of constructing the solidification knowledge base includes: Constructing a multi-emotion voice sample library, wherein the multi-emotion voice sample library comprises voice samples of all target emotion types, extracti