CN-122024147-A - Multi-mode emotion analysis method and system based on dynamic noise estimation and KAN

CN122024147ACN 122024147 ACN122024147 ACN 122024147ACN-122024147-A

Abstract

The application discloses a multimode emotion analysis method and a multimode emotion analysis system based on dynamic noise estimation and KAN, and belongs to the technical field of multimode large language models. The method comprises the steps of inputting the extracted multi-modal feature vectors into DNEs of corresponding modal features of a machine learning model respectively to generate noise confidence scores of the corresponding modal features, calculating regularization coefficients corresponding to the noise confidence scores corresponding to each feature dimension respectively, calculating loss values of loss functions of the machine learning model based on the regularization coefficients, updating weights and spline control coefficients of NIB-KAN based on the loss values, and generating a first emotion analysis result of a video to be analyzed based on feature encoders and NIB-KAN of the machine learning model after training. The application improves the robustness of the machine learning model in carrying out emotion analysis on the video under a complex environment.

Inventors

WU TIAN
LU RUIKANG
LIU CHUNNIAN
XIAO HUAN

Assignees

南昌大学

Dates

Publication Date: 20260512
Application Date: 20260415

Claims (10)

1. The multimode emotion analysis method based on the dynamic noise estimation and the KAN is characterized by comprising the following steps of: When forward propagation of a training stage of a machine learning model is carried out, after feature extraction is carried out on training samples by a feature encoder based on the machine learning model, the extracted multi-modal feature vectors are respectively input into dynamic noise estimators DNE of corresponding modal features of the machine learning model, and noise confidence scores of the corresponding modal features are generated; respectively calculating regularization coefficients corresponding to the noise confidence scores corresponding to each feature dimension; Calculating a loss value of a loss function of the machine learning model based on the regularization coefficient; Updating the weight and spline control coefficient of noise perception information bottleneck-Kerr Mo Ge Roche-Arnold network NIB-KAN of the corresponding modal characteristics of the machine learning model based on the loss value during the counter-propagation of the training stage, wherein the NIB-KAN is used for carrying out modal alignment on the multi-modal characteristics, and the updated weight and the updated spline control coefficient are used for the next round of training of the machine learning model; After training of the machine learning model is completed, responding to receiving a video to be analyzed, and generating a first emotion analysis result of the video to be analyzed based on a feature encoder and NIB-KAN of the machine learning model after training is completed and an emotion analyzer.
2. The multi-modal emotion analysis method based on dynamic noise estimation and KAN according to claim 1, wherein the inputting the extracted multi-modal feature vectors into the dynamic noise estimator DNE of the corresponding modal feature of the machine learning model, respectively, generates a noise confidence score of the corresponding modal feature, includes: and respectively carrying out feature compression and activation and probability mapping and normalization on vectors of the corresponding modal features in the extracted multi-modal feature vectors based on DNE of the corresponding modal features of the machine learning model to generate noise confidence scores of the corresponding modal features.
3. The multi-modal emotion analysis method based on dynamic noise estimation and KAN of claim 2, wherein the feature compression and activation is achieved by the following formula: Wherein the said Is characterized by corresponding mode Is a complete input feature vector of (1) A learnable weight matrix for a first stage, the first stage being a stage of training using coarse-grained samples Is the bias vector of the first stage, the To activate a function, the To pair(s) Performing feature compression and activation to obtain hidden space feature vectors; the probability mapping and normalization are realized through the following formula: Wherein the said A second stage of a learnable weight matrix, wherein the second stage is a fine-tuning stage using fine-grained samples, the method comprises the steps of A bias scalar for the second stage, the Is a normalization function, the Is a noise confidence score.
4. The dynamic noise estimation and KAN-based multi-modal emotion analysis method of claim 3, wherein upon forward propagation of the training phase of the machine learning model, the method further comprises: performing modal alignment on the multi-modal features based on the NIB-KAN, and generating a corresponding word sequence; splicing the word element sequence and the received text instruction vector to obtain a multi-mode input sequence; and generating a second emotion analysis result corresponding to the multi-mode input sequence based on the emotion analyzer of the machine learning model.
5. The method for multi-modal emotion analysis based on dynamic noise estimation and KAN of claim 4, wherein said performing modal alignment on said multi-modal features based on said NIB-KAN and generating a corresponding word sequence is implemented by the following formula: Wherein, the For the first word sequence of the output word sequence Characteristic values of the individual lemmas; is the first The overall feature dimension of the individual modality features; index for feature dimension of multi-modal feature; is the first The first of the modal characteristics Feature values of the individual feature dimensions; is the first Under the individual modes, the first Feature dimension to the first The expression of the mapping function based on the B-spline function of each word element is as follows: Wherein, the Is the first B-spline basis function in An output value at; is the first Mode number of the first embodiment Feature dimension, the first A learnable scaling weight for the individual lemmas; is the first Mode number of the first embodiment Feature dimension, the first Individual word element, the first The learnable control coefficients corresponding to the B-spline basis functions; The number of B-spline meshes; The function is activated for Sigmoid linear units.
6. The method of claim 5, wherein the calculating a loss value of a loss function of the machine learning model based on the regularization coefficients comprises: calculating a loss value of a structured sparsity loss based on the regularization coefficient, the structured sparsity loss being used to model the regularized coefficient And An L1 regularization constraint is applied.
7. The method for multi-modal emotion analysis based on dynamic noise estimation and KAN of claim 6, wherein the calculating of a loss value of structured sparsity loss based on the regularization coefficient The method is realized by the following formula: Wherein the said Is characterized by corresponding mode Is included in the regularization coefficients of (a); Representing the L1 norm calculation.
8. The method of claim 1, wherein the DNE is a multi-layer perceptron MLP comprising two fully connected layers.
9. A multi-modal emotion analysis system based on dynamic noise estimation and KAN is characterized in that, the multimode emotion analysis system based on dynamic noise estimation and KAN comprises: The first generation module is used for respectively inputting the extracted multi-modal feature vectors into dynamic noise estimators DNE of the corresponding modal features of the machine learning model after feature extraction is carried out on the training samples by a feature encoder based on the machine learning model during forward propagation of a training stage of the machine learning model, and generating noise confidence scores of the corresponding modal features; The first calculation module is used for calculating regularization coefficients corresponding to the noise confidence scores corresponding to each feature dimension respectively; a second calculation module for calculating a loss value of a loss function of the machine learning model based on the regularization coefficient; The updating module is used for updating the weight and the spline control coefficient of the noise perception information bottleneck-Kerr Mo Ge Roche-Arnold network NIB-KAN of the corresponding modal characteristics of the machine learning model based on the loss value when the training phase is in back propagation, wherein the NIB-KAN is used for carrying out modal alignment on the multi-modal characteristics, and the updated weight and the updated spline control coefficient are used for the next training of the machine learning model; The second generation module is used for responding to the received video to be analyzed after the training of the machine learning model is completed, and generating a first emotion analysis result of the video to be analyzed based on the feature encoder and the NIB-KAN of the machine learning model after the training is completed and the emotion analyzer.
10. A readable storage medium, wherein a program or instructions is stored on the readable storage medium, which when executed by a processor, implements the steps of the dynamic noise estimation and KAN based multimodal emotion analysis method of any of claims 1-8.

Description

Multi-mode emotion analysis method and system based on dynamic noise estimation and KAN Technical Field The application belongs to the technical field of multi-mode large language models, and particularly relates to a multi-mode emotion analysis method and system based on dynamic noise estimation and KAN. Background At present, a multi-modal large language model generally adopts a linear projection layer as a modal alignment layer, so that unified mapping of visual, auditory and text characteristics is realized when emotion analysis is carried out on video. However, this solution has the following significant drawbacks: firstly, the linear alignment layer can only perform global linear transformation, when video data or audio data of video is interfered by physical noise, the noise in the video data or the audio data passes through an upstream nonlinear feature extraction network, and the noise can cause complex nonlinear distribution deviation of a feature space, so that noise features are directly transmitted to the downstream. Secondly, a fixed weight is used for all inputs, and when the quality of a certain mode signal is extremely poor, the high noise characteristic still interferes with the overall reasoning. Furthermore, conventional techniques typically trade-off in units of whole modalities, resulting in loss of part of the valid semantic information. Disclosure of Invention The embodiment of the application aims to provide a multi-mode emotion analysis method and a multi-mode emotion analysis system based on dynamic noise estimation and KAN, which can improve the robustness of a machine learning model in emotion analysis of videos in a complex environment. In order to solve the technical problems, the application is realized as follows: in a first aspect, an embodiment of the present application provides a method for multi-modal emotion analysis based on dynamic noise estimation and KAN, where the method includes: When forward propagation of a training stage of a machine learning model is carried out, after feature extraction is carried out on training samples by a feature encoder based on the machine learning model, the extracted multi-modal feature vectors are respectively input into dynamic noise estimators DNE of corresponding modal features of the machine learning model, and noise confidence scores of the corresponding modal features are generated; respectively calculating regularization coefficients corresponding to the noise confidence scores corresponding to each feature dimension; Calculating a loss value of a loss function of the machine learning model based on the regularization coefficient; Updating the weight and spline control coefficient of noise perception information bottleneck-Kerr Mo Ge Roche-Arnold network NIB-KAN of the corresponding modal characteristics of the machine learning model based on the loss value during the counter-propagation of the training stage, wherein the NIB-KAN is used for carrying out modal alignment on the multi-modal characteristics, and the updated weight and the updated spline control coefficient are used for the next round of training of the machine learning model; After training of the machine learning model is completed, responding to receiving a video to be analyzed, and generating a first emotion analysis result of the video to be analyzed based on a feature encoder and NIB-KAN of the machine learning model after training is completed and an emotion analyzer. In a second aspect, an embodiment of the present application provides a multi-modal emotion analysis system based on dynamic noise estimation and KAN, where the multi-modal emotion analysis system based on dynamic noise estimation and KAN includes: The first generation module is used for respectively inputting the extracted multi-modal feature vectors into dynamic noise estimators DNE of the corresponding modal features of the machine learning model after feature extraction is carried out on the training samples by a feature encoder based on the machine learning model during forward propagation of a training stage of the machine learning model, and generating noise confidence scores of the corresponding modal features; The first calculation module is used for calculating regularization coefficients corresponding to the noise confidence scores corresponding to each feature dimension respectively; a second calculation module for calculating a loss value of a loss function of the machine learning model based on the regularization coefficient; The updating module is used for updating the weight and the spline control coefficient of the noise perception information bottleneck-Kerr Mo Ge Roche-Arnold network NIB-KAN of the corresponding modal characteristics of the machine learning model based on the loss value when the training phase is in back propagation, wherein the NIB-KAN is used for carrying out modal alignment on the multi-modal characteristics, and the updated weight and the updated spline