CN-121982763-A - Micro-expression recognition method based on progressive domain migration and multi-granularity feature learning

CN121982763ACN 121982763 ACN121982763 ACN 121982763ACN-121982763-A

Abstract

The invention discloses a micro-expression recognition method based on progressive domain migration and multi-granularity feature learning, which comprises the steps of obtaining macro-expression and micro-expression video data sets containing category labels, extracting an optical flow diagram between a start frame and a vertex frame from videos of the obtained macro-expression and micro-expression video data sets, constructing a source domain, an intermediate domain and a target domain sample set by taking an optical flow diagram data domain of the macro-expression video as a source domain and an optical flow diagram data domain of the micro-expression video as a target domain and taking an optical flow diagram data domain generated based on a conditional denoising diffusion probability model as an intermediate domain, constructing a micro-expression recognition model which comprises a multi-granularity self-adaptive feature learning network, a classifier and a discriminator, training the model by adopting a progressive domain countermeasure training method, and carrying out micro-expression recognition by utilizing the trained model. The invention effectively solves the problem of rare samples and weak micro-expression characteristics, and effectively improves the micro-expression recognition performance.

Inventors

LU GUANMING
LU LE

Assignees

南京邮电大学

Dates

Publication Date: 20260505
Application Date: 20260209

Claims (10)

1. The micro-expression recognition method based on progressive domain migration and multi-granularity feature learning is characterized by comprising the following steps of: s1, acquiring a macro expression video data set and a micro expression video data set which contain category labels, and respectively extracting an optical flow diagram between a start frame and a peak frame from videos in the macro expression video data set and the micro expression video data set; S2, taking an optical flow chart data field of the macro expression video as a source field, taking an optical flow chart data field of the micro expression video as a target field, and taking an optical flow chart data field generated based on a conditional denoising diffusion probability model as an intermediate field; s3, constructing a micro-expression recognition model, wherein the micro-expression recognition model is constructed to have: The multi-granularity self-adaptive feature learning network is used for extracting and fusing multi-granularity features from the input optical flow diagram and outputting fused feature vectors; The classifier is used for classifying the fusion feature vectors and outputting micro expression class probabilities; the discriminator is used for judging the data domain of the optical flow diagram from which the fusion feature vector is extracted; s4, training the micro-expression recognition model by adopting a progressive domain countermeasure training method; s5, performing micro-expression recognition by using the trained micro-expression recognition model.
2. The micro-expression recognition method based on progressive domain migration and multi-granularity feature learning according to claim 1, wherein in step S1, an optical flow diagram between a start frame and a peak frame is extracted from a video, specifically: Performing face detection and 68 key point positioning of each frame of image by using Dlib tools, calculating a similar transformation matrix according to the key points, aligning a face region and cutting the face region into a standard size of 224×224 pixels; For each video sequence, determining a start frame and a peak frame using the provided peak frame labels or an automatic detection algorithm based on optical flow accumulation; calculating dense optical flow field from initial frame to peak frame by adopting TV-L1 optical flow algorithm to obtain horizontal displacement component Component of vertical displacement And calculate an optical flow magnitude map The horizontal displacement component Component of vertical displacement Optical flow amplitude map Three channels are stacked to form a three-channel optical flow map And finally, normalizing the pixel values of all the light flow diagrams to the [ -1,1] interval.
3. The method for identifying micro-expressions based on progressive domain migration and multi-granularity feature learning according to claim 1, wherein in step S2, the optical flow diagram data field of macro-expression video is used as a source field, the optical flow diagram data field of micro-expression video is used as a target field, the optical flow diagram data field generated based on a conditional denoising diffusion probability model is used as an intermediate field, a source field sample set, an intermediate field sample set and a target field sample set are respectively constructed, and the method specifically comprises the following steps: Defining an optical flow diagram between a start frame and a peak frame extracted from videos of a macro-expression video data set as a macro-expression optical flow diagram, and defining an optical flow diagram between the start frame and the peak frame extracted from videos of a micro-expression video data set as a micro-expression optical flow diagram, wherein an optical flow diagram data field of the macro-expression video is used as a source field, and a source field sample set is constructed by the macro-expression optical flow diagram Taking an optical flow chart data field of the micro-expression video as a target field, and constructing a target field sample set by using the micro-expression optical flow chart after motion enhancement ; Based on a conditional denoising diffusion probability model, carrying out model training by taking a micro-expression light flow graph in a target domain as conditional input; Synthesizing K middle domain sample sets by adjusting condition control parameters by using a trained conditional denoising diffusion probability model Each intermediate domain sample set forms progressive transition between a source domain and a target domain, and a corresponding optical flow diagram is extracted from each intermediate domain for domain self-adaptive learning; all sample sets of three fields And (5) re-ordering according to the motion intensity from large to small, and constructing an asymptotic domain sequence from macro expression to micro expression.
4. The method for identifying the micro-expression based on progressive domain migration and multi-granularity feature learning according to claim 3, wherein when the target domain sample set is constructed, the self-adaptive euler video amplification algorithm is adopted to carry out motion enhancement on the micro-expression optical flow diagram, and the target domain sample set is constructed by adopting the micro-expression optical flow diagram after the motion enhancement, specifically comprising the following steps: Firstly, calculating the statistical characteristics of the optical flow amplitude of a micro-expression optical flow graph, including calculating the mean value and the variance of the optical flow amplitude; Dynamically determining an amplification factor according to the statistical characteristics of the optical flow amplitude; the amplification factor is used for enhancing the optical flow amplitude of the micro-expression optical flow graph to enable the optical flow amplitude to reach the preset proportional range of the optical flow amplitude of the source domain macro-expression optical flow graph so as to improve the motion signal intensity of the micro-expression; And constructing a target domain sample set by using the enhanced micro-expression optical flow diagram.
5. The micro-expression recognition method based on progressive domain migration and multi-granularity feature learning as claimed in claim 3, wherein when the K middle domain sample sets are synthesized by adjusting the condition control parameters by using the trained conditional denoising diffusion probability model, the guiding scale parameters freely guided by the classifier are adjusted To control the movement intensity of the generated sample, and is higher Values tend to generate samples that move more vigorously, closer to the source domain, lower The values then generate weaker, more target domain-approaching samples, thereby constructing intermediate domain sequences of decreasing motion strength between the source and target domains.
6. The micro-expression recognition method based on progressive domain migration and multi-granularity feature learning of claim 3, wherein the multi-granularity self-adaptive feature learning network of the micro-expression recognition model in step S3 comprises a hybrid backbone feature extraction module, a spatial self-adaptive focusing module and a graph structure semantic fusion module; The hybrid backbone feature extraction module is used for extracting a multi-scale feature pyramid from an input optical flow diagram, and the module fuses a visual transducer block based on a window attention mechanism and a modernized CNN block, wherein the transducer block is responsible for capturing a global motion mode of a macro expression, and the CNN block is responsible for extracting local detail features of a micro expression; The space self-adaptive focusing module adopts deformable convolution to dynamically sample middle layer characteristics output by the mixed backbone, and self-adaptively focuses on key areas under different motion intensities through learning space offset; The graph structure semantic fusion module is used for modeling the cooperative relationship among facial action units, initializing node characteristics from space self-adaptive characteristics according to predefined nodes, performing information aggregation by utilizing a multi-head graph attention network, and finally fusing graph level representation and global context characteristics to obtain multi-granularity fusion characteristic vectors 。
7. The method for micro-expression recognition based on progressive domain migration and multi-granularity feature learning according to claim 6, wherein the classifier is composed of one or two fully connected layers and a Softmax output layer for combining feature vectors Mapping to probability distribution of each micro-expression class.
8. The method for microexpressive recognition based on progressive domain migration and multi-granularity feature learning of claim 7, wherein said discriminant is composed of K domain discriminants Each domain discriminator has the same structure and is a classifier, wherein the domain discriminators For determining fusion feature vectors Is from the previous domain Or the current domain 。
9. The method for microexpressive recognition based on progressive domain migration and multi-granularity feature learning according to claim 8, wherein the training of the microexpressive recognition model by the progressive domain countermeasure training method in step S4 comprises the following steps: construction of self-adaptive feature learning network containing multiple granularities, classifier and K domain discriminants Is a training frame of (a); using source domain sample sets Pre-training a multi-granularity self-adaptive feature learning network and a classifier; Progressive multi-stage countermeasure training, according to In the order of (1) to activate domain discriminators in stages Using a mid-domain sample set And Is a function of the data of (a), by minimizing classification losses Loss of contrast to domain Driving the multi-granularity self-adaptive feature learning network to smoothly migrate the feature distribution from the former domain to the latter domain, so that the multi-granularity feature network gradually and smoothly transits from coarse granularity feature extraction adapting to macro expression to fine granularity feature extraction adapting to micro expression; finally, using the target domain sample set And (3) fine tuning the multi-granularity self-adaptive feature learning network and the classifier to finish the training of the micro-expression recognition model.
10. The method for microexpressive recognition based on progressive domain migration and multi-granularity feature learning of claim 9, wherein each stage of domain fight loss occurs during progressive multi-stage fight training Weight coefficient of (2) Set as dynamic value and corresponding adjacent sample set And The difference measure of the inter-feature distribution is proportional.

Description

Micro-expression recognition method based on progressive domain migration and multi-granularity feature learning Technical Field The invention relates to the field of computer vision and artificial intelligence, in particular to a micro-expression recognition method based on progressive domain migration and multi-granularity feature learning. Background Micro-expressions are spontaneous expressions which are extremely short in duration, weak in intensity and generally limited to local areas of the face, and are key clues for revealing hidden true feelings. The micro expression recognition technology has important application value in the fields of national security, judicial interrogation, clinical diagnosis, human-computer interaction and the like. The evolution path of the micro-expression recognition technology is mainly divided into three stages, each of which has significant limitations. The early method based on the manual design features is that the manual feature method constructs feature descriptors through expert knowledge, and has low requirement on training data amount. The manual characteristics of the model are limited in discrimination on subtle muscle movement changes, are extremely sensitive to interference factors such as illumination, head gestures and individual differences, and are difficult to apply to complex real scenes, and the model generalization capability is seriously insufficient. And extracting expression features from the data by utilizing the strong hierarchical feature learning capability of a Convolutional Neural Network (CNN) based on the end-to-end method of deep learning. In order to alleviate the problem of micro-expression data scarcity, researchers generally adopt a migration learning strategy, namely, firstly pre-training on a macro-expression data set, and then fine-tuning a model to adapt to micro-expression tasks. However, macro-expressions and micro-expressions have substantial field differences on the basis of muscle movement amplitude, duration, spatial extent and physiology. This difference results in a significant shift in the data distribution of the source domain (macro expression) from the target domain (micro expression). Simple fine tuning often enables the model to adapt to shallow texture features, but cannot align deep semantic feature distribution, so that a 'negative migration' phenomenon is caused, namely pre-training knowledge damages the performance of the model on a target domain. This is the current leading edge of research, aiming at reducing the distribution difference between the source domain and the target domain by domain adaptation technology. Representative methods such as Domain Antagonism Neural Networks (DANN) force feature extractors to learn domain invariant features by introducing a domain arbiter and antagonizing training with the feature extractor. Some recent patents have further introduced a fixed zone based attention mechanism to statically weight preset eye, nose, mouth regions on this basis. However, the method is fixed and prior to the division of the regions, and cannot adapt to dynamic important region changes caused by occlusion or posture changes. The prior art solution, analyzed, still suffers from the following disadvantages: The limitation of the data enhancement method is that the existing method mainly adopts the traditional image-level data enhancement technology, including geometric transformation, color transformation or motion enhancement technology based on Euler video amplification. The methods are used for transforming at the pixel space or simple motion level, and micro expression samples conforming to the physiological law of facial muscle motion are difficult to generate at the semantic level. The existing data enhancement technology does not fully consider the combination rule of action units, the diversity and the physiological rationality of the generated samples are limited, and the problem of scarcity of micro expression samples is not effectively solved. The suitability of the feature extraction network is insufficient, and the convolutional neural network designed for the general image recognition task, such as ResNet, VGG and the like, is mostly adopted as a feature extraction backbone in the existing research. The general networks are not optimally designed for the characteristics of specific weak motions, localized distribution, space-time coupling and the like of the microexpressions. In the feature fusion stage, the existing method mostly adopts global average pooling or static weighting strategy based on fixed areas, lacks an explicit modeling mechanism for dynamic cooperative relation among a plurality of Action Units (AUs) of the face, and limits the recognition capability of the model on complex mixed micro-expressions. Stability problem of domain adaptive training existing domain countermeasure training methods typically employ a single step domain alignment strategy in an attempt to di