CN-122024327-A - Gesture recognition method, system, equipment and storage medium

CN122024327ACN 122024327 ACN122024327 ACN 122024327ACN-122024327-A

Abstract

The invention provides a gesture recognition method, a gesture recognition system, gesture recognition equipment and a gesture recognition storage medium, which belong to the technical field of deep learning, and comprise a gesture recognition model, wherein the gesture recognition model comprises a feature extraction module, a parallel attention module, a multi-scale feature aggregation module and a detection head which are sequentially connected; and acquiring a gesture image to be recognized, inputting the gesture image to be recognized into a trained gesture recognition model, and recognizing and positioning gesture targets with different scales. In complex scenes such as coexistence of multiple gestures, accuracy and robustness of gesture recognition and capability of distinguishing fine actions are remarkably improved. The method solves the problems of feature robustness and insufficient discriminant ability caused by complex background, illumination change and similarity among classes.

Inventors

DING JIANGTAO
HU HUAKUI
LIU CHUANYANG
XU HUAJIE

Assignees

池州学院

Dates

Publication Date: 20260512
Application Date: 20260302

Claims (9)

1. A gesture recognition method, comprising the steps of: acquiring a gesture image to be recognized; inputting the image to be recognized into a trained gesture recognition model to recognize and position gesture targets with different scales, wherein the gesture recognition model comprises a feature extraction module, a parallel attention module, a multi-scale feature aggregation module and a detection head which are connected in sequence; The method comprises the steps of processing an image to be identified by utilizing a feature extraction module, extracting features step by step to obtain a plurality of feature images with different scales, respectively inputting the feature images with different scales into a parallel attention module to carry out feature enhancement, inputting the enhanced multi-scale features into a multi-scale feature aggregation module to carry out channel splicing and self-adaptive weight learning on the features with different levels to obtain multi-scale fusion features, and inputting the multi-scale features into a detection head to obtain gesture targets and positions with different scales.
2. The gesture recognition method according to claim 1, wherein the feature extraction module specifically uses ResNet as a backbone feature extraction network, processes an input image, extracts features step by using a convolution layer, and generates a plurality of feature maps with different scales.
3. The gesture recognition method according to claim 1, wherein the parallel attention module comprises a spatial attention sub-module and a channel attention sub-module, the spatial attention sub-module generates a spatial attention weight map by calculating a similarity matrix between any two positions of the feature map, the channel attention sub-module generates a channel attention weight vector by calculating correlations among channels, the spatial attention and the channel attention enhancement feature are fused, and residual inversion MLP is connected to obtain the enhanced multi-scale feature.
4. The gesture recognition method according to claim 1, wherein the multi-scale feature aggregation module performs channel stitching and adaptive weight learning on features of different levels by sequentially adopting global average pooling GAP, convolution layer and activation function processing.
5. The gesture recognition method according to claim 1, further comprising introducing WIoU v a dynamic bounding box loss function, in particular: ; Wherein, the In order to dynamically adjust the factor(s), For the loss of the Wise-IoU, In order to be able to take the focus parameter as such, Is the cross-over ratio.
6. The gesture recognition method according to claim 1, wherein an original gesture image is obtained based on a gesture dataset HaGRID, the original gesture image is downsampled, category labels and bounding box information of each gesture instance are marked by LabelImg, a sequence dataset is obtained, a gesture recognition model is trained by data in the training dataset, and a trained gesture recognition model is obtained.
7. A gesture recognition system, comprising: The data module is used for acquiring a gesture image to be recognized; the gesture recognition module is used for inputting the images to be recognized into a trained gesture recognition model to recognize and position gesture targets with different scales, and comprises a feature extraction module, a parallel attention module, a multi-scale feature aggregation module and a detection head which are connected in sequence; The method comprises the steps of processing an image to be identified by utilizing a feature extraction module, extracting features step by step to obtain a plurality of feature images with different scales, respectively inputting the feature images with different scales into a parallel attention module to carry out feature enhancement, inputting the enhanced multi-scale features into a multi-scale feature aggregation module to carry out channel splicing and self-adaptive weight learning on the features with different levels to obtain multi-scale fusion features, and inputting the multi-scale features into a detection head to obtain gesture targets and positions with different scales.
8. A computer device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the method according to any one of claims 1 to 6.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when loaded by a processor, is able to carry out the steps of the method according to any one of claims 1 to 6.

Description

Gesture recognition method, system, equipment and storage medium Technical Field The invention belongs to the technical field of deep learning, and particularly relates to a gesture recognition method, a gesture recognition system, gesture recognition equipment and a storage medium. Background In the basic era of human-computer interaction innovation, gesture recognition is taken as an intuitive and efficient human-computer interaction mode, and the development of the gesture recognition is derived from the requirement that the human-computer interaction paradigm evolves from the traditional command line and graphic interface to more natural somatosensory interaction. The method is an important gold test stone for promoting the artificial intelligence perception ability to advance horizontally to human beings, and is a powerful means for realizing technical benefits and promoting social inclusion, so that gesture recognition research has extremely important strategic value and practical significance. However, based on expensive peripheral devices such as data gloves, the angle and position information of the hand joints are directly acquired by built-in sensors, which is high in accuracy but high in cost and restricts users, and thus the popularization of the hand joint is difficult. In the field of gesture recognition research, the existing multi-view skeleton feature fusion method improves the discrimination capability of a model by integrating hand information of different sources or view angles, or respectively extracts the spatial features of a hand skeleton under different view angles by using a convolutional neural network. The method mainly focuses on the local motion and space relation of the hand joints, and lacks the integration processing of semantic information of gesture actions, so that when the model performs gesture recognition under complex scenes such as complex background, illumination change and the like, the detection precision of targets with different scales is difficult to consider, and the recognition capability is insufficient. Disclosure of Invention In order to solve the defects existing in the gesture recognition of the existing deep learning method, the invention provides a gesture recognition method, a gesture recognition system, gesture recognition equipment and a storage medium. In order to achieve the above object, the present invention provides the following technical solutions: A gesture recognition method, comprising the steps of: acquiring a gesture image to be recognized; inputting the image to be recognized into a trained gesture recognition model to recognize and position gesture targets with different scales, wherein the gesture recognition model comprises a feature extraction module, a parallel attention module, a multi-scale feature aggregation module and a detection head which are connected in sequence; The method comprises the steps of processing an image to be identified by utilizing a feature extraction module, extracting features step by step to obtain a plurality of feature images with different scales, respectively inputting the feature images with different scales into a parallel attention module to carry out feature enhancement, inputting the enhanced multi-scale features into a multi-scale feature aggregation module to carry out channel splicing and self-adaptive weight learning on the features with different levels to obtain multi-scale fusion features, and inputting the multi-scale features into a detection head to obtain gesture targets and positions with different scales. Preferably, the feature extraction module specifically uses ResNet as a main feature extraction network, processes an input image, extracts features step by using a convolution layer, and generates a plurality of feature maps with different scales. The parallel attention module comprises a space attention sub-module and a channel attention sub-module, wherein the space attention sub-module generates a space attention weight graph by calculating a similarity matrix between any two positions of the feature graph, the channel attention sub-module generates channel attention weight vectors by calculating correlations among channels, enhances different scale features, fuses the space attention and the enhancement features of the channel attention, connects residual inversion MLP, and obtains the enhanced multi-scale features. Preferably, the multi-scale feature aggregation module sequentially adopts global average pooling GAP, convolution layer and activation function processing to carry out channel splicing and self-adaptive weight learning on features of different levels. Preferably, the method further comprises introducing WIoU v dynamic bounding box loss functions during training of the gesture recognition model, specifically: ; Wherein, the In order to dynamically adjust the factor(s),For the loss of the Wise-IoU,In order to be able to take the focus parameter as such,Is the cross