CN-121982773-A - Human body posture recognition method based on multi-head diagram attention network

CN121982773ACN 121982773 ACN121982773 ACN 121982773ACN-121982773-A

Abstract

The invention discloses a human body posture identification method based on a multi-head diagram attention network, which comprises the following steps of obtaining a two-dimensional image, extracting two-dimensional human body posture key point data, representing the two-dimensional human body posture key point data as a diagram structure, taking body joints as nodes of the diagram, taking connection among the joints as sides of the diagram, constructing a multi-head diagram attention network model, inputting the two-dimensional human body posture key point data into the multi-head diagram attention network model, enabling the diagram attention convolution layer to adopt a multi-head attention mechanism, dynamically distributing importance weights among the joints, fusing the output of a plurality of attention heads, outputting node-level features, carrying out global average pooling operation on the node-level features by the global pooling layer, aggregating the node-level features into single diagram-level features, carrying out dimension reduction and feature refining on the diagram-level features by the full-connection layer, outputting human body action classification results, and finally realizing human body action posture identification.

Inventors

Fan Yingyin
CHEN SIYI
YE JUNFENG
LI WANYI

Assignees

广东第二师范学院

Dates

Publication Date: 20260505
Application Date: 20260128

Claims (10)

1. The human body posture recognition method based on the multi-head diagram attention network is characterized by comprising the following steps of: acquiring a two-dimensional image, extracting two-dimensional human body posture key point data, wherein the two-dimensional human body posture key point data are expressed as a graph structure, body joints are used as nodes of the graph, and the connection between the joints is used as edges of the graph; Constructing a multi-head diagram attention network model, wherein the multi-head diagram attention network model comprises a diagram attention convolution layer, a global pooling layer and a full connection layer which are connected in sequence; inputting two-dimensional human body posture key point data into a multi-head diagram attention network model, dynamically distributing importance weights among joints by a diagram attention convolution layer through a multi-head attention mechanism, fusing the output of a plurality of attention heads, and outputting node level characteristics; the global pooling layer performs global average pooling operation on the node level features and aggregates the node level features into a single graph level feature; The full-connection layer performs dimension reduction and feature refining on the image-level features and outputs human body posture classification results.
2. The human body posture recognition method based on the multi-head diagram attention network according to claim 1, wherein the diagram attention convolution layer adopts a multi-head attention mechanism to dynamically allocate the importance weights among joints, specifically: For nodes i and j, the attention coefficient is calculated as follows: ; In the formula, Representing the attention coefficient between node i and node j; leakyReLU denotes an activation function; Is a weight matrix; Is an attention weight vector; And Characteristic representations of node i and node j, respectively; Representing a vector concatenation operation; After deriving the attention coefficients, they were normalized using the Softmax function: ; In the formula, Representing the normalized attention weight of node i to node j; Representing the attention coefficient between node i and node k; representing the set of neighbor nodes for node i.
3. The human body posture recognition method based on the multi-head diagram attention network according to claim 2, wherein the output of a plurality of attention heads is fused, and node level characteristics are output, specifically: updating the feature representation of node i by weighting the features of the aggregated neighbor nodes: ; In the formula, Representing the characteristics of the updated node i; is an activation function; For a multi-head attention mechanism, each attention head The outputs of (2) are: ; In the formula, Representing the characteristics of node i on the h attention head; Finally, the representation of node i is obtained by concatenating the outputs of all the attention heads: ; In the formula, And (5) representing the characteristics of the node i after multi-head attention fusion.
4. The human body posture recognition method based on the multi-head diagram attention network of claim 1, wherein the diagram attention convolution layer comprises a first diagram attention convolution layer, a second diagram attention convolution layer, a third diagram attention convolution layer and a fourth diagram attention convolution layer which are connected in sequence; The first graph attention convolution layer receives two-dimensional human body gesture key point data, the data input feature dimension is 2, each attention head converts 2-dimensional features into 64-dimensional features, and 256-dimensional features are output after the 4 attention heads are spliced; each attention head of the second graph attention convolution layer converts 256-dimensional features into 128-dimensional features, and after 4 attention heads are spliced, 512-dimensional features are output; Each attention head of the third graph attention convolution layer converts 512-dimensional characteristics into 256-dimensional characteristics, and after 4 attention heads are spliced, 1024-dimensional characteristics are output; the fourth figure attention convolution layer converts 1024-dimensional features into 512-dimensional features by each attention head, averages the calculated values after 4 attention heads and outputs 512-dimensional features.
5. The human body gesture recognition method based on the multi-head graph attention network of claim 4, wherein the full-connection layer comprises a first full-connection layer, a second full-connection layer and a third full-connection layer which are sequentially connected, the first full-connection layer compresses 512-dimensional features into 256-dimensional features, the second full-connection layer compresses the 256-dimensional features into 128-dimensional features, the third full-connection layer maps the 128-dimensional features into gesture action category numbers, and a classification prediction result is output.
6. The multi-headed graph attention network based human body posture recognition method of claim 5, wherein the ELU activation function is applied after each graph annotates the force convolution layer and the ReLU activation function is applied after the first fully-connected layer and the second fully-connected layer, respectively.
7. The method for recognizing human body posture based on the multi-head drawing attention network according to claim 6, wherein a global averaging pooling operation is performed after the fourth drawing attention convolution layer, and node-level features are aggregated into a single drawing-level feature by calculating the average value of all node features.
8. A multi-head graph attention network-based human body posture recognition system, applying the multi-head graph attention network-based human body posture recognition method of any one of claims 1 to 7, characterized by comprising: The data processing module is used for acquiring a two-dimensional image, extracting two-dimensional human body posture key point data, wherein the two-dimensional human body posture key point data is represented as a graph structure, body joints are used as nodes of the graph, and the connection between the joints is used as edges of the graph; The figure attention convolution module is used for dynamically distributing importance weights among joints by adopting a multi-head attention mechanism, fusing the outputs of a plurality of attention heads and outputting node level characteristics; the global pooling module is used for carrying out global average pooling operation on the node level features and aggregating the node level features into a single image level feature; And the full-connection layer module is used for carrying out dimension reduction and feature refining on the image level features and outputting human body posture classification results.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the multi-head attention network based human gesture recognition method of any one of claims 1 to 7 when executing the computer program.
10. A storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the multi-head graph attention network based human body posture recognition method of any one of claims 1 to 7.

Description

Human body posture recognition method based on multi-head diagram attention network Technical Field The invention relates to the technical field of gesture recognition, in particular to a human body gesture recognition method based on a multi-head diagram attention network. Background Human body gesture recognition is a key task in the field of computer vision and is widely applied to the fields of behavior analysis, action classification, intelligent monitoring, virtual reality, medical care and the like. Unlike pose estimation, pose recognition focuses on classifying human poses or actions according to spatial and temporal relationships between human joints. Gesture recognition plays a vital role in understanding human motion and behavior, especially in real world scenarios, data is often represented as a structured skeletal graph. The graphic neural network (Graph Neural Network, GNN) has received much attention in recent years as an effective method of processing structured graphic data, such networks being capable of directly modeling graphically represented data. In the human body posture recognition task, a human body skeleton structure is generally represented as a graph structure with joints as nodes and skeletal connections as edges. The conventional graph neural network generally adopts a graph convolutional network (Graph Convolutional Network, GCN) to aggregate the characteristics of the neighborhood nodes, and adjacent nodes of the graph neural network generally have the same or fixed weight distribution mode in the characteristic updating process, so that the importance difference of different joints in the gesture recognition process is difficult to distinguish. To overcome the above-mentioned deficiencies, a graph attention convolution (Graph Attention Convolution, GATconv) unit with an attention mechanism is introduced in the graph neural network framework. The unit carries out weighted modeling on the adjacent nodes through an attention mechanism, so that the model can adaptively adjust the weight distribution of each joint point in feature aggregation according to the gesture change, thereby more accurately representing the complex spatial association relation between human joints. Compared with a graph neural network based on a traditional graph convolution unit, the graph attention convolution unit which introduces an attention mechanism can remarkably improve the modeling capability of a model on a non-uniform spatial structure relationship, so that the graph attention convolution unit has more obvious advantages in tasks requiring fine structure analysis such as human body gesture recognition. Although attention mechanisms perform well in gesture recognition, challenges remain in (1) how to fully utilize multi-head attention mechanisms to effectively capture local dependencies between joints and global pose information, and (2) how to enhance the stability of model recognition when dealing with noise, motion self-key point occlusion or perspective transformation. Traditional gesture recognition methods have proven to perform poorly in capturing complex spatial and temporal relationships in human motion. Although the above advances have improved gesture recognition performance to some extent, existing methods have failed to adequately capture global relationships between joints and the time dynamics of motion sequences. Disclosure of Invention The invention aims to overcome the defects and shortcomings of the prior art, and provides a human body gesture recognition method based on a multi-head diagram attention network. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: A human body posture recognition method based on a multi-head diagram attention network comprises the following steps: acquiring a two-dimensional image, extracting two-dimensional human body posture key point data, wherein the two-dimensional human body posture key point data are expressed as a graph structure, body joints are used as nodes of the graph, and the connection between the joints is used as edges of the graph; Constructing a multi-head diagram attention network model, wherein the multi-head diagram attention network model comprises a diagram attention convolution layer, a global pooling layer and a full connection layer which are connected in sequence; inputting two-dimensional human body posture key point data into a multi-head diagram attention network model, dynamically distributing importance weights among joints by a diagram attention convolution layer through a multi-head attention mechanism, fusing the output of a plurality of attention heads, and outputting node level characteristics; the global pooling layer performs global average pooling operation on the node level features and aggregates the node level features into a single graph level feature; The full-connection layer performs dimension reduction and feature refining on the image-level feature