CN-122024333-A - Gait recognition method under fusion of weight distribution mechanism and layered space-time

CN122024333ACN 122024333 ACN122024333 ACN 122024333ACN-122024333-A

Abstract

The invention relates to the technical field of gait recognition, and discloses a gait recognition method under the integration of a weight distribution mechanism and layering space time, wherein the method comprises the steps of firstly obtaining a gait sequence to be recognized, then inputting the gait sequence to be recognized into a target gait recognition model to obtain a gait feature vector, and thus obtaining a gait recognition result; the target gait recognition model specifically comprises the steps of extracting initial features of a gait sequence to be recognized through a double three-dimensional convolution module to obtain initial space-time features, processing the initial space-time features through a layered space-time fusion module to obtain fusion space-time features, carrying out time dimension pooling on the fusion space-time features through a time pooling layer to obtain time aggregation features, and carrying out space dimension weighted mapping and aggregation on the time aggregation features through a weighted generalized mean pooling layer to obtain gait feature vectors.

Inventors

FU LIMEI
CHEN JUNHE
WANG AOQIAN
LUO JUNYOU
ZHANG JING

Assignees

西安科技大学

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (8)

1. A gait recognition method under a weight distribution mechanism and layered space-time fusion, the method comprising: acquiring a gait sequence to be identified; inputting the gait sequence to be identified into a target gait identification model to obtain a gait feature vector, thereby obtaining a gait identification result; The target gait recognition model comprises a double three-dimensional convolution module, a layered space-time fusion module, a time pooling layer and a weighted generalized mean pooling layer, and specifically comprises the following steps: Extracting initial features of the gait sequence to be identified through a double three-dimensional convolution module to obtain initial space-time features; the method comprises the steps of processing initial space-time characteristics through a layered space-time fusion module to obtain fusion space-time characteristics, wherein the layered space-time fusion module comprises a global convolution branch and a local convolution branch which are parallel to each other, the global convolution branch is used for carrying out global characteristic learning on the initial space-time characteristics, and the local convolution branch is used for carrying out local characteristic learning on the initial space-time characteristics; carrying out time dimension pooling on the fusion space-time characteristics through a time pooling layer to obtain time aggregation characteristics; And carrying out weighted mapping and aggregation on the time aggregation characteristics in a space dimension through a weighted generalized mean value pooling layer to obtain gait characteristic vectors, thereby obtaining gait recognition results.
2. The method according to claim 1, wherein the hierarchical spatiotemporal fusion module is specifically configured to perform the following operations: receiving initial space-time characteristics output by a double three-dimensional convolution module; performing convolution operation and space-time pooling operation on the initial space-time characteristics through global convolution branches to obtain target global characteristics; Carrying out convolution operation and space-time pooling operation on the initial space-time characteristics through local convolution branches to obtain target local characteristics; and fusing the target global features and the target local features to obtain fused space-time features.
3. The method according to claim 1 or 2, wherein the global convolution branch comprises a first global convolution block, a first time-space pooling module, a second global convolution block and a third global convolution block connected in sequence; the global convolution branch is used to perform the following operations: performing a first layer convolution operation on the initial space-time feature through a first global convolution block to obtain a first global feature; performing space-time pooling operation on the first global features through a first space-time pooling module to obtain second global features; Performing a second-layer convolution operation on the initial space-time features through a second global convolution block to obtain third global features; and carrying out third-layer convolution operation on the initial space-time features through a third global convolution block to obtain target global features.
4. The method according to claim 1 or 2, wherein the partial convolution branches comprise a first partial convolution block, a second spatiotemporal pooling module, a second partial convolution block and a third partial convolution block connected in sequence; The local convolution branches are used to perform the following operations: performing a first layer convolution operation on the initial space-time feature through a first local convolution block to obtain a first local feature; performing time-space pooling operation on the first local features through a second time-space pooling module to obtain second local features; performing a second layer convolution operation on the initial space-time feature through a second local convolution block to obtain a third local feature; and carrying out third-layer convolution operation on the initial space-time characteristics through a third local convolution block to obtain target local characteristics.
5. The method of claim 1, wherein the time pooling layer is represented by the formula: ; Wherein, the A time aggregate characteristic is represented and is used to determine, Representing the characteristics of the fused time-space, Representing window size along time dimension for fused spatiotemporal features Is used for the maximum pooling operation of (a), The dimensions of the time aggregate feature are represented, Representing the number of samples to be taken, The dimensions of the channel are represented and, Represents the time dimension, d=1, The height of the feature is indicated and, Representing the feature width.
6. The method of claim 1, wherein the weighted generalized mean pooling layer is expressed by the formula: ; Wherein, the A gait feature vector is represented and is displayed, A time aggregate characteristic is represented and is used to determine, The height of the feature is indicated and, The width of the feature is indicated and, Representing the weight corresponding to the i-th element, Representing the power parameters.
7. The method according to claim 1, wherein the method further comprises: Acquiring a first data set and a second data set, wherein the data quality of the first data set is different from that of the second data set; determining a first weight and a second weight corresponding to the first data set and the second data set respectively based on a preset weight distribution strategy; Based on the first data set and the second data set, training the initial gait recognition model through a combined loss function, a first weight and a second weight until the initial gait recognition model converges to obtain a target gait recognition model, wherein the combined loss function comprises a triplet loss function and a cross entropy loss function.
8. The method of claim 7, wherein training the initial gait recognition model by combining the loss function, the first weight and the second weight based on the first data set and the second data set until the initial gait recognition model converges to obtain the target gait recognition model comprises: Obtaining a first gait feature vector through an initial gait recognition model based on a first data set, and determining a first loss value corresponding to the first gait feature vector based on a combined loss function and a first weight; Obtaining a second gait feature vector through the initial gait recognition model based on the second data set, and determining a second loss value corresponding to the second gait feature vector based on the combined loss function and the second weight; And updating parameters of the initial gait recognition model based on the first loss value and the second loss value until the initial gait recognition model converges to obtain a target gait recognition model.

Description

Gait recognition method under fusion of weight distribution mechanism and layered space-time Technical Field The invention relates to the technical field of gait recognition, in particular to a gait recognition method under the integration of a weight distribution mechanism and layering space-time. Background The gait recognition aims at carrying out identity recognition through the walking gesture of people, has the characteristics of non-contact distance and difficult disguise, and can realize noninductive recognition in the walking process of people. However, in a scene with complex and changeable reality, factors such as carrying, wearing a jacket, and angle difference of a camera can cause great change of gait appearance of pedestrians. The conventional gait recognition method mainly takes the whole human body as a unit to extract gait characteristics, but different parts of the human body can show different states in the walking process of the human body, and the same part can show different states at different moments. At present, gait features of different parts can be extracted through a spatial feature aggregation method, but the method is difficult to accurately capture a motion mode, so that the recognition accuracy is low. Disclosure of Invention In view of this, the embodiment of the invention provides a gait recognition method under the integration of a weight distribution mechanism and layered space-time, which can improve the recognition accuracy. The technical scheme of the embodiment of the invention is as follows: The application provides a gait recognition method under the fusion of a weight distribution mechanism and layered space-time, which comprises the following steps: acquiring a gait sequence to be identified; inputting the gait sequence to be identified into a target gait identification model to obtain a gait feature vector, thereby obtaining a gait identification result; The target gait recognition model comprises a double three-dimensional convolution module, a layered space-time fusion module, a time pooling layer and a weighted generalized mean pooling layer, and specifically comprises the following steps: Extracting initial features of the gait sequence to be identified through a double three-dimensional convolution module to obtain initial space-time features; the method comprises the steps of processing initial space-time characteristics through a layered space-time fusion module to obtain fusion space-time characteristics, wherein the layered space-time fusion module comprises a global convolution branch and a local convolution branch which are parallel to each other, the global convolution branch is used for carrying out global characteristic learning on the initial space-time characteristics, and the local convolution branch is used for carrying out local characteristic learning on the initial space-time characteristics; carrying out time dimension pooling on the fusion space-time characteristics through a time pooling layer to obtain time aggregation characteristics; And carrying out weighted mapping and aggregation on the time aggregation characteristics in a space dimension through a weighted generalized mean value pooling layer to obtain gait characteristic vectors, thereby obtaining gait recognition results. In some embodiments, the hierarchical spatiotemporal fusion module is specifically configured to perform the following operations: receiving initial space-time characteristics output by a double three-dimensional convolution module; performing convolution operation and space-time pooling operation on the initial space-time characteristics through global convolution branches to obtain target global characteristics; Carrying out convolution operation and space-time pooling operation on the initial space-time characteristics through local convolution branches to obtain target local characteristics; and fusing the target global features and the target local features to obtain fused space-time features. In some embodiments, the global convolution branch includes a first global convolution block, a first time-space pooling module, a second global convolution block, and a third global convolution block connected in sequence; the global convolution branch is used to perform the following operations: performing a first layer convolution operation on the initial space-time feature through a first global convolution block to obtain a first global feature; performing space-time pooling operation on the first global features through a first space-time pooling module to obtain second global features; Performing a second-layer convolution operation on the initial space-time features through a second global convolution block to obtain third global features; and carrying out third-layer convolution operation on the initial space-time features through a third global convolution block to obtain target global features. In some embodiments, the local convolution branches include a