CN-115797975-B - Gait recognition method

CN115797975BCN 115797975 BCN115797975 BCN 115797975BCN-115797975-B

Abstract

The application is applicable to the technical field of identity verification and identification, and provides a gait identification method, which comprises the steps of acquiring a gait image sequence to be identified of a target object, wherein the gait image sequence to be identified comprises a plurality of gait images to be identified; the gait recognition method comprises the steps of inputting a gait image sequence to be recognized into an improved GaitPart model for gait recognition to obtain a gait recognition result of a target object, wherein the improved GaitPart model comprises a comprehensive feature extraction module for extracting features of the gait image sequence to be recognized, a global time feature aggregation module for aggregating global time features, a horizontal pooling layer, a local time feature aggregation module for aggregating local time features and a superposition module for outputting the gait recognition result of the target object. The application can improve the accuracy of gait recognition.

Inventors

YU SHAOQIAN
SONG SHUYUE
LI QIANG
CHEN XINYU

Assignees

湖南工商大学

Dates

Publication Date: 20260512
Application Date: 20221213

Claims (6)

1. A gait recognition method, comprising: acquiring a gait image sequence to be identified of a target object, wherein the gait image sequence to be identified comprises a plurality of gait images to be identified; Inputting the gait image sequence to be identified into an improved GaitPart model for gait identification to obtain a gait identification result of the target object; The improved GaitPart model comprises a comprehensive feature extraction module for extracting local features and global features of a gait image sequence to be identified, a global time feature aggregation module for processing the local features and the global features output by the comprehensive feature extraction module, a horizontal pooling layer, a local time feature aggregation module for aggregating local time features and a superposition module for outputting gait identification results of the target object, wherein the output end of the comprehensive feature extraction module is respectively connected with the input end of the global time feature aggregation module and the input end of the horizontal pooling layer, the output end of the horizontal pooling layer is connected with the input end of the local time feature aggregation module, and the output end of the global time feature aggregation module and the output end of the local time feature aggregation module are both connected with the input end of the superposition module; the comprehensive feature extraction module comprises a first convolution module, a first maximum pooling layer, a second convolution module, a second maximum pooling layer and a third convolution module which are sequentially connected, wherein the input end of the first convolution module is the input end of the comprehensive feature extraction module, and the output end of the third convolution module is the output end of the comprehensive feature extraction module; the first convolution module, the second convolution module and the third convolution module comprise a first comprehensive convolution sub-module and a second comprehensive convolution sub-module which are sequentially connected; The first comprehensive convolution sub-module and the second comprehensive convolution sub-module comprise a vertical block convolution unit, a horizontal block convolution unit, a common convolution unit and a first superposition unit, wherein the output end of the vertical block convolution unit, the output end of the horizontal block convolution unit and the output end of the common convolution unit are connected with the input end of the first superposition unit; The input ends of the vertical block convolution unit, the horizontal block convolution unit and the common convolution unit of the first comprehensive convolution sub-module in the first convolution module are the input ends of the first convolution module, the output end of the first superposition unit of the second comprehensive convolution sub-module in the first convolution module is the output end of the first convolution module, the input ends of the vertical block convolution unit, the horizontal block convolution unit and the common convolution unit of the first comprehensive convolution sub-module in the second convolution module are the input ends of the second convolution module, the output end of the first superposition unit of the second comprehensive convolution sub-module in the second convolution module is the output end of the second convolution module, the input ends of the vertical block convolution unit, the horizontal block convolution unit and the common convolution unit of the first comprehensive convolution sub-module in the third convolution module are the input ends of the third convolution module, and the output end of the first superposition unit of the second comprehensive convolution sub-module in the third convolution module is the output end of the third convolution module.
2. The method of claim 1, wherein the vertical block convolution unit comprises a vertical cut function, a plurality of first convolution layers, and a first cascade layer, wherein an output end of each first convolution layer of the plurality of first convolution layers is connected to an input end of the first cascade layer, and an output end of the first cascade layer is connected to an input end of the first superposition unit; The vertical cutting function is used for vertically dividing the data input into the vertical block convolution unit into a plurality of pieces of sub-data, and inputting the plurality of pieces of sub-data into the plurality of first convolution layers in a one-to-one correspondence manner.
3. The method of claim 1, wherein the horizontal block convolution unit comprises a horizontal slicing function, a plurality of second convolution layers, and a second cascade layer, wherein an output end of each of the plurality of second convolution layers is connected to an input end of the second cascade layer, and an output end of the second cascade layer is connected to an input end of the first superposition unit; the horizontal cutting function is used for horizontally dividing the data input into the horizontal block convolution unit into a plurality of pieces of sub-data, and inputting the plurality of pieces of sub-data into the plurality of second convolution layers in a one-to-one correspondence manner.
4. The method of claim 1, wherein the normal convolution unit comprises a third convolution layer, an input of the third convolution layer is an input of the normal convolution unit, and an output of the third convolution layer is connected to an input of the first superposition unit.
5. The method of claim 1, wherein the global time feature aggregation module comprises a first aggregation unit, a second aggregation unit and a second superposition unit, wherein the input end of the first aggregation unit and the input end of the second aggregation unit are connected with the output end of the comprehensive feature extraction module, the output end of the first aggregation unit and the output end of the second aggregation unit are connected with the input end of the second superposition unit, and the output end of the second superposition unit is connected with the input end of the superposition module.
6. The method of claim 5, wherein the first aggregation unit and the second aggregation unit each comprise a first three-dimensional convolution layer, relu activation function, a second three-dimensional convolution layer, and a Sigmoid activation function connected in sequence; The input end of the first three-dimensional convolution layer, the input end of the three-dimensional maximum pooling layer and the input end of the three-dimensional average pooling layer are connected with the output end of the comprehensive feature extraction module, the output end of the three-dimensional maximum pooling layer and the output end of the three-dimensional average pooling layer are connected with the input end of the superimposed layer, and the output end of the superimposed layer and the output end of the Sigmoid activation function are connected with the input end of the product layer; And the output end of the product layer of the first aggregation unit and the output end of the product layer of the second aggregation unit are connected with the input end of the second superposition unit.

Description

Gait recognition method Technical Field The application belongs to the technical field of identity verification and identification, and particularly relates to a gait recognition method. Background Along with the development of society and the progress of scientific technology, various biological feature recognition technologies are also becoming mature, and are widely applied to the fields of smart cities, smart transportation, smart security and the like. Common biological feature recognition technologies based on faces, irises, fingerprints and the like are limited to a near-distance recognition process, a person to be tested needs to be actively matched, and the features are acquired by the person naturally and are easy to steal and forge. In contrast, gait recognition has significant advantages. The gait recognition aims at carrying out identity recognition through the walking gesture of people, has the characteristics of non-contact distance and difficult disguise, and can realize noninductive recognition in the walking process of people. However, in a scene with complex and changeable reality, for example, different factors such as carrying, wearing a jacket and angle of a camera can cause a great change of gait appearance of a pedestrian, which also brings challenges to gait recognition. The conventional gait recognition method mostly takes the whole human body as a unit to extract gait features, but different parts of the human body can show different states in the walking process of the human body, and the same part can also show different states at different moments, so that the gait recognition model GaitPart focuses on extracting the local features although the problem is noticed, ignores global features, and has low accuracy of gait recognition. Disclosure of Invention The embodiment of the application provides a gait recognition method, which can solve the problem of low accuracy rate of gait recognition. The embodiment of the application provides a gait recognition method, which comprises the following steps: acquiring a gait image sequence to be identified of a target object, wherein the gait image sequence to be identified comprises a plurality of gait images to be identified; Inputting a gait image sequence to be identified into an improved GaitPart model for gait identification to obtain a gait identification result of a target object; The improved GaitPart model comprises a comprehensive feature extraction module for extracting features of a gait image sequence to be identified, a global time feature aggregation module for aggregating global time features, a horizontal pooling layer, a local time feature aggregation module for aggregating local time features and a superposition module for outputting gait identification results of a target object, wherein the output end of the comprehensive feature extraction module is respectively connected with the input end of the global time feature aggregation module and the input end of the horizontal pooling layer, the output end of the horizontal pooling layer is connected with the input end of the local time feature aggregation module, and the output end of the global time feature aggregation module and the output end of the local time feature aggregation module are both connected with the input end of the superposition module. Optionally, the comprehensive feature extraction module comprises a first convolution module, a first maximum pooling layer, a second convolution module, a second maximum pooling layer and a third convolution module which are sequentially connected, wherein the input end of the first convolution module is the input end of the comprehensive feature extraction module, and the output end of the third convolution module is the output end of the comprehensive feature extraction module; The first convolution module, the second convolution module and the third convolution module comprise a first comprehensive convolution sub-module and a second comprehensive convolution sub-module which are sequentially connected; The first comprehensive convolution sub-module and the second comprehensive convolution sub-module comprise a vertical block convolution unit, a horizontal block convolution unit, a common convolution unit and a first superposition unit, wherein the output end of the vertical block convolution unit, the output end of the horizontal block convolution unit and the output end of the common convolution unit are connected with the input end of the first superposition unit; The input ends of the vertical block convolution unit, the horizontal block convolution unit and the common convolution unit of the first comprehensive convolution sub-module in the first convolution module are the input ends of the first convolution module, the output end of the first superposition unit of the second comprehensive convolution sub-module in the first convolution module is the output end of the first convolution module, the input ends of t