CN-115761879-B - Gait recognition method, gait recognition device, model training method, terminal and storage medium

CN115761879BCN 115761879 BCN115761879 BCN 115761879BCN-115761879-B

Abstract

The application provides a gait recognition method and a model training method, a device, a terminal and a storage medium thereof, wherein the gait recognition model training method comprises the steps of obtaining a training video stream, wherein the training video stream has a labeling category of a target object; the method comprises the steps of obtaining feature information of a training video stream by means of feature extraction of a fusion image sequence of the training video stream through a teacher model, determining a first prediction category and a prediction feature of a target object based on a mask image sequence corresponding to the training video stream through a gait recognition model, and iteratively training the gait recognition model based on a first error value between a labeling category corresponding to the same training video stream and the first prediction category and a second error value between feature information and the prediction feature. The mask image sequence based on the single-channel data of the gait recognition model can also acquire the same or similar target category and target characteristics based on the fusion image sequence, so that the recognition accuracy is improved, and the time consumption is shortened.

Inventors

GUI QING
YU SHENGQING
PAN HUADONG

Assignees

浙江大华技术股份有限公司

Dates

Publication Date: 20260508
Application Date: 20221103

Claims (11)

1. A gait recognition model training method, comprising: Acquiring a training video stream, wherein the training video stream comprises a plurality of frames of sample images with the same target object, the training video stream is associated with a corresponding mask image sequence and a fusion image sequence, the fusion image sequence is multi-channel data, and the mask image sequence is single-channel data; extracting features of the fused image sequence of the training video stream through a teacher model to obtain feature information of the training video stream; determining a first prediction category and a prediction characteristic of the target object based on the mask image sequence corresponding to the training video stream through a gait recognition model, wherein the teacher model is a model which has completed training and completes the same task as the gait recognition model; Iteratively training the gait recognition model based on a first error value between the annotation class and the first prediction class and a second error value between the feature information and the prediction feature corresponding to the same training video stream; the acquiring the training video stream further includes: dividing each sample image to obtain a mask image of the sample image; respectively carrying out key point detection on each sample image to obtain position key point information corresponding to the sample image; generating a key point connection structure diagram corresponding to the sample image based on the position key point information of the sample image; generating a fusion image of the sample image according to the mask image, the position key point information and the key point connection structure diagram corresponding to each sample image, wherein the fusion image is multi-channel data; Arranging the fusion images corresponding to all the sample images in the training video stream according to a time sequence to generate the fusion image sequence corresponding to the training video stream; The training method of the teacher model comprises the following steps: obtaining a second prediction category of the target object based on the fused image sequence of the training video stream by adopting the teacher model; And iteratively training the teacher model based on a third error value between the second prediction category and the labeling category corresponding to the training video stream.
2. The gait recognition model training method of claim 1, wherein, The acquiring the training video stream includes: And arranging the mask images corresponding to all the sample images in the training video stream according to a time sequence to generate the mask image sequence corresponding to the training video stream.
3. The gait recognition model training method of claim 2, wherein the fused image is three-channel data; the generating a fusion image of the sample image according to the mask image, the location key point information and the key point connection structure diagram corresponding to each sample image includes: Copying and splicing the mask images of the single channels to obtain three-channel mask images; Copying and splicing the single-channel position key point information to obtain three-channel position key point information; And fusing the three-channel mask image, the three-channel part key point information and the key point connection structure diagram corresponding to the same sample image to generate a fused image of the sample image.
4. The gait recognition model training method of claim 2, wherein the location key point information includes a location key point and a category and a location of the location key point; the generating a key point connection structure diagram corresponding to the sample image based on the position key point information of the sample image comprises the following steps: and connecting the position key points by using different types of line segments based on the category of the position key points and the positions to obtain the key point connection structure diagram corresponding to the sample image.
5. The gait recognition model training method of claim 4, wherein the categories of the location keypoints include tip of nose, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle and/or right ankle; based on the category and the position of the position key points, connecting the position key points by using different types of line segments to obtain the key point connection structure diagram corresponding to the sample image, including: Connecting two position key points by using the line segment with a first color in response to the category of one position key point being the left ankle and the category of the other position key point being the left knee and/or the category of one position key point being the right ankle and the category of the other position key point being the right knee; And/or connecting two position key points by using the line segment of the second color in response to the category of one position key point being the left hip, the category of the other position key point being the left knee, and/or the category of one position key point being the right hip and the category of the other position key point being the right knee; and/or connecting two position key points by using the line segment of the third color in response to the category of one position key point being the left hip, the category of the other position key point being the left shoulder and/or the category of one position key point being the right hip and the category of the other position key point being the right shoulder; And/or connecting two position key points by using the line segment with a fourth color in response to the category of one position key point being the left elbow, the category of the other position key point being the left shoulder and/or the category of one position key point being the right elbow and the category of the other position key point being the right shoulder; And/or connecting two position key points by using the line segment of a fifth color in response to the category of one position key point being the left elbow, the category of the other position key point being the left wrist and/or the category of one position key point being the right elbow and the category of the other position key point being the right wrist; and/or connecting two position key points by using the line segment of a sixth color in response to the classification of one position key point being the right shoulder, the classification of the other position key point being the left shoulder, and/or the classification of one position key point being the right hip and the classification of the other position key point being the left hip; And/or connecting the position key point corresponding to the nose tip with the midpoint of the line segment corresponding to the left shoulder and the right shoulder by using the line segment of a seventh color in response to the classification of one position key point as the nose tip and the classification of the other two position key points as the left shoulder and the right shoulder respectively.
6. The gait recognition model training method of claim 1, wherein the teacher model comprises a feature extraction module and a discrimination module, The obtaining, by the teacher model, a second prediction category of the target object based on the fused image sequence of the training video stream includes: performing feature extraction on the fused image sequence by adopting the feature extraction module to obtain a feature vector of the target object, wherein the feature vector comprises time sequence information; Calculating the similarity between the feature vector and each preset feature vector by adopting the judging module; And taking the category of the preset feature vector corresponding to the maximum similarity as a second prediction category of the target object.
7. A gait recognition method, characterized in that the gait recognition method comprises: Acquiring a video stream, wherein the video stream comprises a plurality of video frames containing the same target; dividing each video frame to obtain a mask image sequence of the video stream; And determining the identity category of the target based on the mask image sequence corresponding to the video stream by adopting a gait recognition model, wherein the gait recognition model is obtained by training the gait recognition model training method according to any one of claims 1-6.
8. A gait recognition model training apparatus, characterized in that the gait recognition model training apparatus comprises: The system comprises an acquisition module, a training video stream, a generation module and a generation module, wherein the training video stream comprises a plurality of frames of sample images with the same target object, the training video stream is associated with a corresponding mask image sequence and a fusion image sequence, the fusion image sequence is multi-channel data, the mask image sequence is single-channel data, the training video stream is provided with a labeling category of the target object, the acquisition module is also used for carrying out segmentation processing on each sample image to obtain the mask image of the sample image, carrying out key point detection on each sample image to obtain position key point information corresponding to the sample image, generating a key point connection structure diagram corresponding to the sample image based on the position key point information of the sample image, and generating a fusion image of the sample image according to the mask image, the position key point information and the key point connection structure diagram corresponding to each sample image, wherein the fusion image is multi-channel data, and the fusion image is the fusion image sequence corresponding to the training video stream; The system comprises a monitoring module, a teacher model, a target object, a training video stream, a prediction module, a monitoring module and a training module, wherein the teacher model is used for carrying out feature extraction on a fusion image sequence of the training video stream to obtain feature information of the training video stream; The processing module is used for determining a first prediction category and a prediction characteristic of the target object based on the mask image sequence corresponding to the training video stream through a gait recognition model, wherein the teacher model is a model which has completed training and completes the same task as the gait recognition model; And the training module is used for iteratively training the gait recognition model based on a first error value between the labeling category and the first prediction category corresponding to the same training video stream and a second error value between the characteristic information and the prediction characteristic.
9. A gait recognition device, characterized in that the gait recognition device comprises: The information acquisition module is used for acquiring a video stream, wherein the video stream comprises a plurality of video frames containing the same target; the preprocessing module is used for carrying out segmentation processing on each video frame to obtain a mask image sequence of the video stream; The system comprises a determination module, a target identification module and a target identification module, wherein the determination module is used for determining the identity category of the target by using a gait recognition model based on the mask image sequence corresponding to the video stream, and the gait recognition model is obtained by training the gait recognition model training method according to any one of claims 1-6.
10. A terminal comprising a memory, a processor and a computer program stored in the memory and running on the processor, the processor being configured to execute program data to implement the steps of the gait recognition model training method of any one of claims 1 to 6 or the gait recognition method of claim 7.
11. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when executed by a processor, the computer program implements the steps of the gait recognition model training method according to any one of claims 1 to 6 or the gait recognition method according to claim 7.

Description

Gait recognition method, gait recognition device, model training method, terminal and storage medium Technical Field The invention relates to the technical field of biological feature recognition, in particular to a gait recognition method, a model training method, a device, a terminal and a storage medium thereof. Background Gait recognition is an emerging identity recognition technology, has the advantages of non-contact, long distance and the like, and becomes a focus of attention of researchers. The current mainstream gait recognition algorithm mostly uses a statistical model or a deep learning model to perform parameter learning on a pedestrian profile sequence, obtains a feature extractor on the basis, and then performs feature matching by using the extracted features so as to realize the identity recognition of the person. The pedestrian profile can effectively remove color information interference of clothes, carrying objects and the like, but due to the fact that the profiles of different clothes and carrying objects are different, the difference between the same person profile before and after replacement is larger than the difference between different person profile, and the accuracy of gait recognition can be affected undoubtedly. Disclosure of Invention The invention mainly solves the technical problems of low accuracy of an identification result and long time consumption in a process in the prior art by providing a gait recognition method and a model training method, a device, a terminal and a storage medium thereof. The first technical scheme adopted by the invention is that the gait recognition model training method comprises the steps of obtaining a training video stream, wherein the training video stream comprises a plurality of frames of sample images with the same target object, the training video stream is associated with a corresponding mask image sequence and a fusion image sequence, the fusion image sequence is multi-channel data, the mask image sequence is single-channel data, the training video stream is provided with a labeling type of the target object, feature extraction is conducted on the fusion image sequence of the training video stream through a teacher model to obtain feature information of the training video stream, a first prediction type and a prediction feature of the target object are determined through the gait recognition model based on the mask image sequence corresponding to the training video stream, the teacher model is a model which is trained and completes the same task as the gait recognition model, and the gait recognition model is iterated based on a first error value between the labeling type and the first prediction type corresponding to the same training video stream and a second error value between the feature information and the prediction feature. The method comprises the steps of obtaining a training video stream, dividing each sample image to obtain mask images of the sample images, wherein the mask images are single-channel data, and arranging mask images corresponding to all sample images in the training video stream according to time sequence to generate a mask image sequence corresponding to the training video stream. The method comprises the steps of obtaining training video streams, detecting key points of sample images to obtain position key point information corresponding to the sample images, generating a key point connection structure diagram corresponding to the sample images based on the position key point information of the sample images, generating fusion images of the sample images according to mask images, the position key point information and the key point connection structure diagram corresponding to the sample images, wherein the fusion images are multichannel data, and arranging the fusion images corresponding to all the sample images in the training video streams according to time sequences to generate fusion image sequences corresponding to the training video streams. Wherein the fusion image is three-channel data; generating a fusion image of the sample image according to the mask image, the position key point information and the key point connection structure diagram corresponding to each sample image, wherein the fusion image comprises the steps of copying and splicing the mask image of a single channel to obtain a three-channel mask image, copying and splicing the position key point information of the single channel to obtain the three-channel position key point information, and fusing the three-channel mask image, the three-channel position key point information and the key point connection structure diagram corresponding to the same sample image to generate the fusion image of the sample image. The method comprises the steps of generating a key point connection structure diagram corresponding to a sample image based on the position key point information of the sample image, and connecting the position key points by using