CN-121838213-B - Method for re-identifying blocked pedestrians based on gesture guidance and progressive feature restoration

CN121838213BCN 121838213 BCN121838213 BCN 121838213BCN-121838213-B

Abstract

The invention discloses a method for re-identifying a blocked pedestrian based on gesture guidance and progressive feature restoration, which comprises the steps of obtaining an original pedestrian re-identification data set, carrying out blocking simulation enhancement processing on the data set, carrying out human body key point extraction by utilizing a pre-trained gesture estimation model, executing a many-to-one selection strategy, constructing a skeleton-outline image to generate a visibility pseudo-label, constructing a feature extraction backbone network based on Vision Transformer architecture, carrying out attention guidance based on skeleton information, constructing a visibility estimation branch network comprising a full connection layer, constructing a mixed loss function comprising identity loss, triple loss and visibility loss, carrying out mixed supervision combined training, carrying out reasoning and feature restoration based on the progressive strategy, and outputting a final pedestrian re-identification search result. The method effectively solves the problems of semantic interference and feature missing in the shielding scene, and improves the recognition robustness and precision.

Inventors

ZHAO WENTAO
LIU YIXIU
ZHANG SHAOXIONG
WU YUE

Assignees

杭州电子科技大学

Dates

Publication Date: 20260512
Application Date: 20260313

Claims (8)

1. A method for re-identifying blocked pedestrians based on gesture guidance and progressive feature restoration is characterized by comprising the following operation steps: s1, acquiring an original pedestrian re-identification data set, carrying out shielding simulation enhancement processing on the data set, and constructing an enhanced training sample data set; s2, extracting key points of a human body from images in a training sample data set by utilizing a pre-training posture estimation model, executing a many-to-one selection strategy, and automatically screening retrieval targets in the images; S3, constructing a skeleton-outline image and generating a visibility pseudo tag; Step S4, constructing a feature extraction backbone network based on Vision Transformer architecture; Step S5, implementing attention guidance based on skeleton information; The implementation of step S5 includes: step S51, carrying out maximum pooling processing on the generated binarized skeleton-outline image to generate attention mask sequences corresponding in number to the image block Token sequences; Step S52, two attention calculation paths are executed in parallel at the first layer of the transducer encoder, wherein one path executes standard self-attention calculation to capture global context information, and the other path uses the attention mask sequence to reject background interference and focus target human body characteristics; step S53, introducing a learnable balance parameter, and carrying out weighted fusion on the outputs of the two attention calculation paths to obtain a final first-layer output characteristic sequence; S6, constructing a visibility estimation branch network comprising a full connection layer; step S7, constructing a mixed loss function containing identity loss, triplet loss and visibility loss, and executing mixed supervision combined training; S8, performing reasoning and feature restoration based on a progressive strategy, and outputting a final pedestrian re-identification search result; The implementation of step S8 includes: Step S81, global and local feature representation and visibility scores of the query image and the gallery image are obtained, and a standard Euclidean distance is calculated based on global features to obtain an initial K neighbor sample set of the query image; Step S82, executing progressive feature restoration iteration, in each iteration, weighting and aggregating K neighbor features based on visibility score to obtain a reference feature vector, carrying out weighted fusion restoration on the local features of the query image by using the reference feature vector, calculating the visibility weighted Euclidean distance based on the restored local features, re-searching to update the K neighbor sample set, and continuing restoration of the next iteration until the iteration is completed; And S83, calculating and sequencing the Euclidean distance weighted with the visibility of the gallery image by utilizing the local features of the query image after final repair, and outputting a final pedestrian re-identification search result.
2. The method for re-identifying the blocked pedestrian based on the gesture guidance and the progressive feature restoration according to claim 1, wherein the implementation process of the step S1 comprises the following steps: step S11, acquiring an original pedestrian re-identification data set, and selecting another pedestrian as a shielding source image aiming at each pedestrian image in the data set; Step S12, setting a probability threshold value to realize mixed shielding data enhancement based on probability control; And S13, the enhanced image and the original unreinforced image are combined into a final training sample set to be output.
3. The method for re-identifying the blocked pedestrian based on the gesture guidance and the progressive feature restoration according to claim 2, wherein the step S2 comprises the following steps: the gesture key point extraction stage comprises the steps of processing an input image by utilizing a pre-trained gesture estimation model, extracting gesture information of all detected pedestrians in the image, and outputting a key point coordinate set of each pedestrian; The multi-to-one selection algorithm stage is executed, namely, for each detected pedestrian, calculating the geometric centroid of all visible key points of the pedestrian, calculating the absolute distance between the horizontal coordinate of the centroid and the horizontal center of the image, setting a horizontal distance threshold value, and judging and searching the target by adopting a grading strategy according to the distribution condition of the detected pedestrians in the image; based on the screening result of the many-to-one selection algorithm, only the attitude key point data of the locked retrieval target is reserved, and the key point information which is judged to interfere with pedestrians is filtered.
4. The method for re-identifying an occluding pedestrian based on gesture guidance and progressive feature restoration of claim 3, wherein constructing the skeleton-contour image in step S3 comprises: Based on the extracted coordinates of key points, connecting adjacent key points according to a human body posture topological structure by taking the image subjected to data enhancement as a base plate, directly drawing a color skeleton connecting line on the enhanced image, and generating an enhanced image superimposed with skeleton information; performing pixel-by-pixel difference calculation by using the superimposed image and the original enhanced image, and calculating the sum of absolute differences of each pixel position in the image on three RGB channels to obtain a single-channel accumulated difference map; setting a binarization threshold value, and performing binarization processing on the accumulated difference map to obtain a preliminary mask; and finally, processing the preliminary mask by adopting morphological dilation operation to generate a final binarized skeleton-outline image.
5. The method for re-identifying an occluding pedestrian based on gesture guidance and progressive feature repair of claim 4, wherein generating a visibility pseudo tag in step S3 comprises: Uniformly dividing the final binarized skeleton-contour image into vertical levels A horizontal stripe region, for which the respective pixel sum is counted for all horizontal stripes; positioning the shielding boundary by adopting a scanning strategy from top to bottom, and searching a strip with the sum of the first pixels being 0 as a cut-off strip; after determining the cut-off strip, generating the visibility pseudo labels of all the horizontal strips according to the continuity priori of the human body structure.
6. The method for re-identifying the blocked pedestrian based on the gesture guidance and the progressive feature restoration of claim 5, wherein the implementation of the step S4 comprises the following steps: S41, dividing an input image into non-overlapping image blocks with fixed sizes, and mapping each two-dimensional image block into a one-dimensional embedded vector through a learnable linear projection layer to form an original image block Token sequence; step S42, embedding and superposing the learnable position into a block Token sequence, inputting the sequence into a transducer encoder for interaction and extraction of global context information, and outputting a block Token sequence containing classification Token and image content; Step S43, according to the spatial distribution of the original image, spatially rearranging the image block Token sequence output by the encoder, and restoring the image block Token sequence into a two-dimensional feature map with an original spatial structure; step S44, performing multi-granularity feature extraction based on the two-dimensional feature map, extracting global feature vectors by global average pooling, and utilizing horizontal uniformity The aliquoting pooling strategy extracts a plurality of local feature vectors.
7. The method for re-identifying an occluding pedestrian based on gesture guidance and progressive feature repair of claim 6, wherein the visibility estimation branch network in step S6 comprises: the first full-connection layer is used for receiving the local feature vector as input, and the output end is connected with a ReLU activation function for introducing nonlinear transformation and enhancing the distinguishing expression capability of the features; The second full-connection layer is used for receiving the feature vector output by the first full-connection layer, further reducing the dimension and mapping the feature vector, and the output end is also connected with the ReLU activation function; The third full-connection layer is used as a classification output layer and is used for receiving the feature vector output by the second full-connection layer and mapping the feature vector into a classification prediction vector with the dimension of 2, wherein two dimensions of the vector respectively correspond to two state categories of shielding and visible; and carrying out normalization processing on the output vector of the third full connection layer by using a Softmax function to obtain the prediction probability distribution of the local feature belonging to the 'shielding' and 'visible' categories.
8. The method for re-identifying an occluding pedestrian based on gesture guidance and progressive feature restoration of claim 7, wherein the mixing loss function in step S7 comprises: Identity recognition loss, namely calculating cross entropy loss according to global feature vectors and local feature vectors of each input image output by a backbone network by utilizing a real pedestrian identity tag provided by a data set; The triple measurement loss is that a triple sample is constructed in a feature space based on an identity tag, and comprises an anchor point sample, a similar positive sample and a heterogeneous negative sample, and the triple loss is calculated; Visibility estimation loss by using the generated fine granularity visibility pseudo tag as a supervision truth value, and calculating cross entropy loss for the prediction score of the visibility estimation branch output.

Description

Method for re-identifying blocked pedestrians based on gesture guidance and progressive feature restoration Technical Field The invention relates to the technical field of computer graphics and deep learning, in particular to a method for re-identifying blocked pedestrians based on gesture guidance and progressive feature restoration aiming at complex blocked scenes. Background Conventional pedestrian re-recognition methods are generally based on the preset precondition that the pedestrian image is entirely visible throughout the body. However, in an actual monitoring scene, the target pedestrian is inevitably blocked by objects (such as vehicles, trees, walls) or other pedestrians in the scene, resulting in loss or contamination of pedestrian appearance information. To solve this problem, the technology of shielding the pedestrian from re-recognition has become a research hotspot. Current common strategies to solve occlusion problems include enhancing training data with artificial occlusion images and extracting discriminative features of the visible region through feature alignment. These methods are generally effective when dealing with "Object-to-Person" (OTP) occlusion (i.e., the occlusion is a static Object of a vehicle, wall, etc.). However, in the case of handling more complex "Person-to-Person" (PTP) occlusion scenes, i.e. where the target pedestrian is partially occluded by another pedestrian, the recognition performance of the prior art is severely limited. On the one hand, PTP occlusion introduces a high degree of semantic interference between foreground occlusion pedestrians and target pedestrians (i.e. primary search targets). Unlike static shades of vehicles, trees, etc., foreground pedestrians as shade people and target pedestrians as shaded people have high semantic similarity in vision, and both comprise similar human body structural characteristics of heads, limbs, etc. In the absence of an effective distinguishing mechanism, existing feature extraction models are prone to attention drift, erroneously focus on foreground occluding pedestrians, and not on objects to be retrieved in the background, resulting in unmatched extracted feature representations with real objects. The existing method generally only depends on a manually set PTP shielding image overlapping area in a data enhancement process to generate a visibility label, so that the model is guided to pay attention to the characteristics of a non-shielding part, but the generation mode is mechanical and single, the natural PTP shielding condition existing in an original data set is ignored, the generated position-level visibility label is inconsistent with the actual condition, and an accurate supervision signal cannot be provided for model training. In addition, the existing feature restoration strategy adopts a single global weighted average mode to fill the shielding area by utilizing the gallery features, and the rough restoration mode not only easily introduces irrelevant feature noise, but also ignores priori knowledge of human body structures (for example, in a PTP scene, the probability that the lower body of a pedestrian is shielded is usually obviously higher than that of the upper body), so that the model is difficult to realize high-precision feature reconstruction while maintaining fine-granularity local details. Disclosure of Invention The invention aims to provide a method for re-identifying a blocked pedestrian based on gesture guidance and progressive feature restoration, which uses a human body gesture estimation result as priori knowledge, accurately locks a main target through a many-to-one selection strategy, guides a network to focus an effective area through a skeleton-profile (Skeleton-Silhouette Images, SSI) image, and finely restores blocked features in an inference stage through a progressive feature restoration mechanism, thereby improving the retrieval performance of a model in a complex blocking scene and solving the problems in the background technology. The method for re-identifying the blocked pedestrians based on gesture guidance and progressive feature restoration comprises the following operation steps: step S1, acquiring an original pedestrian re-identification data set, carrying out shielding simulation enhancement processing on the data set, and constructing an enhanced training sample data set. Preferably, the specific implementation process comprises the following steps: step S11, acquiring an original pedestrian re-identification data set, and selecting another pedestrian as a shielding source image aiming at each pedestrian image in the data set; Step S12, setting a probability threshold value to realize mixed shielding data enhancement based on probability control; And S13, the enhanced image and the original unreinforced image are combined into a final training sample set to be output. And S2, extracting key points of the human body from the images in the training sample data set by util