CN-122023847-A - Double-memory dynamic contrast learning clothing changing pedestrian re-identification method
Abstract
The invention discloses a re-identification method for a clothing changing pedestrian based on double-memory dynamic contrast learning. The method comprises the steps of firstly obtaining a pedestrian video sequence with identity tags, extracting a three-dimensional space-time feature map through a main network, then respectively extracting space aggregation features and time sequence difference features from the feature map, respectively projecting the two types of features into an identity subspace and a time sequence subspace, respectively carrying out attention weighted aggregation through a double memory library to generate identity enhancement features and time sequence enhancement features, constructing a positive and negative sample set by adopting similarity priority and diversity complement strategies, calculating a weighted contrast loss joint optimization double memory library, updating a memory prototype through a sliding average strategy with diversity regularities, finally carrying out self-adaptive fusion on the double path enhancement features to generate discriminant feature representation, and returning a pedestrian matching list through similarity sequencing. The invention uses the double memory library design and uses a plurality of memory prototypes of dynamic combination to represent pedestrians, so that the model focuses on the stable identity characteristics of clothes irrelevant.
Inventors
- ZHAO XUJING
- ZHANG YUNZUO
- CHEN SHUAI
- ZHANG XINCHENG
Assignees
- 石家庄铁道大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260203
Claims (7)
- 1. The double-memory dynamic contrast learning clothing changing pedestrian re-identification method is characterized by comprising the following steps of: S1, acquiring a pedestrian video sequence with an identity tag, and extracting a three-dimensional space-time characteristic diagram of the pedestrian video sequence by adopting a backbone network; s2, respectively extracting space aggregation features and time sequence difference features from the three-dimensional space-time feature map; S3, projecting the space aggregation features and the time sequence difference features to identity and time sequence subspaces respectively, calculating the similarity with the memory prototypes in the respective memory libraries, and generating identity enhancement features and time sequence enhancement features based on the attention weighted aggregation of the similarity; S4, determining positive sample sets by adopting similarity priority and diversity complement strategies and constructing corresponding negative sample sets; S5, respectively calculating weighted positive sample pull-up loss and interval negative sample pull-down loss for the identity enhancement feature and the time sequence enhancement feature, and carrying out weighted summation through weight coefficients to jointly optimize the discrimination performance of the identity and time sequence dual memory bank; s6, adopting a moving average updating strategy with diversity regularities for the memory prototype selected as the positive sample, and simultaneously maintaining updating count and latest updating step record of the prototype; And S7, carrying out self-adaptive fusion on the identity enhancement feature and the time sequence enhancement feature which are obtained after the pedestrian video sequence to be identified is processed by the network model, obtaining the final discriminant feature representation through subsequent network processing, calculating the similarity between the discriminant feature representation and the gallery feature, and generating a pedestrian matching list according to the similarity sequence.
- 2. The method for identifying the re-clothing pedestrian of the double-memory dynamic contrast learning is characterized in that S2 comprises the steps of firstly averaging three-dimensional space-time feature images extracted from a pedestrian video sequence through a main network along a time dimension to generate a space attention weight, multiplying the attention weight by an original space-time feature image element by element, finally carrying out time and space dimension averaging pooling on a weighted feature image to obtain a space aggregation feature, and meanwhile constructing a multi-scale time sequence change feature on the three-dimensional space-time feature image through adjacent frame difference, window difference and global difference, and obtaining a time sequence difference feature through convolution and weighted fusion.
- 3. A re-recognition method for a coat changing pedestrian in double memory dynamic contrast learning is characterized in that S3 comprises the steps of respectively constructing an identity memory prototype library and a time memory prototype library, wherein the identity memory prototype library stores identity memory prototypes with different appearances, the time memory prototype library stores time memory prototypes with different modes, projecting space aggregation features into an identity subspace in each training iteration, calculating similarity between the space aggregation features and each identity memory prototype in the identity memory prototype library, selecting the identity memory prototypes meeting preset similarity conditions, generating identity enhancement features through attention weighted aggregation, projecting time sequence difference features into the time sequence subspace, calculating similarity between the time difference features and each time memory prototype in the time sequence memory prototype library, introducing time attenuation factors to promote recent time memory prototype weights, selecting the time memory prototypes meeting preset similarity conditions, generating time sequence enhancement features through attention weighted aggregation, introducing feature orthogonality constraint, calculating the similarity between the identity enhancement features and the time sequence enhancement features, taking the average value of the absolute value of similarity as orthogonal loss, restricting the identity independence of the identity enhancement features, and simultaneously projecting the time sequence difference features, projecting the time sequence memory features, calculating the similarity features, and enhancing the similarity features, and realizing stable decoupling of the identity enhancement features and time sequence information.
- 4. A re-recognition method for a pedestrian in double-memory dynamic contrast learning as set forth in claim 3, wherein the identity memory prototype library and the time sequence memory prototype library are designed by adopting a double-memory library architecture, the identity memory prototype library stores identity memory prototypes through parameter matrixes, the time sequence memory prototype library stores time sequence memory prototypes through parameter matrixes, the initialization of the two memory prototypes adopts a layered initialization strategy, the memory prototype parameter matrixes of the identity memory prototype library are randomly initialized by adopting normal distribution with the mean value of 0 and the standard deviation of 0.01, the memory prototype parameter matrixes of the time sequence memory prototype library are randomly initialized by adopting normal distribution with the mean value of 0 and the standard deviation of 0.02, the access statistical parameters are initialized by adopting zero values, the update count and the latest update step of each memory prototype are included, each memory prototype does not have a fixed identity label, a semantic cluster center is dynamically formed according to the similarity relation with a labeled sample in the training process, and the characteristics of each pedestrian are represented by the combination of a plurality of memory prototypes, so that efficient modeling of the pedestrian characterization and implicit decoupling of appearance change are realized under the condition of limited capacity.
- 5. The re-recognition method for the coat changing pedestrians based on the double-memory dynamic contrast learning is characterized in that S4 comprises the steps of dynamically determining the number of positive samples in stages based on an identity similarity matrix and a time sequence similarity matrix, constructing a positive sample set, screening memory prototypes with similarity higher than a threshold value firstly by adopting a similarity priority and diversity complement strategy, supplementing the memory prototypes with the least updating times to ensure diversity when the similarity is insufficient, and selecting the memory prototypes with the highest similarity from non-positive samples by taking a fixed multiple of the number of the positive samples as a target.
- 6. The method for identifying the changing pedestrian weight by double-memory dynamic contrast learning according to claim 1, wherein S5 comprises the steps of calculating contrast loss for identity enhancement features and time sequence enhancement features respectively, wherein each memory bank comprises positive sample zooming loss and negative sample zooming loss, the positive sample zooming loss weights the zooming loss through a weight matrix to strengthen the constraint of a high-value positive sample, the negative sample zooming loss adopts an interval threshold mechanism to ensure the distinguishing degree of the positive sample and the negative sample zooming loss, and the identity memory bank loss and the time sequence memory bank loss are weighted and summed through a weight coefficient to realize the joint optimization of the distinguishing performance of the double memory banks.
- 7. The method for re-identifying a pedestrian for changing clothes with double memory dynamic contrast learning as claimed in claim 1, wherein the step S6 comprises adopting a sliding average update strategy with diversity regularities for the memory prototypes selected as positive samples, absorbing current query characteristics while retaining history information, suppressing excessive aggregation by punishing excessive similarity among prototypes, and simultaneously maintaining update counts and latest update step records of each memory prototype for supporting subsequent sample screening and diversity completion.
Description
Double-memory dynamic contrast learning clothing changing pedestrian re-identification method Technical Field The invention relates to a re-identification method for a clothing changing pedestrian based on double-memory dynamic contrast learning, and belongs to the technical field of computer vision. Background Pedestrian Re-identification (ReID) is an important research direction in the field of computer vision, with the goal of identity matching the same pedestrian across cameras or long spans of time. The traditional ReID method is mainly based on appearance characteristics of single-frame images, such as clothing colors, textures and the like, and extracts the characteristics through a deep convolution network and performs measurement learning. However, in real scenes, pedestrians often replace clothes, carry articles and the like, so that the visual characteristics of the same person are significantly changed, and the performance of the traditional ReID model based on appearance is seriously reduced. To cope with the problem of appearance change, researchers have proposed two typical directions of improvement. The time sequence feature modeling method utilizes the continuity between frames in a video sequence to model the time sequence dynamic mode of pedestrians through 3D convolution, RNN or Transformer and other models so as to extract motion characterization related to the identity. However, the existing methods still have significant limitations in dealing with changing clothing scenarios. The existing method lacks a stable memory mechanism in the aspect of identity characterization, model training only depends on samples in batches to optimize, and features of the same pedestrian at different moments and under different appearances are difficult to continuously integrate in a global scope, so that stable identity feature representation cannot be formed. Meanwhile, the existing method is still insufficient in time sequence information utilization, and dynamic changes of different time scales are not fully mined. In addition, the interference problem of appearance change to identity learning is not well solved, complex coupling relation exists between clothing decoration features and identity features in a clothing changing scene, an explicit appearance separation method introduces additional supervision and noise, an indistinguishable processing mode is easily led by non-identity factors, recognition confusion is caused, and how to realize implicit decoupling of identity and appearance becomes a difficulty to break through urgently. Disclosure of Invention In order to solve the problems, the invention provides a re-identification method for a clothing changing pedestrian for double-memory dynamic contrast learning, which comprises the following steps: S1, acquiring a pedestrian video sequence with an identity tag, and extracting a three-dimensional space-time characteristic diagram of the pedestrian video sequence by adopting a backbone network; s2, respectively extracting space aggregation features and time sequence difference features from the three-dimensional space-time feature map; S3, projecting the space aggregation features and the time sequence difference features to identity and time sequence subspaces respectively, calculating the similarity with the memory prototypes in the respective memory libraries, and generating identity enhancement features and time sequence enhancement features based on the attention weighted aggregation of the similarity; S4, determining positive sample sets by adopting similarity priority and diversity complement strategies and constructing corresponding negative sample sets; S5, respectively calculating weighted positive sample pull-up loss and interval negative sample pull-down loss for the identity enhancement feature and the time sequence enhancement feature, and carrying out weighted summation through weight coefficients to jointly optimize the discrimination performance of the identity and time sequence dual memory bank; s6, adopting a moving average updating strategy with diversity regularities for the memory prototype selected as the positive sample, and simultaneously maintaining updating count and latest updating step record of the prototype; And S7, carrying out self-adaptive fusion on the identity enhancement feature and the time sequence enhancement feature which are obtained after the pedestrian video sequence to be identified is processed by the network model, obtaining the final discriminant feature representation through subsequent network processing, calculating the similarity between the discriminant feature representation and the gallery feature, and generating a pedestrian matching list according to the similarity sequence. Further, the S2 comprises the steps of firstly averaging three-dimensional space-time feature images extracted from a pedestrian video sequence through a backbone network along a time dimension to generate a space atten