CN-122024310-A - Unsupervised visible light-infrared pedestrian re-identification method based on camera deviation evaluation and alternate graph matching

CN122024310ACN 122024310 ACN122024310 ACN 122024310ACN-122024310-A

Abstract

The invention discloses an unsupervised visible light-infrared pedestrian re-identification method based on camera deviation evaluation and alternate graph matching, which comprises the following steps of acquiring pedestrian images from video streams acquired by a visible light camera and an infrared camera, and constructing a cross-mode pedestrian image set; the method comprises the steps of extracting features of pedestrian images to generate initial pseudo tags, counting camera sources of each cluster through a camera distribution deviation evaluation module, correcting the pseudo tags, constructing a cross-modal similarity matrix weighted by camera deviation to achieve cross-modal cluster level feature alignment, and performing cross-camera retrieval based on the aligned cross-modal features to achieve unsupervised visible light-infrared pedestrian re-identification. The invention can obviously improve the reliability of the pseudo tag and the accuracy of cross-mode matching.

Inventors

Niu Xiaoshuai
WANG JIN
FU HONGYANG

Assignees

南通大学

Dates

Publication Date: 20260512
Application Date: 20260105

Claims (9)

1. An unsupervised visible light-infrared pedestrian re-recognition method based on camera deviation evaluation and alternating pattern matching, characterized by comprising the following steps: Acquiring pedestrian images from video streams acquired by a visible light camera and an infrared camera, and constructing a cross-mode pedestrian image set; Extracting features of the pedestrian image, calculating the similarity among samples, and generating an initial pseudo tag by adopting a clustering algorithm; counting camera sources of each cluster through a camera distribution deviation evaluation module, calculating camera deviation rate according to the distribution proportion of images among different cameras, and identifying low-confidence clusters with the deviation rate exceeding a set threshold; Splitting the low confidence clusters according to camera dimensions to obtain a plurality of sub-clusters, and structurally matching the sub-clusters with the high confidence clusters to correct the pseudo tags; Constructing a cross-modal similarity matrix weighted by camera deviation, constructing graph structures based on visible light features and infrared features respectively, and establishing consistent identity correspondence between the graph structures through a bidirectional alternate graph matching module to realize cross-modal cluster level feature alignment; And performing cross-camera retrieval based on the aligned cross-mode features, so as to realize unsupervised visible light-infrared pedestrian re-recognition.
2. The method of claim 1, wherein a camera deviation rate is calculated from the number of samples from different cameras in each cluster, the deviation rate being determined by the ratio of the sum of the maximum two camera sample numbers to the total number of samples in the cluster, the expression of the deviation rate being: ; Wherein, the And Respectively representing the sample numbers of cameras with the first two sample numbers in the cluster, The sum of the number of samples for all cameras clustered.
3. The method of claim 2, wherein the camera bias rate is obtained by comparing the number of samples from different camera sources in the cluster, and the cluster is determined to be a low confidence cluster when the camera bias rate is greater than a preset threshold of 0.5.
4. The method of claim 3, wherein when camera dimension splitting is performed on the low confidence clusters, images from different cameras are divided into independent sub-clusters, and a similarity cost matrix of the sub-clusters and the high confidence clusters is calculated according to a feature space structure.
5. The method of claim 4, wherein the correction of the pseudo tag is a bi-directional alternate graph match by comparing a similarity cost matrix of the sub-clusters to the high confidence clusters.
6. The method of claim 5, wherein the bi-directional alternation pattern matching module alternates between matching the visible light mode to the infrared mode and matching the infrared mode to the visible light mode, and iteratively updates the correspondence between modes in each alternation until convergence.
7. The method of claim 6, wherein a camera bias penalty factor λ_bias = 0.7 and a camera bias rate are introduced in constructing the cross-modal similarity cost matrix Sample matching between cameras with higher camera deviation rate is endowed with larger distance penalty weight, so that interference of the samples on cross-modal alignment results is restrained, and the stability and reliability of matching are improved.
8. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps in the unsupervised visible light-infrared pedestrian re-identification method based on camera bias evaluation matching an alternating pattern as claimed in any one of claims 1 to 7.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps in the unsupervised visible-infrared pedestrian re-identification method based on camera bias evaluation matching alternating patterns as claimed in any one of claims 1 to 7 when the program is executed.

Description

Unsupervised visible light-infrared pedestrian re-identification method based on camera deviation evaluation and alternate graph matching Technical Field The invention belongs to the technical field of artificial intelligence and computer vision, and particularly relates to an unsupervised visible light-infrared pedestrian re-identification method based on camera deviation evaluation and alternate graph matching. Background The pedestrian re-recognition technology aims at recognizing the same target across different camera angles, and is a core task in a multi-camera association analysis system, a track tracking system and the like. With the development of deep learning, pedestrian re-recognition based on visible light images has made significant progress over large tagged data sets. However, the visible light image is highly dependent on illumination conditions and texture, and when a complex environment such as a night scene, strong backlight, shielding or low resolution is encountered, the appearance information is often insufficient to support stable recognition, so that the applicability of the traditional RGB pedestrian re-recognition method is insufficient. In addition, the imaging difference among multiple cameras can further increase the inconsistency of the cross-view angle characteristics, so that the generalization capability of the model is limited. To alleviate performance bottlenecks of visible light in low light environments, researchers have introduced infrared imaging into a cross-view recognition task, forming visible-infrared pedestrian re-recognition (VI-ReID). The infrared image can still provide stable information such as human body outline, body structure and the like at night and in a low light field scene, so that visible light modes are effectively supplemented. However, a significant modal gap exists between visible light and infrared, including large texture expression difference, color information loss and the like, so that cross-modal features are difficult to directly align, and the recognition difficulty is increased. Moreover, if the cross-modal identity association is established by relying on manual labeling, the cost burden is high, so that the unsupervised visible light-infrared pedestrian re-identification becomes the main direction of the current research. The task only uses unlabeled cross-mode images to carry out model training through mechanisms such as clustering, pseudo tag generation, contrast learning and the like. The existing unsupervised VI-ReID method generally relies on a clustering algorithm such as DBSCAN to generate pseudo labels, but because different cameras have obvious differences in illumination, exposure and visual angles and backgrounds, samples with the same identity have different performances under different cameras, clustering results are very easy to be dominated by 'camera deviation', so that a large number of false labels are generated. The false labels are further accumulated in training to form low-quality clusters, so that the accuracy of subsequent cross-mode matching is seriously reduced, and the false labels are main bottlenecks for restricting the performance of the existing unsupervised ReID. Recent studies have attempted to improve the pseudo tag quality and cross-modal alignment capabilities of the unsupervised VI-ReID from different angles. For example ,"Augmented dual-contrastive aggregation learning for unsupervised visible-infrared person re-identification" enhances the cross-modal sharing characteristics through a double-level contrast learning and characteristic aggregation mechanism, so that the offset ;"Robust Pseudo-label Learning with Neighbor Relation for Unsupervised Visible-Infrared Person Re-Identification" caused by modal differences is relieved to a certain extent, local consistency among samples in a clustering result is improved through introducing neighborhood relation modeling, the influence of pseudo-label noise is reduced, and "Unsupervised Visible-Infrared Person Re-Identification via Progressive Graph Matching and Alternate Learning" utilizes a progressive graph matching strategy to infer the cross-modal corresponding relation among clusters and stabilizes a cross-modal supervision signal through an alternate contrast learning strategy. The methods promote the unsupervised training effect from the aspects of feature aggregation, local structure modeling or cross-modal matching and the like, and promote the development of VI-ReID. However, the above-described method still has significant drawbacks. Firstly, quality-irregular clusters are easy to generate in an unsupervised clustering process, wherein part of cluster samples are mostly from the same camera, the diversity across cameras is lacking, the clusters often correspond to false labels, and the existing method is lack of an effective mechanism for identifying and correcting the low-confidence clusters. Secondly, cross-modal matching generally relies o