CN-119832589-B - Pedestrian re-recognition feature extraction model construction and pedestrian re-recognition method and device

CN119832589BCN 119832589 BCN119832589 BCN 119832589BCN-119832589-B

Abstract

The invention relates to the field of electronic technology, and discloses a pedestrian re-recognition feature extraction model construction and a pedestrian re-recognition method and a device, wherein the pedestrian re-recognition feature extraction model construction method provided by the invention is used for searching the potential relation between pedestrian identities more fully by executing the internal sample clustering of a camera, constructing and relating the pedestrian identity information among a plurality of cameras by the internal sample clustering of the camera and the total sample clustering of the camera so as to select a sample which keeps the invariance of the camera for the camera invariance learning of the model, the sensitivity of the model to the camera visual field change can be reduced, so that the pedestrian re-recognition model focuses more on the change of the pedestrian identity rather than the change of the camera, and the problem that the pedestrian re-recognition task in the related technology involves the recognition of the pedestrian identity by the misleading model with overlarge sample characteristic difference among cameras caused by different factors such as camera shooting angles, backgrounds, camera parameters and the like is solved.

Inventors

SUN WENBO
HU SHAOLONG
ZHOU YUAN
CHEN BAOJIAN

Assignees

天翼云科技有限公司

Dates

Publication Date: 20260508
Application Date: 20241204

Claims (10)

1. The method for constructing the pedestrian re-recognition feature extraction model is characterized by comprising the following steps of: Acquiring a first feature vector and a second feature vector of a plurality of different cameras, wherein the first feature vector is obtained by carrying out feature extraction on the basis of image samples corresponding to a pre-constructed first feature extraction model, and the second feature vector is obtained by carrying out feature extraction on the basis of image samples corresponding to a pre-constructed second feature extraction model; clustering a plurality of image sample data of each camera based on a first feature vector of a plurality of image samples of the camera to obtain a plurality of first cluster clusters of the corresponding camera; performing global clustering on a plurality of image sample data of different cameras based on first feature vectors of a plurality of image samples respectively corresponding to different cameras to obtain a plurality of second clustering clusters, and performing global clustering on a plurality of image sample data of different cameras based on second feature vectors of a plurality of image samples respectively corresponding to different cameras to obtain a plurality of third clustering clusters; Calculating a camera change difference value between each first cluster group and each second cluster group and each third cluster group of each camera; constructing a camera invariance data set based on at least one target first cluster in a plurality of first cluster clusters respectively corresponding to different cameras, wherein the target first cluster is a first cluster with a camera change difference value larger than a preset threshold value; Training the first feature extraction model and the second feature extraction model by utilizing each image data and corresponding feature vectors in the camera invariance data set until a preset condition is met, so as to obtain a pedestrian re-identification feature extraction model; The camera change difference values between each first cluster group and each second cluster group and each third cluster group of each camera are obtained through calculation through the following steps: based on three pseudo-tagged datasets from different feature spaces 、 And Providing diversity information according to clustering results of different feature spaces to calculate camera variation differences among clusters: Wherein, the Representing a set of samples from cluster J during intra-camera P i clustering, Representing the set of samples from cluster K during the encoder F ensemble sample clustering, Representing the set of samples from cluster L during the overall sample clustering of encoder G, Representing the number of samples of the set resulting from the intersection operation, Representing the number of samples of the set resulting from the union operation, Sample set representing cluster J of cameras P i And (3) with And The camera variation difference value between them.
2. The method of claim 1, wherein the step of obtaining a first feature vector and a second feature vector for a plurality of different cameras, respectively, for a plurality of image samples, comprises: Acquiring a plurality of image samples, a first feature extraction model and a second feature extraction model which are constructed in advance and respectively correspond to a plurality of different cameras; Inputting a plurality of image samples of each camera into a first feature extraction model and a second feature extraction model, so that the first feature extraction model outputs a first feature vector corresponding to each image sample of each camera; The plurality of image samples of each camera are input into a second feature extraction model, which outputs a second feature vector corresponding to each image sample of the camera.
3. The method of claim 1, wherein the first feature extraction model and the second feature extraction model are constructed by: Acquiring a source domain data set; dividing the source domain data set to obtain a first training set and a second training set; Training a first preset model by using a first training set until the prediction precision of the model meets a first preset requirement to obtain a first feature extraction model; training a second preset model by using a second training set until the prediction precision of the model meets a second preset requirement to obtain a second feature extraction model.
4. A method according to claim 3, wherein the step of training the first and second feature extraction models, respectively, using each image data in the camera invariance dataset and the corresponding feature vector until a preset condition is satisfied, to obtain a pedestrian re-recognition feature extraction model, comprises: Dividing a camera invariance data set into a third training set and a fourth training set based on camera identifications of image samples of the camera invariance data set; training the first feature extraction model by using a third training set to obtain a third feature model; training the second feature model by using the third training set to obtain a fourth feature model; Training the third feature model by using the fourth training set until model loss reaches convergence to obtain a fifth feature model; training the fourth feature model by using the fourth training set until model loss reaches convergence to obtain a sixth feature model; and taking the fifth characteristic model and/or the sixth characteristic model as the pedestrian re-recognition characteristic extraction model.
5. A method of pedestrian re-identification, the method comprising: Acquiring image data to be identified and a pedestrian re-identification feature extraction model, wherein the pedestrian re-identification feature extraction model is constructed by the pedestrian re-identification feature extraction model construction method according to any one of claims 1 to 4; Inputting the image data to be identified into the pedestrian re-identification feature extraction model so that the pedestrian re-identification feature extraction model outputs feature vectors of the image data to be identified; Comparing the feature vector of the image data to be identified with feature vectors corresponding to different identities in a pre-constructed database to obtain a comparison result; and determining a pedestrian re-recognition result in the image data to be recognized based on the comparison result.
6. A pedestrian re-recognition feature extraction model construction apparatus, characterized in that the apparatus comprises: The first acquisition module is used for acquiring a first feature vector and a second feature vector of a plurality of different cameras, which correspond to a plurality of image samples respectively, wherein the first feature vector is obtained by carrying out feature extraction on the basis of the image samples corresponding to a pre-constructed first feature extraction model, and the second feature vector is obtained by carrying out feature extraction on the basis of the image samples corresponding to a pre-constructed second feature extraction model; The first clustering module is used for clustering the plurality of image sample data of each camera based on the first feature vectors of the plurality of image samples of each camera to obtain a plurality of first clustering clusters of the corresponding camera; The second clustering module is used for carrying out global clustering on the plurality of image sample data of different cameras based on the first feature vectors of the plurality of image samples respectively corresponding to the different cameras to obtain a plurality of second clustering clusters, and carrying out global clustering on the plurality of image sample data of different cameras based on the second feature vectors of the plurality of image samples respectively corresponding to the different cameras to obtain a plurality of third clustering clusters; The computing module is used for computing camera change difference values between each first cluster group, each second cluster group and each third cluster group of each camera; The construction module is used for constructing a camera invariance data set based on at least one target first cluster in a plurality of first cluster clusters respectively corresponding to different cameras, wherein the target first cluster is a first cluster with a camera change difference value larger than a preset threshold value; The training module is used for training the first feature extraction model and the second feature extraction model by utilizing each image data and corresponding feature vectors in the camera invariance data set until a preset condition is met, so as to obtain a pedestrian re-identification feature extraction model; The camera change difference values between each first cluster group and each second cluster group and each third cluster group of each camera are obtained through calculation through the following steps: based on three pseudo-tagged datasets from different feature spaces 、 And Providing diversity information according to clustering results of different feature spaces to calculate camera variation differences among clusters: Wherein, the Representing a set of samples from cluster J during intra-camera P i clustering, Representing the set of samples from cluster K during the encoder F ensemble sample clustering, Representing the set of samples from cluster L during the overall sample clustering of encoder G, Representing the number of samples of the set resulting from the intersection operation, Representing the number of samples of the set resulting from the union operation, Sample set representing cluster J of cameras P i And (3) with And The camera variation difference value between them.
7. A pedestrian re-identification device, the device comprising: The second acquisition module is used for acquiring image data to be identified and a pedestrian re-identification feature extraction model, and the pedestrian re-identification feature extraction model is constructed by the pedestrian re-identification feature extraction model construction method according to any one of claims 1 to 4; The first determining module is used for inputting the image data to be identified into the pedestrian re-identification feature extraction model so that the pedestrian re-identification feature extraction model outputs feature vectors of the image data to be identified; the comparison module is used for comparing the characteristic vector of the image data to be identified with the characteristic vectors corresponding to different identities in a pre-constructed database to obtain a comparison result; And the second determining module is used for determining a pedestrian re-recognition result in the image data to be recognized based on the comparison result.
8. A computer device, comprising: A memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the pedestrian re-recognition feature extraction model construction method of any one of claims 1 to 4 or to perform the pedestrian re-recognition method of claim 5.
9. A computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the pedestrian re-recognition feature extraction model construction method according to any one of claims 1 to 4 or to execute the pedestrian re-recognition method according to claim 5.
10. A computer program product comprising computer instructions for causing a computer to perform the pedestrian re-recognition feature extraction model construction method of any one of claims 1 to 4, or to perform the pedestrian re-recognition method of claim 5.

Description

Pedestrian re-recognition feature extraction model construction and pedestrian re-recognition method and device Technical Field The invention relates to the technical field of electronics, in particular to a pedestrian re-identification feature extraction model construction method and a pedestrian re-identification method and device. Background As one of the important applications of image retrieval, a pedestrian re-recognition task aims at retrieving a given target pedestrian image from among images captured by a plurality of non-overlapping cameras. Today, where urban monitoring systems are increasingly complete, pedestrian re-identification techniques can quickly retrieve and locate individual pedestrians among a large number of monitored images and accurately identify their identities. The advent of pedestrian re-identification technology provides powerful technical support for various fields such as social security and intelligent transportation. At present, a cross-domain pedestrian re-recognition task is realized based on a clustering algorithm, for a model pre-trained in a source domain, a pseudo tag is distributed for a target domain sample by executing the clustering algorithm to serve as 'identity information', and the 'identity information' is used for guiding fine adjustment and update of the model in the target domain, so that the model is adapted to the target domain sample style, and a pedestrian characteristic extraction model is obtained. The clustering-based method has stronger flexibility and generalization. Although the cross-domain pedestrian re-recognition research improves the model in adapting to different data domains, besides the influence of inter-domain differences on the cross-domain model, the problem of camera migration (CAMERA SHIFT) caused by different camera distributions in the same data domain also prevents the improvement of the performance of the model. Camera migration results in a model that is more prone to categorizing pedestrians from other identities under the same camera into the same identity than the same pedestrians under different cameras during the inference process. The sample characteristic difference among cameras caused by different camera parameters in the cross-domain pedestrian re-recognition task is overlarge, so that the accuracy of the finally obtained pedestrian characteristic extraction model is lower. Disclosure of Invention In view of the above, the invention provides a method and a device for constructing a pedestrian re-recognition feature extraction model and re-recognizing pedestrians, which are used for solving the problem that the accuracy of the finally obtained pedestrian feature extraction model is low due to overlarge sample feature differences among cameras caused by different camera parameters in a cross-domain pedestrian re-recognition task. The invention provides a method for constructing a pedestrian re-recognition feature extraction model, which comprises the steps of acquiring a first feature vector and a second feature vector of a plurality of different cameras, wherein the first feature vector is obtained by carrying out feature extraction on the basis of image samples corresponding to a pre-constructed first feature extraction model, and the second feature vector is obtained by carrying out feature extraction on the basis of image samples corresponding to a pre-constructed second feature extraction model; the method comprises the steps of obtaining a plurality of image sample data of a camera, carrying out clustering on the plurality of image sample data of the camera based on first feature vectors of the plurality of image samples of the camera to obtain a plurality of first clustering clusters corresponding to the camera, carrying out global clustering on the plurality of image sample data of the different camera based on first feature vectors of the plurality of image samples corresponding to the different camera to obtain a plurality of second clustering clusters, carrying out global clustering on the plurality of image sample data of the different camera based on second feature vectors of the plurality of image samples corresponding to the different camera to obtain a plurality of third clustering clusters, calculating camera change difference values between the first clustering clusters of the camera and the second clustering clusters and the third clustering clusters, constructing a camera invariance data set based on at least one target first clustering in the plurality of first clustering clusters corresponding to the different camera, wherein the target first clustering is the first clustering cluster with the camera change difference value larger than a preset threshold, and training the first feature extraction model and the second feature extraction model respectively by utilizing the image data in the camera invariance data set until a preset condition is met to obtain a pedestrian recognition feature extracti