CN-115188019-B - Pedestrian characteristic recognition method based on spatial relationship model of multi-region attention association

CN115188019BCN 115188019 BCN115188019 BCN 115188019BCN-115188019-B

Abstract

The invention discloses a pedestrian characteristic recognition method based on a spatial relationship model of multi-region attention association, which belongs to the technical field of characteristic extraction and comprises the following specific steps of (1) extracting global characteristics of pedestrian images; the invention can extract the deeper space dimension characteristics of human body, inhibit the interference of background information to human body to a great extent, and effectively ensure that the pedestrian re-identification achieves better effect.

Inventors

XIONG MINGFU
Xiong Jiefan
HE RUHAN

Assignees

武汉纺织大学

Dates

Publication Date: 20260512
Application Date: 20220428

Claims (3)

1. The pedestrian characteristic recognition method based on the spatial relation model of the multi-region attention association is characterized by comprising the following specific steps of: (1) The global features of the pedestrian images are extracted, namely the collected groups of pedestrian images are imported into ResNet networks, and global feature extraction is carried out on the groups of pedestrian images through ResNet networks; (2) Local feature acquisition is carried out on global features by adopting a multi-channel equipartition method, a spatial attention model is constructed, and feature enhancement is carried out on each extracted group of local features through the spatial attention model; (3) Constructing a space self-adaptive graph rolling model, introducing each group of local features into the space self-adaptive graph rolling model, then carrying out feature fusion on each group of local features, and carrying out space correlation on the fused local features; (4) Collecting the local features and global features which are finally acquired and have the same number as the original input number, and carrying out measurement calculation among different pedestrians based on the acquired features of each group; the specific steps of the local feature extraction in the step (1) are as follows: Carrying out feature rough extraction on a pedestrian image by ResNet network to obtain a W multiplied by H multiplied by C three-dimensional feature vector, wherein W and H respectively represent the width and the height of each local feature, and C represents the channel number of each local feature; Dividing the obtained characteristic map horizontally into a plurality of local characteristic areas according to different average division standards, and attributing the local characteristic areas obtained under the same standard to the same group; the specific steps of the feature enhancement in the step (2) are as follows: Collecting each group of local features after segmentation, and guiding each group of collected local features into a spatial attention model; The second step, the space attention model learns the attention mask by constructing an adjacency matrix, adopts an attention mechanism to extract the pedestrian interest area, and trains and enhances each group of local features according to the extracted information; The specific steps of the local feature fusion in the step (3) are as follows: S1.1 the space self-adaptive graph convolution model receives the attention weighted local features and constructs an input feature set Wherein B represents the number of local features in each group, and h represents the height of the spatial hierarchy of the local features; S1.2, converting each group of input feature sets through global maximum pooling operation to obtain the most remarkable features in the original local information, and then inputting the converted input feature sets into a submodule OVSR according to the B times; s1.3, enabling each local feature to initially learn information of other local features, dividing the converted input feature set into a reference feature and other neighbor features, carrying out global average pooling operation on the other neighbor features, and then carrying out convolution operation on upper and lower branches and splicing; S1.4, fusing the reference local features and the spliced association features by utilizing the thought of a residual error network to obtain local features enhanced by shallow relation; the specific steps of the spatial correlation in the step (3) are as follows: S2.1, performing dimension reduction compression on each group of local features enhanced by the shallow layer relationship, processing a pedestrian global image through a ResNet network and a space attention model to obtain global features, and importing the global features into a space self-adaptive graph rolling model; S2.2, carrying out global maximum pooling and dimension reduction compression processing on global features, taking each group of local feature information and global feature information as endpoints in a feature map relation, and carrying out amplification processing on each group of feature information to obtain a plurality of groups of adjacent relation matrixes; S2.3, calculating the difference value between each group of adjacent relation matrixes and the global information feature map, removing irrelevant interference according to the difference value, updating the corresponding adjacent matrixes, performing dimension reduction processing on the adjacent matrixes by taking absolute values, regularization and full connection to obtain adaptive adjacent matrixes, and introducing a predefined relation matrix to correct the adaptive adjacent matrixes, wherein the predefined relation matrix is as follows: (2) Wherein the relation matrix Corresponding elements are the relation information among different features; s2.4, updating the weight of the relation matrix through back propagation, multiplying the self-adaptive adjacent matrix with the pre-defined adjacent matrix to generate a final weight matrix, simultaneously obtaining a corresponding two-dimensional feature matrix through feature fusion of the original local information and the weighted feature information, and performing dimension-increasing processing on the two-dimensional feature matrix, wherein the specific dimension-increasing formula is as follows: (3) (4) Wherein, the , And The three full-connection layers are respectively made, 、 And outputting the final local features and the final global features of the self-adaptive graph rolling module respectively.
2. The pedestrian feature recognition method based on the spatial relationship model of multi-region attention association according to claim 1, wherein the specific calculation formula of the enhancement process of the attention mechanism to the local feature in the second step is as follows: (1) In the formula, wherein As a feature of the locality, A mask matrix corresponding to the assigned attention weight, Is expressed as a feature after attention enhancement.
3. The spatial relationship model pedestrian feature recognition method based on multi-region attention association of claim 1, wherein the metric calculation in step (4) has the specific formula: (5) In the formula, 、 Loss values representing the cross entropy loss and the triplet loss of the ith feature, respectively.

Description

Pedestrian characteristic recognition method based on spatial relationship model of multi-region attention association Technical Field The invention relates to the technical field of feature extraction, in particular to a pedestrian feature recognition method based on a spatial relationship model of multi-region attention association. Background In recent years, a feature extraction method combining local and global is taken as a popular trend, great attention is paid to the field of pedestrian re-identification, good effects are achieved, but it is not difficult to find that the division of local features under the method is always single standard division, and the single standard division often causes the local features to lack the diversity of partition groups, so that different pedestrians with similar attributes are confused. Moreover, the existing research proves that the construction of the spatial relationship among the local features can enhance the expression effect of the pedestrian features, and therefore, the pedestrian feature recognition method based on the spatial relationship model of the multi-region attention association is provided. Disclosure of Invention The invention aims to solve the defects in the prior art, and provides a pedestrian characteristic recognition method based on a spatial relationship model of multi-region attention association. In order to achieve the above purpose, the present invention adopts the following technical scheme: The pedestrian characteristic recognition method based on the spatial relation model of the multi-region attention association comprises the following specific steps: (1) The global features of the pedestrian images are extracted, namely the collected groups of pedestrian images are imported into ResNet networks, and global feature extraction is carried out on the groups of pedestrian images through ResNet networks; (2) Local feature acquisition is carried out on global features by adopting a multi-channel equipartition method, a spatial attention model is constructed, and feature enhancement is carried out on each extracted group of local features through the spatial attention model; (3) Constructing a space self-adaptive graph rolling model, introducing each group of local features into the space self-adaptive graph rolling model, then carrying out feature fusion on each group of local features, and carrying out space correlation on the fused local features; (4) And (3) carrying out measurement calculation according to each group of characteristic information, namely collecting the finally obtained local characteristics and global characteristics which are the same as the original input number, and carrying out measurement calculation among different pedestrians based on each obtained group of characteristics. As a further aspect of the present invention, the local feature extraction in step (1) specifically includes the following steps: Carrying out feature rough extraction on a pedestrian image by ResNet network to obtain a W multiplied by H multiplied by C three-dimensional feature vector, wherein W and H respectively represent the width and the height of each local feature, and C represents the channel number of each local feature; and secondly, horizontally dividing the obtained characteristic map into a plurality of local characteristic areas according to different average division standards, and attributing the local characteristic areas obtained under the same standard to the same group. As a further aspect of the present invention, the specific step of feature enhancement in step (2) is as follows: Collecting each group of local features after segmentation, and guiding each group of collected local features into a spatial attention model; And secondly, the spatial attention model learns an attention mask by constructing an adjacency matrix, adopts an attention mechanism to extract a pedestrian interest region, and trains and enhances each group of local features according to the extracted information. As a further aspect of the present invention, the specific calculation formula of the attention mechanism for the local feature enhancement process in the second step is as follows: Wl′＝Wl⊙M (1) Wherein W l is a local feature, M is a mask matrix which is correspondingly assigned with attention weights, and W l′ is a feature expression after attention enhancement. As a further aspect of the present invention, the local feature fusion in step (3) specifically includes the following steps: s1.1, receiving each group of local features weighted by attention by a space self-adaptive graph convolution model, and constructing an input feature set V lin∈RB×C×W×h, wherein B represents the number of the local features in each group, and h represents the height of a space level of the local features; S1.2, converting each group of input feature sets through global maximum pooling operation to obtain the most remarkable features in the original local