Search

CN-115934982-B - Feature vector determining method for global image representation, image searching method and device

CN115934982BCN 115934982 BCN115934982 BCN 115934982BCN-115934982-B

Abstract

The invention discloses a feature vector determining method, an image searching method and a device for global representation of an image, wherein the gradient of parameters of a computing model is used as a feature, a fourth-order second tensor formed by each gradient is determined, a weight value of a corresponding position in a weight value matrix is determined based on each gradient in a second-order tensor of each position in the second tensor, the feature value matrix is subjected to feature aggregation to form a first feature vector of the global representation of the image on a convolution layer, and the first feature vector corresponding to each convolution layer is subjected to splicing processing to obtain a target feature vector of the global representation of a target image, so that the image searching can be realized based on the feature vector of the global representation of the image.

Inventors

  • YU HAN
  • TANG BANGJIE
  • PAN HUADONG
  • YIN JUN

Assignees

  • 浙江大华技术股份有限公司

Dates

Publication Date
20260512
Application Date
20221214

Claims (11)

  1. 1. A method of feature vector determination for a global representation of an image, the method comprising: Obtaining a target image, inputting the target image into a feature extraction model to obtain an output first feature image, converting the target image into a third-order first tensor according to the pixel value of each position in the first feature image, inputting the first tensor into a network model pooling layer to carry out pooling treatment to obtain an output target vector; For each convolution layer set in the feature extraction model, determining each gradient of a corresponding position in a fourth-order second tensor based on the target loss value and the position of a parameter of each row and each column in a parameter matrix of each channel of the convolution layer, determining a weight value of the corresponding position in a weight value matrix based on each gradient in a second-order tensor of each position in the second tensor, and calculating the average value of each column in the weight value matrix to obtain a first feature vector of global representation of the convolution layer; Performing splicing processing on the first feature vector corresponding to each convolution layer to obtain a target feature vector of global representation of the target image; the weight matrix is a second-order matrix, and the first feature vector and the target feature vector are row vectors; the determining the weight value of the corresponding position in the weight value matrix based on each gradient of the second order tensor of each position in the second tensor comprises: And determining the absolute value of the gradient in the second tensor as the weight value of the corresponding position if the row or the column of the second tensor is a preset value, determining the difference value of the maximum gradient and the minimum gradient in the second tensor if the row or the column of the second tensor is a non-preset value, determining the sum value of the difference value and the standard difference value of the gradient as the weight value of the corresponding position, and forming a weight matrix according to the weight value corresponding to the second tensor of each position.
  2. 2. The method of claim 1, wherein the converting to a third-order first tensor according to the pixel value of each position in the first feature map includes: aiming at each channel feature map in the first feature map, taking the pixel value of each pixel point as the element value of the corresponding position in the matrix of the channel feature map according to the position of each pixel point in the channel feature map; according to the arrangement sequence of the feature maps formed by the feature maps of each channel, arranging the matrix corresponding to the feature map of each channel according to the arrangement sequence in the direction perpendicular to the plane of the matrix corresponding to the feature map of each channel, and obtaining the first tensor of the third order through conversion of the first feature map.
  3. 3. The method according to claim 1, wherein after the converting, according to the pixel value of each position in the first feature map, the first tensor is input to a network model pooling layer for pooling, and before obtaining the output target vector, the method further includes: determining a sum matrix of each matrix according to each matrix in the first tensor, and performing binarization processing on the sum matrix to obtain a Boolean matrix after binarization processing; Copying the Boolean matrix, and arranging the Boolean matrix in a direction perpendicular to a plane where the Boolean matrix is positioned to obtain a third tensor of a third order, which is the same as the first tensor in size; and carrying out point-to-point multiplication on the first tensor and the third tensor to obtain an updated first tensor.
  4. 4. A method according to claim 3, wherein inputting the first tensor into a network model pooling layer for pooling, and obtaining the output target vector comprises: inputting the updated first tensor into a network model pooling layer, carrying out global average pooling treatment on the updated first tensor to obtain a first vector after global average pooling treatment, and carrying out global maximum pooling treatment on the updated first tensor to obtain a second vector after global maximum pooling treatment; and performing splicing processing on the first vector and the second vector, and taking the spliced vector as a target vector.
  5. 5. The method of claim 4, wherein the performing global average pooling on the updated first tensor to obtain a first vector after global average pooling, and the performing global maximum pooling on the updated first tensor to obtain a second vector after global maximum pooling, further comprises: And determining the ratio of the second quantity to the first quantity according to the first quantity of element points with element values being set numerical values in the Boolean matrix and the second quantity of element points contained in each channel characteristic diagram in the first characteristic diagram, and multiplying the ratio by the first vector to obtain an updated first vector.
  6. 6. The method of claim 1, wherein before the splicing the first feature vector corresponding to each convolution layer, the method further comprises: and carrying out L2 normalization processing on the first eigenvectors corresponding to each convolution layer to obtain normalized first eigenvectors and carrying out splicing processing on the normalized first eigenvectors.
  7. 7. An image retrieval method according to the feature vector determination method of the global representation of an image according to any one of claims 1-6, characterized in that the method comprises: Executing the feature vector determining method of the global representation of the image to be searched and each image in a search library to obtain a second feature vector of the image to be searched and each third feature vector of each image; and determining the similarity of the second feature vector and each third feature vector according to the second feature vector and each third feature vector, and sequencing each image according to the sequence from the high similarity to the low similarity.
  8. 8. A feature vector determination apparatus for a global representation of an image, the apparatus comprising: The acquisition module is used for acquiring a target image and inputting the target image into the feature extraction model to obtain an output first feature map; The determining module is used for converting the pixel value of each position in the first feature map to obtain a third-order first tensor, inputting the first tensor into a network model pooling layer for pooling processing to obtain an output target vector; determining a target loss value according to the element value of each element in the target vector; for each convolution layer set in the feature extraction model, determining each gradient of the corresponding position in a second tensor of a fourth order based on the target loss value and the position of the parameter of each row and each column in the parameter matrix of each channel of the convolution layer, determining the weight value of the corresponding position in the weight matrix based on each gradient in the second tensor of each position in the second tensor, calculating the mean value of each column in the weight matrix to obtain a first feature vector of the global representation of the convolution layer, performing stitching processing on the first feature vector corresponding to each convolution layer to obtain a target feature vector of the global representation of the target image, wherein the weight matrix is a second order matrix, the first feature vector and the target feature vector are row vectors, determining the weight value of the corresponding position in the weight matrix based on each gradient of the second order tensor of each position in the second tensor comprises determining the weight value of each position in the second tensor based on the second order tensor, if the first feature vector is a preset gradient and the difference value is the corresponding position in the first order tensor, determining that the difference value is the absolute value, and if the difference value is the corresponding to the position in the first order and the first order gradient is determined that the difference value is the absolute value and the difference value is the maximum, and forming a weight value matrix according to the weight value corresponding to the second-order tensor of each position.
  9. 9. An image retrieval apparatus, the apparatus comprising: a determining module, configured to perform the steps of the feature vector determining method for global image representation according to any one of claims 1 to 6 on an image to be retrieved and on each image in a retrieval library, to obtain a second feature vector of the image to be retrieved and each third feature vector of each image; and the retrieval module is used for determining the similarity of the second feature vector and each third feature vector according to the second feature vector and each third feature vector, and sequencing each image according to the sequence from the high similarity to the low similarity.
  10. 10. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; the memory has stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the feature vector determination method of an image global representation as claimed in any one of claims 1-6 or to perform the steps of the image retrieval method as claimed in claim 7.
  11. 11. A computer readable storage medium, characterized in that it stores a computer program executable by a processor, which when run on the processor causes the processor to perform the steps of the feature vector determination method of the global representation of an image according to any one of claims 1-6 or the steps of the image retrieval method according to claim 7.

Description

Feature vector determining method for global image representation, image searching method and device Technical Field The present invention relates to the field of image analysis and image retrieval technologies, and in particular, to a feature vector determining method for global image representation, an image retrieval method and an image retrieval device. Background The fine-grained image retrieval has wide application in academic and industrial fields, for example, the fine-grained image retrieval technology can be adopted in the fields of endangered field animal and plant protection, commodity retrieval of e-commerce websites and the like, and application scenes such as face verification, pedestrian re-recognition and the like can be regarded as a generalized fine-grained image retrieval problem. However, the accuracy of image retrieval based on the feature vector of the image global representation determined by the prior art is low, so how to determine the feature vector of the new image global representation, and improving the accuracy of image retrieval becomes a technical problem to be solved. Disclosure of Invention The invention provides a feature vector determining method, an image searching method, a device, equipment and a medium for image global representation, which are used for solving the problem that the accuracy of image searching by the feature vector of the image global representation determined in the prior art is lower. The invention provides a feature vector determining method for global image representation, which comprises the following steps: Obtaining a target image, inputting the target image into a feature extraction model to obtain an output first feature image, converting the target image into a third-order first tensor according to the pixel value of each position in the first feature image, inputting the first tensor into a network model pooling layer to carry out pooling treatment to obtain an output target vector; For each convolution layer set in the feature extraction model, determining each gradient of a corresponding position in a fourth-order second tensor based on the target loss value and the position of a parameter of each row and each column in a parameter matrix of each channel of the convolution layer, determining a weight value of the corresponding position in a weight value matrix based on each gradient in a second-order tensor of each position in the second tensor, and calculating the average value of each column in the weight value matrix to obtain a first feature vector of global representation of the convolution layer; and performing splicing processing on the first feature vector corresponding to each convolution layer to obtain a target feature vector of global representation of the target image. Further, the weight matrix is a second-order matrix, and the first feature vector and the target feature vector are row vectors. Further, the determining the weight value of the corresponding position in the weight value matrix based on each gradient of the second order tensor of each position in the second tensor comprises: And determining the absolute value of the gradient in the second tensor as the weight value of the corresponding position if the row or the column of the second tensor is a preset value, determining the difference value of the maximum gradient and the minimum gradient in the second tensor if the row or the column of the second tensor is a non-preset value, determining the sum value of the difference value and the standard difference value of the gradient as the weight value of the corresponding position, and forming a weight matrix according to the weight value corresponding to the second tensor of each position. Further, the converting to obtain a third-order first tensor according to the pixel value of each position in the first feature map includes: aiming at each channel feature map in the first feature map, taking the pixel value of each pixel point as the element value of the corresponding position in the matrix of the channel feature map according to the position of each pixel point in the channel feature map; according to the arrangement sequence of the feature maps formed by the feature maps of each channel, arranging the matrix corresponding to the feature map of each channel according to the arrangement sequence in the direction perpendicular to the plane of the matrix corresponding to the feature map of each channel, and obtaining the first tensor of the third order through conversion of the first feature map. Further, after the third-order first tensor is obtained by converting the pixel value of each position in the first feature map, the first tensor is input to a network model pooling layer for pooling processing, and before the output target vector is obtained, the method further includes: determining a sum matrix of each matrix according to each matrix in the first tensor, and performing binarization processing on the sum