CN-122023213-A - Data enhancement method, framework and application thereof in image retrieval
Abstract
The invention discloses a data enhancement method, a framework and application thereof in image retrieval, and belongs to the field of computer vision and information retrieval. The method comprises the steps of obtaining an enhanced sample with quantifiable semantic loss by discarding part of image information in a data enhancement stage, carrying out semantic retention estimation in the data enhancement stage to obtain a retention estimation factor, and integrating the obtained retention estimation factor into a loss function to enable constraint of measurement loss on the enhanced sample to be corrected according to the semantic retention. The method and the device can automatically evaluate the semantic retention of the image after the enhancement operation, correct the measurement mismatch caused by the enhancement, reduce the pseudo-feature interference, improve the hash retrieval precision, keep the model structure simple, and have wide applicability.
Inventors
- QIAN YURONG
- LU YI
- LIU HE
- FENG MEI
- LI XIAOHAN
- LI MENGQIAN
- ZHAO KAI
- SU GUOFANG
- Tohti Baike Tuohe Tuoxun
Assignees
- 怀柔实验室新疆研究院
Dates
- Publication Date
- 20260512
- Application Date
- 20260123
Claims (10)
- 1. A method of data enhancement, comprising the steps of: step S01, obtaining an enhanced sample with quantifiable semantic loss by discarding the image part information in a data enhancement stage; Step S02, semantic retention estimation is carried out in a data enhancement stage, and a retention estimation factor e is obtained; And S03, merging the obtained retention estimation factor e into a loss function, so that the constraint of the measurement loss on the enhanced sample is corrected according to the semantic retention.
- 2. The data enhancement method according to claim 1, wherein in step S01, a model-independent translation discarding method is adopted, a random translation is performed on the input image i, the pixels are translated as a whole, and the portion beyond the image boundary is discarded.
- 3. A data enhancement method according to claim 2, characterized in that the width displacement distance Δx, Δx = σ is controlled by sampling point pixel position coordinates (δx, δy) and a scaling coefficient σ t t (2δx 1) W, wherein σ t is in the (0, 1) range, where w is the image width, and the height is the same.
- 4. The data enhancement method according to claim 3, wherein in step S01, when a model-independent translation discarding method is adopted, the geometric area ratio is used as a retention estimation factor e, and the retention estimation factor e is calculated by the following manner: Where Δy is the height displacement distance.
- 5. The data enhancement method according to claim 1, wherein in step S01, a model-aware cut discarding method is adopted, firstly, the saliency distribution Att of the image is calculated according to the existing model, the image is divided into n×m patch matrices, and a plurality of image blocks patches are randomly discarded in a high saliency region according to the saliency distribution Att and a set saliency threshold, wherein n and m correspond to the width and height dimensions of the model specific layer feature map respectively.
- 6. The data enhancement method according to claim 5, wherein in the model-aware cut-off discarding method, the saliency profile Att is generated by: If the backbone of the model is a convolutional neural network CNN, directly summing the output feature images of the appointed network layer to construct an approximate class activation image, and then normalizing the approximate class activation image by using a Softmax function to obtain significance distribution Att perceived by the model to different image areas; if the model is ViT networks, the dot product similarity of the token and the class token CLS token is directly used to obtain the attention of the patch level, and the saliency distribution Att is obtained by normalizing the dot product similarity by using a Softmax function.
- 7. The data enhancement method according to claim 5, wherein in step S01, when a model-aware cut-off discarding method is adopted, the attention-weighted ratio is used as a retention estimation factor e, and the retention estimation factor e is calculated as follows: Let the saliency distribution Att be the patch level attention matrix, let the excised token set be Att cut , and the reserved set be Att remain , then the reserved estimation factor e is calculated by the following formula: 。
- 8. the data enhancement method according to claim 1, wherein step S03 specifically includes: let S O be the similarity matrix calculated by the original label, and B be the hash code matrix, to obtain the likelihood term of the retention estimation factor e : Wherein the method comprises the steps of Is similar to sample m after the current calculation of sample k the a-th execution of data enhancement, Is the retention estimation factor after the current calculation sample k a-th execution of data enhancement, For similarity of sample k to sample m, σ (·) is the Sigmoid activation function; Taking likelihood terms of retention estimation factor e To obtain semantically weighted soft Bernoulli likelihood loss functions The loss function is calculated As a modified metrology target: and when the deep hash model is trained, guiding the deep learning process by using the corrected measurement target.
- 9. A data enhancement framework, for implementing the data enhancement method of any one of claims 1-8, comprising a data enhancement module, a semantic estimation module, and a loss correction module; The data enhancement module is used for obtaining an enhancement sample with quantifiable semantic loss by discarding the image part information in a data enhancement stage; the semantic estimation module is used for carrying out semantic retention estimation in a data enhancement stage to obtain a retention estimation factor e; And the loss correction module is used for integrating the obtained retention estimation factor e into a loss function, so that the constraint of the measurement loss on the enhanced sample is corrected according to the semantic retention.
- 10. Use of a data enhancement method as claimed in any one of claims 1-8 in image retrieval.
Description
Data enhancement method, framework and application thereof in image retrieval Technical Field The invention belongs to the field of computer vision and information retrieval, and particularly relates to a data enhancement method, a framework and application thereof in image retrieval, in particular to a data enhancement method, a framework and application thereof in image retrieval based on semantic retention estimation. Background In the field of deep learning image retrieval, a deep hash (DEEP HASHING) method has become the dominant technical route. The method maps the high-dimensional image features to compact binary hash codes through the end-to-end neural network model, thereby realizing efficient similar image retrieval. To improve the robustness and generalization ability of the model, researchers have commonly employed data enhancement (Data Augmentation) means, such as rotation, scaling of color perturbations, and the like. These enhancement methods alleviate the model overfitting problem by generating variant samples that are semantically identical to the original sample, expanding the training data distribution. The current mainstream depth hash image retrieval method at home and abroad mainly relies on the following 3 types of data enhancement strategies, single sample enhancement (Single-sample Augmentation) comprises random cutting, horizontal overturning, brightness adjustment, rotation, scaling and the like, and the data diversity is enhanced by carrying out geometric or color transformation on the Single sample. Multisampled hybrid enhancement (Multi-sample Augmentation), typically representing a method such as Mixup, manifold Mixup, fuses two images and their labels by linear interpolation, thereby introducing a continuous intermediate domain in the feature space. Based on enhancement of the depth model (ADVERSARIAL/Deep Augmentation), such as countermeasure training, generating countermeasure enhancement, etc., image samples with perturbations are generated through a neural network to improve model robustness. Although the enhancement method has good effects in tasks such as classification, the enhancement method still has obvious defects in image retrieval, especially in deep hash retrieval tasks, and the enhancement method is specifically expressed in the following steps: 1. semantic misalignment problem The existing enhancement method generally assumes that the enhanced image semantics are completely consistent with the original image (i.e. the label remains unchanged). However, in an image retrieval scenario, such an assumption is often not true. For example, when an image is rotated or partially cropped, its visual features and semantic expressions have changed, but the model is still forced to learn "semantically identical", resulting in a false constraint of the distance distribution between samples in hash space. 2. Measuring target mismatch Metric Loss of deep hashing (e.g., pairwise Loss, triple Loss) relies on a tag similarity matrix to guide feature clustering. When the data enhancement brings semantic drift, the tag matrix still remains unchanged, so that a model falsely draws in a semantically inconsistent sample in the optimization process, the distinguishability of a hash space is destroyed, and the retrieval precision is reduced. 3. Enhancing spurious feature interference In large-scale training, the model is easy to learn and enhance the pseudo features (such as rotating edges, scaling noise points and the like) introduced in the process as a retrieval basis, so that the model tends to return images with similar pseudo features instead of semantic similarity in actual retrieval, and the reliability of a retrieval result is obviously reduced. 4. Difficult to quantify semantic changes The traditional enhancement method cannot evaluate the influence of each enhancement operation on the semantic retention of the sample, and lacks a mechanism to judge how much original semantic information the enhanced sample still retains, so that dynamic weight adjustment cannot be performed in loss calculation. The data enhancement commonly used in the existing image retrieval (especially deep hash) training assumes that the enhancement does not change labels, but the enhancement can cause semantic drift, mismatch or introduce pseudo features in the retrieval task, so that the measurement loss can falsely pull the semantically inconsistent samples, thereby reducing the retrieval precision. Disclosure of Invention In order to solve the technical problems in the prior art, the invention provides a data enhancement method, a framework and application thereof in image retrieval, and aims to solve the problems of semantic mismatching and pseudo feature bias caused by traditional enhancement in deep hash retrieval. The semantic retention of each enhanced sample is calculated by intentionally randomly discarding the image portion content during enhancement and introducing a semantic change estima