CN-121786886-B - Multi-source data privacy protection retrieval method based on lightweight self-adaptive mechanism

CN121786886BCN 121786886 BCN121786886 BCN 121786886BCN-121786886-B

Abstract

A multi-source data privacy protection retrieval method based on a lightweight self-adaptive mechanism relates to the technical field of image retrieval, and achieves cross-domain semantic efficient migration and low-dimensional binary hash code generation by a three-level architecture of visual-language semantic alignment, lightweight adapter tuning and privacy protection hash code learning, and privacy security, retrieval accuracy and training efficiency are considered. The method solves the problems of low retrieval performance, high model adaptation cost and the like caused by source domain data privacy disclosure and domain offset in the existing cross-domain image retrieval technology.

Inventors

CUI HUI
ZHAO ZUOYU
HAN XIAOHUI
WANG PEIPEI

Assignees

齐鲁工业大学(山东省科学院)
山东省计算中心（国家超级计算济南中心）

Dates

Publication Date: 20260512
Application Date: 20260303

Claims (10)

1. A multi-source data privacy protection retrieval method based on a lightweight self-adaptive mechanism is characterized by comprising the following steps: s1, acquiring a source domain data set And a target domain dataset , , Is the first The image of the individual source domain is displayed, , For the number of source domain images, , Is the first The image of the individual object domain(s), Is the first Individual target domain images The corresponding category label is used for the purpose of identifying, , The number of target domain images; s2, collecting the target domain data set Dividing into training sets And test set ; S3, constructing a visual-language model composed of an image encoder of VLMs model, a text encoder of VLMs model, a classifier, an adapter and a hash encoder; s4, training the training set The first of (3) Individual target domain images Input to image encoder of VLMs model of visual-language model, output to obtain visual characteristic Training set The first of (3) Individual target domain images Corresponding label Input to text encoder of VLMs model of visual-language model, output to obtain the first Individual class of tags , , The number of categories in the target domain dataset; S5, according to the label Calculating to obtain initial pseudo tag ; S6, visual characteristics Inputting into classifier of visual-language model to obtain pseudo tag ; S7, according to the initial pseudo tag Pseudo tag Calculating to obtain the first Individual target domain images Confidence of (1) ; S8, according to the confidence level From training set Intermediate screening to obtain reliable subsets ; S9, reliable subset The first of (3) Individual target domain images Input into an adapter of a vision-language model, and output to obtain a pseudo tag Training a vision-language model; s10, utilize the first Personal source domain image Pre-training the vision-language model, and obtaining an image retrieval result by utilizing a hash encoder of the pre-trained vision-language model 。
2. The multi-source data privacy protection retrieval method based on the lightweight adaptive mechanism as claimed in claim 1, wherein in the step S1, one field is selected from Amazon (A) field or Webcam (W) field or DSLR (D) field of the Office-31 data set as a source field data set, the other two fields are selected from Art (A) field or Clipart (C) field or Product (P) field or Real-World (R) field of the Office-Home data set as a source field data set, the other three fields are selected from Clipart (C) field or packing (P) field or Real (R) field of the DomainNet-126 data set as a source field data set, and the other three fields are selected from Clipart (C) field or Real (R) field or Sketch (S) field as a target field data set.
3. The method for protecting and retrieving multi-source data privacy based on lightweight adaptive mechanisms as recited in claim 1, wherein the step S2 is to set the target domain data Dividing the training set into training sets according to the proportion of 9:1 And test set 。
4. The multi-source data privacy protection retrieval method based on the lightweight adaptive mechanism as claimed in claim 1, wherein step S5 comprises the steps of: S5-1. Through the formula Calculating to obtain training set The first of (3) Individual target domain images Belonging to the first Probability of individual categories In which, in the process, In order to calculate the cosine similarity, Is a temperature parameter; S5-2, selecting all The category corresponding to the maximum value in the probabilities of the categories is used as an initial pseudo tag 。
5. The multi-source data privacy protection retrieval method based on the lightweight adaptive mechanism as claimed in claim 1, wherein step S6 includes the steps of: s6-1, a classifier of the vision-language model is sequentially formed by linear layers; S6-2, visual characteristics Input into a classifier of a vision-language model to obtain the first Individual target domain images Belonging to the first Probability of individual categories ; S6-3, selecting all The category corresponding to the maximum value in the confidence of each category is used as a pseudo tag 。
6. The method for protecting and retrieving multi-source data privacy based on lightweight adaptive mechanisms as claimed in claim 5, wherein in step S7, the formula is passed through Calculating to obtain the first Individual target domain images Confidence of (1) 。
7. The multi-source data privacy protection retrieval method based on the lightweight adaptive mechanism as claimed in claim 1, wherein step S8 includes the steps of: S8-1. Traversing initial pseudo tag Pseudo tag Statistics of the first The number of predictions for each category is ; S8-2. Through the formula Calculating to obtain the first Class frequency of individual classes ; S8-3. Through the formula Calculating to obtain the first Number of samples of each category In the following , ; S8-4 training set in step S8 All confidence levels in the list are sorted in descending order, before selection The target domain images corresponding to the confidence degrees form a reliable subset , , , Is a reliable subset Middle (f) The image of the individual object domain(s), , Is the first Individual target domain images A corresponding category label.
8. The multi-source data privacy protection retrieval method based on the lightweight adaptive mechanism as claimed in claim 1, wherein step S9 includes the steps of: s9-1, an adapter of the vision-language model is composed of a first projection layer, a ReLU activation function, a second projection layer and a residual error connection unit; s9-2. Subset to be reliable Middle (f) Individual target domain images Input to image encoder of VLMs model of visual-language model, output to obtain visual characteristic ; S9-3, visual characteristics Input into a first projection layer of the adapter, and output to obtain the dimension-reducing characteristic ; S9-4 feature of dimension reduction Input to the ReLU activation function of the adapter, output gets the activation feature ; S9-5 activating feature Inputting the residual error into a second projection layer of the adapter, and outputting to obtain a characteristic residual error ; S9-6. Will be Individual target domain images Corresponding visual features Activation features Residual of characteristics Input to the differential connection unit of the adapter by the formula Calculating to obtain modulation characteristics ; S9-7. Find modulation characteristics And the first Individual target domain images Corresponding first Individual class of tags The calculated cosine similarity is input into a Softmax function and output to obtain the first Individual target domain images Belonging to the first Probability of individual categories Selecting all The category corresponding to the maximum value in the probabilities of the categories is taken as a pseudo tag ; S9-8. Calculate the first Probability of individual categories And the first Individual target domain images Corresponding class labels Using an Adam optimizer, training a visual-language model with the cross entropy penalty.
9. The multi-source data privacy protection retrieval method based on the lightweight adaptive mechanism as claimed in claim 8, wherein step S10 includes the steps of: S10-1. Will be Personal source domain image Input to image encoder of VLMs model of visual-language model, output to obtain visual characteristic ; S10-2, visual characteristics Input into a classifier of a vision-language model to obtain the first Personal source domain image Belonging to the first Probability of individual categories Finish the pre-training of the vision-language model; S10-3, taking the weight parameters of the linear layer of the classifier of the pre-trained vision-language model as the weights of the classifier Weighting the classifier Inputting the modulation weight into an adapter of a pre-trained vision-language model, and taking the modulation weight as a source domain category center ; S10-4, a hash encoder of the visual-language model sequentially comprises a first linear layer, a BN layer, a second linear layer and a Tanh activation function, and the modulation characteristics are obtained Input to a hash encoder of a visual-language model, output resulting in a continuous hash representation Successive hash representations Binarization processing is carried out to obtain an image retrieval result 。
10. The multi-source data privacy preserving retrieval method based on a lightweight adaptive mechanism as claimed in claim 9, further comprising the step of, after step S10, performing the steps of: S11-1, centering the source domain category Normalization processing is carried out to obtain a normalized source domain class center ; S11-2. Through the formula Calculating to obtain a target class center In which, in the process, To indicate the function when In the time-course of which the first and second contact surfaces, When (when) In the time-course of which the first and second contact surfaces, Centering the object class Normalization processing is carried out to obtain a normalized target class center ; S11-3 calculating a Source Domain class center With the center of the object class Contrast loss of (2) Respectively calculating successive hash representations And image retrieval results Quantization loss of (2) And semantic similarity penalty ; S11-4. Through the formula Calculating to obtain total loss In which, in the process, 、、 All are weights, and an Adam optimizer is used to utilize the total loss Training a vision-language model.

Description

Multi-source data privacy protection retrieval method based on lightweight self-adaptive mechanism Technical Field The invention relates to the technical field of image retrieval, in particular to a multi-source data privacy protection retrieval method based on a lightweight self-adaptive mechanism. Background In an actual cross-domain image retrieval scenario, the contradiction between data privacy protection and knowledge migration is increasingly prominent. In the retrieval application in reality, the training data and the query image are often distributed differently under the influence of factors such as image acquisition equipment, illumination conditions, shooting view angles and the like, and even if the training data and the query image correspond to the same semantic category, the image representations of different domains are also significantly different, and the domain offset problem makes it difficult for the traditional image retrieval method assuming the same distribution of the data to obtain an ideal effect in a cross-domain scene. In recent years, cross-domain hash technology gradually becomes an efficient image retrieval solution, and through learning low-dimensional and discriminant binary hash codes, cross-domain semantic alignment can be achieved, and storage and calculation efficiency of large-scale retrieval tasks can be greatly improved. However, existing cross-domain hash methods generally require access to source domain data or model parameters, which not only carries the risk of source domain information disclosure, but also limits their applicability in passive or privacy-sensitive scenarios. In sensitive applications such as medical images, the source domain data often contains privacy attributes such as patient identity, diagnostic labels and the like, which cannot be directly shared, and more importantly, the source domain model may be finely tuned by using the target domain data to unintentionally reveal sensitive information encoded in the source model, so that the source domain model is easily subjected to inversion attack or reasoning attack. Therefore, on the premise of not accessing source domain data and exposing model parameters, realizing cross-domain semantic migration and efficient retrieval becomes a key problem to be solved. Currently, privacy-preserving cross-domain image retrieval faces two major core challenges. On the one hand, in the case that the source domain data cannot be accessed, the target domain lacks a direct alignment reference, and domain invariant semantics are difficult to learn. Although some approaches attempt to make up for this gap using structure encoded in the source model or statistical priors, such as SFDAH-MA feature alignment through architecture and semantic distribution, such indirect dependencies can still reveal domain-specific information, creating accurate semantic correspondence under strict data-to-model isolation constraints remains a difficult problem. On the other hand, the traditional domain self-adaptive method realizes knowledge migration by fine tuning a pre-training source model, and can access source model parameters without limit by default, but in a privacy sensitive scene, the strategy can not only expose a source domain sensitive mode, but also increase training cost, and the development of a lightweight and safe adaptation technology without fine tuning the source model becomes an important research direction. From the development of related technology, the cross-domain image retrieval method is mainly divided into two types, namely feature space adaptation and image domain conversion. The feature space adaptation method projects the features of two domains into a shared subspace, combines metric learning and domain alignment loss to reduce domain difference, for example ProtoOT initializes a class prototype through a K-means value and performs contrast alignment, ACCA processes label noise through self-adaptive weight and updates a class center, the image domain conversion method converts an image of one domain into another domain style, the cross-domain task is converted into a single domain retrieval problem, for example, a ZS-SBIR utilizes a variation automatic encoder to realize the shared subspace embedding of a sketch and a photo, and cGAN-SBIR generates a vivid photo from the sketch through a conditional generation countermeasure network to support retrieval. Along with the scale expansion of the data set, ha Xiji methods are widely applied to cross-domain scenes due to high efficiency, deDAHA, GTH, DANCE, CPH and other methods are combined with advantages of binary coding and domain adaptation, but the methods generally depend on source domain data training, and privacy leakage risks exist. Passive domain adaptation (SFDA) paradigms have received a great deal of attention for their utility in privacy-sensitive scenarios, which aim to adapt a pre-trained source model to a label-free target domain without a