CN-121278775-B - Multi-mode training data desensitization method

CN121278775BCN 121278775 BCN121278775 BCN 121278775BCN-121278775-B

Abstract

The application discloses a multi-mode training data desensitization method, which comprises the steps of firstly carrying out structural pretreatment on original multi-mode data, then generating text weighted image features, image weighted text features and key cross-mode attention map through cross-mode feature extraction and collaborative attention fusion, carrying out entity identity priori detection by creatively combining a knowledge map and utilizing Bayesian inference to carry out self-adaptive collaborative prediction on sensitivity based on the attention map and a privacy policy library, and finally generating a refined desensitization plan through variation optimization to realize accurate desensitization on the original data. By the method, the defects of inconsistent desensitization and 'one-cut' existing in the aspect of multi-mode data privacy protection in the prior art are overcome, the sensitive information is fully protected, meanwhile, the intrinsic value and training usability of the data are reserved to the maximum extent, and a high-quality, compliant and valuable training data set is provided for the artificial intelligent large model.

Inventors

SONG YIFAN
LIU HAN
Yan lijing
LI DINGDING
Jiao Qidi
YANG YING
Lun di
WANG ZHIYING
ZHU YING

Assignees

国网河南省电力公司信息通信分公司
国网河南省电力公司

Dates

Publication Date: 20260512
Application Date: 20251024

Claims (5)

1. A method of desensitizing multimodal training data comprising: Acquiring original multi-modal data; carrying out structural preprocessing on the original multi-mode data to obtain an image block sequence and a text token sequence; performing cross-modal feature extraction and collaborative attention fusion on the image block sequence and the text token sequence to obtain an image block feature sequence weighted by the text information, a text token feature sequence weighted by the image information and a cross-modal attention map; based on the cross-modal attention map and the privacy policy library, performing sensitivity collaborative prediction and desensitization policy generation on the text information weighted image block feature sequence and the image information weighted text token feature sequence to obtain a desensitization plan; desensitizing the original multi-modal data based on the desensitization plan to obtain desensitized multi-modal data; Performing cross-modal feature extraction and collaborative attention fusion on the image block sequence and the text token sequence to obtain a text information weighted image block feature sequence, an image information weighted text token feature sequence and a cross-modal attention map, including: inputting the image block sequence into an image encoder and inputting the text token sequence into a text encoder to obtain an image embedded sequence and a text embedded sequence; taking the image embedding sequence as a query vector, taking text embedding as a key vector and a value vector, and inputting the query vector and the text embedding as a cross-modal attention module to obtain a text information weighted image block characteristic sequence and an attention weight matrix as the cross-modal attention map; taking the text embedding sequence as a query vector, taking image embedding as a key vector and a value vector, and inputting the text embedding sequence into a cross-modal attention module to obtain a text token feature sequence after image information weighting; Based on the cross-modal attention map and the privacy policy library, performing sensitivity collaborative prediction and desensitization policy generation on the text information weighted image block feature sequence and the image information weighted text token feature sequence to obtain a desensitization plan, wherein the method comprises the following steps of: extracting an entity pair set from the text information weighted image block feature sequence and the image information weighted text token feature sequence based on the cross-modal attention map; the method comprises the steps of embedding a knowledge graph input by a joint entity expression vector of each entity pair in an entity pair set into a model to obtain a priori evidence set; Inputting the text information weighted image block feature sequence and the text token feature sequence weighted by the image information into an image sensitivity prediction head and a text sensitivity prediction head to obtain an initial sensitivity prediction set; Performing Bayesian inference-based posterior sensitivity evaluation on the initial sensitivity prediction set and the prior evidence set to obtain a posterior sensitivity score set; And generating a desensitization strategy based on variation optimization on the posterior sensitivity score set based on the privacy strategy library to obtain a desensitization plan.
2. The method of claim 1, wherein the step of structurally preprocessing the original multi-modal data to obtain the image block sequence and the text token sequence comprises: Inputting image data in the original multi-mode data into an image processor for size standardization and image block segmentation processing to obtain the image block sequence; And inputting the text in the original multi-modal data into a text processor for word segmentation and word embedding processing to obtain the text token sequence.
3. The multimodal training data desensitization method according to claim 2, wherein extracting the set of entity pairs from the text information weighted image block feature sequence and the image information weighted text token feature sequence based on the cross-modal attention map comprises: Based on the comparison between the cross-modal attention map and a preset threshold, extracting matched text information weighted image block features and a matched set of image information weighted text token features from the text information weighted image block feature sequence and the image information weighted text token feature sequence; and calculating the position-based mean value vector between each matched text information weighted image block feature and each matched text information weighted text token feature in the matched text information weighted image block feature and the matched text information weighted text token feature set to obtain the joint entity expression vector of each entity pair.
4. The method of claim 3, wherein performing Bayesian inference-based posterior sensitivity assessment on the initial sensitivity prediction set and the prior evidence set to obtain the posterior sensitivity score set comprises performing Bayesian inference-based posterior sensitivity assessment on the initial sensitivity prediction set and the prior evidence set with the following formula: Wherein, the For the initial sensitivity prediction, the method comprises the steps of, Representing the probability that a truly sensitive entity is detected as having a public attribute, Representing the probability that a non-sensitive entity is detected as having a public attribute.
5. The multimodal training data desensitization method of claim 1, wherein the performing a variational optimization-based desensitization policy generation on the posterior sensitivity score set based on the privacy policy repository to obtain the desensitization plan comprises performing a variational optimization-based desensitization policy generation on the posterior sensitivity score set based on the privacy policy repository with a formula: Wherein, the Is an entity A selected optimal desensitization operation; is the posterior sensitivity score calculated by the entity in the previous step; Is the set of all optional desensitization operations; Is an operation of The privacy protection efficiency of (2) is between 0 and 1; Is to perform operations Resulting loss of data utility; Then an adjustable superparameter is used to balance the weights between privacy and utility.

Description

Multi-mode training data desensitization method Technical Field The application relates to the field of intelligent management, and more particularly, to a multi-modal training data desensitization method. Background With the rapid development of artificial intelligence technology, particularly the rise of a multi-mode large model, enterprises and research institutions are increasingly gathering and utilizing multi-mode data such as images, texts and the like. The massive data often contains sensitive personal information or business confidentiality, and how to play the data value to the maximum degree while guaranteeing the privacy security of the data and ensure the traceability of the data becomes a key problem to be solved urgently at present. Therefore, it is particularly important to construct a set of efficient and reliable multi-modal training data desensitization schemes. However, in the prior art, privacy protection schemes for multi-modality data often adopt an independent processing manner of "patching", that is, independent desensitization tools are developed for different modalities respectively. This separate process flow results in information islanding and severe cross-modality correlated information desensitization inconsistencies. Specifically, when multi-mode data such as graphics context and video captions are processed, the condition that one mode is desensitized and the other mode still retains sensitive information may occur, and an attacker can use the inconsistency to carry out identity re-identification, so that risk of cross-mode information leakage is caused. A further disadvantage is that the core processing logic of existing mechanisms is "cut-away" and contextually blind to the special elements. It often equates the strong correlation between cross-modal features directly to sensitivity that requires desensitization, but ignores the public nature of the entity and the true intent of the information release. This results in unnecessary desensitization of the system when dealing with public characters or authorized public information, erroneously determining public information as a privacy risk, severely undermining the intrinsic value and usability of training data. In addition, the sensitivity is evaluated as a static attribute determined by the entity class, so that the mechanism cannot dynamically adapt to complex context changes, and the sensitivity difference of the same entity under different disclosure intentions cannot be distinguished, thereby causing accidental injury and over protection of the data value. Thus, an optimized multimodal training data desensitization method is desired. Disclosure of Invention The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides a multi-modal training data desensitization method, which comprises the steps of firstly carrying out structural pretreatment on original multi-modal data, then generating text weighted image features, image weighted text features and key cross-modal attention map through cross-modal feature extraction and collaborative attention fusion, carrying out entity identity priori detection by creatively combining a knowledge map and utilizing Bayesian inference to carry out self-adaptive collaborative prediction on sensitivity based on the attention map and a privacy policy library, and finally generating a refined desensitization plan through variation optimization to realize accurate desensitization on the original data. By the method, the defects of inconsistent desensitization and 'one-cut' existing in the aspect of multi-mode data privacy protection in the prior art are overcome, the sensitive information is fully protected, meanwhile, the intrinsic value and training usability of the data are reserved to the maximum extent, and a high-quality, compliant and valuable training data set is provided for the artificial intelligent large model. According to one aspect of the present application, there is provided a multi-modal training data desensitization method comprising: Acquiring original multi-modal data; carrying out structural preprocessing on the original multi-mode data to obtain an image block sequence and a text token sequence; performing cross-modal feature extraction and collaborative attention fusion on the image block sequence and the text token sequence to obtain an image block feature sequence weighted by the text information, a text token feature sequence weighted by the image information and a cross-modal attention map; based on the cross-modal attention map and the privacy policy library, performing sensitivity collaborative prediction and desensitization policy generation on the text information weighted image block feature sequence and the image information weighted text token feature sequence to obtain a desensitization plan; And performing desensitization processing on the original multi-mode data based on the desensit