CN-116842566-B - Personal image privacy protection method, system and electronic equipment
Abstract
The invention discloses a personal image privacy protection method, a personal image privacy protection system and electronic equipment. The method provided by the invention allows a user to specify a person in a protected image through language expression, parallel codes input indicating information and a personal image by using a lightweight deep neural network to generate multi-scale visual features of which the image and text information are fully fused, and generates a stable specified personal privacy protection image mask through a multi-scale feature fusion and mask positioning enhancement module in the decoding process. The invention can solve the problems of excessive content protection and unbalanced pixel existing in the training of the personal image privacy protection network existing in the existing personal image privacy protection technology.
Inventors
- LI ZHIYONG
- CHEN JIAJUN
- LIN JIACHENG
- XIAO ZHIQIANG
- FU HAOLONG
- WANG ZIAN
- LIU HANHAO
- Guo Yihu
Assignees
- 湖南大学
Dates
- Publication Date
- 20260508
- Application Date
- 20230706
Claims (5)
- 1. A personal image privacy preserving method, comprising the steps of: step one, constructing a privacy protection data set of an index personal image, wherein the data in the data set comprises images related to people, corresponding index text descriptions and a real annotation mask; inputting the image and the appointed text description into a text-aware visual coding network, and extracting multi-scale visual characteristics appointed local information enhancement; Inputting the multi-scale visual features which refer to local information enhancement into a multi-scale feature decoding network, outputting a high-quality image mask, and obtaining the specific positions of people to be protected in the image; Training and optimizing the image based on a balance binary cross entropy loss function, and relieving the problem of unbalanced pixels during network training by introducing a balance coefficient item on the basis of binary cross entropy; in the second step, the visual coding network includes an image coding module, a text coding module, and an image text multi-scale coding module, wherein: The image coding module is used for extracting multi-scale visual characteristics; The text encoding module is used for encoding the input text into word vectors; The image text multi-scale coding module is used for inputting image mode characteristics and text mode characteristics extracted from original data into a text perception attention mechanism with a text perception control gate, and enhancing visual local information pointed by multi-scale visual characteristics; In step three, the multi-scale feature decoding network includes a multi-scale feature fusion module and a mask location enhancement module, wherein: the multi-scale feature fusion module is used for further filtering, supplementing and fusing multi-scale features through a multi-scale up-sampling network with a noise information filtering door; The mask positioning enhancement module positions the privacy object based on the features extracted by the multi-scale feature fusion module; The mask positioning enhancement module comprises two successive steps: dual attention feature enhancement, namely adopting a dual attention mechanism of a channel self-attention branch and a space self-attention branch to enhance local feature characterization and mask positioning features; location mask generation a location privacy preserving target decoding is captured on multiple scale receptive fields at different sample rates using the modified ASPP to generate a high quality mask for privacy preserving post processing operations.
- 2. The method of claim 1, wherein in step one, the data set comprises a training set, a test set, and a validation set, the training set, the test set, and the validation set being partitioned in a ratio of 8:1:1.
- 3. The method according to claim 2, wherein in step one, the data sets are divided into three types of data sets according to the content of the reference text description, specifically comprising: a first class of data sets, which refer to words of shorter text length, less than 10 words, containing positional information words; a second class of data sets, which refers to words of intermediate length, from 10 to 15 words, comprising appearance words; a third class of data sets, which are more descriptive of text, contains positional information words and appearance words.
- 4. A system for performing the personal image privacy preserving method of any of claims 1-3, comprising: The data construction module is used for constructing a privacy protection data set of the reference personal image, wherein the data in the data set comprises images related to people, corresponding reference text descriptions and a real annotation mask; The feature extraction module is used for inputting the image and the indicating text description into a text-aware visual coding network and extracting multi-scale visual features indicating local information enhancement; the feature decoding module is used for inputting the multi-scale visual features which refer to local information enhancement into a multi-scale feature decoding network, outputting a high-quality image mask and obtaining the specific positions of people to be protected in the image; The network training module is used for training and optimizing the image based on the balance binary cross entropy loss function, and the problem of unbalanced pixels during network training is relieved by introducing a balance coefficient item on the basis of the binary cross entropy.
- 5. An electronic device comprising a processor and a memory, the memory having stored therein computer program instructions that, when read and executed by the processor, perform the method of any of claims 1-3.
Description
Personal image privacy protection method, system and electronic equipment Technical Field The application belongs to the technical field of multi-mode image segmentation and image privacy protection, and particularly relates to a personal image privacy protection method, a system and electronic equipment. Background The explosive proliferation of network platforms and social media greatly enriches people's lives. However, while personal information is shared in a social network, a large number of personal pictures may be transmitted and shared in the network, which may lead to disclosure of personal privacy information, especially information such as personal identity, which may cause potential privacy infringement and infringement problems. The existing image privacy protection method is mainly divided into a traditional method and a deep learning method. Conventional privacy protection schemes typically employ automatic masking strategies, such as mosaicing or blurring the entire image, which inevitably results in serious information loss. In recent years, the development of deep learning and the progress of face detection and human body gesture recognition technology are benefited, and the existing detection method can automatically detect all faces and bodies in an image and then blur the faces and bodies to filter private information. While these privacy preserving methods can effectively preserve personal privacy, when a user only wishes to hide and mask one or a portion of the person, the problem of over-protection of the content is faced, and one or more persons in the image that are not desired to be masked are inevitably masked, resulting in unnecessary loss of image information. In this case, the existing method generally requires manual blurring of the image area, resulting in additional manual operation costs. Recently, with the progress of multi-modal technology, image segmentation technology is referred to make it possible to capture corresponding areas in a positioning image according to complex language expressions or audio inputs, and a potential solution is provided for positioning objects or private information to be protected in the image by means of text or voice information. More importantly, the use of audio or text input has the advantage that, firstly, since text contains rich high-dimensional semantic information, it has proven to have excellent summarization capability and efficient object localization capability in the fields of reference segmentation and semantic representation, and secondly, the use of audio to interact with devices is a part of people's daily life, thus, reference to personal image privacy protection technology has a wide range of application scenarios, such as secure video conferences and the like. However, the existing image segmentation method is rarely considered to lose training between the pointed and non-pointed image pixels through Binary Cross Entropy (BCE), and as the pointed area in the image occupies a small proportion, the problem of unbalance of the training pixels exists, and the unbalance is quite serious in impeding the development of the privacy protection technology of the pointed image, and can obstruct the optimization training of a network, so that the pointed object cannot be accurately positioned, and the practical performance of the method applied in privacy protection is seriously affected. Disclosure of Invention The invention discloses a personal image privacy protection method, a personal image privacy protection system and electronic equipment, which can effectively solve the technical problems related to the background technology. In order to achieve the above purpose, the technical scheme of the invention is as follows: A personal image privacy preserving method comprising the steps of: step one, constructing a privacy protection data set of an index personal image, wherein the data in the data set comprises images related to people, corresponding index text descriptions and a real annotation mask; inputting the image and the appointed text description into a text-aware visual coding network, and extracting multi-scale visual characteristics appointed local information enhancement; Inputting the multi-scale visual features which refer to local information enhancement into a multi-scale feature decoding network, outputting a high-quality image mask, and obtaining the specific positions of people to be protected in the image; and step four, training and optimizing the image based on a balanced binary cross entropy loss function, and relieving the problem of unbalanced pixels during network training by introducing a balanced coefficient item on the basis of binary cross entropy. As a preferred improvement of the present invention, in step one, the data set includes a training set, a test set and a validation set, and the training set, the test set and the validation set are divided according to a ratio of 8:1:1. As a preferred imp