Search

CN-122019787-A - Privacy ontology construction method and system for social network

CN122019787ACN 122019787 ACN122019787 ACN 122019787ACN-122019787-A

Abstract

The invention discloses a method and a system for constructing a privacy ontology oriented to a social network, and relates to the technical field of safety information processing, wherein the method comprises the steps of acquiring a social network data set and extracting candidate privacy attributes; calculating the information entropy of the candidate privacy attribute and the mutual information of the candidate privacy attribute and the identity or target sensitive attribute, weighting after normalization to obtain a comprehensive sensitivity score so as to screen a privacy attribute set, calculating the condition mutual information of the privacy attribute pair under the constraint of the condition attribute as the dependency strength, determining the deducing direction based on a condition entropy reduction criterion and normalizing to obtain a relation weight, constructing a privacy ontology comprising attribute nodes and weighted dependency edges, receiving data batch according to time slices, executing node addition detection, statistic increment update and relation evidence accumulation, and increment update ontology structure or weight. By the technical scheme, privacy attribute sensitivity quantification and inference relation modeling are realized, and privacy risk identification accuracy and interpretability are improved.

Inventors

  • ZHU NAFEI
  • ZHANG CHUNHUI
  • HE JINGSHA

Assignees

  • 北京工业大学

Dates

Publication Date
20260512
Application Date
20260122

Claims (10)

  1. 1. The method for constructing the privacy ontology facing the social network is characterized by comprising the following steps of: Acquiring a social network data set, wherein the social network data set comprises a plurality of user records, and each user record comprises a plurality of privacy attribute variables and corresponding values thereof, and comprises a user identity attribute variable or a target sensitive attribute variable and corresponding values thereof; extracting candidate privacy attributes from the social network data set to form a candidate privacy attribute set; Calculating information entropy based on probability distribution obtained by statistics of all user samples aiming at each candidate privacy attribute in the candidate privacy attribute set, and calculating mutual information between the candidate privacy attribute and the user identity attribute variable or the target sensitive attribute variable; after the information entropy and the mutual information are respectively normalized, a comprehensive sensitivity score is constructed according to preset weights, and a privacy attribute set is obtained through screening according to the comprehensive sensitivity score; Calculating condition mutual information under the constraint of a preset condition attribute set aiming at each privacy attribute pair in the privacy attribute set, and taking the condition mutual information as the dependence strength between the privacy attribute pairs; constructing a privacy ontology based on the privacy attribute set, wherein the privacy ontology comprises attribute nodes and attribute dependency edges characterized by the inferred directions and the relationship weights; dividing the social network data into a plurality of time slices according to a preset time granularity, and receiving corresponding data batches in each time slice; aiming at the data batch, performing new detection and node initialization of attribute nodes, incremental update of node statistics and accumulation of attribute relation observation evidence; And when the time slice is finished, uniformly judging the candidate attribute dependency relationship accumulated in the time slice, and when the dependency strength corresponding to the candidate attribute dependency relationship exceeds the preset threshold value, incrementally updating the structure of the privacy ontology or the relationship weight.
  2. 2. The social network-oriented privacy ontology construction method according to claim 1, wherein the normalization processing of the information entropy and the mutual information adopts extremum normalization based on a candidate privacy attribute set, so that a normalization result falls into a [0,1] interval, and an extremely small positive number is introduced to avoid zero denominator, the comprehensive sensitivity score is that the normalization information entropy and the normalization mutual information are weighted and summed according to preset weights, and the sum of the weights of the preset weights is 1.
  3. 3. The method according to claim 1, wherein the probability distribution used for calculating the information entropy and the mutual information is estimated by statistics of all user samples in the social network data set, and the probability distribution at least comprises an edge probability distribution of candidate privacy attributes and a joint probability distribution of candidate privacy attributes and user identity attribute variables or target sensitive attribute variables.
  4. 4. The method according to claim 1, wherein the preset condition attribute set is a privacy attribute variable set other than a privacy attribute pair to be analyzed or a condition attribute subset selected from the privacy attribute set based on relevance and comprehensive sensitivity score.
  5. 5. The method for constructing a privacy ontology for a social network according to claim 1, wherein when privacy attribute pairs are uniformly determined at the end of a time slice, statistical significance checking is performed on conditional mutual information thereof, and when the conditional mutual information exceeds the preset threshold and passes the statistical significance checking, the privacy attribute pairs are determined to have stable attribute dependency relationships.
  6. 6. The method of claim 1, wherein determining the inference direction based on the conditional entropy reduction criterion comprises calculating a first conditional entropy reduction and a second conditional entropy reduction, respectively, under the same constraint of a set of preset conditional attributes, the first conditional entropy reduction characterizing a reduction degree of uncertainty of a first privacy attribute to a second privacy attribute introduced under the condition of a set of known conditional attributes, the second conditional entropy reduction characterizing a reduction degree of uncertainty of the first privacy attribute introduced under the condition of a set of known conditional attributes, determining a party with a larger reduction degree of conditional entropy and passing a statistical significance test as a starting attribute of the inference direction, and establishing a corresponding directed attribute dependency relationship edge in the privacy ontology.
  7. 7. The method for constructing the privacy ontology for the social network according to claim 1, wherein the privacy ontology further comprises semantic extension information, the semantic extension information at least comprises fine-grained sub-attributes of privacy attributes, candidate synonyms and upper and lower concepts, the semantic extension information obtains semantic support weights based on concept co-occurrence statistics calculation of the social network corpus, and the semantic support weights are used for attribute node description and attribute relation interpretation in the privacy ontology.
  8. 8. The method for constructing a privacy ontology for a social network according to claim 1, wherein the performing the new detection and the node initialization of the attribute nodes for the data batch includes newly adding a corresponding attribute node in the privacy ontology when detecting that the data batch contains a privacy attribute variable which does not appear in the privacy ontology, and initializing a value space, a statistical count and a probability estimation parameter of the attribute node; performing incremental update of node statistics comprises the steps of performing incremental update on attribute value occurrence frequency, edge probability distribution and joint statistics information, and updating statistical bases for online estimation of information entropy, mutual information and conditional mutual information by adopting an accumulation counting or sliding window mechanism.
  9. 9. The method for constructing a privacy ontology for a social network according to claim 1, wherein the step of performing the accumulation of attribute relationship observation evidence includes taking privacy attribute pairs commonly occurring in a data batch as observation evidence and updating corresponding joint statistical information; Calculating the condition mutual information of the candidate attribute dependency relationship when the accumulated samples in the time slices meet the minimum sample size or stability requirement; And when the time slice is finished, if the candidate attribute dependency relationship meets the unified judging condition, executing new addition on the attribute dependency relationship side which does not exist in the privacy ontology, and executing the relationship weight update on the attribute dependency relationship side which exists.
  10. 10. A social network-oriented privacy ontology construction system for implementing the social network-oriented privacy ontology construction method according to any one of claims 1 to 9, comprising: the data acquisition module is used for acquiring a social network data set, wherein the social network data set comprises a plurality of user records, and each user record comprises a plurality of privacy attribute variables and corresponding values thereof, and comprises a user identity attribute variable or a target sensitive attribute variable and corresponding values thereof; the privacy attribute extraction module is used for extracting candidate privacy attributes from the social network data set to form a candidate privacy attribute set; The sensitivity evaluation module is used for calculating information entropy aiming at each candidate privacy attribute in the candidate privacy attribute set based on probability distribution obtained by statistics of all user samples, calculating mutual information between the candidate privacy attribute and the user identity attribute variable or the target sensitive attribute variable, respectively carrying out normalization processing on the information entropy and the mutual information, constructing comprehensive sensitivity scores according to preset weights, and screening according to the comprehensive sensitivity scores to obtain a privacy attribute set; the attribute dependence analysis module is used for calculating condition mutual information under the constraint of a preset condition attribute set aiming at each privacy attribute pair in the privacy attribute set, and taking the condition mutual information as the dependence intensity between the privacy attribute pairs; The privacy ontology construction module is used for constructing a privacy ontology based on the privacy attribute set, and the privacy ontology comprises attribute nodes and attribute dependency edges characterized by the inferred direction and the relationship weight; The privacy ontology structure or the relation weight is updated in an increment mode, wherein the increment updating module of the time slice is used for dividing the social network data into a plurality of time slices according to preset time granularity and receiving corresponding data batches in each time slice, the new detection and node initialization of the attribute nodes, the increment updating of node statistics and the accumulation of the attribute relation observation evidence are executed for the data batches, when the time slices are finished, the candidate attribute dependency relation accumulated in the time slices is uniformly judged, and when the dependency strength corresponding to the candidate attribute dependency relation exceeds the preset threshold value, the structure of the privacy ontology or the relation weight is updated in an increment mode.

Description

Privacy ontology construction method and system for social network Technical Field The invention relates to the technical field of safety information processing, in particular to a privacy ontology construction method facing a social network and a privacy ontology construction system facing the social network. Background The online social network platform deposits a large amount of user portraits and interaction data, and user records typically contain a variety of attribute variables and their values, and are associated with user identity attributes or other sensitive attributes. In the privacy protection and risk assessment scenario, the semantic concept of privacy attributes and their interdependence relationship need to be expressed in a structured manner to support privacy disclosure path analysis, risk measurement inference and treatment policy formulation. The existing privacy ontology construction scheme mainly surrounds the extraction of privacy policy/compliance terms, the modeling of a predefined privacy concept system, the deployment of ideas such as rule or reasoning mechanism organizations and the like, or the establishment of an ontology structure based on a specific business scene preset privacy attribute list and a relation rule. The following disadvantages are common in floor application: Firstly, the sensitivity degree of the privacy attribute is dependent on manual experience or single index discrimination, and a unified quantitative measurement framework for all user samples is lacked, so that the sensitivity among different attributes is difficult to transversely compare, and the screening caliber of the privacy attribute is inconsistent. Secondly, the attribute dependence is mostly characterized in a correlation or rule mode, quantitative judgment on the dependence strength under the constraint of conditional attributes is absent, direct dependence and indirect coupling are difficult to distinguish, and unstable relationship is easy to introduce. Third, attribute relationship directions are usually manually set, and a directional dependency structure which can be used for deducing path analysis is difficult to form due to the lack of an deducing direction determining mechanism based on uncertainty reduction. Fourthly, social network data has a continuous updating characteristic, the existing ontology is mostly built offline at one time, and a mechanism for carrying out statistic increment updating on batch data according to time granularity and uniformly judging and updating structure and relation weight at a time slice boundary is lacked, so that the ontology is difficult to keep consistency and effectiveness along with data evolution. Therefore, there is a need for a social network-oriented privacy ontology construction method, which performs sensitivity quantification screening on candidate privacy attributes based on overall user sample statistics, determines the dependency strength of privacy attribute pairs under condition constraint, determines the inference direction and relationship weight, and supports incremental update and unified determination update of data batch according to time slices, thereby forming a sustainable evolution privacy ontology structure. Disclosure of Invention According to the privacy ontology construction method and system for the social network, candidate privacy attribute extraction is carried out on a social network data set, mutual information between information entropy and user identity attribute variables or target sensitive attribute variables is calculated based on probability distribution of overall user sample statistics, integrated sensitivity scores are built by combining normalization and weighting, unified quantitative screening of privacy attributes is achieved, defects of inconsistent caliber and insufficient comparability caused by only relying on manual experience or single index are avoided, condition mutual information of privacy attribute pairs is calculated under constraint of a preset condition attribute set to serve as dependency intensity, an inference direction is determined based on condition entropy reduction criteria, relationship weights are formed by normalization dependency intensity, accordingly attribute dependency relationship edges with directionality and strength are obtained, the risk of misjudging indirect coupling as direct dependency is reduced, stability and interpretability of inference path analysis are improved, the privacy attribute structure or relationship weights are uniformly judged and updated when time slices are finished through time granularity division of time slices, new detection, statistic increment updating and observation evidence accumulation are carried out on batch data, and the privacy attribute pair can keep consistency and evolution along with the time slices, and the privacy attribute pair stability and effectiveness of privacy attribute pair can be improved continuously, and the priv