Search

CN-121984710-A - Government system-based data management and data analysis system and analysis method thereof

CN121984710ACN 121984710 ACN121984710 ACN 121984710ACN-121984710-A

Abstract

The invention relates to the technical field of batch type safety calculation and discloses a government system data management and data analysis system and an analysis method thereof, wherein the government system data management and data analysis method comprises the steps of obtaining real-time sensitive data streams of all departments, generating distributed ciphertext streams through streaming secret sharing, associating characteristic fingerprints with homomorphic hashes and outputting ciphertext characteristic mapping tables, calculating node attributes of a parallel calculation ciphertext graph by means of safety multiparties to generate distributed ciphertext graphs, inputting an encryption event propagation path by a graph neural network reasoner, selecting a high risk path by threshold decryption and differential privacy, outputting a desensitization early warning report, updating the ciphertext graph by a sliding time window, calculating abnormal recognition and risk score by differential calculation, balancing calculation complexity and response time, adaptively adjusting encryption granularity and updating frequency, generating a delay-privacy-accuracy balance strategy, and achieving triple objectives of privacy protection, real-time response and depth analysis.

Inventors

  • CAO JIANMING
  • LIN YIZHU
  • LEI JUN
  • LIN PINLE
  • Lai Weirong

Assignees

  • 福建省大数据集团三明有限公司

Dates

Publication Date
20260505
Application Date
20251226

Claims (10)

  1. 1. The government system data management and data analysis method is characterized by comprising the following steps of: acquiring real-time sensitive data streams of each department, performing normalization processing on the digital data to scale the data Within a range to fit a finite field Performing numerical encoding conversion on the split type data into integer identifiers, performing a streaming secret sharing algorithm on each data record to generate a distributed ciphertext stream, calculating associated feature fingerprints of the ciphertext data based on homomorphic hash functions, applying homomorphic hash functions to each ciphertext fragment to calculate feature fingerprints, identifying potential data associations by comparing Euclidean distances between hash values of different ciphertext fragments, outputting a ciphertext feature mapping table, calculating ciphertext graph node attributes in parallel by utilizing a secure multiparty computing protocol, calculating the graph node attributes for the local ciphertext fragment by each department including node identifiers, node types and node weights, converting the node types into numerical vectors by adopting one-time encoding, and scaling the node weights to be achieved by adopting Min-Max standardization processing In the range, the side relation information is exchanged through an unintentional transmission protocol to generate a distributed ciphertext graph structure The method comprises the steps of inputting a ciphertext graph structure to a graph neural network reasoner based on secret sharing, receiving the ciphertext graph structure and a node characteristic matrix by an input layer of the graph neural network, performing Z-score standardization processing, performing message transfer operation in a ciphertext domain by a plurality of graph convolution layers, updating node characteristics by a security matrix multiplication of a BGV homomorphic encryption scheme and a security ReLU activation function of piecewise linear approximation, generating a risk score vector of each node by an output layer, and iteratively generating a high risk propagation path sequence by a path extraction decoder of an attention mechanism, and performing selective decryption on a high risk association path based on a threshold decryption and differential privacy mechanism, and performing path comprehensive risk scoring: triggering threshold decryption when a preset threshold is exceeded requires at least The individual departments provide ciphertext patches using the Lagrangian interpolation formula: restoring the original data, and adding Laplacian noise to the decrypted data through a differential privacy mechanism Generating a desensitized real-time early warning report, wherein the streaming secret sharing algorithm adopts a streaming variant of a Shamir secret sharing scheme to decompose each arriving data record into data records The ciphertext fragment satisfies that only the number of acquisitions is not less than The homomorphic hash function satisfies homomorphic property so that hash values of ciphertext fragments corresponding to the original data with the same characteristic attribute satisfy The graph neural network executes multi-hop propagation calculation in the ciphertext domain through safe matrix multiplication and safe activation functions.
  2. 2. The government system based data management and data analysis method of claim 1 wherein said streaming secret sharing algorithm includes step 2.1 of receiving a single data record in a real-time data stream Step 2.2, generating Random coefficient Each coefficient is taken from a finite field Wherein Is a large prime number with a value range of And satisfy the following To optimize the modular operation performance, step 2.3, construction Polynomial of degree Wherein the constant term is a data record value The rest term coefficients are the random coefficients; step 2.4, calculating Function values of the evaluation points As a means of Ciphertext fragment, wherein , Dividing ciphertext into pieces for participating departments number, step 2.5 Send to the first Step 2.6, clearing local polynomial coefficients to ensure forward security, wherein the threshold value The range of the values is as follows: When the number of departments is participated Time setting When (when) Time setting 。
  3. 3. The government affair system data governance and data analysis method according to claim 1, wherein calculating the associated feature fingerprint of ciphertext data based on homomorphic hash function includes step 3.1, for each ciphertext fragment Applying homomorphic hash functions Computing feature fingerprints The homomorphic hash function The input of (a) is ciphertext fragment Output is hash value of fixed length Step 3.2, comparing the relation between hash values of different ciphertext fragments, and calculating two ciphertext fragments And Euclidean distance between hash values of (a) Step 3.3 when the Euclidean distance is less than the similarity threshold When the similarity threshold is determined to have an association relationship The range of the values is as follows The threshold for false positive rate below 5% and true positive rate above 95% is selected by computing hash value distance distribution of true associated and non-associated pairs over historical data: step 3.4, outputting a ciphertext feature mapping table to record ciphertext fragment identifier pairs with association relations, wherein the homomorphic hash function Satisfying homomorphism properties, i.e. for any two ciphertext fragments And If the original data has the same characteristic attribute, then: Wherein the method comprises the steps of The value range is that the modulus of the hash function is 。
  4. 4. The government affair system based data management and data analysis method according to claim 1, wherein the parallel computation of the cipher text graph node attribute by using the secure multiparty computation protocol comprises the steps of 4.1, computing the graph node attribute by each department on the local ciphertext fragment, including node identification, node type and node weight, 4.2, converting the node type into a numerical vector by adopting a single-hot coding mode, and for the data comprising the data vector Node type variable creation length for each category Vector of (3) For category index of Node arrangement of (c) The rest positions are kept to be 0, and in step 4.3, the Min-Max standardization processing is adopted for the node weight, and the weight value is calculated Counting the minimum value of all weights in the current batch And maximum value Applying a normalized formula Scaling weight values to In-scope, step 4.4, determining the associated edges between nodes by exchanging edge relationship information with an inadvertent transmission protocol, the input of which includes the sender's data set And selection index of receiver Output is obtained for the receiving party Without the sender knowing Step 4.5, outputting the cryptograph graph structure of the distributed storage Wherein In the case of a set of ciphertext nodes, Is a ciphertext edge set.
  5. 5. The government system based data governance and data analysis method of claim 1 wherein said secret sharing based graph neural network reasoner includes an input layer receiving ciphertext graph structure Sum node feature matrix As input to the feature matrix Performing a Z-score normalization process on the first Calculating a mean value of the dimension features: Standard deviation: Then for the first node of each node Application of standardized formulas to dimensional features And the plurality of graph convolution layers execute message transmission operation in a ciphertext domain, and the characteristic updating formula of the node is as follows: Wherein the method comprises the steps of Is a node In the first place The feature vector of the layer is used to determine, Is a node Is used to determine the neighbor set of a neighbor, Is the first The weight matrix of the layer is used to determine, The security matrix multiplication adopts a BGV homomorphic encryption scheme, and each element of the matrix is expressed as a polynomial: Calculating polynomial multiplication: The safe ReLU activation function uses a piecewise linear approximation method in a ciphertext domain to perform Approximately as Absolute value function By polynomials An output layer for generating risk score vector of each node The risk score vector is converted into probability distribution through softmax function, and the probability distribution is obtained according to a preset risk threshold matrix The path extraction decoder comprises an attention mechanism and a sequence generation module, wherein an attention weight calculation formula is as follows: A high risk propagation path sequence is iteratively generated based on the attention weights.
  6. 6. The government system based data management and data analysis method of claim 1 wherein said threshold decryption and differential privacy based mechanism for selectively decrypting high risk associated paths includes step 6.1 of calculating path composite risk scores for encrypted event propagation paths Wherein For the path length of the optical fiber, Is the first on the path Step 6.2, when the path comprehensive risk score exceeds the preset threshold value of 0.6 and meets the requirement of And is also provided with Triggering threshold decryption operation in time, step 6.3, threshold decryption needs at least The individual department provides the ciphertext fragment held by the individual department And corresponding index set Using the lagrangian interpolation formula: restoring original data, step 6.4, adding Laplace noise to decrypted data by differential privacy mechanism Wherein For the global sensitivity of the query function, The value range is that for privacy budget parameters Step 6.5, converting the numerical identifier into a specific event name based on a predefined event type mapping table, converting the node coordinates into specific address descriptions through a geographic information system, matching corresponding emergency treatment suggestion templates according to risk levels, and outputting a desensitized real-time early warning report, wherein the global sensitivity is the same as that of the real-time early warning report The calculation method of (1) is as follows: risk score ranges for a path risk score query single node are And a path length of at most 10 and thus global sensitivity 。
  7. 7. The government system data management and data analysis method in accordance with any one of claims 1 through 6 further comprising the step of maintaining a time period of 7.1 For the ciphertext graph structure in the window Calculating an incremental change, the length of time The range of the values is as follows: wherein the natural disaster event adopts a time window of 45-60 minutes, the emergency safety event adopts a time window of 5-15 minutes, and step 7.2, the two ciphertext graph structures at adjacent moments are compared through the difference calculation of ciphertext domains And Outputting a graph structure change set: Step 7.3, pattern matching is carried out based on a predefined abnormal pattern library, wherein the abnormal pattern library comprises 5 basic abnormal patterns, namely node sudden increase patterns are equivalent to the same Triggering when the edge weight drastic mode is triggered when: The method comprises the steps of triggering when a newly added edge is intensively connected to less than 5% of nodes, triggering when a cascade connection mode is triggered when a graph connected component is increased by more than 50% due to the fact that nodes are deleted, triggering when the variation amplitude of continuous 3 time windows shows a periodic rule in a periodic fluctuation mode, calculating real-time risk scores, converting the risk scores into discrete risk grades through grading threshold judgment, and scoring the risk grades The interval corresponds to a green normal level, The interval corresponds to a yellow attention level, And 7.5, outputting a time sequence risk scoring sequence and a corresponding risk grade identifier.
  8. 8. The government system data management and data analysis method according to any one of claims 1 to 6, further comprising the steps of 8.1, monitoring the calculation load, response delay and privacy protection intensity of the system, calculating the normalized value of each index, and 8.2, performing normalization processing on each index to scale it to the value of each index In order to eliminate the influence of dimension difference on balance calculation, step 8.3, optimizing the objective function according to delay-privacy-accuracy balance: Optimization is performed in which And is also provided with The optimization variables Configuring parameter vectors for a system Step 8.4, the delay loss function: Wherein the method comprises the steps of For the moment of time Step 8.5, the privacy loss function: Wherein the method comprises the steps of For the moment of time Information entropy of ciphertext data, and 8.6, the accuracy loss function: Step 8.7, adaptively adjusting encryption granularity parameters The range of the values is as follows And ciphertext graph structure update frequency The range of the values is as follows And step 8.8, outputting the optimized system parameter configuration.
  9. 9. The government system based data management and data analysis method of claim 6 wherein generating a desensitized real-time alert report includes step 9.1 of converting a numeric identifier to a specific event name based on a predefined event type mapping table structured in key value pair form: By table look-up operation Finishing conversion, step 9.2, using geographic information system to make longitude and latitude coordinates of node Input to the reverse geocoding API, call the formula Obtaining specific address character string, step 9.3, according to the combination of risk level and event type Retrieving corresponding emergency treatment suggestion text from a predefined template library, wherein the matching formula is as follows Step 9.4, outputting a desensitization real-time early warning report containing event types, influence ranges and recommended measures.
  10. 10. A government system based data management and data analysis system for executing the government system based data management and data analysis method according to any one of claims 1-9, characterized by comprising a data receiving module for receiving real-time sensitive data streams of departments, a data preprocessing module for performing normalization processing on numeric data to scale the data to the value The system comprises a range, a ciphertext fragment generation module, a correlation characteristic calculation module, a ciphertext graph construction module and a secure multiparty calculation protocol parallel calculation module, wherein the range is used for carrying out numerical coding on segmentation type data, the ciphertext fragment generation module is used for executing a streaming secret sharing algorithm to generate a distributed ciphertext stream, the correlation characteristic calculation module is used for calculating correlation characteristic fingerprints of ciphertext data based on homomorphic hash functions and identifying data correlation through Euclidean distance measurement, and the ciphertext graph construction module is used for generating a distributed ciphertext graph structure by utilizing secure multiparty calculation protocol to parallelly calculate ciphertext graph node attributes and exchanging side relation information through an careless transmission protocol The system comprises a graph neural network reasoning module, a selective decryption module, an early warning report generation module and a system optimization module, wherein the graph neural network reasoning module is used for executing multi-hop propagation calculation and identifying an event propagation path through a safety matrix multiplication and piecewise linear approximation safety activation function of a BGV homomorphic encryption scheme in a ciphertext domain, the selective decryption module is used for executing selective decryption on a high risk association path based on a threshold decryption and differential privacy mechanism, the early warning report generation module is used for generating a desensitized real-time early warning report which comprises an event type, an influence range, a risk level and an emergency treatment suggestion, and the system optimization module is used for monitoring a system performance index and optimizing target function self-adaption adjustment system parameter configuration according to delay-privacy-accuracy balance.

Description

Government system-based data management and data analysis system and analysis method thereof Technical Field The invention relates to the technical field of batch type safety calculation, in particular to a government system-based data management and data analysis system and an analysis method thereof. Background In the field of urban emergency management, when an emergency occurs, an emergency management center needs to rapidly analyze data streams from a plurality of departments such as public security, traffic, medical treatment, civil administration and the like to identify the associated influence and propagation paths of the event. However, the data of these departments contains a large amount of sensitive information, such as personal privacy data, e.g., personnel identity information, vehicle track records, medical records, etc. The prior art has the following problems: Firstly, the depth association analysis of cross-department data needs to access original data, but sensitive data cannot be transmitted and processed among departments in a plaintext form, so that the data island phenomenon is serious; secondly, the traditional batch processing type safety calculation method needs to collect all data and then encrypt the data, so that the second-level response requirement in an emergency scene can not be met; third, although a simple encryption method can protect data privacy, the association relationship between data is destroyed, so that effective association analysis cannot be performed. Disclosure of Invention The invention provides a government system-based data management and data analysis system and an analysis method thereof, which solve the technical problem of privacy leakage risk caused by plaintext transmission in the related art. The invention provides a government affair system-based data management and data analysis method, which comprises the following steps: acquiring real-time sensitive data streams of each department, performing normalization processing on the digital data to scale the data Within a range to fit a finite fieldPerforming numerical encoding conversion on the split type data into integer identifiers, performing a streaming secret sharing algorithm on each data record to generate a distributed ciphertext stream, calculating associated feature fingerprints of the ciphertext data based on homomorphic hash functions, applying homomorphic hash functions to each ciphertext fragment to calculate feature fingerprints, identifying potential data associations by comparing Euclidean distances between hash values of different ciphertext fragments, outputting a ciphertext feature mapping table, calculating ciphertext graph node attributes in parallel by utilizing a secure multiparty computing protocol, calculating the graph node attributes for the local ciphertext fragment by each department including node identifiers, node types and node weights, converting the node types into numerical vectors by adopting one-time encoding, and scaling the node weights to be achieved by adopting Min-Max standardization processingIn the range, the side relation information is exchanged through an unintentional transmission protocol to generate a distributed ciphertext graph structureThe method comprises the steps of inputting a ciphertext graph structure to a graph neural network reasoner based on secret sharing, receiving the ciphertext graph structure and a node characteristic matrix by an input layer of the graph neural network, performing Z-score standardization processing, performing message transfer operation in a ciphertext domain by a plurality of graph convolution layers, updating node characteristics by a security matrix multiplication of a BGV homomorphic encryption scheme and a security ReLU activation function of piecewise linear approximation, generating a risk score vector of each node by an output layer, and iteratively generating a high risk propagation path sequence by a path extraction decoder of an attention mechanism, and performing selective decryption on a high risk association path based on a threshold decryption and differential privacy mechanism, and performing path comprehensive risk scoring: triggering threshold decryption when a preset threshold is exceeded requires at least The individual departments provide ciphertext patches using the Lagrangian interpolation formula: restoring the original data, and adding Laplacian noise to the decrypted data through a differential privacy mechanism Generating a desensitized real-time early warning report, wherein the streaming secret sharing algorithm adopts a streaming variant of a Shamir secret sharing scheme to decompose each arriving data record into data recordsThe ciphertext fragment satisfies that only the number of acquisitions is not less thanThe homomorphic hash function satisfies homomorphic property so that hash values of ciphertext fragments corresponding to the original data with the same characteristic attribute