Search

CN-121980612-A - Public institution internal data sharing method based on multi-source data fusion

CN121980612ACN 121980612 ACN121980612 ACN 121980612ACN-121980612-A

Abstract

The invention discloses a public institution internal data sharing method based on multi-source data fusion, and relates to the technical field of data sharing. According to the method, four-quadrant accurate division is performed on three medical structured data fused by multiple sources based on sensitive risk scores and utility influence tolerance, then differential noise adding strategies are matched in a targeted mode, three types of deviation verification and noise iterative optimization of medical quality, medical insurance foundation risks and medicine supply and demand results are combined, finally encryption and sharing are performed through blockchain intelligent contracts, the problems that the noise intensity of a traditional differential privacy technology is stiff and data with different privacy levels are difficult to adapt are solved, dynamic balance of privacy protection force and data availability is achieved, safety and credibility in the data sharing process are guaranteed, and practical values of supporting core services such as medical quality assessment, medical insurance supervision and medicine supply and demand allocation are reserved to the greatest extent on the premise that private safety requirements are met for internal data of a public institution with multiple sources.

Inventors

  • WANG JIAN
  • XIAO QINGYUN
  • Zheng Yuxiong
  • WANG YING
  • REN SHAOJIE
  • CHEN JINYONG

Assignees

  • 上海交通大学云南(大理)研究院

Dates

Publication Date
20260505
Application Date
20260123

Claims (10)

  1. 1. The public institution internal data sharing method based on multi-source data fusion is characterized by comprising the following steps of: sensitive risk score and utility impact tolerance of each structured data record generated based on electronic medical record text and drug circulation records; determining a noise addition strategy of each structured data record based on the sensitivity risk score and the utility influence tolerance, and generating a protected data set; calculating the medical quality deviation of the protected data set in a simulated business scene, the medical insurance fund risk deviation and the medicine supply and demand result deviation, and verifying whether the protected data set meets the utility verification; If the data set does not meet the noise adding strategy, optimizing the noise adding strategy, and regenerating the protected data set; and if so, encrypting the protected data set and deploying the encrypted protected data set in the blockchain intelligent contract for sharing.
  2. 2. The method for sharing data in public institutions based on multi-source data fusion according to claim 1, wherein the process of the sensitive risk score of each structured data record generated based on the electronic medical record text and the drug circulation record is: Calculating the reciprocal of the occurrence frequency of any structured data record in the whole data set, and determining the standard score of the logarithmic value in the whole data set according to the reciprocal; simultaneously determining a ratio of the number of medical entity types linked by the structured data record to the total number of all possible entity types, and determining a percentile ranking in the overall data set according to the ratio; And multiplying the standard score after the standardization processing by the percentile ranking, and then performing compression mapping to obtain the sensitive risk score.
  3. 3. The method for sharing data in public institutions based on multi-source data fusion according to claim 1, wherein the utility influence tolerance of each structured data record generated based on electronic medical record text and drug circulation records is as follows: Determining analysis tasks corresponding to the numerical fields of any structured data record respectively; Multiplying the standardized numerical value of each numerical value field by a weight coefficient corresponding to an analysis task, and accumulating and summing to obtain an importance value corresponding to the structured data record; and calculating the percentile rank of the importance value, and obtaining the utility influence tolerance after the percentile ranks of all the structured data records are subjected to linear inversion transformation.
  4. 4. The method for sharing data in public institutions based on multi-source data fusion according to claim 1, wherein the noise adding strategy of each structured data record is determined based on the sensitivity risk score and the utility influence tolerance, and the process of generating the protected data set is as follows: Threshold decision based on the sensitivity risk score and utility impact tolerance, dividing the structured data record into four quadrants; For the structured data record of the first quadrant, taking the ratio of the corresponding sensitive risk score to the maximum value of the sensitive risk score as the privacy budget parameter of each numerical field in the corresponding structured data record, and independently generating and adding calibration noise for each numerical field by combining a Laplace mechanism; According to the analysis task, carrying out aggregation query on all structured data records of the second quadrant to obtain corresponding global sensitivity, and adding calibration noise on all structured data records based on the global sensitivity and preset global privacy budget parameters; For the structured data record divided into the third quadrant, randomly sampling the numerical field in the structured data record with preset probability, and adding fixed noise with set intensity to the sampled numerical field; for the structured data record of the fourth quadrant, identifying sensitive fields in the structured data record and executing local differential privacy and noise operation according to a predefined sensitive field list; And respectively carrying out privacy protection verification on the structured data records subjected to noise addition in the four quadrants, outputting a protected data set if the structured data records pass the verification, and carrying out noise enhancement if the structured data records do not pass the verification.
  5. 5. The method for sharing data in a public institution based on multi-source data fusion as claimed in claim 4, wherein the process of dividing the structured data record into four quadrants is to make a threshold decision based on the sensitivity risk score and the utility influence tolerance: Taking the 75 th percentile in the sensitive risk score value distribution of all the structured data records as a risk threshold, and taking the 25 th percentile in the utility influence tolerance value distribution as a tolerance threshold; dividing structured data records that are greater than a risk threshold and not greater than a tolerance threshold into first quadrants; Dividing the structured data record greater than the risk threshold and greater than the tolerance threshold into a second quadrant; dividing the structured data record not greater than the risk threshold and greater than the tolerance threshold into a third quadrant; The structured data record is divided into a fourth quadrant that is not greater than the risk threshold and not greater than the tolerance threshold.
  6. 6. The method for sharing data in public institutions based on multi-source data fusion according to claim 4, wherein the process of aggregating and querying all structured data records in the second quadrant according to the analysis task to obtain the corresponding global sensitivity is as follows: If a data field within a structured data record relates to a count query, the global sensitivity of the data field is defined as 1, and if a sum query relating to a cost amount or a drug quantity, the global sensitivity is defined as the maximum of the data field in all historical structured data records.
  7. 7. The method for sharing data in public institutions based on multi-source data fusion according to claim 4, wherein the process of privacy protection verification on the structured data records with noise added in four quadrants is: For the first quadrant and the fourth quadrant, adopting K-neighbor-based re-recognition attack simulation to verify whether privacy protection intensity of each structured data record subjected to noise addition meets the standard; For the second quadrant, adopting member inference attack simulation based on statistical difference to verify whether privacy protection intensity of the structured data record subjected to noise addition meets the standard; and for the third quadrant, calculating the number of other structured data records with the distance from the non-noisy structured data record in the feature space smaller than a preset threshold value after the noise is added, and judging that the anonymity of the record meets the standard if the number is larger than the preset number threshold value.
  8. 8. The method for sharing data in public institutions based on multi-source data fusion according to claim 7, wherein privacy protection verification is not passed, and noise enhancement is performed by: Adjusting the structured data record of the second quadrant which is not up to standard to the first quadrant to carry out noise enhancement processing again; adjusting the structured data record of the third quadrant which is not up to standard to the second quadrant for carrying out noise enhancement treatment again; The structured data record of the fourth quadrant which is not up to standard is adjusted to the third quadrant for noise enhancement processing again; and for the structured data record of the first quadrant which is verified to be unqualified, halving the privacy budget parameter, and carrying out noise enhancement processing again.
  9. 9. The method for sharing data in public institutions based on multi-source data fusion according to claim 1, wherein the process of calculating the medical quality deviation of the protected data set in the simulated business scenario, the medical insurance fund risk deviation and the medicine supply and demand result deviation is as follows: counting the initial occurrence times of diagnostic codes of all structured data records and the initial average value of each checking and checking result numerical value field, and carrying out difference value calculation on the initial occurrence times and the initial average value, the noisy occurrence times and the noisy average value corresponding to the protected data set, so as to obtain medical quality deviation; Counting the cost sum of each cost field of all the structured data records, and performing difference calculation on the added cost sum corresponding to the protected data set to obtain medical insurance foundation risk deviation; And counting the sum of the ex-warehouse numbers of each medicine identifier field of all the structured data records, and carrying out difference calculation on the sum of the ex-warehouse numbers after adding noise corresponding to the protected data set to obtain the deviation of medicine supply and demand results.
  10. 10. The method for sharing data in public institutions based on multi-source data fusion according to claim 9, wherein the process of verifying whether the protected data set meets the utility check is: And if the medical quality deviation, the medical insurance foundation risk deviation and the medicine supply and demand result deviation are not larger than the corresponding thresholds, judging that the protected data set passes the utility verification, otherwise, judging that the verification does not pass.

Description

Public institution internal data sharing method based on multi-source data fusion Technical Field The invention relates to the technical field of data sharing, in particular to a public institution internal data sharing method based on multi-source data fusion. Background In the public institution internal data sharing of multi-source data fusion, the accurate noise intensity adaptation of the differential privacy technology is a link for balancing privacy protection and data availability, however, in public institution data sharing scenes such as actual three doctors, factors such as multiple heterogeneous data sources, obvious privacy level difference, various service utility demands and the like are commonly existed, and the defects that the prior art mainly adopts unified noise intensity setting, lacks the accurate distinction of the data privacy level, does not combine the characteristics such as data sensitive risk score, service scene importance and the like, and cannot realize the accurate adaptation of high-noise protection of high-sensitive data and low-noise retention of low-sensitive data are revealed. Secondly, the existing scheme ignores field type differences of multi-source data, noise intensity sets inherent attributes of unassociated data, and noise and data characteristics are not matched, so that privacy is revealed due to insufficient noise, or data statistical value is destroyed due to excessive noise. Therefore, a differential privacy application scheme with the capability of privacy level fine distinction, data characteristic deep adaptation and closed loop optimization adjustment is needed to solve the technical bottleneck and realize the dynamic balance of privacy protection and data availability in multi-source data sharing in public institutions. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a public institution internal data sharing method based on multi-source data fusion, which solves the problem that the prior art shares public institution internal data of multi-source data fusion, because the differential privacy technology is difficult to accurately set noise intensity for sensitive data according to the data privacy level. The public institution internal data sharing method based on multi-source data fusion comprises the following steps of scoring sensitive risks and utility influence tolerance of each structured data record generated based on electronic medical record texts and medicine circulation records. A noise addition policy for each structured data record is determined based on the sensitivity risk score and the utility impact tolerance, and a protected data set is generated. And calculating the medical quality deviation of the protected data set in the simulated business scene, the medical insurance foundation risk deviation and the medicine supply and demand result deviation, and verifying whether the protected data set meets the utility verification. If not, optimizing the noise adding strategy, and regenerating the protected data set. And if so, encrypting the protected data set and deploying the encrypted protected data set in the blockchain intelligent contract for sharing. Compared with the prior art, the method has the advantages that four-quadrant accurate division is firstly carried out on three multi-source fused medical structured data based on sensitive risk scores and utility influence tolerance, then differential noise addition strategies are matched in a targeted mode, three types of deviation verification and noise iterative optimization of medical quality, medical insurance fund risks and medicine supply and demand results are combined, finally encryption and sharing are carried out through a blockchain intelligent contract, the problems that the traditional differential privacy technology noise intensity setting is stiff and the data of different privacy levels are difficult to adapt are solved, dynamic balance of privacy protection force and data availability is achieved, meanwhile safety and credibility of a data sharing process are guaranteed, and practical values of supporting core services such as medical quality assessment, medical insurance supervision, medicine supply and demand allocation and the like are reserved to the maximum extent on the premise that the internal data of a multi-source heterogeneous public institution meets the privacy safety requirement are met. Drawings FIG. 1 is a flow chart of a method for sharing data in a public institution based on multi-source data fusion. FIG. 2 is a flow chart of the method for obtaining a sensitive risk score in the method for sharing data in public institutions based on multi-source data fusion. FIG. 3 is a flow chart of utility influence tolerance obtained in the method for sharing public institution internal data based on multi-source data fusion. Detailed Description The present invention will be described in detail below with reference to the