CN-121985296-A - Position data differential privacy protection and availability balance optimization method and system thereof
Abstract
The invention relates to the technical field of location privacy, in particular to a location data differential privacy protection and availability balance optimization method and a system thereof, comprising the following steps: receiving a position statistics request, extracting a grid resolution fence boundary time window, forming parameters by a statistics type and a threshold value, calculating a fence area and a time span to generate a fingerprint key, matching cache multiplexing or calculating sensitivity and determining noise variance, adding noise to the result and outputting and checking error updating parameters, the stable identification and parameter multiplexing of the same-caliber query are realized, the output fluctuation caused by repeated calibration is reduced, the query sensitivity is calculated by combining the fence area and the time span, the noise intensity is matched with the statistical scale, the non-negative constraint and the error threshold calibration are introduced into the disturbance result, and the stability of the result under multiple queries is maintained.
Inventors
- ZHAO YICHAO
- ZHAO LIMIN
Assignees
- 山东睿擘瑞励商贸有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260203
Claims (10)
- 1. The position data differential privacy protection and usability balance optimization method is characterized by comprising the following steps of: s1, receiving a regional thermodynamic diagram, grid counting, the number of stay points and track segment frequency query request, extracting space grid resolution, a geofence boundary, time window starting and ending moments, statistic types and statistic caliber threshold values, and establishing a request and parameter mapping relation to form a position statistic query parameter set; S2, calculating the area of the geofence and the span of the time window based on the position statistical query parameter set, sequentially encoding and splicing the resolution of the space grid, the area of the fence, the span of the time window, the statistic type and the statistic caliber threshold value, and generating a query caliber fingerprint index key; S3, based on the inquiry caliber fingerprint index key, searching a sensitivity and noise parameter cache table, multiplexing inquiry sensitivity and noise variance when the inquiry is valid, recalculating inquiry sensitivity when the inquiry is not valid or invalid, determining the noise variance by combining privacy budget and writing the noise variance into the noise parameter cache table to obtain an inquiry noise calibration parameter set; s4, generating a differential privacy disturbance statistical output value based on the query noise scaling parameter set and the noise disturbance processing statistical result count value; And S5, calculating an error metric value based on the differential privacy disturbance statistical output value, comparing the error metric value with an availability error threshold value, updating the noise variance when the threshold value is exceeded, writing the noise variance into a noise parameter cache table, and keeping the parameters unchanged when the threshold value is not exceeded to form a noise parameter update record.
- 2. The method for optimizing position data differential privacy protection and availability balance according to claim 1, wherein the position statistics query parameter set comprises a spatial grid resolution parameter, a geofence boundary parameter, a time window start-stop time parameter, a statistic type parameter and a statistic caliber threshold parameter, the query caliber fingerprint index key comprises a resolution coding section, a fence area coding section, a time window span coding section, a statistic type coding section and a threshold coding section, the query noise scaling parameter set comprises a query sensitivity parameter, a noise variance parameter, a validity period information parameter and a multiplexing marking parameter, the differential privacy disturbance statistic output value comprises a disturbance synthesis count value, a zero value replaced count value and a non-negative correction count value, and the noise parameter update record comprises an error measurement value item, an availability error threshold item, a noise variance update value item and a cache writing identification item.
- 3. The method for optimizing the balance between the privacy protection and the availability of the location data difference according to claim 1, wherein the specific steps of S1 are as follows: S101, acquiring a regional thermodynamic diagram query request, a grid counting query request, a stay point number query request and a track fragment frequency query request received by a location service statistics query interface, acquiring a request identifier and a request type mark corresponding to the query request, recording an access sequence aiming at different request identifiers and forming a request set number to obtain a query request type mark value; s102, based on the query request type marking value, collecting the space grid resolution, the geofence boundary, the time window starting and ending time and the statistic type carried by the query request, executing parameter field verification and comparing the parameter field verification formats, integrating space parameters and time parameters by the set number aiming at the same request, and recording parameter sets to obtain the query parameter set quantity; s103, collecting a statistical caliber threshold carried by a query request according to the query parameter combination quantity, judging the consistency of a lower threshold of a request set number and the parameter combination quantity, calling the threshold, recording the corresponding relation between the threshold and the parameter combination quantity, and establishing a mapping record of the query request and all parameters to obtain a position statistical query parameter set.
- 4. The method for optimizing the balance between location data differential privacy protection and availability of claim 3, wherein the specific steps of S2 are as follows: S201, collecting a geofence boundary coordinate sequence and a coordinate unit based on the position statistics query parameter set, judging the consistency of the head and tail points of the boundary coordinate sequence, eliminating repeated points, calculating the accumulated value of the connecting line areas of adjacent coordinate points, and recording area calculation marks to obtain a geofence area value; s202, calling the area value of the geofence, collecting the start and stop time and time zone marks of a time window, calculating the difference value of the start and stop time and converting the difference value into a second-level span measure, judging that the span measure is non-negative, recording the time window number, combining the area value and the time window number to form a combined record, and obtaining the time window span value; S203, according to the time window span value, calling the space grid resolution, the geofence area value, the statistic type and the statistic caliber threshold value, sequentially encoding the fields, concatenating the separators, judging that the length of the concatenated result is consistent with the number of the fields, writing the concatenated result into an index key library, establishing a query caliber field sequence mapping record, and generating a query caliber fingerprint index key.
- 5. The method for optimizing the balance between privacy protection and availability of location data as claimed in claim 4, wherein the specific step of S3 is: s301, according to the inquiry caliber fingerprint index key, executing sensitivity and noise parameter cache table to search and locate a matched record item, judging a cache hit state and a valid period state, simultaneously writing a state mark, reading inquiry sensitivity and noise variance aiming at the hit and valid state, writing a multiplexing mark, recording the corresponding relation between the index key and the state mark as well as the multiplexing mark, and generating a cache hit valid mark quantity; S302, based on the cache hit effective marking quantity, calling a position statistics query parameter set aiming at a miss or invalid state, reading space grid resolution, geo-fence area, time window span, statistic type and statistic caliber threshold value, calculating the ratio of the space grid resolution to the geo-fence area, calculating the product of the ratio and the time window span, writing a parameter record, and writing the statistic type and statistic caliber threshold value combination code into the parameter record to generate a query sensitivity value; S303, calling the query sensitivity value, collecting global sensitivity and privacy budget parameters, writing budget marks, calculating the ratio of the global sensitivity to the query sensitivity, calculating the quotient of the ratio and the privacy budget parameters, writing the noise variance and the validity period information into a sensitivity and noise parameter cache table, and generating a query noise scaling parameter set.
- 6. The method for optimizing the balance between privacy protection and availability of location data as claimed in claim 5, wherein the specific step of S4 is: s401, calling the query noise calibration parameter set, reading the noise variance and the corresponding query request identification, collecting the count value of the query request statistical result, writing the count mark, judging that the effective value exists between the noise variance and the count mark, writing the check identification, recording the corresponding relation between the query request identification, the noise variance and the count value, establishing a mapping record index, and generating the noise count mapping quantity; s402, calculating the square root of the noise variance based on the noise count mapping quantity, writing a scale mark, collecting a random sampling sequence, calculating the product of the random sampling sequence and the scale mark, recording a noise sampling value, writing the sampling mark, calculating the noise sampling value, summing the noise sampling value with a counting value of a statistical result, writing a synthesis mark, and generating a disturbance synthesis result; S403, judging that the synthesized result is non-negative according to the disturbance synthesized result, writing a judging mark, executing zero value substitution on the synthesized result judged to be negative, writing a substitution mark, recording the corresponding relation between the query request mark and the substitution mark, writing a differential privacy output value cache table, and generating a differential privacy disturbance statistical output value.
- 7. The method for optimizing the balance between privacy protection and availability of location data as claimed in claim 6, wherein the specific step of S5 is: S501, based on the differential privacy disturbance statistical output value, collecting a corresponding query request statistical result count value and writing a count mark, calculating a disturbance statistical output value and a statistical result count value difference value, then taking an absolute value, recording a corresponding relation between the difference value and a query request mark, writing an error calculation mark, and generating an error metric value; S502, calling the error metric value, obtaining an availability error threshold value, writing a threshold value mark, calculating the difference value between the error metric value and the availability error threshold value, recording a comparison mark, writing a threshold value judgment mark after judging that the difference value is positive or negative, establishing a corresponding relation between a query request mark and the threshold value judgment mark, and generating a threshold value overrun mark quantity; S503, according to the threshold overrun mark quantity, collecting the current noise variance and calculating an update step value according to the overrun mark as a true state, calculating the noise variance and the update step and then writing the update mark, recording the noise variance maintaining mark according to the overrun mark as a false state, writing the sensitivity and the noise parameter cache table, and generating the noise parameter update record.
- 8. The method for protecting location data differential privacy and balancing availability according to claim 1, wherein the location service statistical query interface is an interface for receiving a location statistical query request and returning a statistical result to outside in a location service system, and the source is a statistical query module of the location service system; The location statistics query request is a request type formed by any one of a regional thermodynamic diagram query request, a grid counting query request, a stay point number query request and a track fragment frequency query request, and the source of the location statistics query request is a statistics query call initiated by an upper-layer service system or a client; The resolution of the space grid is a space grid dividing scale parameter, which means the side length, area or grade code of the grid unit, and the source is a space statistic parameter carried by the query request; The geofence boundary is a boundary parameter of a statistical space range, and has the meaning of boundary information represented by polygon vertex coordinate sequences, circular center points and radius or administrative region codes, and the source of the boundary information is fence parameters carried by a query request; the starting and ending time of the time window is a starting time and ending time parameter of a statistical time range, and the starting and ending time parameter is a time window parameter carried by the query request; the statistic type is a type identifier of a statistic result category, and the statistic type is one or more of the categories of counting, frequency, number of stay points, heat and the like, and the statistic type is a statistic type field carried by a query request; The statistical caliber threshold is a threshold parameter of a statistical rule, and the statistical caliber threshold means one or a combination of thresholds such as a stay judgment threshold, a track segment dividing threshold, a minimum counting threshold and the like, and the statistical caliber threshold is derived from caliber configuration parameters carried by a query request; the location statistics query parameter set is a structured set of query request parameters, and means a parameter set at least comprising spatial grid resolution, geofence boundaries and/or geofence areas, time window start-stop moments and/or time window spans, statistic types and statistical caliber threshold values, and the source of the parameter set is a set obtained by analyzing, collecting and merging the query request parameters.
- 9. The location data differential privacy preserving and availability balance optimization method of claim 1, wherein the geofence area is an area value calculated from a geofence boundary, the source of which is a result of performing an area calculation on the geofence boundary carried by the query request; the time window span is the time length calculated by the starting and ending moments of the time window, and the source is the result obtained by calculating the difference between the ending moment and the starting moment; The inquiry caliber fingerprint index key is an index key for representing inquiry statistics caliber, and the index key has the meaning of encoding and concatenating space grid resolution, geofence area, time window span, statistic type and statistics caliber threshold according to a preset sequence to form a character string or hash value, wherein the character string or hash value is derived from an encoding result generated based on a position statistics inquiry parameter set; the sensitivity and noise parameter cache table is a cache storage structure, and the sensitivity and noise parameter cache table is a table structure or a key value storage structure which takes a query caliber fingerprint index key as an index to record query sensitivity, noise variance and validity period information, and the source is a record item set formed by cache miss or invalid branch writing; the query sensitivity is a sensitivity parameter of differential privacy statistical query, and the sensitivity parameter means an upper bound value of statistical result change under the condition of adjacent data sets, and the source of the query sensitivity parameter is a sensitivity result obtained by calculating based on spatial grid resolution, geo-fence area, time window span, statistic type and statistic caliber threshold; the privacy budget parameters are budget parameters of a differential privacy mechanism, the meaning of the privacy budget parameters is differential privacy budget single parameters or differential privacy budget double parameter groups, and the sources of the privacy budget parameters are privacy configuration fields carried by a system privacy configuration strategy or a query request; the noise variance is a variance parameter of the noise distribution or a scale parameter equivalent to the variance, and the source is a parameter value obtained by calculating according to query sensitivity or global sensitivity and privacy budget parameter and writing the parameter value into a sensitivity and noise parameter cache table.
- 10. A location data differential privacy protection and availability balance optimization system for implementing the location data differential privacy protection and availability balance optimization method of any one of claims 1-9, the system comprising: The location query access module is used for executing S1, acquiring a regional thermodynamic diagram query request, a grid counting query request, a stay point number query request and a track segment frequency query request received by a location service statistics query interface, acquiring a spatial grid resolution, a geofence boundary, a time window starting and ending moment and a statistic type carried by the query request, reading a statistical caliber threshold carried by the query request, recording a corresponding relation between the query request and parameters, and generating a location statistics query parameter set; the caliber fingerprint generation module is used for executing S2, based on the position statistics inquiry parameter set, calculating the geofence area and the time window span, sequentially encoding and concatenating the space grid resolution, the geofence area, the time window span, the statistic type and the statistic caliber threshold value, and generating an inquiry caliber fingerprint index key; The noise calibration caching module is used for executing S3, according to a query caliber fingerprint index key, searching a sensitivity and noise parameter caching table, judging a cache hit state and a valid period state, wherein a caching table record item is formed by parameter records written by missed or invalid branches, reading the query sensitivity and noise variance in the caching table record item and writing multiplexing marks when hit and valid, calling a position statistics query parameter set to read the spatial grid resolution, the geofence area, the time window span, the statistic type and the statistic caliber threshold value when miss or invalid, calculating the query sensitivity, determining the noise variance according to the global sensitivity and the privacy budget parameter, writing the query sensitivity, the noise variance and valid period information into the sensitivity and noise parameter caching table, and generating a query noise calibration parameter set; the privacy disturbance output module is used for executing S4, namely calling a query noise scaling parameter set, reading a noise variance and a statistical result count value corresponding to a query request, calculating a noise sampling value and disturbance processing the statistical result count value to obtain a disturbance synthesis result, judging the disturbance synthesis result nonnegatively, replacing a negative value with a zero value, and generating a differential privacy disturbance statistical output value; And the error constraint adjustment module is used for executing S5, namely calculating an error metric value based on the differential privacy disturbance statistical output value, acquiring an availability error threshold value, comparing the error metric value with the availability error threshold value, updating the noise variance when the availability error threshold value is exceeded, writing the noise variance into the sensitivity and noise parameter cache table, and keeping the noise variance unchanged when the availability error threshold value is not exceeded, so as to generate a noise parameter update record.
Description
Position data differential privacy protection and availability balance optimization method and system thereof Technical Field The invention relates to the technical field of location privacy, in particular to a location data differential privacy protection and usability balance optimization method and a system thereof. Background The technical field of position privacy mainly relates to privacy protection of position information generated by a mobile internet, an internet of things and an intelligent terminal in the processes of acquisition, transmission, storage, sharing and release, wherein core matters comprise position data access authorization, track association risk assessment, desensitization release of position statistics results and a data processing method under privacy parameter constraint, and the position data protection flow is generally constructed by carrying out disturbance processing on statistics such as position counting, regional heat, number of stay points and the like before release, wherein the traditional position data differential privacy protection and availability balance optimization method refers to the process of providing position statistics or track frequency analysis results, according to the differential privacy budget allocation rule, random noise is added to statistical output of multiple queries and accumulated privacy consumption is limited to meet privacy parameter constraint, the technical matters aimed at by the patent theme are how to set privacy budgets and select a noise injection mode to consider statistical usability under the condition of multidimensional position statistics or multiple queries, the traditional mode generally adopts fixed total privacy budgets and distributes the privacy budgets evenly according to query times or according to statistical task importance, a Laplace mechanism or a Gaussian mechanism is adopted to inject noise to position counting results, grid heat values or track frequency results respectively, and budget allocation and noise intensity are compared and adjusted through indexes such as absolute errors, mean square errors or confidence interval width. The traditional position statistics differential privacy balance optimization mode is used for carrying out budget allocation around the number of inquiry times or the importance of tasks, noise is independently added to the count or heat value when the count or heat value is output outwards each time, the statistical caliber is often expressed in an implicit mode by parameter combination in an actual interface layer, the structural solidification and deterministic identification of the caliber elements are lacked, the same service caliber is difficult to be reliably identified as the same caliber under the conditions of different calling parties, different time window granularity and different fence fine adjustment, repeated calibration and repeated trial calculation phenomena occur, output fluctuation is amplified along with the accumulation of the inquiry frequency, and trend change and privacy disturbance change of the service side are difficult to distinguish; the same statistics have obvious sensitivity difference under different geographic ranges and time spans, but traditional operation is more biased to drive noise intensity by fixed rules, lack of explicit quantization constraint based on fence area and time span, so that error occupation ratio is higher in a sparse counting scene of a short time window of a small fence, local hot spots of thermodynamic diagram can be smoothed by noise, abnormal fluctuation of the number of stay points can occur, unstable drifting of track segment frequency ranking can occur, meanwhile, externally issued counting class results can have negative values or statistical items which do not fit with semantics after noise superposition, extra correction or fault tolerance on a service side is needed, caliber inconsistency and interpretation disputes are introduced, for example, negative hot spots occur on regional heat, negative numbers occur on the number of stay points and directly destroy report credibility, error evaluation is mostly stopped on an offline comparison and manual parameter adjustment level, a dynamic threshold constraint mechanism linked with online output is lacked, errors are deviated from an available range on a certain caliber for a long term and are not converged in time, or data value is still reduced due to continuous excessive disturbance in the available range, and finally, the influence of insufficient interface stability, rising cross-team collaborative cost, unpredictable service experience facing multiple queries and the like is brought. Disclosure of Invention In order to solve the technical problems existing in the prior art, the embodiment of the invention provides a method for protecting the differential privacy of position data and optimizing the balance of availability, which comprises the following steps: In