CN-121985327-A - Internet of vehicles track privacy protection method based on deep reinforcement learning and road network constraint

CN121985327ACN 121985327 ACN121985327 ACN 121985327ACN-121985327-A

Abstract

The invention provides a vehicle networking track privacy protection method based on deep reinforcement learning and road network constraint, which relates to the technical field of vehicle networking information safety and comprises the steps of fusing urban road topology and node semantic attributes into a high-dimensional state space by constructing a directional weighted attribute diagram; training an agent by utilizing a dual-delay depth deterministic strategy gradient algorithm, outputting a mixed decision comprising continuous privacy budget adjustment and discrete path planning, introducing dynamic mask constraint on the basis of a geographic indistinguishability mechanism, screening legal candidate node sets through probability truncation, updating an action space, designing a multi-target composite rewarding function, fusing road network distance disturbance intensity, semantic transition probability difference and data utility penalty, and guiding the agent to adaptively adjust privacy protection intensity. The invention strictly accords with the track data of the differential privacy constraint, obtains the optimal balance among physical reachability, semantic confusion and data availability, and can resist inference attack based on space-time law and background knowledge.

Inventors

ZHU NAFEI
LIU JINGMIAO
HE JINGSHA
PENG MINGXU

Assignees

北京工业大学

Dates

Publication Date: 20260505
Application Date: 20260126

Claims (10)

1. The internet of vehicles track privacy protection method based on deep reinforcement learning and road network constraint is characterized by comprising the following steps: abstracting the urban road network into a directed weighted attribute graph G= (V, E, W, S), wherein V is a road network node, E is a road communication edge, W is an edge weight set for representing the road network distance between nodes, and S is a node semantic attribute set; building a Markov decision process state space, and building a state of an agent at a time step t Defined as multidimensional feature tuples Wherein And The node indexes of the vehicle in the directional weighted attribute graph G at the current time and the last time respectively, For the semantic attribute vector of the current node, For the remaining privacy budget; constructing an intelligent body based on a dual-delay depth deterministic strategy gradient algorithm, wherein the intelligent body adopts a dual-head Actor strategy network, and comprises a continuous budget action head and a discrete topology action head to form a mixed action space; Privacy budgets receiving the continuous budget action header output Calculating the current true position based on a geographic indistinguishability mechanism Introducing a probability cutoff threshold delta to screen out an effective candidate set from road network nodes for central position disturbance probability distribution Constructing a binary mask vector, which belongs to The corresponding position of the node of (2) is set to be 1, otherwise, is set to be 0; outputting original scoring vectors of all road network nodes in the directional weighted attribute graph by using the discrete topological action head, and filtering the original scoring vectors of all road network nodes by using the binary mask vector to generate probability distribution Selecting the node with the highest probability as the release position Obtaining the final action ; Setting a multi-objective composite rewards function of the agent Wherein To reward physical location disturbances based on the logarithm of road network distance between the real location and the release location, To achieve semantic privacy gain rewards by maximizing semantic mutual information of real locations and published locations, As a utility penalty term based on distance deviation and connectivity, ω 1 、ω 2 、ω 3 is a weight coefficient; and storing samples generated by the interaction of the intelligent agent and the urban road network into an experience playback pool, and completing the optimization of the parameters of the double-head Actor strategy network of the intelligent agent through target Q value calculation, critic network updating, actor network delay updating and soft updating mechanisms.
2. The internet of vehicles track privacy protection method based on deep reinforcement learning and road network constraint of claim 1, wherein the abstracting the urban road network into the directed weighted attribute graph g= (V, E, W, S) comprises: extracting a road network topological structure based on open source map data, and establishing an adjacency matrix and an edge weight matrix, wherein adjacency matrix elements represent whether directly connected roads exist between nodes or not; Mapping POI data to nearest road network nodes through a space connection algorithm, defining K semantic categories, converting node semantic labels into semantic vectors by adopting One-Hot coding, and splicing longitude and latitude coordinates and the semantic vectors to form state information of each node.
3. The method for protecting the track privacy of the Internet of vehicles based on deep reinforcement learning and road network constraint of claim 1, wherein the continuous budget action head adaptively enlarges or reduces the probability range of position disturbance according to the sensitivity of the current position and outputs a scalar quantity E [ -1,1], warp transformation mapping to privacy budget The transformation formula is: Wherein, the Representing a minimum privacy budget and, Representing the maximum privacy budget.
4. The method for protecting the track privacy of the Internet of vehicles based on deep reinforcement learning and road network constraint according to claim 1, wherein the discrete topology action head outputs original scoring vectors of all road network nodes in the directional weighted attribute graph ; The original scoring vector reflects the expected value of the agent on each node in the road network as a semantic shield point, and the higher the original scoring of the node is, the more suitable the agent considers the node to be a disguised position of the current position in terms of semantics.
5. The internet of vehicles track privacy protection method based on deep reinforcement learning and road network constraint of claim 1, wherein the geographic indistinguishability-based mechanism calculates the current true position A location disturbance probability distribution that is centered, comprising: the calculation follows the popularization form of Laplace distribution on road network distance, and the formula is: Wherein, the 、 Respectively representing the current real position and the target node; Representing real nodes With the target node Shortest net distance between.
6. The internet of vehicles track privacy protection method based on deep reinforcement learning and road network constraint of claim 1, wherein the effective candidate set Φt is defined as: Wherein, the 、 Respectively representing the current real position and the target node; expressed in terms of current true position Probability distribution of position disturbance as center Delta represents a preset probability cutoff threshold; and if the probability value of the road network node is larger than the probability stage threshold value, selecting an effective candidate set at the current moment.
7. The method for protecting the track privacy of the Internet of vehicles based on deep reinforcement learning and road network constraint according to claim 1, wherein the original scoring vectors of all road network nodes are filtered through the binary mask vector to generate probability distribution The calculation formula is as follows: Wherein, the The binary mask vector is represented as such, Representing the original scoring vector.
8. The method for protecting the track privacy of the Internet of vehicles based on the deep reinforcement learning and the road network constraint of claim 1, wherein the physical position disturbance rewards based on the logarithm of the road network distance between the real position and the release position The calculation formula of (2) is as follows: Wherein, the Representing the nodes of the true location, The larger the road network distance between the true position and the release position is, the more difficult an attacker locates the true position is.
9. The method for protecting the track privacy of the Internet of vehicles based on deep reinforcement learning and road network constraint according to claim 1, wherein semantic privacy gain rewards are realized by maximizing semantic mutual information of real positions and release positions The calculation formula of (2) is as follows: Wherein, the And Semantic types of the current moment and the last moment of the real track respectively, And Semantic types of the current time and the last time of the release track are respectively; The larger the difference between the generated track and the real uncanny in the semantic transition probability, the more the forward rewards are given, and the higher the capability of the agent to successfully mask the real semantic intent of the user.
10. The method for protecting the track privacy of the Internet of vehicles based on deep reinforcement learning and road network constraint of claim 1, wherein the optimization of the network parameters of the dual-head Actor strategy of the intelligent agent comprises the following steps: the calculation formula of the target Q value is as follows: Wherein, the Indicating the current prize, gamma indicating the discount factor, Representing a target Critic network output; The updating of the Critic network is realized by minimizing the mean square error loss between the evaluation value and the target value of the Critic network, and the loss function is as follows: where N represents the number of samples, Representing a current Critic network output; Every 2 times the Critic network is updated, the double-head Actor strategy network is updated 1 time, and parameter updating is carried out by maximizing the value evaluation value of the Critic network, wherein the formula is as follows: ; the soft update is to update the parameters of all target networks by exponential moving average, wherein the parameters of the target networks are slowly closed to the current network, and the formula is as follows: θ'←τθ+(1-τ)θ' where τ represents the learning rate.

Description

Internet of vehicles track privacy protection method based on deep reinforcement learning and road network constraint Technical Field The invention relates to the technical field of Internet of vehicles information security, in particular to an Internet of vehicles track privacy protection method based on deep reinforcement learning and road network constraint. Background With the rapid development of the internet of vehicles (Internet of Vehicles, ioV) and Intelligent Transportation Systems (ITS), vehicles have become key sensing nodes in mobile internet environments. The massive vehicle track data are collected in real time through a V2X communication technology and uploaded to the cloud, and the data contain rich space-time distribution rules and behavior characteristics, so that the method has extremely high value in the fields of traffic flow prediction, road network planning and the like. However, the vehicle track data contains sensitive information such as position, path and time, and if the vehicle track data is directly released, privacy such as home address, living habit and the like of the user may be compromised. Therefore, how to protect the privacy of users while releasing the value of data is a current urgent problem The current track privacy protection technology mainly comprises K-anonymity, encryption technology, differential privacy, location blurring technology, access control, location data desensitization and the like. The differential privacy (DIFFERENTIAL PRIVACY, DP) is a mainstream standard in the current privacy protection field because of the strict mathematical proof and the characteristics irrelevant to background knowledge. In particular to a geographic indistinguishability (Geo-indistinguishability) framework, random noise conforming to a specific probability distribution (such as Laplacian distribution or Gaussian distribution) is added on a real position, so that an attacker can hardly distinguish the real position from a nearby position, and position protection is realized. Because the traditional differential privacy mechanism faces the challenge of privacy budget allocation rigidness in practical application, the privacy budget cannot be allocated reasonably according to the personalized requirements of users. To further solve the problem of adaptive adjustment of privacy policies in dynamic environments, more and more research is focused on applying reinforcement learning (Reinforcement Learning, RL) to track privacy protection. The prior technical proposal utilizes the self-adaptive exploration capability of the intelligent agent to dynamically adjust the noise adding strategy or privacy budget allocation according to the current track state so as to seek balance between privacy protection intensity and data statistics utility. Although the existing technical scheme combining reinforcement learning and differential implicit private improves policy flexibility, the following significant drawbacks still exist when processing a complex internet of vehicles real road network environment: (1) Ignoring road network topology constraints, the data physical reality is poor, and most of the existing reinforcement learning models simplify the environment into an open two-dimensional Euclidean plane, and seriously neglecting the real road network topology constraints (such as road connectivity and the like). The intelligent agent is easy to generate a large amount of 'illegal tracks' deviating from an actual road, crossing a building and even falling in an unvented area such as a water area in the optimization process, so that the physical reality of the data is seriously damaged, and the protected data is hardly available in a downstream task based on a road network. (2) Lacking Semantic perceptibility, the defense pertinence is weak, and most of the existing agents only pay attention to geometric features (such as longitude and latitude coordinates) of the track, and neglect Semantic Context (Semantic Context) behind the track points. The intelligent agent can not sense whether the current position is a high-sensitivity 'hospital/residence' or a low-sensitivity 'public road', so that a cut-off or improper disturbance strategy is adopted under different semantic scenes, and inference attack combined with semantic background knowledge can not be effectively resisted. Aiming at the defects of the prior art, the invention aims to solve the technical problem of providing a vehicle networking track privacy protection method based on deep reinforcement learning to realize the optimal balance of personalized privacy protection and data availability. Disclosure of Invention Aiming at the problems in the background technology, the invention provides the internet of vehicles track privacy protection method based on deep reinforcement learning and road network constraint, which ensures that the generated track strictly accords with physical traffic rules and realizes the optimal balance of