Search

CN-122027353-A - Internet of vehicles communication security situation assessment and decision optimization method based on reinforcement learning

CN122027353ACN 122027353 ACN122027353 ACN 122027353ACN-122027353-A

Abstract

The invention discloses a vehicle networking information security situation assessment and decision optimization method based on reinforcement learning, which is applied to vehicle-to-vehicle and vehicle-to-cloud communication scenes. The method comprises the following steps of 1, collecting multisource internet of vehicles safety data, 2, preprocessing the multisource data, extracting feature vectors representing internet of vehicles safety situations, 3, inputting the feature vectors into a reinforcement learning model, outputting defending actions according to the current safety situations by the model, 4, executing the defending actions, obtaining rewarding signals, generating the rewarding signals based on changes of the internet of vehicles safety situations, and 5, updating parameters of the reinforcement learning model according to the rewarding signals, and optimizing internet of vehicles safety situation assessment and defending decisions. The system adopts a multi-agent cooperative mechanism of federal learning to share defending knowledge between each vehicle node and road infrastructure, thereby improving cooperative defending capability. The method has the advantages of self-adaptability and distributed coordination, and can effectively cope with diversified security threats in the Internet of vehicles.

Inventors

  • HUA YUCHENG
  • ZHAO YUYU

Assignees

  • 东南大学

Dates

Publication Date
20260512
Application Date
20260409

Claims (10)

  1. 1. The vehicle networking communication security situation assessment and decision optimization method based on reinforcement learning is characterized by comprising the following steps of: Step 1, collecting multi-source internet of vehicles safety data, wherein the multi-source data comprises network flow data of vehicle-to-vehicle (V2V) and vehicle-to-cloud (V2C) communication, vehicle-mounted system log information and vehicle equipment state information; Step 2, preprocessing the collected multi-source data, and extracting feature vectors containing the safety situation features of the Internet of vehicles; Step 3, inputting the feature vector into a reinforcement learning model, and outputting a defending action by the model according to the current security situation; Step 4, executing defending actions and acquiring rewarding signals, wherein the rewarding signals are generated based on the security situation change of the Internet of vehicles; And 5, updating the reinforcement learning model parameters according to the reward signals, and optimizing the security situation assessment and defense decision strategy of the Internet of vehicles.
  2. 2. The method for evaluating and optimizing security situation of Internet of vehicles based on reinforcement learning according to claim 1, wherein the multi-source data in the step 1 further comprises vehicle internal network data, security event alarm information, historical security policy data and external threat information data, wherein the vehicle internal network data acquires communication information of each control unit of the vehicle through a vehicle bus, the security event alarm information is generated by security devices such as intrusion detection and fire wall in the vehicle or infrastructure, the historical security policy data records security policies implemented in different time periods and effects thereof, and the external threat information data is acquired from a security information source in real time through a standard interface.
  3. 3. The method for evaluating the safety situation and optimizing the decision of the Internet of vehicles based on reinforcement learning according to claim 1 is characterized in that the method comprises the following steps of step 2, removing abnormal outliers in collected data through an outlier detection algorithm, filling the lost data by using a missing value interpolation method, guaranteeing the integrity and reliability of the input data, carrying out scale unification on numerical data by adopting a normalization method, and eliminating the influence of feature dimension differences of different sources on model training; Wherein the method comprises the steps of A characteristic value representing the ith sample in the acquired original data sequence, And the characteristic value of the ith sample obtained after normalization processing is represented, and x represents the original data set formed by the characteristic in the acquired sample. Based on the set time sliding window, extracting d-dimensional normalized characteristic data of N continuous time steps in the window, constructing a characteristic matrix X, Where N represents the total number of sample sequences contained within the time sliding window and m represents the index of the feature dimension, ; Representing an mth normalized eigenvalue of an ith sample in the eigenvalue matrix X; Representing an empirical mean of the mth feature dimension over a time window; Representing the characteristic value after the centralization treatment by all Constructing a centralized feature matrix , Based on centralized feature matrix Calculating covariance among feature dimensions, constructing a covariance matrix C, decomposing the feature value of C to obtain the feature value and the feature vector of the main component, Wherein the method comprises the steps of Represents the j-th eigenvalue obtained by solving, Representation and feature values The corresponding j-th feature vector satisfies Further, d eigenvalues obtained by solving are arranged in descending order from big to small, namely, the requirement is satisfied , Feature dimension reduction and time sequence feature extraction using Principal Component Analysis (PCA) in combination with a time sliding window, preserving k such that the cumulative variance contribution rate 0.95, 。
  4. 4. The method for evaluating and optimizing security situations of Internet of vehicles based on reinforcement learning according to claim 1, wherein the reinforcement learning model in the step 3 is a deep Q network (A-DQN) with enhanced attention mechanism, local spatial features of the feature vectors are extracted by the model through a convolutional neural network, weight distribution of different feature dimensions is calculated through a self-attention mechanism, value scores of defensive actions are output through a fully connected network based on the weighted feature vectors, and optimal actions are selected, wherein the defensive actions comprise communication firewall rule updating, intrusion detection strategy adjustment and vulnerability restoration priority ordering.
  5. 5. The method for evaluating and optimizing the security situation of the Internet of vehicles based on reinforcement learning according to claim 1, wherein the reward signal of the step 4 is designed as a multidimensional weighting function, and specifically comprises the following steps: forward rewarding, namely attack flow reduction proportion, system vulnerability risk reduction index and normal business flow retention rate; Negative punishment, namely defending action resource consumption cost and service interruption time caused by misjudgment, wherein the expression of the rewarding function is as follows: Wherein alpha, beta, gamma, delta As the weight coefficient, deltaA is attack flow reduction rate, deltaV is vulnerability risk reduction rate, For availability maintenance, C is a resource consumption index, and D is a service interruption duration.
  6. 6. The reinforcement learning-based vehicle networking security situation assessment and decision optimization method according to claim 1, wherein the parameter updating in step 5 adopts an improved strategy gradient algorithm, and combines an experience playback buffer area and a target network technology: Storing the historical interaction data to an experience playback buffer; the data in the buffer area are uniformly sampled to carry out batch training, so that the data correlation is reduced; And periodically synchronizing the main network parameters to the target network, stabilizing the Q value function training process, and improving the convergence rate of the model.
  7. 7. The method for evaluating and optimizing the security situation of the Internet of vehicles based on reinforcement learning according to claim 1 is characterized by further comprising a dynamic defense strategy adjustment mechanism, wherein after a defense action is executed, the Internet of vehicles security situation is continuously evaluated through a real-time monitoring module, the monitoring module integrates an isolated forest anomaly detection algorithm and a time sequence prediction model, and when a new attack type or a continuous 3-monitoring period rise of a security risk index is detected, the steps 3 to 5 are triggered to be executed again, so that the self-adaptive adjustment of the defense strategy is realized.
  8. 8. The method for evaluating and optimizing the security situation of the Internet of vehicles based on reinforcement learning according to claim 1, wherein a multi-agent cooperative mechanism of a federal learning framework is adopted, and the method is characterized in that: defining different vehicle nodes or road side infrastructures as independent agents, each agent maintaining a local reinforcement learning model; the central coordinator adopts differential privacy technology to protect privacy data of each node when the central coordinator aggregates global model parameters; And designing a collaborative rewarding function, incorporating the cross-regional attack blocking rate into rewarding calculation, and promoting collaborative optimization of policies among the agents.
  9. 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements a reinforcement learning based internet of vehicles security situation assessment and decision optimization method according to any of the preceding claims 1 to 8 when executing the program.
  10. 10. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a reinforcement learning based internet of vehicles security situation assessment and decision optimization method according to any of claims 1-8.

Description

Internet of vehicles communication security situation assessment and decision optimization method based on reinforcement learning Technical Field The invention relates to the technical field of information security of the Internet of vehicles, in particular to an Internet of vehicles communication security situation assessment and decision optimization method based on reinforcement learning. Background With the development of intelligent networking automobile and internet of vehicles (V2X) technology, communication between vehicles and the cloud is increasingly frequent. However, open wireless communications and vehicle internal buses pose serious information security challenges to the internet of vehicles. An attacker CAN use vulnerabilities of the internet of vehicles to implement various attacks, such as man-in-the-middle attacks for intercepting and falsifying vehicle communication data, message falsification attacks for falsifying emergency messages or identity information, GPS signal spoofing for disturbing vehicle positioning, attacks for injecting malicious instructions on a vehicle CAN bus, and the like. These attacks may directly threaten traffic safety and network stability, and traditional security measures are difficult to timely and effectively cope with. The current internet of vehicles security defense mainly faces the following core challenges: The limitation of the static strategy is that the traditional defense system of the internet of vehicles often depends on a preset static security strategy (such as a fixed communication authentication mechanism, a predefined filtering rule and the like). In a vehicle-mounted network environment with high-speed change, the response of the defending based on the artificial rule to the novel unknown attack is often lagged, and the dynamically-evolving attack means are difficult to deal with. The single-dimension data perception is insufficient, that is, the traditional vehicle-mounted safety monitoring may only pay attention to certain types of data (such as only monitoring the V2X network traffic or only detecting the vehicle-mounted log), and the multi-dimension information such as vehicle internal bus data, vehicle positioning information, external threat information and the like cannot be fused. The single view angle leads to understanding of the security situation, and is difficult to discover combined attack or complex threat in time, so that the risks of misjudgment and missed judgment are increased. The rough decision and the unbalance of resources are that the existing safety decision is mostly triggered by adopting fixed rules or simple thresholds, the intelligent optimization is lacking, and the balance between the safety protection and the vehicle performance is difficult to achieve. In the Internet of vehicles, if the defending strategy is too conservative, the critical traffic information delay and even the vehicle control instruction can not be timely sent due to the excessive blocking of normal communication, otherwise, if the strategy is too loose, the attack can be released. The online self-adaptation capability is lacking, that is, a traditional machine learning model and a rule-based system cannot perform online learning update according to environmental changes, and a timely response mechanism is lacking for new attack techniques or system holes in the Internet of vehicles. The mobile environment of the vehicle is highly dynamic, and if the security policy cannot be optimized in real time along with a new attack scenario, the threat cannot be restrained in time. The problem of distributed insufficient collaboration and privacy is that the nodes of the Internet of vehicles are scattered in different vehicles and road side infrastructures, the phenomenon of data island exists when a single vehicle performs isolated security defense, and each vehicle only makes decisions according to limited perception of the vehicle, so that the global threat situation can not be obtained. In addition, the unified analysis of all vehicle data directly collected to the cloud end can cause privacy disclosure and bandwidth burden problems. Therefore, in order to solve the above problems, it is necessary to provide an intelligent security defense method capable of adapting to the low-delay, high-dynamic and heterogeneous distribution environment of the internet of vehicles, so as to enhance the capability of evaluating the security situation of the information of the internet of vehicles and optimizing the decision. The invention provides a vehicle networking communication security situation assessment and decision optimization method based on reinforcement learning, which aims to solve the technical problems. Disclosure of Invention The invention solves the problems in the safety protection of the Internet of vehicles by the following technical scheme, and comprises the following steps: And 1, collecting multisource internet of vehicles safety data. The