CN-122001685-A - Network attack and defense countermeasure test method and system based on deep reinforcement learning

CN122001685ACN 122001685 ACN122001685 ACN 122001685ACN-122001685-A

Abstract

The invention provides a network attack and defense countermeasure test method and system based on deep reinforcement learning, which relate to the technical field of network security and comprise the steps of constructing a network topology adjacency matrix and carrying out spectrum decomposition, calculating a structural distance according to node spectrum coordinates to divide an attack cluster and configure an attack agent, executing an attack action by the attack agent to obtain rewards, recording node state changes, calculating an attack propagation speed and identifying a dominant path, encoding the attack propagation speed as a feature vector and inputting the feature vector into a defending agent, executing the defending action by the defending agent to obtain rewards, and updating network parameters of the agent through reinforcement learning by utilizing rewards of both parties. The invention can automatically identify the attack propagation critical path, realize the dynamic and self-adaptive attack and defense countermeasure test, and improve the fidelity and automation level of the test.

Inventors

FAN QINGQING

Assignees

北京精微致合测试技术有限公司

Dates

Publication Date: 20260508
Application Date: 20260408

Claims (10)

1. The network attack and defense countermeasure test method based on deep reinforcement learning is characterized by comprising the following steps: Collecting node connection relations and service types of a network, and constructing a network topology adjacency matrix; Performing spectrum decomposition on the network topology adjacency matrix, calculating a characteristic value and a characteristic vector, taking the component amplitude of the characteristic vector as the spectrum coordinate of each node, and calculating the structural distance between the nodes; According to the structure distance, carrying out clustering division on the network nodes to determine attack clusters, and configuring independent attack intelligent agents for each attack cluster; each attack intelligent agent selects and executes attack actions according to the node connection relation and the service type in the corresponding attack cluster, and acquires attack rewarding feedback; recording the state change moment of nodes in each attack cluster, calculating the attack propagation speed between adjacent attack clusters, constructing a directed graph to identify an attack dominant propagation path, and encoding the attack dominant propagation path into an attack dominant propagation direction feature vector as the observation input of a defending agent; The defending intelligent agent selects and executes defending actions according to the observation input containing the feature vector of the attack leading propagation direction, and acquires defending rewarding feedback; According to attack rewarding feedback and defense rewarding feedback, respectively updating the deep neural network parameters of each attack intelligent agent and each defense intelligent agent through reinforcement learning; And outputting the penetration depth and the attack diffusion rate of each attack cluster.
2. The method of claim 1, wherein performing spectral decomposition on the network topology adjacency matrix, calculating eigenvalues and eigenvectors, taking component magnitudes of the eigenvectors as spectral coordinates of each node, and calculating structural distances between the nodes comprises: Constructing a graph Laplace operator matrix based on the network topology adjacency matrix, and carrying out spectrum decomposition operation to solve a eigenvalue spectrum and an orthogonal eigenvector base; the characteristic values in the characteristic value spectrum are arranged in ascending order according to the value, the characteristic values of the low-frequency part are selected, and corresponding orthogonal characteristic vectors are extracted, wherein the orthogonal characteristic vectors bear the global topological skeleton and the multi-level community structure of the network; performing amplitude extraction on each dimension component of the orthogonal feature vector to obtain amplitude projection of each node on each feature vector dimension; vectorizing and assembling amplitude projections of all nodes on orthogonal feature vectors to generate coordinate representation of all nodes in a spectral domain embedded space, wherein the spectral domain embedded space maps discrete graph topology into continuous manifold geometry; And calculating Euclidean distance between coordinate representations of any two nodes, multiplying each dimension component of the Euclidean distance by the reciprocal of the corresponding characteristic value, and then summing to obtain a weighted distance, and outputting the weighted distance to determine the structural distance between the nodes.
3. The method of claim 2, wherein vectorizing the assembly of the magnitude projections of each node onto orthogonal feature vectors, generating a coordinate representation of each node in spectral domain embedding space comprises: Constructing a two-dimensional mapping table of node indexes and spectrum dimensions, wherein the two-dimensional mapping table records the amplitude projection values of all nodes in each orthogonal feature vector dimension; The two-dimensional mapping table is searched for the lead-in line according to the nodes, an amplitude projection sequence of a single node in all orthogonal feature vector dimensions is obtained, vector encapsulation is carried out on the amplitude projection sequence, each amplitude projection value in the amplitude projection sequence is used as a component element of a vector, and an initial coordinate vector of the node is constructed; Calculating the spectral energy contribution degree of the characteristic value corresponding to each component of the initial coordinate vector, wherein the spectral energy contribution degree is determined by the ratio of the characteristic value to the sum of the characteristic values, and performing scale adjustment on each component of the initial coordinate vector according to the spectral energy contribution degree; and taking the coordinate vector with the adjusted scale as a coordinate representation of the node embedded in the space in the spectrum domain.
4. The method of claim 1, wherein determining attack clusters by clustering network nodes based on structure distance, configuring independent attack agents for each attack cluster comprises: extracting a structure distance set from each node to other nodes from all inter-node structure distances; The structure distance sets of all the nodes are arranged in ascending order according to the values, and the structure distance values at preset sub-points are selected from the arrangement results to determine the clustering radius; Taking any node as a seed node, searching all adjacent nodes with the structure distance smaller than the clustering radius between the seed node and the seed node, and combining the seed node and all the adjacent nodes into candidate attack clusters; Identifying density peak nodes for nodes in the candidate attack cluster, calculating the structural distance between each non-density peak node and all density peak nodes in the candidate attack cluster, determining the density peak node with the smallest structural distance for each non-density peak node, and determining a sub-attack cluster by taking each density peak node as a core; Detecting the communication paths of any two nodes in each sub-attack cluster in an original network topology adjacency matrix, supplementing relay nodes for non-communication sub-attack clusters to complete topology restoration, and determining the attack clusters; And instantiating the attack agent for each attack cluster, and respectively determining the state space dimension and the action space structure of the attack agent according to the number of nodes of the attack cluster and the attack operation type.
5. The method of claim 4, wherein identifying density peak nodes for nodes within a candidate attack cluster comprises: reading coordinate representation of each node in the candidate attack cluster in the spectral domain embedding space; Traversing all nodes in the candidate attack cluster, taking each traversed node as a central node, calculating Euclidean distance between coordinate representation of the central node and coordinate representation of other nodes in the candidate attack cluster, counting the number of nodes with Euclidean distance smaller than a preset radius from the central node, and recording the number of the nodes as local density indexes of the central node; Traversing all nodes in the candidate attack cluster, taking each traversed node as a node to be determined, querying all adjacent nodes with the structure distance smaller than the cluster radius between the nodes to be determined, If the local density index of the node to be judged is larger than the local density index of all the adjacent nodes, marking the node to be judged as a density peak value node; Outputting all density peak nodes marked in the candidate attack cluster.
6. The method of claim 1, wherein recording node state change moments in each attack cluster, calculating attack propagation speeds between adjacent attack clusters, constructing a directed graph to identify attack-dominant propagation paths, and encoding the attack-dominant propagation paths into attack-dominant propagation direction feature vectors as observation inputs for defending agents comprises: Recording the moment when the node in each attack cluster is converted from a normal state to a trapped state, and constructing a node state change moment sequence of each attack cluster; Identifying boundary nodes of adjacent attack clusters, extracting the earliest time of state transition in the boundary nodes of the adjacent attack clusters, and calculating the ratio of the difference of the time to the shortest path length between the boundary nodes as the attack propagation speed between the adjacent attack clusters; Constructing a directed graph by taking the attack clusters as nodes, taking the propagation speed of the attack between adjacent attack clusters as the weight of a directed edge, and setting the direction of the directed edge as the direction from the attack cluster with early state change moment to the attack cluster with late state change moment; Traversing all reachable paths in the directed graph, calculating path propagation velocity gradient, identifying attack acceleration diffusion nodes and attack deceleration diffusion nodes, and selecting an attack leading propagation path according to path attack diffusion liveness indexes; Extracting an attack cluster node sequence and an edge weight sequence on an attack leading propagation path, and encoding the attack cluster node sequence and the edge weight sequence into an attack leading propagation direction feature vector; and splicing the characteristic vector of the attack leading propagation direction with the current state vector of the node in each attack cluster to form an observation vector of the defending agent, and inputting the observation vector into the neural network of the defending agent to generate a defending action decision.
7. The method of claim 6, wherein traversing all reachable paths in the directed graph, calculating a path propagation velocity gradient, identifying attack acceleration and deceleration diffusion nodes, and selecting an attack dominant propagation path based on a path attack diffusion liveness index comprises: Identifying an attack cluster with zero degree of entry in the directed graph as an attack source node, and traversing all reachable paths of the directed graph from the attack source node; Extracting directed edge weight sequences on each reachable path, calculating the difference value of adjacent weight elements in the directed edge weight sequences, and summing to obtain a path propagation velocity gradient; Constructing a time sequence distribution diagram of the path propagation velocity gradient, mapping the path propagation velocity gradient to a state change time sequence of an attack cluster through which the path passes, and identifying the positions of peaks and troughs of the path propagation velocity gradient evolving along with time; marking an attack cluster corresponding to a wave crest position in a time sequence distribution diagram of the path propagation speed gradient as an attack acceleration diffusion node, and marking an attack cluster corresponding to a wave trough position as an attack deceleration diffusion node; counting the number of attack acceleration diffusion nodes and the number of attack deceleration diffusion nodes on each reachable path, calculating the ratio, and determining the path attack diffusion activity index; And selecting an reachable path with the maximum path propagation speed gradient and the maximum path attack diffusion liveness index as an attack dominant propagation path.
8. A network attack and defense countermeasure testing system based on deep reinforcement learning, for implementing the method of any of the preceding claims 1-7, comprising: the network topology unit is used for collecting node connection relations and service types of the network and constructing a network topology adjacency matrix; The spectrum decomposition calculation unit is used for carrying out spectrum decomposition on the network topology adjacency matrix, calculating a characteristic value and a characteristic vector, taking the component amplitude of the characteristic vector as the spectrum coordinate of each node, and calculating the structural distance between the nodes; the attack cluster dividing unit is used for carrying out cluster division on the network nodes according to the structure distance to determine attack clusters, and configuring independent attack intelligent bodies for each attack cluster; The attack execution unit is used for each attack intelligent agent to select and execute the attack action according to the node connection relation and the service type in the corresponding attack cluster and acquire attack rewarding feedback; The attack propagation unit is used for recording the state change moment of the nodes in each attack cluster, calculating the attack propagation speed between adjacent attack clusters, constructing a directed graph to identify an attack dominant propagation path, and encoding the attack dominant propagation path into an attack dominant propagation direction characteristic vector as the observation input of the defending agent; the defense execution unit is used for selecting and executing a defense action according to the observation input containing the characteristic vector of the attack leading propagation direction by the defense intelligent agent, and acquiring defense reward feedback; The reinforcement learning unit is used for updating the deep neural network parameters of each attack intelligent agent and each defense intelligent agent respectively through reinforcement learning according to the attack rewarding feedback and the defense rewarding feedback; And the attack result unit is used for outputting the penetration depth and the attack diffusion rate of each attack cluster.
9. An electronic device, comprising: A processor; A memory for storing processor-executable instructions; Wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 7.

Description

Network attack and defense countermeasure test method and system based on deep reinforcement learning Technical Field The invention relates to the technical field of network security, in particular to a network attack and defense countermeasure testing method and system based on deep reinforcement learning. Background The network attack and defense countermeasure test is an important means for evaluating the effectiveness of a network security defense system. With the expansion of the network scale and the complexity of the structure, the traditional testing method based on fixed rule or static vulnerability scanning has difficulty in simulating dynamic and intelligent attack behaviors such as advanced persistent threat and the like. In recent years, deep reinforcement learning technology is introduced into the field of network attack and defense due to its strong sequence decision and strategy optimization capability, and is used for training an agent capable of independently exploring a network environment and initiating a simulated attack so as to generate an attack path and a test scene which are closer to reality. In the prior art, a global attack agent or a few agents are generally focused on building to explore the whole target network, and the agents select attack actions according to the overall observation of the network state, such as node attribute, connection relationship and service vulnerability information. Execution of the attack action will change the network state and generate a corresponding reward signal, the agent updates its policy network by interacting with the environment collected empirical data, with the goal of maximizing the jackpot, and thus discovering an effective attack sequence or key vulnerability. However, because modern enterprises or infrastructure networks tend to be huge in scale and complex in structure, taking the modern enterprises or infrastructure networks as an overall observation and decision space can lead to extremely high states and action space dimensions, and cause serious dimension disaster problems, so that the training efficiency of the deep reinforcement learning algorithm is low, the convergence is difficult, and an effective test strategy is difficult to find in limited computing resources and time. More importantly, this global view ignores the inherent topology and functional partitioning characteristics within the network. The different areas of the network often have significant differences in connection density, service type and defense strength, and the propagation modes and the impact speeds of the attack behaviors are different from area to area. The structural heterogeneity cannot be effectively captured and utilized, so that the trained attack agent strategy may be too general and cannot be subjected to refined and differentiated attack tests aiming at the characteristics of a specific network area, thereby reducing the fidelity of the challenge test and the capability of discovering local key weaknesses. Disclosure of Invention The embodiment of the invention provides a network attack and defense countermeasure test method and system based on deep reinforcement learning, which can solve the problems in the prior art. In a first aspect of the embodiments of the present invention, a method for testing attack and defense countermeasures of a network based on deep reinforcement learning is provided, including: Collecting node connection relations and service types of a network, and constructing a network topology adjacency matrix; Performing spectrum decomposition on the network topology adjacency matrix, calculating a characteristic value and a characteristic vector, taking the component amplitude of the characteristic vector as the spectrum coordinate of each node, and calculating the structural distance between the nodes; According to the structure distance, carrying out clustering division on the network nodes to determine attack clusters, and configuring independent attack intelligent agents for each attack cluster; each attack intelligent agent selects and executes attack actions according to the node connection relation and the service type in the corresponding attack cluster, and acquires attack rewarding feedback; recording the state change moment of nodes in each attack cluster, calculating the attack propagation speed between adjacent attack clusters, constructing a directed graph to identify an attack dominant propagation path, and encoding the attack dominant propagation path into an attack dominant propagation direction feature vector as the observation input of a defending agent; The defending intelligent agent selects and executes defending actions according to the observation input containing the feature vector of the attack leading propagation direction, and acquires defending rewarding feedback; According to attack rewarding feedback and defense rewarding feedback, respectively updating the deep neural network parameters of