Search

CN-122020444-A - Data cross-border risk monitoring method and device based on rich information of data flow path

CN122020444ACN 122020444 ACN122020444 ACN 122020444ACN-122020444-A

Abstract

The invention discloses a data cross-border risk monitoring method and device based on rich information of a data circulation path, wherein the method dynamically collects the information of the data cross-border circulation path, constructs a heterogeneous multi-relation space-time hypergraph, abstracts a data related main body into nodes, establishes multi-type edges according to data behaviors, and dynamically updates the hypergraph while carrying multi-dimensional attributes; and constructing a collaborative behavior relation graph based on the features, mining a collaborative behavior mode, carrying out anomaly identification and judgment, and realizing multi-main collaborative abnormal behavior monitoring. By fusing the rich information of the data circulation paths, the multi-main-body path abnormality and collaborative abnormality modes in the global visual field can be effectively identified, the comprehensiveness and the accuracy of data cross-border risk monitoring are improved, and the data outbound safety is ensured.

Inventors

  • SHI JINQIAO
  • ZHANG KAI
  • LIU SEN
  • FANG ZHOU
  • MA LELE
  • ZHANG LIANLONG
  • Liu Chutong

Assignees

  • 北京邮电大学

Dates

Publication Date
20260512
Application Date
20251212

Claims (10)

  1. 1. The data cross-border risk monitoring method based on the rich information of the data flow path is characterized by comprising the following steps of: Dynamically and continuously collecting data cross-border circulation path information, and constructing a heterogeneous multi-relation space-time hypergraph based on the data cross-border circulation path information; extracting rich information features of a data flow path from four levels of nodes, edges, paths and a graph overall based on the heterogeneous multi-relation space-time hypergraph; And constructing a collaborative behavior relation diagram based on the rich information characteristics, and monitoring the multi-main collaborative abnormal behavior by mining a collaborative behavior mode, carrying out abnormal recognition and judgment.
  2. 2. The method for monitoring data cross-border risk based on rich information of a data flow path according to claim 1, wherein the step of constructing a heterogeneous multi-relation space-time hypergraph based on the data cross-border flow path information comprises the following steps: abstracting a data provider, a processor and a receiver as nodes in the graph, wherein the nodes carry main body types, industry categories, attribution regions and security level attributes; Establishing a multi-type edge according to data access and transmission behaviors, wherein the edge carries transmission behavior types, transmission time stamps, data volumes, transmission source/destination IP, transmission modes, transmission protocols, data sensitivity and encryption attributes or not; and the heterogeneous multi-relation space-time hypergraph is dynamically updated along with data cross-border service execution and time lapse.
  3. 3. The data cross-border risk monitoring method based on data flow path rich information according to claim 1, wherein the step of extracting the data flow path rich information feature from the node hierarchy comprises: Analyzing data processing frequency, data volume change trend, associated main body quantity change trend, transmission mode distribution and sensitive data duty ratio of the nodes in a unit time period, and constructing a node behavior portrait based on a time window; the topological position and neighborhood information of the nodes in the network are encoded into low-dimensional vectors by applying a graph embedding technology; The step of extracting the data stream path rich information feature from the edge level comprises: extracting a transmission behavior type, a transmission time stamp, a data volume, a transmission source/destination IP, a transmission mode, a transmission protocol, data sensitivity and multidimensional interaction characteristics whether encryption is carried out or not; calculating the relative position weight of the current edge in the whole path and the dependency relationship between the current edge and the surrounding edges; and extracting the streaming performance of the edge and the time sequence consistency between the edge and the upstream and downstream nodes.
  4. 4. The data cross-border risk monitoring method based on data flow path rich information according to claim 1, wherein the step of extracting the data flow path rich information feature from the path hierarchy comprises: generating a candidate path set by limiting a maximum path hop count threshold and using a BFS or DFS search algorithm to exclude loops and known whitelist paths; Constructing a path identifier based on the node type, the edge type and the access policy, and defining a typical path mode; Aggregating multiple circulation records of the same path mode in the historical data, and counting the occurrence frequency, the average transmission data quantity and the transmission time period distribution; Extracting path length, the number of transit nodes, high-risk node coverage rate and path opacity indexes; and (3) adopting sequence modeling and time perception embedding to code the time sequence of the nodes and the edges on the path.
  5. 5. The method for data cross-border risk monitoring based on data flow path rich information according to claim 1, wherein the step of extracting the data flow path rich information feature from the graph hierarchy comprises: Counting the number of nodes, the number of edges, the average degree of the nodes, the degree value of the maximum degree node, the number of superframes and the relation type; analyzing graph density, average path length, network diameter and global clustering coefficient; Extracting distribution proportion, relation diversity index and node attribute entropy values of the main body type and the edge type; and analyzing the Gini coefficient, the PageRank bias distribution, the cross-border data outbound path gravity center and the information flow aggregation degree.
  6. 6. The method for monitoring data cross-border risk based on rich information of a data flow path according to claim 1, wherein the step of constructing a collaborative behavior relation graph based on the rich information features comprises: Taking a data main body as a node, mining cooperative behavior edges based on path structure similarity, behavior time sequence coupling and transmission purpose similarity, wherein the weight of the edges is determined by cross-path co-occurrence degree, behavior time window overlapping degree and semantic or business similarity; introducing a sliding time window mechanism, and summarizing and composing the streaming event under different time scales; the data stream link inherited from the original path is supplemented as a structure connection edge.
  7. 7. The data cross-border risk monitoring method based on rich information of a data flow path according to claim 1, wherein mining a collaborative behavior pattern comprises: Adopting a dense subgraph mining algorithm and a spectral clustering method to identify candidate collaborative behavior groups from a collaborative behavior relation graph; by synergistic strength The formula performs structural evaluation on the collaborative subgraph: ; In the formula, α + β + γ = 1, Is the average value of the path structure similarity in the subgraph, For the node attribute value(s), The synergy degree of the path initiation and the arrival time is represented, and alpha, beta and gamma are multidimensional feature fusion weights; The anomaly identification and determination includes: establishing a path anomaly scoring function: ; In the formula, Representing the degree of risk of the path rule matching, For the degree of deviation of the path history behavior, Representing the abnormality of the path structure, wherein w, y and z are characteristic weights; constructing an overall anomaly scoring function of the collaborative subgraph: ; Wherein mu is a path rule risk weight, delta is a structure cooperative behavior risk weight, and P is a path set; And introducing a sliding time window mechanism, constructing a behavior evolution curve for the main body or the path, and identifying an abnormal fluctuation mode.
  8. 8. The method for monitoring data cross-border risk based on rich information of a data flow path according to claim 1, wherein each superside comprises a node sequence, a time sequence attribute based on each flow event occurring by each node and a behavior attribute of each flow event; the formula for dynamically updating the heterogeneous multi-relation space-time hypergraph along with time is as follows: ; In the formula, In order to update the hypergraph, In order to update the pre-update hypergraph, For the newly added set of nodes, Is a newly added edge set.
  9. 9. The data cross-border risk monitoring method based on the data flow path rich information according to claim 6, wherein the sliding time window comprises a scale of 5 minutes, 30 minutes and 1 hour; the weight of the cooperative behavior edge is as follows: ; In the formula, 、 、 For parameters tuned by means of data driving, In order to cross-path co-occurrence, In order to behave with respect to the degree of overlap of the time windows, Is semantic or business similarity.
  10. 10. The data cross-border risk monitoring device based on the data flow path rich information adopts the data cross-border risk monitoring method based on the data flow path rich information as claimed in any one of claims 1 to 9, and is characterized by comprising the following steps: The heterogeneous multi-relation space-time hypergraph construction module is used for dynamically and continuously collecting data cross-border circulation path information and constructing a heterogeneous multi-relation space-time hypergraph based on the data cross-border circulation path information; the rich information feature extraction module is used for extracting rich information features of the data flow paths from four levels of nodes, edges, paths and the overall graph based on the heterogeneous multi-relation space-time hypergraph; And the anomaly monitoring module is used for constructing a collaborative behavior relation diagram based on the rich information characteristics, and monitoring the multi-main collaborative abnormal behavior by mining a collaborative behavior mode, carrying out anomaly identification and judgment.

Description

Data cross-border risk monitoring method and device based on rich information of data flow path Technical Field The invention belongs to the technical field of data monitoring, and particularly relates to a data cross-border risk monitoring method and device based on rich information of a data flow path. Background With the development of global digital economy, data cross-border flow becomes an important support in the fields of international trade and the like, but the method relates to the core problems of national security, personal privacy protection and the like, and the risk analysis is significant for enterprise implementation management regulation and national control risk situation. In the prior art, part of researches depend on static rule matching and other methods, and risk discovery and management and control can be realized aiming at local abnormal events of data outbound business, but the risk events under complex interaction scenes caused by frequent cross-border flow, multiple scenes and complex business interaction are difficult to deal with. The risk analysis model is built by combining part of researches with the circulation interaction relationship among the subjects, so that risk monitoring and recognition can be realized, but only two types of data subjects and interaction information thereof are concerned, a monitoring blind area exists, and the illegal exit behaviors associated with multiple subjects cannot be recognized. Analysis of the related art shows that the prior art scheme has certain effects on the aspects of local violation verification, flow data detection and the like, but has the common limitations that the global complex circulation business scene is difficult to identify, the analysis of global abnormal behaviors is lacking, the visibility of paths of other main bodies in the circulation process is lacking, the scene limitation of abnormal identification is lacking and the like. The existing method has the key problems of focusing local anomaly detection, lack of global visual field, no deep modeling of dynamic interaction behavior between multiple nodes and multiple main bodies, limited anomaly identification capability and no path information becoming a core feature of active risk identification and prediction. At present, data cross-border flow security compliance becomes a research hotspot, and the existing monitoring method has the problems of insufficient path information expression, lack of multi-main-body cooperative behavior modeling, lack of detection of hidden detour behaviors and the like, and is difficult to be qualified for complex circulation scenes. Disclosure of Invention Therefore, the present invention is directed to a method and a device for monitoring data cross-border risk based on rich information of a data flow path, so as to solve or partially solve the problems mentioned in the background art. Based on the above object, in a first aspect, the present invention provides a data cross-border risk monitoring method based on rich information of a data flow path, which includes the following steps: Dynamically and continuously collecting data cross-border circulation path information, and constructing a heterogeneous multi-relation space-time hypergraph based on the data cross-border circulation path information; extracting rich information features of a data flow path from four levels of nodes, edges, paths and a graph overall based on the heterogeneous multi-relation space-time hypergraph; And constructing a collaborative behavior relation diagram based on the rich information characteristics, and monitoring the multi-main collaborative abnormal behavior by mining a collaborative behavior mode, carrying out abnormal recognition and judgment. As a data cross-border risk monitoring method preferred scheme based on data cross-border transfer path rich information, the step of constructing a heterogeneous multi-relation space-time hypergraph based on the data cross-border transfer path information comprises the following steps: abstracting a data provider, a processor and a receiver as nodes in the graph, wherein the nodes carry main body types, industry categories, attribution regions and security level attributes; Establishing a multi-type edge according to data access and transmission behaviors, wherein the edge carries transmission behavior types, transmission time stamps, data volumes, transmission source/destination IP, transmission modes, transmission protocols, data sensitivity and encryption attributes or not; and the heterogeneous multi-relation space-time hypergraph is dynamically updated along with data cross-border service execution and time lapse. As a preferred scheme of the data cross-border risk monitoring method based on the rich information of the data flow path, the step of extracting the rich information characteristics of the data flow path from the node level comprises the following steps: Analyzing data processing frequency,