CN-122027248-A - Data leakage attack tracing method, device, medium and product

CN122027248ACN 122027248 ACN122027248 ACN 122027248ACN-122027248-A

Abstract

The application provides a data leakage attack tracing method, equipment, a medium and a product, and relates to the technical field of data security. And tracing the upstream event node by taking the leakage event node as an endpoint, and determining the leakage entity node and the upstream entity node to generate a data flow transformation relation sub-map. Mapping the data flow relation sub-graph into a dynamic Bayesian network model, and calculating causal strength based on a preset multidimensional evidence fusion rule comprising time sequence rationality scores among event nodes, entity node vulnerability scores and event node abnormality scores. And deducing the posterior probability of the potential attack path in the dynamic Bayesian network model based on the causal strength, and determining the potential attack path with the posterior probability larger than a preset probability threshold as an attack tracing path. The technical problems of low tracing efficiency and high false alarm rate when facing data leakage events are solved.

Inventors

Bi Wenchong
LU YONGQIANG
WANG XIANG

Assignees

北京赋乐科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260126

Claims (10)

1. The data leakage attack tracing method is characterized by comprising the following steps of: When a leakage event is detected, searching a leakage event node corresponding to the leakage event in a data flow relation map, wherein the data flow relation map comprises a plurality of entity nodes, a plurality of event nodes and a plurality of edges, and the edges comprise directed edges for connecting the entity nodes and the event nodes and directed edges for connecting the event nodes and the event nodes; In the data flow relation graph, tracing back an upstream event node of the leakage event node by taking the leakage event node as an end point, and determining a leakage entity node connected with the leakage event node and an upstream entity node connected with the upstream event node to generate a data flow relation sub-graph; Mapping the data flow relation sub-graph into a dynamic Bayesian network model, and calculating causal strength among event nodes in the dynamic Bayesian network based on a preset multidimensional evidence fusion rule, wherein the multidimensional evidence fusion rule at least comprises calculation rules of time sequence rationality scores among the event nodes, entity node vulnerability scores and event node abnormality scores; based on the causal strength, deducting and calculating posterior probability of each potential attack path in the data flow relation sub-graph in the dynamic Bayesian network model; and determining the potential attack path with the posterior probability larger than a preset probability threshold as an attack tracing path.
2. The method of claim 1, wherein prior to the step of looking up a leakage event node corresponding to the leakage event in a data flow relationship graph when the leakage event is detected, the method further comprises: Collecting network full flow data, a host audit log, a database access log and an application layer API call record, and extracting a plurality of standard triples through protocol analysis and semantic analysis, wherein the standard triples comprise entities, relations and events; Generating a plurality of entity nodes and event nodes according to the standard triples, and marking the entity nodes corresponding to preset sensitive data as core entity nodes; and constructing the data flow relation map according to the entity nodes associated with the core entity nodes and the event nodes associated with the core entity nodes.
3. The method according to claim 2, wherein the standard triplet comprises entity nodes and converged entity nodes, and the generating of the plurality of entity nodes according to the standard triplet specifically comprises: For any two entity nodes, calculating attribute similarity scores and structure similarity scores of the two entity nodes; according to a preset fusion weight coefficient, carrying out weighted fusion calculation on the attribute similarity score and the structure similarity score to obtain a final similarity score; And when the final similarity score is larger than a preset similarity score threshold, merging the attribute lists of the two entity nodes to generate the fusion entity node.
4. The method of claim 1, wherein after the step of determining the potential attack path having the posterior probability greater than a preset probability threshold as an attack trace-out path, the method further comprises: determining an illegal operation event node in the attack tracing path based on a preset illegal operation judging rule; Taking the illegal operation event node as a starting point, performing multi-order traversal in the data flow relation map, and determining a downstream entity node which is directly or indirectly associated with the illegal operation event node; and determining the number of influencing nodes and the level of transmitting nodes of the illegal operation event node based on a preset data blood-edge pollution degree algorithm and the downstream entity node, and generating a data analysis report.
5. A method according to claim 3, wherein said generating a number of said entity nodes and said event nodes from said standard triplet, in particular further comprises: acquiring entity multidimensional features of the entity nodes or the fusion entity nodes, and carrying out weighted aggregation calculation according to the entity multidimensional features to obtain vulnerability scores of the entity nodes, wherein the entity multidimensional features comprise sensitivity level parameters, exposure degree parameters and configuration state parameters; And acquiring event multidimensional features of the event node, and calculating an abnormality degree score of the event node according to the historical entity multidimensional feature deviation degree of the event multidimensional features, wherein the event multidimensional features comprise access time features, data volume features and behavior mode features.
6. The method according to claim 1, wherein the causal strength between each event node in the dynamic bayesian network is calculated based on a preset multidimensional evidence fusion rule, specifically using the following formula: ; Wherein, the Representing event nodes To event node Is a causal strength of (2); Representing a time sequence rationality score based on the timestamp difference; Representing event nodes Is characterized by an event node anomaly score; Representation and event nodes Associated entity node Is a vulnerability score for the entity node; a weight coefficient corresponding to the time sequence rationality score is represented, A weight coefficient corresponding to the event node abnormality degree score is represented, And the weight coefficient corresponding to the entity node vulnerability score is represented.
7. The method according to claim 4, wherein the determining the number of influencing nodes of the offending event node based on the preset data blood-edge pollution algorithm and the downstream entity node specifically comprises: Acquiring initial sensitivity level parameters corresponding to operation data of the illegal operation event node; Identifying a circulation path from the illegal operation event node to each downstream entity node, and determining a sensitivity attenuation coefficient of each downstream entity node according to an event operation type contained in the circulation path, wherein the event operation type comprises full replication, aggregation statistics or field extraction; Calculating a sensitivity parameter value of each downstream entity node based on the initial sensitivity level parameter and the sensitivity attenuation coefficient; And screening out downstream entity nodes with the sensitive parameter value larger than a preset pollution threshold value, and counting the downstream entity nodes as the number of the influence nodes.
8. A data leakage attack-tracing device comprising one or more processors and a memory, the memory coupled to the one or more processors, the memory to store computer program code, the computer program code comprising computer instructions, the one or more processors to invoke the computer instructions to cause the data leakage attack-tracing device to perform the method of any of claims 1-7.
9. A computer readable storage medium comprising instructions which, when run on a data leakage attack-tracing device, cause the data leakage attack-tracing device to perform the method of any one of claims 1-7.
10. A computer program product, characterized in that the computer program product, when run on a data leakage attack-tracing device, causes the data leakage attack-tracing device to perform the method according to any one of claims 1-7.

Description

Data leakage attack tracing method, device, medium and product Technical Field The application relates to the technical field of data security, in particular to a data leakage attack tracing method, equipment, medium and product. Background In the current digital age, data has become a core asset for enterprises and organizations, and data security faces serious challenges. Along with the increasing complexity of a service system, sensitive data frequently flows in various links such as network transmission, application processing, database storage and the like, and the risk of data leakage increases dramatically. Once a data leakage event occurs, the leakage source is quickly and accurately positioned, and an attack path is restored, so that the method is important for reducing loss and repairing the loopholes. At present, in the aspect of guaranteeing data security, the prior art mainly adopts boundary protection means such as a firewall, an Intrusion Detection System (IDS) and the like to carry out passive defense, and focuses on blocking external illegal access. In the tracing link after the occurrence of the data leakage event, the prior art relies on manual discrete analysis on various system logs. Specifically, security personnel need to call the firewall log, the Web application log, and the database log, respectively, in an attempt to manually splice the trajectories of the data flows through timestamp or IP address matching in order to find abnormal clues. However, the above-described prior art has significant limitations in facing massive data and complex data flow paths. Firstly, the firewall and the IDS mainly pay attention to network boundaries and lack the perception capability of internal data flow details, and secondly, the traditional log analysis method leads to data to be fragmented and cannot construct a full-flow data flow relation map so as to enable the data flow relation among different nodes to be split. The prior art cannot track the complete life cycle of sensitive data in the system in real time. When data is subjected to multiple jumps, deformation or aggregation in complex business logic, logical relations are difficult to be cleared by simply relying on log analysis. This results in low traceability efficiency and high false alarm rate in the face of data leakage events, and cannot meet the accuracy requirements of modern enterprises on data security emergency response. Disclosure of Invention The application provides a data leakage attack tracing method, equipment, a medium and a product, which are used for solving the technical problems of low tracing efficiency and high false alarm rate in the prior art when facing a data leakage event. In a first aspect, the present application provides a data leakage attack tracing method, including: When a leakage event is detected, searching a leakage event node corresponding to the leakage event in a data flow relation map, wherein the data flow relation map comprises a plurality of entity nodes, a plurality of event nodes and a plurality of edges, and the edges comprise directed edges for connecting the entity nodes and the event nodes and directed edges for connecting the event nodes and the event nodes; In the data flow relation graph, tracing back an upstream event node of the leakage event node by taking the leakage event node as an end point, and determining a leakage entity node connected with the leakage event node and an upstream entity node connected with the upstream event node to generate a data flow relation sub-graph; Mapping the data flow relation sub-graph into a dynamic Bayesian network model, and calculating causal strength among event nodes in the dynamic Bayesian network based on a preset multidimensional evidence fusion rule, wherein the multidimensional evidence fusion rule at least comprises calculation rules of time sequence rationality scores among the event nodes, entity node vulnerability scores and event node abnormality scores; based on the causal strength, deducting and calculating posterior probability of each potential attack path in the data flow relation sub-graph in the dynamic Bayesian network model; and determining the potential attack path with the posterior probability larger than the preset probability threshold as an attack tracing path. Optionally, before the step of searching for a leakage event node corresponding to the leakage event in the data flow relation map when the leakage event is detected, the method further includes: Collecting network full flow data, a host audit log, a database access log and an application layer API call record, and extracting a plurality of standard triples through protocol analysis and semantic analysis, wherein the standard triples comprise entities, relations and events; Generating a plurality of entity nodes and event nodes according to the standard triples, and marking the entity nodes corresponding to preset sensitive data as core entity nodes; and constr