CN-122001628-A - Risk identification method, apparatus, device and storage medium

CN122001628ACN 122001628 ACN122001628 ACN 122001628ACN-122001628-A

Abstract

The embodiment of the application provides a risk identification method, a risk identification device, risk identification equipment and a storage medium, and relates to the technical field of computers. The method comprises the steps of preprocessing alarm logs generated by a DLP system, generating initial event tables, aggregating the initial event tables according to predefined aggregation rules to obtain initial risk events, filtering the initial risk events based on a white list table to obtain first target risk events, screening second target risk events meeting preset alarm rules from the first target risk events, respectively carrying out risk level identification on the second target risk events by adopting a large model to obtain at least one third target risk event with high risk level, and carrying out alarm pushing on the third target risk event. Therefore, accurate noise reduction is realized, a real high-risk event is finally identified and alarming is carried out, and the problem of high false alarm rate of the traditional rule alarming is thoroughly solved.

Inventors

XU HAOZHOU
WANG SHEN
GUO JIUHUI

Assignees

度小满科技(北京)有限公司

Dates

Publication Date: 20260508
Application Date: 20260116

Claims (10)

1.A risk identification method, comprising: preprocessing each alarm log generated by a DLP system, generating each initial event table, and aggregating each initial event table according to a predefined aggregation rule to obtain each initial risk event; filtering each initial risk event based on a white list to obtain each first target risk event, and screening each second target risk event meeting a preset alarm rule from each first target risk event; and respectively carrying out risk level identification on each second target risk event by adopting a large model, obtaining at least one third target risk event with high risk level, and carrying out alarm pushing on the at least one third target risk event.
2. The method of claim 1, wherein preprocessing each alert log generated by the DLP system to generate each initial event table comprises: Responding to configuration operation triggered on a data access and preprocessing configuration interface, and acquiring log access rule information and preprocessing rule information configured by a configuration object; Based on the log access rule information and the preprocessing rule information, configuring a data access and preprocessing system; acquiring each alarm log generated by a DLP system pushed to a Kafka message queue through the data access and preprocessing system, and preprocessing each alarm log; and writing each preprocessed alarm log into a Metabase platform to generate each initial event table.
3. The method of claim 1, wherein the predefined aggregation rules include an aggregation dimension and an aggregation index, and wherein aggregating the initial event tables according to the predefined aggregation rules results in initial risk events, comprising: And generating each initial risk event according to the aggregation dimension and the aggregation index for the data of each initial event table, wherein each initial risk event is a risk abstract.
4. The method of claim 1, wherein the whitelist includes a multi-dimensional trusted condition including a trusted Internet protocol address table, a trusted user account table, a trusted business operation period table; filtering each initial risk event based on the white list to obtain each first target risk event, including: Determining each false alarm risk event matched with any trusted condition in the white list in each initial risk event; And filtering out the false alarm risk events from the initial risk events to obtain first target risk events.
5. The method of claim 1, wherein the performing risk level identification on each of the second target risk events using the large model to obtain at least one third target risk event with a high risk level includes: for each second target risk event, respectively executing the following operations: Constructing a prompt word for one second target risk event, wherein the prompt word comprises context information of the second target risk event and a research rule, and the research rule is used for judging whether data in the second target risk event is related to preset reference risk information or not; Inputting the prompt word into the large model to obtain a research result output by the large model; and when the similarity value contained in the research result is larger than a preset similarity threshold value, determining that the risk level of the second target risk event is high risk, and taking the second target risk event as a third target risk event.
6. The method of claim 5, wherein the following steps are performed using the large model to obtain a grinding result output by the large model: judging whether the target file in the second target risk event is a personal file or not; if the target file is determined to be a personal file, determining that the similarity value is 0; And if the target file is determined to be a non-personal file, determining a similarity value between the context information of the second target risk event and preset reference risk information.
7. The method of claim 6, wherein determining a similarity value between data to be detected in the one second target risk event and preset reference risk information comprises: Determining a first degree of matching between department information in the one second target risk event and department information in the reference risk information; Determining a second degree of matching between the filename in the one second target risk event and the filename in the reference risk information; Determining a third matching degree between key information of the file abstract in the second target risk event and risk characteristics in the reference risk information; determining a fourth matching degree between the application process in the second target risk event and the application process in the reference risk information; The similarity value is determined based on the first degree of matching, the second degree of matching, the third degree of matching, and the fourth degree of matching.
8. A risk identification device, comprising: The aggregation module is used for preprocessing each alarm log generated by the DLP system, generating each initial event table, and aggregating each initial event table according to a predefined aggregation rule to obtain each initial risk event; the noise reduction module is used for filtering each initial risk event based on a white list to obtain each first target risk event, and screening each second target risk event meeting a preset alarm rule from each first target risk event; The large model analysis module is used for respectively carrying out risk level identification on the second target risk events by adopting a large model, obtaining at least one third target risk event with high risk level, and carrying out alarm pushing on the at least one third target risk event.
9. An electronic device, comprising: processor, and A memory in which a program is stored, Wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method of any of claims 1-7.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-7.

Description

Risk identification method, apparatus, device and storage medium Technical Field The present application relates to the field of computer technologies, and in particular, to a risk identification method, apparatus, device, and storage medium. Background Currently, enterprises increasingly rely on a data leakage protection (DATA LEAKAGE pre-tion, DLP) system as a key technical means for protecting core data assets. DLP systems can generate massive alarm logs (daily reaching the billion level) in daily operations. The alarm logs have strong real-time performance, complex format and contain a large number of false alarms. The security operation center faces a great challenge, and real threats need to be accurately positioned from the alarm storm. In the prior art, when risk study and judgment are carried out on the alarm logs, all the alarm logs are pushed to a security analyst, and then the security analyst relies on personal experience to carry out manual examination on each alarm log to judge whether the alarm log is a real threat or not. However, by adopting the risk research and judgment mode, facing to massive alarm logs, a security analyst needs to manually research and judge one by one, so that the security analyst has low efficiency and slow response, and can possibly cause misjudgment, so that a real high-risk alarm event is submerged, and accurate risk identification is difficult to realize. Disclosure of Invention The embodiment of the application provides a risk identification method, a risk identification device, risk identification equipment and a storage medium, which are used for improving the accuracy and the efficiency of high-risk alarm event identification. In a first aspect, an embodiment of the present application provides a risk identification method, where the method includes: preprocessing each alarm log generated by a DLP system, generating each initial event table, and aggregating each initial event table according to a predefined aggregation rule to obtain each initial risk event; Filtering each initial risk event based on a white list to obtain each first target risk event, and screening each second target risk event meeting a preset alarm rule from each first target risk event; And respectively carrying out risk level identification on each second target risk event by adopting a large model, obtaining at least one third target risk event with high risk level, and carrying out alarm pushing on the at least one third target risk event. In an alternative embodiment, preprocessing each alarm log generated by the DLP system to generate each initial event table includes: Responding to configuration operation triggered on a data access and preprocessing configuration interface, and acquiring log access rule information and preprocessing rule information configured by a configuration object; based on the log access rule information and the preprocessing rule information, configuring a data access and preprocessing system; Acquiring each alarm log generated by the DLP system pushed to the Kafka message queue through a data access and preprocessing system, and preprocessing each alarm log; And writing each preprocessed alarm log into the Metabase platform to generate each initial event table. In an alternative embodiment, the predefined aggregation rule comprises an aggregation dimension and an aggregation index, and the aggregation of the initial event tables according to the predefined aggregation rule to obtain the initial risk events comprises the following steps: and generating each initial risk event according to the aggregation dimension and the aggregation index for the data of each initial event table, wherein each initial risk event is a risk abstract. In an alternative embodiment, the white list table comprises a multi-dimensional trusted condition, wherein the multi-dimensional trusted condition comprises a trusted Internet protocol address table, a trusted user account table and a trusted service operation time period table; filtering each initial risk event based on the white list to obtain each first target risk event, including: determining each false alarm risk event matched with any one of the trusted conditions in the white list in each initial risk event; and filtering out each false alarm risk event from each initial risk event to obtain each first target risk event. In an alternative embodiment, using a large model, performing risk level identification on each second target risk event, to obtain at least one third target risk event with a high risk level, including: For each second target risk event, the following operations are respectively executed: Constructing a prompt word for a second target risk event, wherein the prompt word comprises context information of the second target risk event and a research rule, and the research rule is used for judging whether data in the second target risk event is related to preset reference risk information or not; Inputting the