Search

CN-121984758-A - Network security situation assessment method based on data mining

CN121984758ACN 121984758 ACN121984758 ACN 121984758ACN-121984758-A

Abstract

The invention relates to the technical field of network security, and discloses a network security situation assessment method based on data mining, which comprises the steps of S1, obtaining network security related multi-source data, S2, preprocessing the multi-source data to obtain event data, S3, constructing feature data for situation assessment based on the event data, S4, inputting the feature data into a current network situation assessment model to generate a current network assessment result, S5, generating a candidate situation assessment model, outputting the candidate assessment result based on the same batch of feature data in a shadow assessment window, S6, calculating a quality score according to the current network assessment result and the candidate assessment result, comparing the quality score with a quality threshold, S7, rolling back to a historical situation assessment model and starting a freezing window when a rollback triggering condition is met, and calculating a drift index in the freezing window and comparing the drift index with the drift threshold. And realizing the controllability of updating the evaluation model.

Inventors

  • Pei Jiaxi
  • LIU XINGCAI
  • WANG YILONG

Assignees

  • 北京中创海晟科技有限公司

Dates

Publication Date
20260505
Application Date
20260205

Claims (10)

  1. 1. The network security situation assessment method based on data mining is characterized by comprising the following steps of: s1, acquiring network security related multi-source data; S2, preprocessing the multi-source data to obtain event data, wherein the preprocessing at least comprises de-duplication and time alignment; S3, constructing characteristic data for situation assessment based on the event data; S4, inputting the characteristic data into a current network situation assessment model to generate a current network assessment result; S5, generating a candidate situation assessment model, and outputting a current network assessment result and a candidate assessment result respectively by the current network situation assessment model and the candidate situation assessment model based on the same batch of characteristic data in a shadow assessment window; s6, calculating a quality score according to the current network evaluation result and the candidate evaluation result, and comparing the quality score with a quality threshold value to determine whether to switch the candidate situation evaluation model into a current network situation evaluation model; wherein the quality score is calculated based on at least one of a variability index and a waviness index; s7, when a rollback trigger condition is met, rollback to a historical situation assessment model and starting a freezing window, inhibiting model updating in the freezing window, calculating a drift index, and comparing the drift index with a drift threshold value to determine release or continuation of the freezing window; wherein the rollback trigger condition includes at least the quality score being below the quality threshold.
  2. 2. The network security posture assessment method based on data mining according to claim 1, wherein the S1 includes: Acquiring network security data from a preset data source, wherein the data source comprises a network flow data source, a network security alarm data source, an authentication log data source, a host log data source, an asset data source and a vulnerability data source; Unifying the time stamps of the acquired network security data, and mapping the network security data to a unified time reference; And carrying out format verification on the acquired network security data, and taking the data passing the verification as the multi-source data.
  3. 3. The network security posture assessment method based on data mining according to claim 1, wherein the S2 includes: performing de-duplication processing on the multi-source data to obtain de-duplicated data; Performing time alignment processing on the data after the duplication removal, and mapping the data after the duplication removal to a preset time window to obtain aligned data; and carrying out normalization processing on the aligned data, and converting the normalized data into unified event representation to obtain the event data.
  4. 4. The network security posture assessment method based on data mining according to claim 1, wherein the step S3 comprises: extracting statistical features based on the event data, wherein the statistical features comprise at least one of event count, event frequency and event duration; Extracting sequence features based on the event data, wherein the sequence features are used for representing the precedence relationship of the events in a preset time window; And combining the statistical features with the sequence features to obtain the feature data.
  5. 5. The method for evaluating a network security situation based on data mining according to claim 1, wherein the step S4 comprises: inputting the characteristic data into the current network situation assessment model, and outputting at least one of a situation value and a situation grade as the current network assessment result; establishing association between the current network evaluation result and a corresponding time window and recording the association; And using the recorded current network evaluation result for the subsequent calculation of the quality score.
  6. 6. The method for evaluating a network security situation based on data mining according to claim 1, wherein the step S5 comprises: setting a shadow evaluation window, wherein the shadow evaluation window is a time window with preset duration and/or a sample window with preset sample number; In the shadow evaluation window, based on the same batch of characteristic data, outputting the current network evaluation result by the current network situation evaluation model and outputting a candidate evaluation result by the candidate situation evaluation model; And aligning the current network evaluation result with the candidate evaluation result according to the time window corresponding to the shadow evaluation window to obtain an aligned evaluation result pair.
  7. 7. The method for evaluating a network security situation based on data mining according to claim 6, wherein the aligning the current network evaluation result with the candidate evaluation result according to the time window corresponding to the shadow evaluation window comprises: Respectively distributing window identifiers corresponding to the shadow evaluation windows for the current network evaluation result and the candidate evaluation result; matching the current network evaluation result with the candidate evaluation result based on the window identifier to form a plurality of evaluation result pairs; When unmatched candidate evaluation results or unmatched current network evaluation results exist, processing the unmatched evaluation results based on a preset complement rule to obtain aligned evaluation result pairs.
  8. 8. The method for evaluating network security situations based on data mining according to claim 1, wherein the step S6 comprises: calculating a difference index based on the current network evaluation result and the candidate evaluation result, wherein the difference index is used for representing the difference between the current network evaluation result and the candidate evaluation result; Calculating a fluctuation degree index through the candidate evaluation result, wherein the fluctuation degree index is used for representing the fluctuation degree of the candidate evaluation result in the shadow evaluation window; And calculating the quality score based on the difference index and the fluctuation index, and comparing the quality score with the quality threshold to determine whether to switch the candidate situation assessment model to an existing network situation assessment model.
  9. 9. The method for evaluating a network security situation based on data mining according to claim 1, wherein the step S7 includes: determining that the rollback trigger condition is met when the quality score is below the quality threshold; when the rollback triggering condition is met, rollback to a historical situation assessment model, starting a freezing window, and prohibiting model update in the freezing window; a drift indicator is calculated within the freezing window and compared to a drift threshold to determine the release or continuation of the freezing window.
  10. 10. The method for evaluating network security situations based on data mining according to claim 9, wherein the rollback to historical situation evaluation model comprises: acquiring a plurality of historical situation assessment models and corresponding historical quality scores thereof; sorting the plurality of historical situation assessment models according to the historical quality scores; selecting a target historical situation assessment model meeting preset conditions in the sorting result as the historical situation assessment model; The preset conditions include that the historical quality score corresponding to the target historical situation assessment model is not lower than a preset score threshold, and the historical quality score is highest in the historical situation assessment model meeting the preset score threshold.

Description

Network security situation assessment method based on data mining Technical Field The invention relates to the technical field of network security, in particular to a network security situation assessment method based on data mining. Background The network security situation assessment is a technology for comprehensively analyzing and characterizing the security states of a network and an information system within a certain time range, generally forms overall judgment on risk levels, attack activities and abnormal behaviors based on data such as network traffic, alarm information, authentication logs, host logs, assets, vulnerabilities and the like, and is based on data mining means such as data cleaning, feature extraction, association analysis, model calculation and the like, the method is characterized in that representative behavior characteristics are extracted from massive, multi-source and heterogeneous security data, assessment results such as situation values or situation levels are output to support security monitoring and management decisions, the existing network security situation assessment based on data mining usually adopts a pre-trained or continuously updated assessment model to calculate the characteristic data, and in actual operation, the model is often required to be iterated or switched to adapt to threat changes. However, in the current technology, the situation assessment model often lacks a process of performing parallel comparison verification on candidate model output under the condition of the same batch of characteristic data during iteration or switching, and the switching is easy to cause deviation or fluctuation of the assessment result after switching according to a multi-dependent experience rule or a single assessment result, so that uncertainty in an online updating process is increased. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a network security situation assessment method based on data mining, which solves the problems that the situation assessment model lacks quantifiable verification and switching basis in the iterative or switching process, and is easy to cause deviation or fluctuation of assessment results. In order to achieve the purpose, the invention is realized by the following technical scheme that the network security situation assessment method based on data mining comprises the following steps: s1, acquiring network security related multi-source data; S2, preprocessing the multi-source data to obtain event data, wherein the preprocessing at least comprises de-duplication and time alignment; S3, constructing characteristic data for situation assessment based on the event data; S4, inputting the characteristic data into a current network situation assessment model to generate a current network assessment result; S5, generating a candidate situation assessment model, and outputting a current network assessment result and a candidate assessment result respectively by the current network situation assessment model and the candidate situation assessment model based on the same batch of characteristic data in a shadow assessment window; s6, calculating a quality score according to the current network evaluation result and the candidate evaluation result, and comparing the quality score with a quality threshold value to determine whether to switch the candidate situation evaluation model into a current network situation evaluation model; wherein the quality score is calculated based on at least one of a variability index and a waviness index; s7, when a rollback trigger condition is met, rollback to a historical situation assessment model and starting a freezing window, inhibiting model updating in the freezing window, calculating a drift index, and comparing the drift index with a drift threshold value to determine release or continuation of the freezing window; wherein the rollback trigger condition includes at least the quality score being below the quality threshold. Preferably, the S1 includes: Acquiring network security data from a preset data source, wherein the data source comprises a network flow data source, a network security alarm data source, an authentication log data source, a host log data source, an asset data source and a vulnerability data source; Unifying the time stamps of the acquired network security data, and mapping the network security data to a unified time reference; And carrying out format verification on the acquired network security data, and taking the data passing the verification as the multi-source data. Preferably, the S2 includes: performing de-duplication processing on the multi-source data to obtain de-duplicated data; Performing time alignment processing on the data after the duplication removal, and mapping the data after the duplication removal to a preset time window to obtain aligned data; and carrying out normalization processing on the aligned data, and converting the normal