CN-121997322-A - AI-driven malicious software real-time monitoring method in network space

CN121997322ACN 121997322 ACN121997322 ACN 121997322ACN-121997322-A

Abstract

The invention relates to the technical field of malware monitoring, and discloses an AI-driven real-time malware monitoring method in a network space; the AI-driven malicious software real-time monitoring method in the network space comprises the following steps of multi-dimensional kernel behavior data acquisition, kernel behavior semantic conversion and model construction, malicious software cluster analysis based on behavior characteristics, and targeted capturing of the kernel behaviors (such as process creation, file tampering, network communication and the like) of the full life cycle of the malicious software through five sub-modules of a distributed monitoring engine integration process thread, a registry, a file system, a network and a system service, so that the problem of omission of single-dimensional monitoring is avoided, the full-direction tracking of the malicious behaviors is realized, the fuzzy kernel function call log is converted into a structural model containing BEHAVIOR, ARGBLOCK, TARGET, HANDLE, the behavior types are precisely defined through category identification and operation identification, and a high-quality characteristic basis is provided for subsequent clustering.

Inventors

TAN GONGQIANG

Assignees

新国脉数字文化股份有限公司

Dates

Publication Date: 20260508
Application Date: 20251118

Claims (8)

1. A real-time monitoring method for AI-driven malicious software in a network space is characterized by comprising the following steps: The method comprises the steps of collecting multidimensional kernel behavior data, namely comprehensively capturing a system kernel behavior log through a distributed monitoring engine formed by a process thread monitoring sub-module, a registry monitoring sub-module, a file system monitoring sub-module, a network monitoring sub-module and a system service monitoring module, wherein each sub-module respectively carries out directional monitoring on a process life cycle, registry operation, file operation, network interaction and a system service calling scene to realize multidimensional coverage of malicious software core behaviors; Performing semantic conversion on the original kernel function call log acquired in the first step, converting a function call sequence subjected to semantic aggregation into kernel behavior representation with definite semantic dispersion, and constructing a kernel behavior model containing BEHAVIOR, ARGBLOCK, TARGET, HANDLE parts; Preprocessing the kernel behavior sequence generated in the second step, merging redundant behaviors repeatedly occurring in the sequence to reduce data quantity, extracting sliding window features of the preprocessed behavior sequence through an n-gram model, calculating feature weights by combining a TF-IDF algorithm and mapping the feature weights to a multidimensional vector space, and clustering the malicious software behavior features in the vector space by using a hierarchical clustering algorithm, wherein a complex link algorithm is adopted for cluster-to-cluster calculation, and family attribution of the malicious software is determined according to a vector space distance.
2. The method for monitoring AI-driven malicious software in network space according to claim 1, wherein in the first step, the process thread monitoring submodule establishes a callback function, a process deleting callback function, a thread starting callback function, a thread ending callback function and a mirror image loading callback function through a registration process, and captures the path and timestamp information of a full life cycle event, a thread state change event and a mirror image file loading of the process from establishment to deletion in real time.
3. The method of claim 1, wherein the step one is characterized in that a file system monitoring submodule adopts a file micro-filtration driving mode, registers a file creation, a file reading, a file writing, a file deleting and a file renaming filtering rule with a system filtering manager, captures a file operation event of a specified path (comprising a system directory, a user directory and a temporary directory) through a callback function, and records an operation initiating process ID, an operation type and an operation result.
4. The method of claim 1, wherein the first step is characterized in that the network monitoring submodule analyzes and extracts a source IP address, a destination IP address, a source port, a destination port and a protocol type in a network connection establishment event, a data length and a direction in a network data transceiving event, and a timestamp and a status code of a network connection disconnection event by using a TDI filtering drive interception system network IRP packet, so as to realize comprehensive recording of network interaction behavior of the malicious software.
5. The method for real-time monitoring AI driven malicious software in a network space according to claim 1, wherein in the second step, the BEHAVIOR part is composed of a category identifier and an operation identifier, wherein the category identifier is used for dividing a large class (including process operation, file operation, network operation, registry operation and system service call) to which the BEHAVIOR belongs, and the operation identifier is used for representing a specific BEHAVIOR type under the large class, so that precise definition of the specific BEHAVIOR type of the API call is jointly realized.
6. The method for real-time monitoring of AI driven malware in a network space as in claim 1, wherein part ARGBLOCK of the second step records a core parameter of behavior, wherein the parameter is a common characteristic of stability to malware in the same family, and the common characteristic includes, but is not limited to, a type of an image file, a system path in which the image file is located, a key path of a registry operation, and a destination port range of a network connection.
7. The method for monitoring AI driven malicious software in real time in a network space according to claim 1, wherein the mapping of the behavior sequence to the multidimensional vector space by adopting an n-gram model and a TF-IDF algorithm in the third step comprises the steps of sliding and extracting a behavior subsequence with a preset window size (n=2 or n=3) by using the n-gram model as a characteristic unit, calculating weights of the characteristic units in the behavior sequence by utilizing the TF-IDF algorithm, wherein the weight values are positively correlated with the occurrence frequency of the characteristic units and negatively correlated with the distribution breadth of the characteristic units in different malicious software samples, and finally generating multidimensional vectors for characterizing the behavior characteristics of the malicious software.
8. The method for monitoring AI driven malicious software in network space according to claim 1, wherein the inter-cluster distance calculation of hierarchical clustering in the third step adopts a complex link algorithm, namely, for two sample clusters to be clustered, euclidean distances between every two data points in the clusters are calculated, the maximum distance is taken as the inter-cluster distance between the two clusters, and the accurate division of the malicious software family boundaries is realized through the distance measurement.

Description

AI-driven malicious software real-time monitoring method in network space Technical Field The invention belongs to the technical field of malware monitoring, and particularly relates to an AI-driven real-time malware monitoring method in a network space. Background With the aggravation of network space attack and defense countermeasure, the malicious software presents the remarkable characteristics of quick variety iteration, multiple evading means and familial transmission, and the traditional monitoring technology forms a serious challenge. Traditional malicious software detection relies on static feature code matching, but the malicious software frequently evades feature library identification through technologies such as shell adding, code confusion, dynamic behavior variation and the like, so that detection hysteresis is prominent, and even though a scheme based on behavior analysis has three main core defects: 1. The behavior data acquisition is incomplete, most schemes only monitor single dimension behaviors (such as API calls), ignore key malicious behaviors such as process life cycle, registry tampering, network interaction and the like, cause incomplete coverage of behavior characteristics and are easy to leak detection. 2. The behavior characterization is fuzzy and standardized, namely, the original kernel function call log is directly used as a characteristic, the log semantics are condensed (like a malicious operation corresponds to a plurality of groups of function call combinations), the structured modeling is lacked, the malicious behavior and the normal behavior are difficult to accurately distinguish, and the characteristic noise interference is large. 3. The clustering efficiency and the family recognition accuracy are low, the behavior sequence is not subjected to redundant processing, the repeated behavior causes the rapid increase of data volume, the dimensionality is expanded after vectorization, the similarity calculation is too long, meanwhile, family clustering is carried out only by relying on single behavior characteristics (such as calling frequency), and depth characteristics of behavior parameters, operation objects and the like are ignored, so that the family attribution judgment error of variant malicious software is larger, and therefore, the method is improved according to the current situation. Disclosure of Invention Aiming at the situation, in order to overcome the defects of the prior art, the invention provides the AI-driven real-time monitoring method for the malicious software in the network space, which effectively solves the problems of incomplete monitoring behavior data acquisition, fuzzy behavior characterization, insufficient standardization, low clustering efficiency and family identification precision in the prior art. In order to achieve the purpose, the invention provides the following technical scheme that the AI-driven malicious software real-time monitoring method in the network space is characterized by comprising the following steps: The method comprises the steps of collecting multidimensional kernel behavior data, namely comprehensively capturing a system kernel behavior log through a distributed monitoring engine formed by a process thread monitoring sub-module, a registry monitoring sub-module, a file system monitoring sub-module, a network monitoring sub-module and a system service monitoring module, wherein each sub-module respectively carries out directional monitoring on a process life cycle, registry operation, file operation, network interaction and a system service calling scene to realize multidimensional coverage of malicious software core behaviors; Performing semantic conversion on the original kernel function call log acquired in the first step, converting a function call sequence subjected to semantic aggregation into kernel behavior representation with definite semantic dispersion, and constructing a kernel behavior model containing BEHAVIOR, ARGBLOCK, TARGET, HANDLE parts; Preprocessing the kernel behavior sequence generated in the second step, merging redundant behaviors repeatedly occurring in the sequence to reduce data quantity, extracting sliding window features of the preprocessed behavior sequence through an n-gram model, calculating feature weights by combining a TF-IDF algorithm and mapping the feature weights to a multidimensional vector space, and clustering the malicious software behavior features in the vector space by using a hierarchical clustering algorithm, wherein a complex link algorithm is adopted for cluster-to-cluster calculation, and family attribution of the malicious software is determined according to a vector space distance. Preferably, in the first step, the process thread monitoring submodule captures the full life cycle event, thread state change event and mirror file loading path and time stamp information from creation to deletion in real time through registering the process creation callback function