CN-121997185-A - Malicious software family classification method based on entropy diagram visualization

CN121997185ACN 121997185 ACN121997185 ACN 121997185ACN-121997185-A

Abstract

The invention relates to a entropy diagram-based malicious software family classification method, which is used for classifying entropy diagrams by performing entropy diagram transformation on malicious software samples and utilizing a deep learning model so as to realize accurate classification of malicious software families. The method is mainly applied to detection and classification of malicious software in the field of network security, and is characterized in that a malicious software sample is converted into a entropy diagram and classified through a Convolutional Neural Network (CNN). The method comprises the working procedures of extracting entropy value features from input malicious software original hexadecimal data, generating a entropy diagram as input of a feature extractor, carrying out feature extraction on the generated entropy diagram by a convolutional neural network model in the feature extractor, taking output of a global average pool layer as an extracted feature vector, selecting a random forest classifier by a classifier, and taking the extracted feature vector as input to obtain a final classification result. The method provides a new solution for classifying the malicious software families.

Inventors

FANG YONG
ZHANG QIANG
DENG HUAXIN
LIANG XIAN
YUAN LISHA

Assignees

四川大学

Dates

Publication Date: 20260508
Application Date: 20241105

Claims (3)

1. A method of classifying a family of malware based on a entropy diagram visualization, the method comprising the steps of: A. acquiring a malicious software sample from a malicious software sample library and converting the malicious software sample into a binary format; B. preprocessing a binary format malicious software sample to generate a corresponding entropy diagram; C. Constructing a training set of malware family classifications by using the generated entropy diagram, and performing model training by using a Convolutional Neural Network (CNN); D. And classifying the new malicious software sample by using the trained CNN model, and identifying the malicious software family to which the new malicious software sample belongs.
2. The method for classifying a family of malware based on entropy diagram visualization according to claim 1, wherein the preprocessing in step B comprises the steps of: B1, performing block processing on the malicious software sample according to a fixed length, and calculating an entropy value of each block; b2, regarding the entropy value of each segment as a data point on the image, and forming a continuous value stream by connecting the entropy values of the malicious software parts, thereby generating an entropy image with the size of 300 multiplied by 1.
3. The method for classifying a family of malware based on entropy diagram visualization of claim 1, wherein the training process in step C comprises the steps of: The method comprises the steps that C1, a global average pooling layer is adopted to replace a traditional full-connection layer to serve as a final output layer of a network, average value calculation is carried out on channels of each feature map, and the average values are combined into a comprehensive global feature vector; and C2, inputting the 512-dimensional feature vector generated by the global average pooling layer into a random forest classifier to realize the classification of the malicious software family.

Description

Malicious software family classification method based on entropy diagram visualization Technical Field The invention belongs to the field of network security and malicious software detection, and particularly relates to a malicious software family classification method based on a entropy diagram and a convolutional neural network. The method utilizes entropy features of malware samples, and realizes accurate classification of different malware families through imaging processing and a deep learning model. Background From the beginning of the advent of computers, malware began to spread. The primary forms are mainly simple viruses, trojans, backdoors and worms. After entering the Internet era, with the rise of various online games and communication applications, personal computer-oriented theft trojans and remote control back doors begin to flood. As countries and enterprises accumulate core important data, hackers' targets of attack are also turned to large enterprises and institutions, and directional APT attacks against these organizations are further developed. In order to secure network assets, security researchers have proposed various measures for detecting malware attacks, but the types of malware are continuously developing with the development of operating systems and platforms. In the current internet environment, the amount of malware is increasing at an unprecedented rate, and is a significant challenge in the field of network security. Most malware is often not completely recreated, but rather comes from modifications and variations to existing malware, potentially forming a family of malware with common features and patterns of behavior. The method has the advantages that the malicious software is effectively classified according to families, so that the behavior mode and the propagation mechanism of the malicious software can be mastered in a deepened mode, the analysis efficiency of the malicious software can be greatly improved, and important support is provided for security defense and threat tracing. Developers of malware typically do not write new malware from scratch, but rather circumvent security detection by modifying and re-developing existing malware. This approach results in the emerging malware having a high degree of similarity in code structure and behavior patterns to existing malware, and thus can be categorized as the same "family". As commercial operations of malware become increasingly popular, the similarity between families of malware is further enhanced. This familiarization trend not only reflects the modeling of malware development, but also means that it is more important to effectively classify malware families than ever before. The visual analysis method based on the image analysis technology is applied to the analysis of the malicious software, and can better represent the behavior and the characteristics of the malicious software. However, the conventional visualization methods have a great limitation, and most of the visualization methods adopt gray-scale image features, so that the features of the representation method are single, and partial information is easy to lose. The single feature extraction mode is difficult to capture the inherent complexity and diversity of the malicious software, so that key information is lost, and the accuracy and the robustness of the classification model are further affected. With the continuous evolution of malware technology, traditional feature extraction methods are more and more difficult to adapt to new changes, and development of new methods which are more efficient and capable of extracting more abundant features is urgently needed. Disclosure of Invention The invention discloses a malicious software family classification method based on entropy diagram visualization, which is provided for solving the problem that the existing malicious software visualization technology is imperfect, and aims to solve the problems of low classification accuracy and efficiency in the existing malicious software family classification technology. The invention innovatively provides a malicious software family classification method based on a entropy diagram, which realizes the accurate classification of malicious software families by performing entropy diagram conversion on a malicious software sample and classifying the entropy diagram by using a deep learning model. The invention is mainly divided into the following parts, namely, on the one hand, preprocessing is carried out on the binary file of the malicious software sample to generate a entropy diagram representing the characteristics of the binary file. The entropy value of each block is calculated by carrying out block processing on the binary file, and the entropy value is mapped into an entropy value image, so that a entropy diagram of a sample is obtained. The entropy diagram can fully reflect the internal structure and distribution characteristics of a malicious software sample