Search

CN-118965354-B - Software type detection method and device, terminal equipment and storage medium

CN118965354BCN 118965354 BCN118965354 BCN 118965354BCN-118965354-B

Abstract

The application is applicable to the technical field of data processing, and provides a software type detection method, a device, terminal equipment and a storage medium, wherein the method comprises the steps of obtaining software data of software to be detected; extracting software features of the software to be tested from software data, wherein the software features comprise operation code features, text features, authority features and image features, classifying the software to be tested according to the software features to obtain multiple classification results of the software to be tested, determining composite software features of the software to be tested according to the multiple classification results, and determining the software type of the software to be tested according to the composite software features. The method can solve the defect of low detection accuracy caused by the fact that a single data type is used in the related technology, is beneficial to improving the accuracy and the robustness of detection, does not need to update a feature library, and is beneficial to reducing the maintenance cost.

Inventors

  • XIONG HANQING
  • LI YUANYUAN
  • DENG YUANGEN
  • QUE YUE
  • YOU JIE
  • TAN LINFENG
  • JIANG NAN

Assignees

  • 华东交通大学

Dates

Publication Date
20260512
Application Date
20241015

Claims (8)

  1. 1. A method for software type detection, the method comprising: acquiring software data of software to be tested; extracting software features of the software to be detected from the software data, wherein the software features comprise operation code features, text features, authority features and image features; Classifying the software to be detected according to the software characteristics to obtain a plurality of classification results of the software to be detected; determining composite software features of the software to be tested according to the multiple classification results, wherein the multiple classification results comprise a first classification result obtained by inputting the operation code features into a random forest algorithm for classification, a second classification result obtained by inputting the text features into a naive Bayesian model for classification, a third classification result obtained by inputting the authority features into a multi-layer perceptron for classification, and a fourth classification result obtained by inputting the image features into a VGG16 model for classification; determining the software type of the software to be tested according to the characteristics of the composite software; And determining the composite software characteristics of the software to be tested according to the classification results, wherein the determining comprises the following steps: Splicing the first classification result, the second classification result, the third classification result and the fourth classification result to obtain feature vectors corresponding to the software to be detected, and determining the feature vectors as composite software features; the method further comprises the steps of: The method comprises the steps of obtaining a composite software characteristic data set, dividing the composite software characteristic data set into a training set, a verification set and a test set, mapping the spliced composite software characteristics into characteristic vectors through a full connection layer, inputting the characteristic vectors into a meta-model KAN for training to obtain a final output model, and inputting the composite software characteristics into the output model to obtain a software type output by the output model; the obtaining the software data of the software to be tested includes: Acquiring an application program package file of the software to be tested through a preset data transmission protocol; decompiling the application program package file to obtain software data of the software to be tested; The obtaining the application program package file of the software to be tested comprises the following steps: Acquiring an APK file of the android software by directly uploading the APK file, downloading a URL or downloading a two-dimensional code; decompiling the APK file to obtain AndroidManifest file and operation code reflecting the internal structure of the original APK file, wherein AndroidManifest file contains basic information, authority information, components and configuration of application program; Extracting the operation code features comprises the following steps: extracting file operation code data of the software to be tested from the software data, performing redundancy elimination processing on the file operation code data to obtain pure file operation code data, and performing vectorization processing on the pure file operation code data to obtain operation code characteristics of the software to be tested.
  2. 2. The method of claim 1, wherein extracting the software features of the software under test from the software data comprises: Extracting text information of the software to be tested from the software data; Performing word segmentation on the text information, and performing noise reduction on the text information subjected to the word segmentation to obtain software keywords; and carrying out text vectorization processing on the software keywords to obtain text characteristics of the software to be tested.
  3. 3. The method of claim 1, wherein extracting the software features of the software under test from the software data comprises: Analyzing the target format file in the software data through a software authority analysis library to obtain the application authority of the software to be tested; And carrying out vectorization processing on the application authority by a preset coding mode to obtain the authority characteristics of the software to be tested.
  4. 4. The method of claim 1, wherein extracting the software features of the software under test from the software data comprises: extracting a plurality of target pictures in a preset format from the software data; screening the corresponding target pictures according to the picture information of each target picture to obtain a plurality of reserved target pictures; formatting the plurality of reserved target pictures to obtain a plurality of standard pictures; And determining the plurality of standard pictures as image features of the software to be tested.
  5. 5. The method according to any one of claims 1-4, wherein after determining the software type of the software under test according to the composite software feature, further comprising: Inputting the type probability distribution of the software to be tested into a preset large language model to obtain a detection report of the software to be tested; and integrating the type probability distribution and the detection report to obtain a comprehensive detection report of the software to be detected.
  6. 6. A software type detection device, characterized in that the software type detection device comprises: the acquisition unit is used for acquiring software data of the software to be detected; The extraction unit is used for extracting software features of the software to be detected from the software data, wherein the software features comprise operation code features, text features, authority features and image features; The classifying unit is used for classifying the software to be detected according to the software features to obtain a plurality of classifying results of the software to be detected, wherein the plurality of classifying results comprise a first classifying result obtained by inputting the operation code features into a random forest algorithm for classifying, a second classifying result obtained by inputting the text features into a naive Bayesian model for classifying, a third classifying result obtained by inputting the authority features into a multi-layer perceptron for classifying, and an image feature into a VGG16 model for classifying, and a fourth classifying result obtained; the first determining unit is used for determining the composite software characteristics of the software to be tested according to the plurality of classification results; the second determining unit is used for determining the software type of the software to be tested according to the composite software characteristics; The first determining unit is further configured to splice the first classification result, the second classification result, the third classification result and the fourth classification result to obtain feature vectors corresponding to the software to be detected, and determine the feature vectors as composite software features; The second determining unit is further used for obtaining a composite software feature data set, dividing the composite software feature data set into a training set, a verification set and a test set, mapping the spliced composite software features into feature vectors through a full connection layer, inputting the feature vectors into a meta model KAN for training to obtain a final output model, and inputting the composite software features into the output model to obtain a software type output by the output model; The acquisition unit is also used for acquiring an application program package file of the software to be tested through a preset data transmission protocol, decompiling the application program package file to obtain software data of the software to be tested, acquiring an APK file of the android software through directly uploading the APK file, downloading a URL or downloading a two-dimensional code, detecting a suffix name of the uploaded file, and carrying out data transmission through an HTTPS protocol after the suffix name indicates that the file is the APK file; the extraction unit is also used for extracting file operation code data of the software to be detected from the software data, performing redundancy elimination processing on the file operation code data to obtain pure file operation code data, and performing vectorization processing on the pure file operation code data to obtain operation code characteristics of the software to be detected.
  7. 7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 5 when executing the computer program.
  8. 8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 5.

Description

Software type detection method and device, terminal equipment and storage medium Technical Field The present application relates to the field of data processing technologies, and in particular, to a software type detection method, a device, a terminal device, and a storage medium. Background In recent years, with the development of the internet industry, different software layers are endless. Some of the software may be camouflaged into various types of software to confuse the user, and the software has high confusion and deception and may be infringed to the user. Currently, these camouflage software is generally detected by methods based on rules and signature schemes that rely heavily on a library of known malware signatures that need to be detected by matching software behavior with known features, not only to detect classification inaccurately, but also to resist Zero-day attacks (Zero-DAY ATTACKS), i.e., those attacks that utilize vulnerabilities that have not yet been discovered or repaired. In addition, the feature library needs to be updated continuously, so that the maintenance cost is high, and therefore, how to improve the software type detection accuracy and reduce the maintenance cost is a technical problem to be solved urgently. Disclosure of Invention The embodiment of the application provides a software type detection method, a device, terminal equipment and a storage medium, which can solve the technical problems of low software type detection accuracy and high later maintenance cost in the prior art. In a first aspect, an embodiment of the present application provides a software type detection method, including: acquiring software data of software to be tested; extracting software features of the software to be detected from the software data, wherein the software features comprise operation code features, text features, authority features and image features; Classifying the software to be detected according to the software characteristics to obtain a plurality of classification results of the software to be detected; Determining the composite software characteristics of the software to be tested according to the plurality of classification results; And determining the software type of the software to be tested according to the characteristics of the composite software. Further, the extracting the software features of the software to be tested from the software data includes: extracting file operation code data of the software to be detected from the software data; Performing redundancy elimination processing on the file operation code data to obtain pure file operation code data; And carrying out vectorization processing on the pure file operation code data to obtain the operation code characteristics of the software to be tested. Further, the extracting the software features of the software to be tested from the software data includes: Extracting text information of the software to be tested from the software data; Performing word segmentation on the text information, and performing noise reduction on the text information subjected to the word segmentation to obtain software keywords; and carrying out text vectorization processing on the software keywords to obtain text characteristics of the software to be tested. Further, the extracting the software features of the software to be tested from the software data includes: Analyzing the target format file in the software data through a software authority analysis library to obtain the application authority of the software to be tested; And carrying out vectorization processing on the application authority by a preset coding mode to obtain the authority characteristics of the software to be tested. Further, the extracting the software features of the software to be tested from the software data includes: extracting a plurality of target pictures in a preset format from the software data; screening the corresponding target pictures according to the picture information of each target picture to obtain a plurality of reserved target pictures; formatting the plurality of reserved target pictures to obtain a plurality of standard pictures; And determining the plurality of standard pictures as image features of the software to be tested. Further, the obtaining the software data of the software to be tested includes: Acquiring an application program package file of the software to be tested through a preset data transmission protocol; decompiling the application program package file to obtain the software data of the software to be tested. Further, after determining the software type of the software to be tested according to the composite software feature, the method further includes: Inputting the type probability distribution of the software to be tested into a preset large language model to obtain a detection report of the software to be tested; and integrating the type probability distribution and the detection report to obtain a comprehensiv