Search

CN-116647365-B - Model training method, encryption data processing method and related equipment

CN116647365BCN 116647365 BCN116647365 BCN 116647365BCN-116647365-B

Abstract

The application discloses a model training method, an encryption data processing method and related equipment, and relates to the technical field of computer security, wherein the method comprises the steps of collecting historical encryption data, wherein the historical encryption data is malicious encryption data or non-malicious encryption data; the method comprises the steps of generating historical encryption data, carrying out statistical analysis on the occurrence frequency of the historical process based on the process information of the historical process which sends the historical encryption data to obtain statistical analysis data, taking the historical encryption data and the statistical analysis data as training data samples, training a machine learning model to enable the machine learning model to be suitable for predicting the prediction probability of the historical process which sends the historical encryption data, and outputting a judgment label for indicating whether the historical encryption data is malicious encryption data or not. The application solves the problems of need of constructing a simulation environment and large resource consumption when identifying malicious encrypted traffic.

Inventors

  • Wu Nianjing

Assignees

  • 阿里巴巴(中国)有限公司

Dates

Publication Date
20260508
Application Date
20230413

Claims (10)

  1. 1. A method of model training, comprising: collecting historical encrypted data, wherein the historical encrypted data is malicious encrypted data or non-malicious encrypted data; Based on the process information of the history process which sends the history encryption data, carrying out statistical analysis on the occurrence times of the history process so as to obtain statistical analysis data; training a machine learning model by taking the historical encryption data and the statistical analysis data as training data samples, so that the machine learning model is suitable for predicting the prediction probability of the historical encryption data sent by the historical process, and outputting a judgment label for indicating whether the historical encryption data is malicious encryption data; The process information includes a plurality of process features including features for describing first network communication information of the history process and/or features for describing host information of the history process, and the step of obtaining the statistical analysis data includes: according to the association relation between the historical encryption data and the process information, carrying out statistical analysis on the occurrence times of the historical processes corresponding to the historical encryption data, and carrying out statistical analysis on the occurrence times of the process features under the historical processes corresponding to the historical encryption data to obtain statistical analysis data; And training a machine learning model by taking the historical encryption data and the statistical analysis data as training data samples, so that the machine learning model is suitable for predicting the prediction probability of the historical encryption data sent by the historical process, and outputting the process characteristics of the historical process when outputting a judgment label for indicating whether the historical encryption data is malicious encryption data.
  2. 2. The method according to claim 1, wherein before statistically analyzing the number of occurrences of the history process based on process information of the history process from which the history encrypted data is sent, the method further comprises: Collecting the process information of the history process; the step of statistically analyzing the number of occurrences of the history process based on process information of the history process that sent the history encrypted data includes: associating the historical encryption data from the historical process with the process information to obtain an association relationship between the historical encryption data and the process information; And according to the association relation, carrying out statistical analysis on the occurrence times of the history process corresponding to the history encryption data.
  3. 3. The method of claim 2, wherein the process information comprises first network communication information of the history process, and wherein when collecting history encrypted data, the method further comprises: determining second network communication information of the historical encrypted data; Wherein the step of associating the history encrypted data from the history process with the process information comprises: And judging whether the historical encryption data come from the historical process according to the consistency of the second network communication information and the first network communication information, and if so, associating the historical encryption data with the process information.
  4. 4. The method of claim 1, wherein the process information includes a plurality of process features, and wherein the step of statistically analyzing the number of occurrences of the history process corresponding to the history encrypted data and the number of occurrences of the process features under the history process corresponding to the history encrypted data includes: determining the occurrence number of one of the history processes corresponding to the history encrypted data, and determining the total occurrence number of a plurality of history processes corresponding to the history encrypted data based on the occurrence number; determining a feature number of the process feature under one of the history processes corresponding to the history encrypted data, and determining a total feature number of the process feature under a plurality of the history processes corresponding to the history encrypted data based on the feature number; Determining a process occurrence ratio of the history process based on the occurrence number and the total occurrence number, and determining a feature occurrence ratio of the process feature according to the feature number and the total feature number to obtain the statistical analysis data including the process occurrence ratio and the feature occurrence ratio; Then, using the historical encryption data and the statistical analysis data as training data samples, the step of training the machine learning model includes: And training the machine learning model by taking the historical encryption data, the process occurrence ratio and the characteristic occurrence ratio as the training data samples, so that the machine learning model is suitable for predicting the prediction probability of the historical encryption data sent by the historical process, and outputting the process characteristics of the historical process when a judgment label for indicating whether the historical encryption data is malicious encryption data is output.
  5. 5. The method of claim 4, wherein, in training the machine learning model using the historical encryption data, the process occurrence ratio, and the feature occurrence ratio as the training data samples, the method further comprises: calculating a first weight of the history process for sending out the history encrypted data based on the prediction probability, and calculating a second weight of each process characteristic under the history process; updating the prediction probability predicted by the machine learning model based on the first weight, the second weight, the process occurrence ratio, and the feature occurrence ratio.
  6. 6. A method of processing encrypted data, comprising: inputting encrypted data into a machine learning model, predicting the process of sending out the encrypted data through the machine learning model, and outputting a judgment tag for indicating whether the encrypted data is malicious encrypted data, wherein the machine learning model is a model obtained by training according to the model training method of any one of claims 1 to 5; And based on the evaluation tag, counting the encrypted data serving as malicious encrypted data to obtain statistical information, and identifying the encrypted data by adopting a predicted process name of the process during counting.
  7. 7. The method of claim 6, wherein the step of counting the encrypted data as malicious encrypted data based on the evaluation tag further comprises: detecting a malicious score of the encrypted data and recording the malicious score of the encrypted data in the statistical information.
  8. 8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method steps of any of claims 1 to 7 when the computer program is executed.
  9. 9. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the method steps of any of claims 1 to 7.
  10. 10. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the method steps of any of claims 1 to 7.

Description

Model training method, encryption data processing method and related equipment Technical Field The application relates to the technical field of computer security, in particular to a model training method, an encryption data processing method and related equipment. Background This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. It is not admitted to be prior art by inclusion of this description in this section. With the continuous upgrading of the attack and defense countermeasure of the internet, the novel network technology is iterated continuously, and the network structure is more complex. At the same time, the widespread use of encryption technology has led to the explosive growth of encrypted traffic. The encryption technology can protect the privacy of users and also deeply change the form of network security threat, so that viruses, botnets, trojans and the like which utilize the encryption technology can still be widely spread, and the traditional detection technology route is always incapable when facing malicious encrypted traffic. In order to better identify malicious encrypted traffic, the encryption traffic is marked by the traditional scheme by means of a sandbox, the execution environment is properly selected by uploading normal/malicious executable files, the real environment of a user is simulated, escape and countermeasure of the executable files are avoided, and finally the uploaded files are executed to obtain the encryption traffic data generated under the same real environment. However, the scheme needs to construct a simulation environment, and constructs environments of various different operating systems such as Windows, windows Server, linux, android and the like according to different types, versions and execution environments of executable files, selects environments of different versions such as Windows XP, windows 7, windows 10 and the like, and deploys different types of software such as Adobe PDF, adobe Flash, office Word and the like. As a result, a lot of manpower and material resources are consumed, and if the executable file execution/triggering condition cannot be accurately predicted/judged, communication/encrypted traffic communication/malicious encrypted traffic communication of the executable file may not be triggered, thereby directly affecting the subsequent traffic analysis result. Disclosure of Invention The embodiment of the application provides a model training method, an encryption data processing method and related equipment, which at least solve the problems of need of constructing a simulation environment and high resource consumption when malicious encryption traffic is identified in the prior art. According to an aspect of the present application, there is also provided a model training method including: collecting historical encrypted data, wherein the historical encrypted data is malicious encrypted data or non-malicious encrypted data; Based on the process information of the history process which sends the history encryption data, carrying out statistical analysis on the occurrence times of the history process so as to obtain statistical analysis data; And training a machine learning model by taking the historical encryption data and the statistical analysis data as training data samples, so that the machine learning model is suitable for predicting the prediction probability of the historical encryption data sent by the historical process, and outputting a judgment label for indicating whether the historical encryption data is malicious encryption data. In some embodiments, before performing statistical analysis on the occurrence number of the history process based on process information of the history process that sent the history encrypted data, the method further includes: Collecting the process information of the history process; the step of statistically analyzing the number of occurrences of the history process based on process information of the history process that sent the history encrypted data includes: associating the historical encryption data from the historical process with the process information to obtain an association relationship between the historical encryption data and the process information; And according to the association relation, carrying out statistical analysis on the occurrence times of the history process corresponding to the history encryption data. In some embodiments, the process information includes first network communication information of the history process, and when collecting the history encrypted data, the method further includes: determining second network communication information of the historical encrypted data; Wherein the step of associating the history encrypted data from the history process with the process information comprises: And judging whether the historical encryption data come from the historical process according to