CN-122027176-A - Webshell attack detection method and device and electronic equipment

CN122027176ACN 122027176 ACN122027176 ACN 122027176ACN-122027176-A

Abstract

The invention provides a method, a device and electronic equipment for detecting webshell attack, wherein the method comprises the steps of obtaining target attack flow to be detected, extracting a target response body in the target attack flow, generating corresponding target feature vectors according to the target response body, inputting the target feature vectors into a trained classification model, wherein the classification model is used for distinguishing feature vectors belonging to webshell attack success flow from feature vectors belonging to webshell attack failure flow, and determining whether the target attack flow belongs to webshell attack success flow according to a classification result of the classification model. The method, the device and the electronic equipment for detecting the webshell attack can automatically and quickly screen the data successfully attacked by the webshell, have high detection efficiency and can ensure the detection accuracy.

Inventors

WEI JIADONG
WEI JINXIA
LONG CHUN
FU YUHAO
HUANG PAN
SUN DEGANG

Assignees

中国科学院计算机网络信息中心

Dates

Publication Date: 20260512
Application Date: 20241111

Claims (10)

1. A method for webshell attack detection, comprising: Obtaining target attack flow to be detected, and extracting a target response body in the target attack flow, wherein the target attack flow is webshell attack flow; Generating a corresponding target feature vector according to the target response body; The target feature vector is input into a trained classification model, wherein the classification model is used for distinguishing feature vectors belonging to successful flow of webshell attack from feature vectors belonging to failed flow of webshell attack; and determining whether the target attack traffic belongs to the webshell attack success traffic according to the classification result of the classification model.
2. The method of claim 1, wherein generating a corresponding target feature vector from the target response body comprises: performing word segmentation processing on the target response body, and determining word segmentation results containing a plurality of target words; Determining word frequency characteristics of each target word; And generating target feature vectors corresponding to the target response body according to the word frequency features of all the target word fragments, wherein the length of the target feature vectors is consistent with the size of a preset dictionary, and the positions of the word frequency features of the target word fragments in the target feature vectors are corresponding to the positions of the target word fragments in the preset dictionary.
3. The method of claim 2, wherein said determining word frequency characteristics of each of said target tokens comprises: according to a training set for training the classification model, determining word frequency characteristics of each target word, wherein the word frequency characteristics meet the following conditions: Wherein w represents the target word, R represents the word segmentation result of the target response body, F (w, R) represents the word frequency characteristic of the target word, N w,r represents the number of the target word in the word segmentation result R, N r represents the number of all the target words in the word segmentation result R, R represents the sample attack flow in the training set for training the classification model, R w represents the sample attack flow in the training set containing the target word w, num (R) represents the number of the sample attack flow R in the training set, and Num (R w ) represents the number of the sample attack flow R w in the training set containing the target word w.
4. The method of claim 1, wherein the classification model is trained by: the training set comprises a plurality of sample attack flows, wherein part of the sample attack flows are webshell attack success flows, and the other part of the sample attack flows are webshell attack failure flows; Extracting a sample response body in the sample attack flow, and generating a corresponding sample feature vector according to the sample response body; And performing model training according to the sample feature vectors of the sample attack flow, and generating the classification model.
5. The method of claim 4, wherein constructing a training set for training the classification model comprises: Building a network server based on a Docker, and deploying at least one webshell attack script on the network server; simulating the successful webshell attack condition by using the at least one webshell attack script, collecting corresponding successful webshell attack traffic, and taking the collected successful webshell attack traffic as part of sample attack traffic; and collecting the webshell attack failure traffic in the real Internet backbone network environment, carrying out deduplication according to a response body of the collected webshell attack failure traffic, and taking the webshell attack failure traffic after deduplication as another part of sample attack traffic.
6. The method as recited in claim 1, further comprising: Displaying a detection result indicating whether the target attack flow belongs to the webshell attack success flow or not to a user so as to instruct the user to verify the detection result; and acquiring a verification result fed back by the user, and generating a final detection result which indicates whether the target attack flow belongs to the successful flow of the webshell attack according to the verification result.
7. The method as recited in claim 6, further comprising: Adding the target attack traffic to a training set for training the classification model under the condition that the final detection result indicates that the target attack traffic belongs to webshell attack success traffic; and iteratively updating the classification model according to the updated training set.
8. A webshell attack detection apparatus, comprising: The acquisition module is used for acquiring target attack traffic to be detected and extracting a target response body in the target attack traffic, wherein the target attack traffic is webshell attack traffic; the feature extraction module is used for generating corresponding target feature vectors according to the target response body; The input module is used for inputting the target feature vector into a trained classification model, wherein the classification model is used for distinguishing the feature vector belonging to the successful flow of the webshell attack from the feature vector belonging to the failed flow of the webshell attack; and the detection module is used for determining whether the target attack flow belongs to the webshell attack success flow according to the classification result of the classification model.
9. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the webshell attack detection method of any of claims 1 to 7.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps in the method of webshell attack detection according to any of claims 1 to 7.

Description

Webshell attack detection method and device and electronic equipment Technical Field The present invention relates to the field of traffic detection technologies, and in particular, to a method, an apparatus, an electronic device, and a computer readable storage medium for webshell attack detection. Background Webshell is a code execution environment accessed through a network, and can be uploaded by an attacker in the form of a code file to serve as a website backdoor so as to achieve the purpose of controlling a website server. The Webshell attack is successful, so that the control of the website can be completely obtained, the harm of the Webshell attack is great, the data of the attack success is little, and one or two webframes are monitored daily on the real Internet backbone network environment. The existing detection method for successful webshell attack is mostly based on files, namely whether suspicious webshell files exist on a website server or not is searched, and a detection program needs to be operated on the server, so that the detection method is not suitable for a scene of monitoring traffic judgment attack in a real Internet backbone network environment. Disclosure of Invention In order to solve the existing technical problems, the embodiment of the invention provides a method, a device, electronic equipment and a computer readable storage medium for detecting webshell attack. In a first aspect, an embodiment of the present invention provides a method for detecting webshell attack, including: Obtaining target attack flow to be detected, and extracting a target response body in the target attack flow, wherein the target attack flow is webshell attack flow; Generating a corresponding target feature vector according to the target response body; The target feature vector is input into a trained classification model, wherein the classification model is used for distinguishing feature vectors belonging to successful flow of webshell attack from feature vectors belonging to failed flow of webshell attack; and determining whether the target attack traffic belongs to the webshell attack success traffic according to the classification result of the classification model. In some optional embodiments, the generating a corresponding target feature vector according to the target response body includes: performing word segmentation processing on the target response body, and determining word segmentation results containing a plurality of target words; Determining word frequency characteristics of each target word; And generating target feature vectors corresponding to the target response body according to the word frequency features of all the target word fragments, wherein the length of the target feature vectors is consistent with the size of a preset dictionary, and the positions of the word frequency features of the target word fragments in the target feature vectors are corresponding to the positions of the target word fragments in the preset dictionary. In some optional embodiments, the determining the word frequency feature of each of the target word includes: according to a training set for training the classification model, determining word frequency characteristics of each target word, wherein the word frequency characteristics meet the following conditions: Wherein w represents the target word, R represents the word segmentation result of the target response body, F (w, R) represents the word frequency characteristic of the target word, N w,r represents the number of the target word in the word segmentation result R, N r represents the number of all the target words in the word segmentation result R, R represents the sample attack flow in the training set for training the classification model, R w represents the sample attack flow in the training set containing the target word w, num (R) represents the number of the sample attack flow R in the training set, and Num (R w) represents the number of the sample attack flow R w in the training set containing the target word w. In some alternative embodiments, the classification model is trained by: the training set comprises a plurality of sample attack flows, wherein part of the sample attack flows are webshell attack success flows, and the other part of the sample attack flows are webshell attack failure flows; Extracting a sample response body in the sample attack flow, and generating a corresponding sample feature vector according to the sample response body; And performing model training according to the sample feature vectors of the sample attack flow, and generating the classification model. In some alternative embodiments, the constructing a training set for training the classification model includes: Building a network server based on a Docker, and deploying at least one webshell attack script on the network server; simulating the successful webshell attack condition by using the at least one webshell attack script, collecting correspondin