Search

CN-122027209-A - Network abnormal access identification method, system, electronic equipment and storage medium

CN122027209ACN 122027209 ACN122027209 ACN 122027209ACN-122027209-A

Abstract

The invention relates to the technical field of network security and discloses a network abnormal access identification method, a system, electronic equipment and a storage medium, wherein the method comprises the steps of obtaining network access log data in a preset time period; the method comprises the steps of extracting a plurality of predefined dynamic behavior features from network access log data, clustering the network access log data by adopting an unsupervised subspace clustering algorithm based on the plurality of dynamic behavior features, generating a plurality of log clustering results, calculating the intra-group feature density of each log clustering result based on the request sequence features and the request time interval features in each log clustering result, and identifying abnormal access behaviors according to the distribution condition of the intra-group feature densities of the plurality of log clustering results. The invention solves the problems that the prior art depends on static characteristics and labeling data and is difficult to identify camouflage requests, and improves the identification accuracy and the adaptability to dynamic attack.

Inventors

  • DENG GAOQIANG

Assignees

  • 北京思特奇信息技术股份有限公司

Dates

Publication Date
20260512
Application Date
20251231

Claims (10)

  1. 1. A network anomaly access identification method, comprising: Acquiring network access log data in a preset time period; extracting a predefined plurality of dynamic behavior features from the network access log data; Based on the dynamic behavior characteristics, clustering the network access log data by adopting an unsupervised subspace clustering algorithm to generate a plurality of log clustering results; calculating the intra-group feature density of each log clustering result based on the request sequence feature and the request time interval feature in each log clustering result; And identifying abnormal access behaviors according to the distribution condition of the feature density in the group of the plurality of log clustering results.
  2. 2. The network anomaly access identification method according to claim 1, wherein the step of acquiring network access log data within a predetermined period of time comprises: acquiring a network request log from a server; and screening out the request information of which the request time is between the last acquisition time and the current acquisition time from the network request log to obtain the network access log data in the preset time period.
  3. 3. The network anomaly access identification method of claim 1, wherein the step of extracting a predefined plurality of dynamic behavior features from the network access log data comprises: analyzing the network access log data, and dividing a plurality of network requests belonging to the same communication session into a request session group; a plurality of dynamic behavior features corresponding to each request session group are calculated, wherein the plurality of dynamic behavior features comprise page depth standard deviation and continuous HTTP request percentage of the request session group.
  4. 4. The method for identifying network anomaly access according to claim 3, wherein the step of clustering the network access log data using an unsupervised subspace clustering algorithm based on the plurality of dynamic behavior features to generate a plurality of log clustering results comprises: Dividing a plurality of dynamic behavior characteristics corresponding to each request session group into a plurality of characteristic groups according to characteristic attributes; clustering feature data formed by dynamic behavior features of all request session groups by adopting a subspace clustering algorithm based on feature group weighting, wherein the clustering process comprises the steps of distributing initial weights for each feature group, and iteratively updating weights of each feature group for different clusters and a clustering center; And when the iteration process meets a preset stopping condition, outputting the plurality of log clustering results.
  5. 5. The method of claim 4, wherein the step of calculating the intra-group feature density of any one of the log cluster results based on the request sequence feature and the request time interval feature in the Ren Yiri log cluster results comprises: Calculating request time interval characteristics according to the time relation among the network requests in any log clustering result; Calculating the characteristics of a request sequence according to a request sequence formed by the arrangement sequence of network requests in any log clustering result; calculating weighted frequency similarity based on the request sequence features; And calculating the intra-group feature density of any log clustering result according to the request time interval feature, the request sequence feature and the weighted frequency similarity.
  6. 6. The method for identifying network anomaly access according to claim 5, wherein the step of identifying anomaly access behavior according to distribution of feature density in the group of the plurality of log clustering results comprises: Determining a statistical distribution of feature densities within the set of the plurality of log clustering results; dividing the plurality of log clustering results into a normal access category and an abnormal access category according to the characteristics of the statistical distribution; and identifying the network access behavior corresponding to the log clustering result belonging to the abnormal access category as the abnormal access behavior.
  7. 7. The network anomaly access identification method of claim 5 or 6, wherein the step of calculating weighted frequency similarity based on the request sequence features comprises: distributing a weight value to each subsequence according to the occurrence frequency of each subsequence in the request sequence in all log clustering results, wherein the weight value is inversely related to the occurrence frequency of the corresponding subsequence; weighting calculation is carried out on the weight value of each sub-sequence and the frequency of each sub-sequence in the request sequence characteristics, so that the weighting frequency of each sub-sequence is obtained; and carrying out normalization processing on the weighted frequencies of all the subsequences to obtain the weighted frequency similarity.
  8. 8. A network anomaly access identification system, comprising: The acquisition module is used for acquiring the network access log data in a preset time period; an extraction module for extracting a predefined plurality of dynamic behavior features from the network access log data; The clustering module is used for clustering the network access log data by adopting an unsupervised subspace clustering algorithm based on the dynamic behavior characteristics to generate a plurality of log clustering results; the calculation module is used for calculating the intra-group feature density of each log clustering result based on the request sequence feature and the request time interval feature in each log clustering result; and the identification module is used for identifying abnormal access behaviors according to the distribution condition of the feature density in the group of the plurality of log clustering results.
  9. 9. An electronic device comprising a processor coupled to a memory, the memory having stored therein at least one computer program that is loaded and executed by the processor to cause the electronic device to implement the network anomaly access identification method of any one of claims 1 to 7.
  10. 10. A computer-readable storage medium, wherein at least one computer program is stored in the computer-readable storage medium, which when executed by a processor implements the network anomaly access identification method according to any one of claims 1 to 7.

Description

Network abnormal access identification method, system, electronic equipment and storage medium Technical Field The present invention relates to the field of network security technologies, and in particular, to a method, a system, an electronic device, and a storage medium for identifying network abnormal access. Background Along with popularization of internet application and improvement of data value, web crawler technology is widely used, but abnormal access behaviors of malicious crawlers bring heavy loads to a website server, and safety problems such as data leakage and service quality reduction are caused. Therefore, how to effectively identify and intercept abnormal accesses has become a key technical challenge in the field of network security. At present, the network abnormal request recognition technology mainly comprises four types, namely a real-time analysis technology based on network request data, a machine learning method based on log data, such as an online community crawler behavior recognition scheme disclosed in the invention patent with the publication number of CN117596081A, a recognition model generated by collecting access logs and user generated content data through preprocessing and feature association analysis and utilizing self-encoder neural network and deep learning model combined training, a network security protection method disclosed in the invention with the publication number of CN116962075A, a method for recognizing abnormal behavior feature, such as access data of a anticreeping system in a unit period by acquiring the similarity between non-crawler user features, constructing a distribution index as second feature operation information, and a clustering algorithm-based analysis method, such as a data platform anticreeper scheme disclosed in the invention with the publication number of CN116800526A, and the like, and a K-means algorithm to perform recognition on page point, stay time feature, and the like feature combination, and a link similarity recognition and a similarity combination recognition method. However, the technology still has two outstanding problems in practical application, namely, firstly, the existing interception mechanism is mainly used for carrying out passive analysis on request contents when a request occurs, and the modern crawler technology has the countermeasures of counterfeiting request information, simulating a real browser environment, simulating artificial operation and the like, so that a system is difficult to effectively distinguish a camouflage request from normal access at a request level and cannot realize comprehensive recognition and filtration, secondly, the recognition method based on supervised learning depends on a large-scale labeling data set and a relatively static feature set, and the recognition effect is limited. As the crawler technology continues to evolve, static features are easily imitated, hidden or bypassed, so that the recognition accuracy is gradually reduced, and the method is difficult to adapt to dynamically-changing network attack means. Accordingly, there is a need to provide a solution to the above-mentioned problems. Disclosure of Invention In order to solve the technical problems, the invention provides a network abnormal access identification method, a system, electronic equipment and a storage medium. In a first aspect, the present invention provides a method for identifying network abnormal access, which has the following technical scheme: Acquiring network access log data in a preset time period; extracting a predefined plurality of dynamic behavior features from the network access log data; Based on the dynamic behavior characteristics, clustering the network access log data by adopting an unsupervised subspace clustering algorithm to generate a plurality of log clustering results; calculating the intra-group feature density of each log clustering result based on the request sequence feature and the request time interval feature in each log clustering result; And identifying abnormal access behaviors according to the distribution condition of the feature density in the group of the plurality of log clustering results. The network abnormal access identification method has the beneficial effects that: According to the method, the dynamic behavior characteristics are extracted from the network access log, the unsupervised subspace clustering algorithm is adopted, the feature density in the group is calculated based on the request sequence and the time interval characteristics to conduct abnormal recognition, the problems that the prior art depends on static features and labeling data and disguised requests are difficult to recognize are solved, and the recognition accuracy and the dynamic attack adaptability are improved. Based on the scheme, the network abnormal access identification method can be improved as follows. In an optional manner, the step of acquiring the network access log data within the pre