CN-121984784-A - Website fingerprint identification attack method, device and equipment

CN121984784ACN 121984784 ACN121984784 ACN 121984784ACN-121984784-A

Abstract

The application provides a website fingerprint identification attack method, device and equipment, wherein an original flow track is mapped to a structured feature matrix through a bilateral feature representation module, a standardized time line is divided into fixed windows, scale-invariant representation is extracted through a multi-scale feature migration module to process scale change of a cross browser in the flow track, the module is pre-trained by using reference flow data, the flow mode of each website is learned under low delay, a pre-training model is obtained, limited and truncated Torr flow is used for carrying out less sample training on the pre-training model, effective cross-domain knowledge migration is realized, final feature vectors are generated, and a classifier is used for carrying out website classification on the feature vectors, so that analysis and identification performances of anonymous communication flow under the conditions of less samples and incomplete sample tracks are remarkably improved, and a systematic technical scheme and a method support are provided for flow analysis of a Torr anonymous system under an actual network environment.

Inventors

QU ZHE
GAO XIANGYU
SHI LEI

Assignees

中南大学

Dates

Publication Date: 20260505
Application Date: 20260403

Claims (10)

1. The website fingerprint identification attack method is characterized by comprising the following steps of: Obtaining flow data of the Tor browser to be processed; Processing the flow data of the Tor browser to be processed by utilizing the trained website fingerprint identification attack model to obtain an identification result, The website fingerprint identification attack model comprises a bilateral feature representation module for constructing a structured feature matrix corresponding to a flow track based on the representation of a data packet on a time sequence obtained by dividing the flow track, and a multi-scale feature migration module for carrying out multi-scale feature extraction and fusion on the structured feature matrix, wherein the flow data comprises a plurality of flow tracks, the data packet is represented by a time stamp and key network features, The website fingerprint identification attack model is obtained through the following training process: A pre-training stage for pre-training the website fingerprint identification attack model by using the reference browser flow data, and And in the training stage of few samples, carrying out training of few samples on the website fingerprint identification attack model by using the flow data of the Torr browser for training, and freezing parameters of the multi-scale characteristic migration module in the training stage of few samples.
2. The web fingerprint recognition attack method of claim 1, wherein the key network characteristics include a transmission direction, a packet size and a time stamp, and the traffic trace is expressed as: wherein f represents a data packet in the flow track, and g represents the length of the flow track; the data packet is expressed as: Where k represents the position of the packet in the traffic trace, t represents the timestamp of the packet, s represents the packet size, and the packet transmission direction is encoded as Is the sign of: representing outgoing packets The incoming packet is represented by a representation of the incoming packet, The structured feature matrix M is expressed as: Where N is the number of time windows, For structuring elements in the feature matrix, i is defined by Is determined by the sign of (3).
3. The web site fingerprinting attack method of claim 1 wherein the multi-scale feature migration module comprises: A multi-scale feature generation module for multi-scale feature extraction of the structured feature matrix, and And the multi-scale feature fusion module is used for fusing the extracted multi-scale features.
4. The web fingerprint recognition attack method of claim 3, wherein the multi-scale feature generation module comprises a multi-scale abstraction of four-dimensional progressive convolutional block cascade implementation features, generating four hierarchical features.
5. The web fingerprint identification attack method of claim 4, wherein the convolution block comprises a first one-dimensional convolution layer, a first normalization layer, a first ReLU activation function layer, a second one-dimensional convolution layer, a second normalization layer, a second ReLU activation function layer, a maximum pooling layer and a Dropout layer which are sequentially connected.
6. The method of fingerprint identification attack of a website of claim 5, wherein the multi-scale feature fusion module comprises a third one-dimensional convolution layer, a fourth one-dimensional convolution layer, a fifth one-dimensional convolution layer, a sixth one-dimensional convolution layer, a first element-by-element addition layer, a second element-by-element addition layer, a third element-by-element addition layer, a fourth element-by-element addition layer, a first upsampling layer, a second upsampling layer, a third upsampling layer, a first lightweight convolution block, a second lightweight convolution block, a third lightweight convolution block, and a fourth lightweight convolution block, the third one-dimensional convolution layer, the fourth one-dimensional convolution layer, the fifth one-dimensional convolution layer, and the sixth one-dimensional convolution layer take each of the hierarchical features as inputs, an output of the third one-dimensional convolution layer and the first upsampling layer as an input of the first element-by-element addition layer, an output of the first element-by-element addition layer as an input of the first lightweight convolution block, the fourth one-dimensional convolution layer and the second element-by-element addition layer as an input of the second element-by-element addition layer, a fourth one-dimensional convolution layer and the fourth element-by-element addition layer as an input of the fourth one-dimensional convolution layer, a fourth element-by-element addition layer as an input of the fourth one-dimensional convolution layer, and the fourth element-by-addition layer as an input of the fourth element-by the fourth element addition layer, and the output of the fourth element layer as a layer.
7. The web fingerprinting attack method of claim 1 wherein the low sample training phase is trained using limited and truncated Tor browser traffic data.
8. The web site fingerprinting attack method of claim 1 wherein the loss function L used for training is: Wherein B is the number of samples of the training round, C is the total number of categories of the website, For a real web site tag for which the i-th sample belongs to category c, The probability that the i-th sample belongs to category c is predicted for the model.
9. A web site fingerprint identification attack apparatus, comprising: The acquisition unit is used for acquiring flow data of the Tor browser to be processed; A training unit for processing the flow data of the Tor browser to be processed by utilizing the trained website fingerprint recognition attack model to obtain a recognition result, The website fingerprint identification attack model comprises a bilateral feature representation module for constructing a structured feature matrix corresponding to a flow track based on the representation of a data packet on a time sequence obtained by dividing the flow track, and a multi-scale feature migration module for carrying out multi-scale feature extraction and fusion on the structured feature matrix, wherein the flow data comprises a plurality of flow tracks, the data packet is represented by a time stamp and key network features, The website fingerprint identification attack model is obtained through the following training process: A pre-training stage for pre-training the website fingerprint identification attack model by using the reference browser flow data, and And in the training stage of few samples, carrying out training of few samples on the website fingerprint identification attack model by using the flow data of the Torr browser for training, and freezing parameters of the multi-scale characteristic migration module in the training stage of few samples.
10. An electronic device comprising a processor, and a memory coupled to the processor, The memory is used for storing a computer program; The processor configured to execute the computer program stored in the memory, to cause the electronic device to perform the web site fingerprinting attack method according to any of claims 1-8.

Description

Website fingerprint identification attack method, device and equipment Technical Field The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, and a device for web fingerprint identification attack. Background Onion routing (Tor) browsers are the most widely used anonymous communication system at present, with active user sizes of millions per day. The system provides anonymous browsing service for users through mechanisms such as random relay node selection, multi-layer encryption and the like. However, tor presents significant vulnerability in defending against website fingerprint (Website Fingerprinting, WF) recognition attacks. The website fingerprint identification attack extracts the flow pattern characteristic specific to the target website by using a machine learning technology, thereby realizing effective identification of the website accessed by the Torr user. The existing website fingerprint identification attack method based on deep learning has high identification accuracy, and can reach more than 95% under ideal experimental conditions. Deep learning based web site fingerprinting attacks, however, rely heavily on the complete and pure traffic collected during page loading for traffic and analysis. In practice, because of the mixed background traffic, the whole process of loading traffic on the website cannot be perceived by the attacker, and the poor network conditions and the website fingerprint defense technology prevent the attacker from effectively collecting the complete pure traffic loaded on the page, so that the attack performance on some websites is obviously reduced. And because of factors such as website content updating, tor browser version iteration, network path random selection and the like, website flow characteristics change accordingly, an attacker needs to regularly update a large amount of training data to retrain an attack model, and high cost generated by the attack model is difficult to bear in practical application. For application in real environments, web site fingerprinting attacks require training using incomplete traffic trajectories and reduce the number of traces required for training. However, existing deep learning based web fingerprinting attacks do not work well in this case. Because of the development of network conditions and defense techniques, the Tor traffic available to an attacker is often incomplete, messy and structurally weak, meaning that many of the features utilized by prior web fingerprinting attacks can no longer be reliably observed. Therefore, training of effective classifiers directly over Torr traffic becomes increasingly difficult, especially if limited traffic data is only collected. Therefore, how to extract more efficient traffic features and implement accurate web site fingerprinting attacks for incomplete and small amounts of traffic data is a need for research. In addition, patent document 1 (CN 118157963 a) proposes a method and a system for identifying a user accessing website fingerprint based on distribution calibration, which relate to the technical field of network supervision, wherein a flow track generated in the process of accessing a website by a captured user is input into a website fingerprint identification model based on distribution calibration, the website fingerprint identification model comprises a feature extraction module, a distribution calibration module and a classification module, features of the input flow track are extracted through the feature extraction module, the extracted features are subjected to distribution calibration through the distribution calibration module, namely, an auxiliary flow track accessed by the user is utilized, feature distribution statistical information of the auxiliary flow track is transferred onto the features of the input flow track, so that new features are generated, the new features are spliced with the extracted original features and then input into the classification module, and finally an accurate website identification result is output. Patent document 1 differs from the present application in that: 1. The patent document 1 trains and recognizes that the data used is the same type of data, so that the patent document 1 still has the problems mentioned above, and the patent document 1 needs to rely on a large number of marked auxiliary flow tracks (i.e. source domain data) in the migration learning or model training process, and the data is still the Tor flow track. However, anonymity of the Tor network causes multiple barriers to traffic data acquisition, namely limited access of nodes, high cost of data labeling, narrow sample acquisition channel and limitation on expandability and practicability of the technology, and the technical scheme of the patent document 1 depends on acquiring a website traffic track with complete loading as an analysis basis, so that the precondition is often difficult to meet