CN-115964666-B - Street level IPv6 geographic positioning method based on graphic neural network
Abstract
The invention provides a street-level IPv6 geographic positioning method based on a graphic neural network, which is used for solving the technical problem of low positioning granularity of the current IPv6 geographic positioning method. The method comprises the steps of anonymously processing the obtained IP address, converting longitude and latitude information of a landmark into an area number, converting the processed node information into an attribute feature map by utilizing a graph neural network, converting the feature information of the edge of the attribute feature map into the weight of the edge by a learning mode, feeding the feature information of the node into an improved GRAPHSAGE model, pruning according to the weight of the edge, learning the information of the adjacent node by convolution, outputting the node information after the node is represented, and putting a node attribute updating matrix of the improved GRAPHSAGE model into a hierarchical classification module for classification by combining the area number, and outputting the target geographic positioning. The invention is superior to the current IPv6 geolocation algorithm in terms of median error, average error and maximum error.
Inventors
- MA ZHAORUI
- ZHOU SHIJIE
- YAN ZHEN
- ZHU FUBAO
- MA PEIKAI
- ZHANG SHICHENG
- LI NA
- HU XINHAO
- WANG HONGJIAN
- LI TIANAO
- DONG QILIN
- FENG HAO
- YIN YI
Assignees
- 郑州轻工业大学
Dates
- Publication Date
- 20260505
- Application Date
- 20230206
Claims (8)
- 1. The street level IPv6 geographic positioning method based on the graphic neural network is characterized by comprising the following steps of: Firstly, preprocessing, namely anonymizing the obtained IP address to remove anonymous nodes, and converting longitudes and latitudes in landmark information into area numbers; Pre-training, namely converting the characteristic information of the edges of the attribute characteristic diagram into the weights of the edges in a learning-based mode; step three, the characteristic information of the nodes comprising the IPv6 address and the intermediate routing node obtained in the step one is fed into an improved GRAPHSAGE model, pruning is carried out according to the weight of the edge, the information of the adjacent nodes is learned through convolution, the node information is aggregated and updated, and a node attribute updating matrix is output after the node is represented; step four, the node attribute updating matrix is put into a hierarchical classification module to be classified, and the area number of the target, namely the geographic position of the target, is output; calculating the error between the geographic position and the true geographic position output by the hierarchical classification model, and adjusting the parameters of the hierarchical classification model through back propagation and gradient descent; The hierarchical classification model comprises a plurality of full-connection layers and classification layers, wherein one full-connection layer is arranged behind each layer; Setting the root node of the tree as the city name, the child node as the belonged area number, outputting the corresponding classification probability by using a classification layer normalized exponential function output layer, inputting the classification result to the next stage, selecting the classification model of the next stage, and finally outputting the area number of the target IP; traversing downwards from a root node by adopting a greedy method, selecting only the node with the highest probability for each layer of leaf nodes, continuing to traverse downwards until a certain node is lower than a set threshold value or reaches the leaf node; Classification layer calculates nodes using normalized exponential function At the position of Probability on layer output node: ; Wherein ic represents the number of cycles, Bias1 is a bias parameter; Representing nodes At the position of A set of probabilities on the layer output nodes; Maximizing probability Obtaining a node Class CH with highest probability, if If the number is greater than the threshold value Q, the second-stage classification is performed, if Smaller than Q, CH is valued to 1 by a function that maximizes the argument, The other values in the vector are 0, and the one-hot coding vector of each layer of the output node The single hot coded vector is mapped by the coded vector and the number Translated into a region number.
- 2. The street level IPv6 geographic positioning method based on the graph neural network according to claim 1, wherein the anonymizing process obtains path information from a target IP address through a Traceroute processing detector, the graph neural network carries out information encoding on nodes and a topological structure to obtain an attribute feature graph, and the information of the nodes comprises an IP address V, time delay D of an edge connected with an adjacent node, domain name query protocol information WH, an intermediate routing node mid_route and a host name Hostname.
- 3. The street level IPv6 geographic positioning method based on the graphic neural network according to claim 2, wherein the method for coding the information of the nodes and the topological structure by the graphic neural network is characterized in that the topological relation of the computer network is represented as an attribute characteristic graph by the graphic neural network , wherein, Is a collection of nodes that are configured to be connected, Representing an ith network node in the computer network; Is a set of edges that are to be joined, Representative node And node Edges therebetween; , representation and node VX represents the characteristics on each node, EX represents the characteristics on the edge; Representing a critical matrix table, if a node And node With connections, elements Otherwise The critical matrix table a is a symmetric matrix.
- 4. The street level IPv6 geographical positioning method based on the neural network according to claim 3, wherein for a plurality of times of data incapable of displaying nodes, the anonymizing processing method is as follows: (1) If father node and child node of two paths are the same, anonymous route or multiple intermediate routes are equal to route A of shortest path as alias route or load route, IP of anonymous route and intermediate multiple routes is replaced by IP of route A of shortest path; (2) The method comprises the steps of using a detector I and a detector II to measure each other in the same area, wherein a father node I and a son node II in two paths are different, and the son node I and the father node II are also different, but using an alias detection algorithm to determine that the father node I and the son node II, and the son node I and the father node II are all alias routers; Dividing the region by using a global unified longitude and latitude mode, dividing candidate landmarks collected by experiments according to longitude and latitude regions, dividing one experimental region by adopting a dichotomy, continuing dichotomy division on the region divided by the dichotomy, and converting the longitude and latitude of the landmarks into the region numbers corresponding to each region.
- 5. The street level IPv6 geographic positioning method based on a neural network according to claim 3 or 4, wherein the learning-based manner is implemented by: the characteristic EX of the edge is time delay D between two IP devices on the edge, similarity P of the two IP devices, similarity Q of domain name query protocol and similarity HN of host name, then Node And node Time delay between The composed set time delay D is the original data; The elements in the similarity P of the two IPs on the edge are: ; Wherein, the Representing edges Is the same number of consecutive bits, h represents the total number of bits of the IP, Represents a similar proportion of two IPs, and ; The elements in the domain name query protocol similarity Q are: ; Wherein, the Representing edges The same number of domain name query protocol information of two connected IPs, g denotes the total number of domain name query protocol information, Representing the similarity of domain name query protocols of two IPs on the edge, and ; The elements in the hostname similarity HN are: ; Wherein, the Representing edges The largest hostname string of the two connected IPs, Representing two consecutive identical strings of connected IPs, Representing edges Similarity of hostnames of (a), and ; Constructing a linear regression model: ; Wherein, the Respectively corresponding to elements Is used for the weight of the (c), The weight matrix is represented by a matrix of weights, The feature vector is represented by a vector of features, 1 Is the bias constant and is set to be equal to, The weight is a linear regression model and is output as an edge; Setting the geographical position distance between two nodes to be smaller than the threshold value T km Otherwise T is a threshold parameter for the target IP distance; Calculating the complete experimental data set The values are obtained to obtain the weight values of all sides of the experimental data set , Representing edges Is a linear regression model , Representation and node Neighboring node And a border therebetween.
- 6. The method for street level IPv6 geographic location based on graphic neural network according to claim 5, wherein said improved GRAPHSAGE model directly performs pruning operation on original network, deletes small-weighted edges from surrounding neighbor nodes, changes the value 1 of the element corresponding to small-weighted edges in critical matrix table A to 0, only retains R edges with larger weight of each node, aggregates the features of all R neighbor nodes of target node onto target node when reasoning, the aggregation mode is random aggregation, the pruning operation is performed in the aggregation process, the lower-weighted value does not participate in aggregation and update operation, and the retained weight value Normalization processing is carried out, and a weight threshold value is set The sampling rule of the neighbor node is that if the neighbor node exists more than the threshold value And also exist below the threshold value Is only selected to be greater than the threshold value If all neighbor nodes are greater than the threshold value Randomly sampling all neighbor nodes if all neighbor nodes are less than a threshold No sampling is performed; the aggregation is to convert a set of node vectors into vectors using an aggregation function: ; Wherein k represents the number of polymerization layers, Representing nodes The node attributes aggregated at the kth layer update the matrix, Representing nodes Directional node An aggregation function aggregated in the information transfer of (a), Representing nodes Surrounding neighbor features aggregated at the kth layer; At the node Is a set of neighbor nodes of (a) After aggregation, according to the node Neighborhood aggregated representation and nodes of (1) Previously expressed as nodes Creating an updated representation: ; Wherein, the As a function of the micro-ability of the device, For the value output after the k-th layer update, Representing nodes The node attributes aggregated at the kth layer update the matrix, Representing the result of the convolution of the previous layer; improved GRAPHSAGE model node The method is to aggregate the neighbor points by adopting a splicing mode: ; Wherein, the And Representing the linear transformation of the central node embedded and neighborhood aggregated messages respectively as a weight matrix, Representing an activation function.
- 7. The method for street level IPv6 geographic location based on a graphic neural network according to claim 6, wherein the aggregation functions are a Mean aggregation operator, a pooling aggregation operator and a long-term and short-term memory network aggregation operator respectively; the Mean aggregation operator is the sum and average of elements: ; The pooling aggregation operator converts neighbor nodes one by one through a nonlinear full-connection layer MLP, then performs one-dimensional pooling on the dimension of the feature, Represents maximum pooling or average pooling: ; the long-term memory network aggregation operator uses the random order of vector sets of adjacent nodes as the input of the long-term memory network, namely:
- 8. The method for street level IPv6 geographic localization based on a graph neural network of claim 1 or 7, wherein the cross entropy loss function calculation is performed on the loss of each level and the true value, and the final loss function is added for the loss of each level: ; Wherein, the The final value of the loss function is indicated, And representing a cross entropy loss function, wherein Pr is a predicted value, and Y is a true value.
Description
Street level IPv6 geographic positioning method based on graphic neural network Technical Field The invention relates to the technical field of IPv6 geographic positioning, in particular to a street level IPv6 geographic positioning method based on a graphic neural network. Background IP geolocation is a technique that allows a user to infer the geographic location of a device, also referred to as a network entity geographic location, based on characteristics of the device's IP address (e.g., network measurements or queriable information). IP geolocation has many applications, such as network security, for verifying and detecting user login information, detecting server intrusion, tracking sources of illegal information transmissions, tracking sources of network attacks, limiting network crimes, blocking certain areas of the network, and the like. In terms of business marketing, IP geographic location may be used to push targeted advertisements based on the geographic location and population density of users, or to provide geographic location information for automatic web page translation or other business areas. Existing IP geolocation algorithms can be divided into two categories, city-level and street-level, depending on the granularity of the location and under non-collaborative conditions. City level algorithms such as the Constraint-based Geolocation (CBG), learning-based Geolocation (LBG) and Ranking Nodes-based Geolocation (RNBG) algorithms have median error distances within 100 km. Whereas Street level algorithms such as Street-Level Geolocation(SLG)、Identification Routers and Local Delay-based Geolocation (IRLD)、Geolocation of covert communication entity on the Internet for post-steganalysis(MLP-Geo) and Exploiting LEAKED IDENTIFIERS IN IPV for Street-Level Geolocation (IPvSeeYou) et al have median error distances within 10 km. Among them, CBG, LBG and RNBG algorithms are applicable to IPv4, while Latency Constraints and Neighbor Sequences-based Geolocation (LCNS) are applicable to IPv6, but the geolocation accuracy is lower. Most existing IPv6 geolocation algorithms are still at the city level. Global IPv6 deployment has entered an acceleration phase as IPv4 addresses are exhausted. However, there are some differences between the IPv4 address and the IPv6 address, so that the existing IPv4 geolocation method (such as CBG, LBG, RNBG algorithm) is not directly applicable to the IPv6 network, or unsatisfactory results are generated due to the imperfections of network routing and sparsity of the IPv6 address, and the existing IPv6 CBG and LCNS algorithms do not have high geolocation accuracy and have a large median error distance. These differences include 1) large IPv6 address space, where traversal probes are not possible, 2) sparse IPv6 addresses, which results in a large time cost of using address sparsity and measurement, 3) network space mapping and network space reverse mapping (CASM) techniques in the context of network security defense, resulting in a large number of anonymous servers, 4) increased storage costs for IPv6 addresses, poor connectivity of IPv6 networks, resulting in a large number of routes with loops and high latency. In addition, the network topology changes faster as there is a dynamic newly added active IP on going. Current learning-based methods, such as machine learning and deep learning, have been widely used in the field of network security, including graphic neural networks. The present invention organically combines a graph neural network with a computer network to address the limitations described above. Disclosure of Invention Aiming at the technical problem that the current IPv6 geographic positioning method has lower positioning fine granularity, the invention provides a street-level IPv6 geographic positioning method based on a graphic neural network, all related routers serving an area are filtered in a learning-based mode, and then geographic position granularity is gradually converged by narrowing the area by using a hierarchical classification method, so that the geographic positioning fine granularity in IPv6 is improved, and the geographic positioning model approaches to the street-level geographic position. In order to achieve the purpose, the technical scheme of the invention is realized by a street level IPv6 geographic positioning method based on a graph neural network, which comprises the following steps: Firstly, preprocessing, namely anonymizing the obtained IP address to remove anonymous nodes, and converting longitudes and latitudes in landmark information into area numbers; Pre-training, namely converting the characteristic information of the edges of the attribute characteristic diagram into the weights of the edges in a learning-based mode; step three, the characteristic information of the nodes comprising the IPv6 address and the intermediate routing node obtained in the step one is fed into an improved GRAPHSAGE model, pruning is carried out a