CN-116662852-B - Equipment type identification method based on flow sampling, terminal equipment and storage medium

CN116662852BCN 116662852 BCN116662852 BCN 116662852BCN-116662852-B

Abstract

The invention discloses a device type identification method based on flow sampling, a terminal device and a storage medium, the complete equipment flow is not required to be acquired, and the equipment can be identified only by sampling the flow at the gateway. In order to solve the problem of missing flow characteristics, the traditional tensor filling algorithm learns the embedding itself corresponding to tensor rows, columns and depths, cannot be popularized to unknown rows, columns or depths, and has the problems of repeated retraining, long time consumption for missing characteristic completion and high cost. The invention provides a generalized tensor filling method which can learn to generate an embedded function by utilizing history information to complete quick and effective equipment flow characteristics.

Inventors

WANG HAOXUAN
XIE KUN
WEN JIGANG

Assignees

湖南大学

Dates

Publication Date: 20260505
Application Date: 20230529

Claims (9)

1. The device type identification method based on the flow sampling is characterized by comprising the following steps: s1, intermittent data packet acquisition is carried out on Internet of things equipment; s2, extracting characteristics of collected data packets to obtain characteristic vectors of the data packets, stacking the characteristic vectors of all the data packets to form a third-order sparse tensor, and constructing a bipartite graph G= { Vm, vw, E }, wherein a left node set Vm of the bipartite graph is a device set, a right node set Vw is a data packet measurement window set, and an edge set E between the left node set and the right node set is a characteristic set; s3, taking the bipartite graph as the input of the equipment identification model to obtain an updated equipment identification model; The device identification model includes: the first graph neural network message transmission layer is used for carrying out first graph information transmission on the input bipartite graph and updating vector characterization of nodes and edges in the bipartite graph; The second graph neural network message transmission layer is used for updating the vector characterization of the nodes and edges in the bipartite graph output by the first graph neural network message transmission layer according to the 2-hop neighbor relation in the bipartite graph output by the first graph neural network message transmission layer; the missing feature complement layer is used for estimating missing feature edges according to node information in the bipartite graph output by the second graph neural network message transmission layer and outputting a complement feature edge set; the convolution unit is used for carrying out convolution operation on the completed characteristic edge set; the characteristic obtaining process comprises the steps of flattening the characteristic output by the convolution unit, and obtaining the spliced characteristic by splicing the flattened characteristic and a node vector output by a second graph neural network message transmission layer.
2. The method for identifying a device type based on traffic sampling according to claim 1, wherein the specific implementation process of step S1 includes: Setting a plurality of measurement windows; a portion of the measurement window is randomly selected and device traffic within the selected measurement window is collected.
3. The device type identification method based on flow sampling according to claim 1, wherein the specific implementation process of updating the vector representation of the nodes and edges in the bipartite graph by the first graph neural network message transfer layer comprises the following steps: constructing neighbor aggregation vectors for node v in the 1 st graph information transfer by : ; Is a set of neighbor nodes for node v, As a neighbor node of the node v, For the initial vector characterization of the 1-hop neighbor node u of node v, For the initial vector characterization of the edge uv, The operation of the splice is indicated and, As a function of the mean value aggregation, As a function of the non-linear activation, In the 1 st graph information transmission, a process learning parameter is constructed; Updating node vector characterizations using: Wherein, the method comprises the steps of, For the token vector updated by node v in the transfer of figure 1, In the information transfer of the 1 st graph, the node characterizes the learnable parameters of the updating process, Representing an initial vector of a node v; updating edge vector characterizations using: Wherein, the method comprises the steps of, For the token vector updated by the side uv in the 1 st pass of the graph information, In the information transfer of the 1 st graph, the learning parameters of the updating process are represented, The vector updated in message passing 2 for node u is characterized.
4. The traffic sampling-based device type identification method according to claim 3, wherein the specific implementation process of updating the node and edge vector representation in the bipartite graph output by the first graph neural network message passing layer by the second graph neural network message passing layer comprises: constructing neighbor aggregation vector of node v in 2 nd graph information transfer by : ; For vector characterization of node u in the transfer of figure 1, In the 2 nd graph information transmission, a process learning parameter is constructed; updating node vector representation: ; for the token vector updated by node v in the 2 nd graph message pass, In the 2 nd graph information transmission, the node represents a learnable parameter in the updating process; updating edge vector representation: ; for the token vector updated by the edge uv in the 2 nd graph information pass, In the 2 nd graph information transfer, the learning parameters of the updating process are represented, The vector updated in the 3 rd message pass for node u is characterized.
5. The method for identifying a device type based on flow sampling according to claim 1, wherein the specific implementation process of obtaining the completed feature edge set by the missing feature complement layer includes: Estimating missing feature edges: Wherein, the method comprises the steps of, Is an estimate of the missing feature edges between nodes v and u, is a K-dimensional feature vector, And The token vectors updated in the second graph neural network messaging layer for nodes v and u respectively, In order for the parameters to be able to be learned, Is a nonlinear activation function; Combining the estimated values of all the missing feature edges into a known feature edge set to obtain a completed feature edge set 。
6. The method of claim 1, wherein the convolution unit comprises a plurality of convolution modules connected in series.
7. The traffic sampling-based device type identification method according to claim 1, wherein the linear output layer firstly maps the input spliced vector into vectors with vector lengths equal to the number of device types, each element in the vectors corresponds to a score of a device type, and then converts the vectors into probability distribution through the softmax layer, wherein the device type corresponding to the maximum value is the final device type.
8. A terminal device, comprising: One or more processors; A memory having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the steps of the method of any of claims 1-7.
9. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the steps of the method according to any one of claims 1-7.

Description

Equipment type identification method based on flow sampling, terminal equipment and storage medium Technical Field The invention relates to the technical field of deep learning and equipment identification, in particular to an equipment type identification method based on flow sampling, terminal equipment and a storage medium. Background With the development of embedded systems, wireless communication, cloud computing and artificial intelligence, the internet of things is widely used in the fields of industry, home and office. In order to ensure the safety and service management of the Internet of things, it is very important to timely identify the management of the Internet of things equipment on the whole Internet of things environment. For example, a network service provider may allocate higher bandwidth to devices identified as smart speakers to improve user experience and satisfaction, and may provide more reliable communication services to devices identified as smoke alarms or smart door locks to ensure user life safety. The device identification based on the flow characteristics is an effective device identification method, which extracts characteristics such as time intervals, communication flow, protocols and the like from the complete flow of the internet-of-things device, and classifies the flow behavior characteristics of the device by applying a machine learning multi-classifier such as random forest, support vector machine, convolutional neural network, long-term and short-term memory network and the like, thereby realizing the identification of the device. The method can be roughly divided into three steps of (1) collecting the complete flow of equipment, (2) extracting the equipment flow behavior characteristics from the collected complete equipment flow, and (3) inputting the flow characteristics into an equipment identification model, namely a machine learning multi-classifier, and identifying and classifying the equipment types. Although the equipment based on the flow characteristics is identified, the identification accuracy is high. However, they still have some non-negligible technical problems, i.e. they rely on extracting features from the complete traffic flow, such as source/destination address, source/destination port, network protocol, average/standard deviation of upstream and downstream packet length, duration, payload, etc. The acquisition of the complete data stream requires continuous capturing of the data packets on the gateway, which not only increases the computation burden and storage pressure of the gateway, but also may affect the network communication speed and stability, thereby resulting in poor robustness and practicality of the method. In addition, an environmental protection equipment identification method based on time sequence is also an effective equipment identification method. This approach, while not based on an identification of device traffic, still requires the acquisition of complete environmental protection device real-time waveform data. Even if they use the neural network to extract the node characteristics of the gray image and use the gray characteristics to construct a two-stage fuzzy classification network, the fuzzy rules are screened to achieve higher equipment identification accuracy. However, when the number of devices is large, a large data acquisition overhead is still caused, that is, the overhead of the data acquisition source still exists. In addition, two-dimensional classification networks may require more computational resources and time to process higher-dimensional, more complex data, and may risk overfitting or underfilling, and modeling into gray images may lose detailed information of some environmental-friendly equipment waveform data, and may be affected by noise or interference, resulting in poor recognition accuracy. Meanwhile, an abnormality detection method for the electric power Internet of things equipment based on the graph neural network is also provided. According to the method, the image neural network is used for detecting the abnormality of the equipment of the Internet of things, the flow data and the service data of different equipment in the Internet of things are still required to be completely collected at a data acquisition source, when the number of the equipment which is connected into the Internet is large, larger data acquisition expenditure is still caused, and a large burden is caused to the data acquisition source, such as a gateway, so that the basic function of the network forwarding equipment is further influenced. Disclosure of Invention The invention aims to solve the technical problems that aiming at the defects of the prior art, the invention provides a device type identification method, terminal equipment and storage medium based on flow sampling, and solves the problems that the traditional device identification method based on device flow characteristics relies on collecting and storing compl