CN-122020222-A - Method, system, equipment and medium for discovering multiple network communities in noise environment

CN122020222ACN 122020222 ACN122020222 ACN 122020222ACN-122020222-A

Abstract

The application provides a method, a system, equipment and a medium for discovering multiple network communities facing noise environments, which are characterized in that a first random binary mask matrix and a second random binary mask matrix are constructed, and partial edges and partial attribute features are shielded randomly to produce an adjacent mask matrix and an attribute mask matrix, so that a model is forced to learn stable node representation under the conditions of missing, incomplete and even noisy data, the problem that the reliability of the structure is poor due to noise edges and redundant edges contained in a real multiple network can be effectively avoided by effectively avoiding the GCN excessively depending on the specific structure or the feature of a certain view in the prior art. And then, the node representations generated by different views are constrained by the loss function in the unified characterization space, so that the consistency and comparability of the structural features and the attribute features are maintained, the complementary information among multiple views can be fully integrated, the feature conflict or the mode deviation is avoided, and the node representations with higher stability and discrimination are provided for community division.

Inventors

CHEN YAN
HAN CHUNLEI
Zhang Xiushe
LI SHUO
LUO XINYUE
Teng Xiangyi

Assignees

中国电子科技集团公司第二十研究所

Dates

Publication Date: 20260512
Application Date: 20251224

Claims (10)

1. A method of multiple network community discovery for noisy environments, comprising the steps of: Step S1, generating node characteristics of a plurality of node relation views in a multiple network by adopting characteristic propagation, and processing the node characteristics to obtain a refined adjacency matrix; S2, constructing a first random binary mask matrix for the refined adjacent matrix, and obtaining an adjacent mask matrix of the refined adjacent matrix based on the first random binary mask matrix; step S3, constructing a second random binary mask matrix for the initial attribute matrix of the node characteristic, and obtaining an attribute mask matrix of the initial attribute matrix based on the second random binary mask matrix; S4, inputting the adjacent mask matrix and the attribute mask matrix into a multi-layer graph convolution neural network GCN to obtain node representations of each node relation view, and splicing the node representations of each node relation view to obtain a first fusion feature; step S5, splicing the initial attribute matrix of the node characteristic into a second fusion matrix, and inputting the second fusion matrix into a multi-layer perceptron MLP to obtain hidden layer representation; S6, carrying out similarity measurement on the same unit hypersphere to obtain a similarity matrix of the node representation and the hidden layer representation, constructing an initial loss function of each node based on the similarity matrix, aggregating the initial loss functions to obtain an aggregate loss function, and aligning the node representation and the hidden layer representation in a unified feature space based on the aggregate loss function to obtain aligned node features; And S7, obtaining a community division result by adopting a K-means algorithm based on the alignment node characteristic representation.
2. The method for discovering multiple network communities in a noise environment according to claim 1, wherein when the refined adjacency matrix is obtained in the step S1, the method comprises the following steps: generating node features for each node relationship view using feature propagation : ; In the formula, , Is an identity matrix; For the initial adjacency matrix, Represent the first A view; The node initial attribute matrix is used for the node; Obtaining similarity matrix between nodes based on node characteristics XL : ; In the formula, And All represent nodes; representing similarity to cosine; after the similarity matrix Hl is subjected to symmetry processing, a kNN algorithm is adopted to reserve the front part of each node Generating a refining adjacent matrix after standardization of neighbors : ; In the formula, Representing a normalization process.
3. The method for multiple network community discovery in a noisy environment according to claim 1, wherein in step S2, the adjacency mask matrix is used Comprising the following steps: ; In the formula, Representing element-by-element multiplication, refining adjacency matrix , For the number of nodes, a first random binary mask matrix And with probability Will be Mask element in (a) Set to 0, probability% Will be Mask element in (a) Let 1 be the value.
4. The method for multiple network community discovery in a noisy environment according to claim 1, wherein in step S3, the attribute mask matrix is used Comprising the following steps: ; In the formula, Representing the multiplication by element, For an initial matrix of attributes of the node features, , A second random binary mask matrix for the feature dimension And with probability Will be Mask element in (a) Set to 0, probability% Will be Mask element in (a) Let 1 be the value.
5. The method of multiple network community discovery for noise-oriented environments according to claim 1, wherein the node representation Zl of each node relation view in step S4 comprises: ; Wherein, the , Is a matrix of units which is a matrix of units, Is the first Layer output, wherein , As a matrix of weights, the weight matrix, Activating a function for a ReLU; representing the nodes of each node relationship view Longitudinally spliced as a first fused feature , 。
6. The method for discovering multiple network communities in a noise environment according to claim 1, wherein in the step S5, the initial attribute matrix of the node features is vertically spliced or longitudinally spliced to obtain a second fusion matrix Fusing the second matrix Inputting the multi-layer perceptron MLP to obtain hidden layer representation Comprising the following steps: 。
7. the method for multiple network community discovery in a noisy environment according to claim 1, wherein the similarity matrix in step S6 Comprising the following steps: ; in the formula, the node representation and the hidden layer representation are respectively normalized by L2 to obtain And (3) with ; Construction of infoNCE loss for each node based on similarity matrix as an initial loss function : ; In the formula, And Are all similarity matrices Is a combination of the elements of (1), Is a temperature reference; is a natural constant, and is a base of a natural exponential function; The aggregate loss function Comprising the following steps: ; In the formula, Is the total number of network nodes; In the unified feature space, aligning the node representation and the hidden layer representation based on the aggregation loss function, and obtaining aligned node features by adopting a contrast learning framework : ; Wherein argmin represents the value obtained by optimization under a contrast learning framework Make the following In the embedded space and Maximum consistency is maintained.
8. A system for noise-environment-oriented multiple web community discovery, characterized in that the method for noise-environment-oriented multiple web community discovery according to any one of claims 1-7 comprises: the initialization unit is configured to generate node characteristics of a plurality of node relation views in the multiple networks by adopting characteristic propagation, and process the node characteristics to obtain a refined adjacency matrix; A first masking unit configured to construct a first random binary mask matrix for the refined adjacency matrix and to obtain an adjacency mask matrix for the refined adjacency matrix based on the first random binary mask matrix; A second masking unit configured to construct a second random binary mask matrix for the initial attribute matrix of the node feature, and obtain an attribute mask matrix of the initial attribute matrix based on the second random binary mask matrix; the first fusion unit is configured to input the adjacency mask matrix and the attribute mask matrix into a multi-layer graph convolution neural network GCN to obtain node representations of each node relation view, and splice the node representations of each node relation view to obtain a first fusion feature; the second fusion unit is configured to splice the initial attribute matrix of the node characteristic into a second fusion matrix, and input the second fusion matrix into the multi-layer perceptron MLP to obtain hidden layer representation; The alignment unit is configured to perform similarity measurement on the same unit hypersphere to obtain a similarity matrix of the node representation and the hidden layer representation, construct an initial loss function of each node based on the similarity matrix, aggregate the initial loss functions to obtain an aggregate loss function, and align the node representation and the hidden layer representation in a unified feature space based on the aggregate loss function to obtain an aligned node feature; And the output unit is configured to obtain a community division result by adopting a K-means algorithm based on the aligned node characteristics.
9. An electronic device, characterized in that, the electronic device includes: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of noise-environment-oriented multi-web community discovery of any one of claims 1 to 7.
10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of noise-environment-oriented multiple web community discovery of any one of claims 1 to 7.

Description

Method, system, equipment and medium for discovering multiple network communities in noise environment Technical Field The invention belongs to the technical field of data processing, and particularly relates to a method, a system, equipment and a medium for discovering multiple network communities facing a noise environment. Background Multiple networks have been widely used to represent a variety of relationships between objects in the real world. They can be intuitively represented as a multi-layer structure, where nodes and their attributes are shared among all layers, but each layer exhibits a different topology, reflecting different types of relationships between nodes. For example, in a social network, each node represents a person with personality attributes, and the edges of each layer represent different relationships. Clearly, the study of multiple networks is a valuable and challenging task. Community discovery is an important technical means in multiple network analysis, which aims at finding a set of nodes with dense internal connections and sparse external connections. In order to cope with label noise in image classification, performance of a depth model under noisy data can be improved through community discovery, and in the field of recommendation systems, accuracy and robustness of recommendation users can be improved through community discovery. Multiple graphs, as a special type of heterogeneous graph, contain more rich information, which undoubtedly presents challenges for graph representation learning. Unsupervised multiple graph learning is a powerful tool for multiple graph representation, and is attracting attention because of its ability to utilize information from different views, the use of graph neural networks, and self-supervision techniques. However, each view of the multiple networks contains complementary information but is not distributed uniformly, and how to achieve consistency and effective information utilization in the fusion process remains a key challenge. The existing method is easy to introduce redundancy and even noise by directly splicing the multi-view features, and is also capable of independently modeling each view, but ignoring inherent relevance among the views. In the prior art, when multiple networks are processed, the reliability of the graph structure is often ignored, and in actual graph data, many edges are irrelevant edges or noise data, which seriously affects the effect of the method. The noise in the multi-layer network data can be divided into label noise and characteristic noise, wherein the label noise is label information errors of entities or edges (such as entity attribute labeling errors in a knowledge graph). The above problems can lead to a significant reduction in the accuracy of the results of multiple network community discovery due to the fact that feature noise, i.e., the deviation of feature vectors of nodes or edges (e.g., anomalies in user behavior data in a social network), is present. Disclosure of Invention In order to solve the problems of poor accuracy and instability of multi-network community discovery caused by noise edges and redundant edges in the noise environment in the prior art, the invention provides a method, a system, equipment and a medium for multi-network community discovery facing the noise environment. In order to achieve the above purpose, the present invention provides the following technical solutions: In a first aspect, embodiments of the present disclosure provide a method of multiple network community discovery for a noisy environment, comprising the steps of: Step S1, generating node characteristics of a plurality of node relation views in a multiple network by adopting characteristic propagation, and processing the node characteristics to obtain a refined adjacency matrix; S2, constructing a first random binary mask matrix for the refined adjacent matrix, and obtaining an adjacent mask matrix of the refined adjacent matrix based on the first random binary mask matrix; step S3, constructing a second random binary mask matrix for the initial attribute matrix of the node characteristic, and obtaining an attribute mask matrix of the initial attribute matrix based on the second random binary mask matrix; S4, inputting the adjacent mask matrix and the attribute mask matrix into a multi-layer graph convolution neural network GCN to obtain node representations of each node relation view, and splicing the node representations of each node relation view to obtain a first fusion feature; step S5, splicing the initial attribute matrix of the node characteristic into a second fusion matrix, and inputting the second fusion matrix into a multi-layer perceptron MLP to obtain hidden layer representation; S6, carrying out similarity measurement on the same unit hypersphere to obtain a similarity matrix of the node representation and the hidden layer representation, constructing an initial loss function of each node ba