Search

CN-122020229-A - Anchor point guide graph clustering network

CN122020229ACN 122020229 ACN122020229 ACN 122020229ACN-122020229-A

Abstract

The embodiment of the application discloses an anchor point guide graph clustering network which comprises the steps of (1) generating pseudo tags with global distribution characteristics by adopting a characteristic analysis method, (2) learning a mapping relation between sample characteristics and the pseudo tags through a model, and (3) gradually aggregating samples of the same tags in an embedded space, wherein the model dynamically corrects false tags in the learning process so as to enable the false tags to be re-aggregated into clusters with similar semantics. According to the application, an anchor guiding learning strategy is designed, an anchor point is automatically selected from a high confidence coefficient sample farthest from a distribution boundary, and a false label is generated in a diffusion characteristic space by utilizing a k-means++ algorithm, so that the learning of discriminant characterization is guided. Further, AGCN is fused with an anchor point expansion strategy and a compatibility enhancement mechanism, so that the discriminant of clustering and the accuracy of recommended tasks are remarkably improved.

Inventors

  • TANG XIANGYAN
  • ZENG FAQIANG
  • CHENG JIEREN

Assignees

  • 海南大学

Dates

Publication Date
20260512
Application Date
20251212

Claims (7)

  1. 1. An anchor point guidance graph clustering network, comprising: (1) Generating a pseudo tag with global distribution characteristics by adopting a characteristic analysis method; (2) Learning a mapping relation between sample characteristics and the pseudo tag through a model; (3) In the embedded space, samples of the same label are gradually aggregated, and the model dynamically corrects false labels in the learning process so as to enable the false labels to be re-aggregated into clusters with similar semantics.
  2. 2. The anchor point guidance graph clustering network of claim 1, wherein the generating pseudo tags with global distribution features using a feature analysis method comprises: Clustering the diffusion attribute matrix X p of the target undirected graph by adopting k-means++, so as to obtain a pseudo-tag independent-heat coding matrix Y k ; extracting local important features in the diffusion attribute matrix Xp to form an enhancement attribute matrix X e , and combining X e with a symmetrical normalized adjacent matrix Inputting the two models into a model for training, and guiding the model to learn by utilizing a single-heat coding matrix Y k ; after the model is quickly and roughly trained, an easily separable sample which is farthest from the dividing curved surface and has large difference is selected as an anchor sample V a of the next stage.
  3. 3. The anchor point guidance map clustering network of claim 1, wherein the learning the mapping network relationship between the sample features and the pseudo tags by the model comprises: The mapping relation between the anchor sample V a and the single thermal code Y a of the pseudo tag of the anchor sample is used as the result of constraint conditions on the model Training: Generating a target distribution P; minimizing KL-divergence loss between the Y distribution and the P distribution; an update mechanism of the threshold is employed to gradually expand the anchor sample set.
  4. 4. The anchor-directed graph clustering network of claim 1, wherein the gradual aggregation of samples of identical labels in the embedded space, the model dynamically corrects false labels during learning to re-aggregate them into semantically similar clusters, comprises: Stripping the sample close to the segmentation curved surface; generating C cluster centers by adopting a diffusion attribute matrix X p and anchor samples of the corresponding clusters respectively; Calculating a space distance Dis between Vr and each cluster center C p through a cluster center C p obtained by taking an average value of anchor samples in the same cluster; finding the minimum distance calculated between the sample and the cluster center, and then respectively corresponding the sample to the cluster with the minimum distance; and (5) extracting important clustering information by the model.
  5. 5. The anchor point guidance graph clustering network of claim 1, further comprising a loss function and complexity analysis.
  6. 6. The anchor point guidance graph clustering network of claim 5, wherein the loss function analysis comprises: Integrating steps (1) (2) and (3) into the same framework, the total loss function is: Wherein, the = U The set of pseudo tags for all difficult samples and anchor samples, = + , The moment in time when the anchor sample is selected is shown.
  7. 7. The anchor point guidance graph clustering network of claim 5, wherein the loss complexity analysis comprises: the time complexity of the graph roll-up neural network layer is In the following Before In addition to the model itself, there is also a cross entropy function with a computational complexity of In the following Then, KL-divergence is added as a loss function, the time complexity is that The time complexity of anchor sample selection and update is both The time complexity of processing difficult samples is Due to The temporal complexity of the cross entropy function at this time is therefore still Due to The period between the two is small compared with the whole training period and can be ignored, so that the time complexity of the whole model is finally 。

Description

Anchor point guide graph clustering network Technical Field The application relates to the technical field of multi-view data processing, in particular to an anchor point guide graph clustering network. Background In the field of personalized service recommendation, clustering is a key means, and users with similar behavior and characteristics are divided into different groups, so that more accurate recommendation is realized. The depth map clustering aims at mining potential structural modes in the user behavior map, so that users are effectively grouped based on behavior similarity, and the efficiency of personalized service recommendation is improved. Existing graph clustering methods can be broadly divided into two categories, contrast and generative. The contrast graph clustering relies on cross-view sample contrast learning to obtain discriminant representation, and partial clustering information can be captured. The generated graph clustering method generally adopts an encoder-decoder structure, and learns high-quality user representation under a self-supervision framework, so that the risk of information loss is reduced, and a more stable recommendation result can be generated. However, the conventional contrast graph clustering method may lose key information in the feature dimension reduction process, so that representation collapse is caused, and the recommendation effect is affected, and the generated graph clustering method relies heavily on pre-training, the training process is sensitive to initialization and has low robustness, and high-cost pre-training is needed to be repeated during parameter adjustment or model migration. Disclosure of Invention In order to solve the existing technical problems, the embodiment of the application provides an anchor point guide graph clustering network. The technical scheme is as follows: In a first aspect, an anchor point guidance graph clustering network is provided, including: (1) Generating a pseudo tag with global distribution characteristics by adopting a characteristic analysis method; (2) Learning a mapping relation between sample characteristics and the pseudo tag through a model; (3) In the embedded space, samples of the same label are gradually aggregated, and the model dynamically corrects false labels in the learning process so as to enable the false labels to be re-aggregated into clusters with similar semantics. Further, the generating the pseudo tag with the global distribution feature by adopting the feature analysis method includes: Clustering the diffusion attribute matrix X p of the target undirected graph by adopting k-means++, so as to obtain a pseudo-tag independent-heat coding matrix Y k; extracting local important features in the diffusion attribute matrix Xp to form an enhancement attribute matrix X e, and combining X e with a symmetrical normalized adjacent matrix Inputting the two models into a model for training, and guiding the model to learn by utilizing a single-heat coding matrix Y k; after the model is quickly and roughly trained, an easily separable sample which is farthest from the dividing curved surface and has large difference is selected as an anchor sample V a of the next stage. Further, the learning, through the model, the mapping network relationship between the sample feature and the pseudo tag includes: The mapping relation between the anchor sample V a and the single thermal code Y a of the pseudo tag of the anchor sample is used as the result of constraint conditions on the model Training: Generating a target distribution P; minimizing KL-divergence loss between the Y distribution and the P distribution; an update mechanism of the threshold is employed to gradually expand the anchor sample set. Further, in the embedding space, the samples of the same label are gradually aggregated, and the model dynamically corrects the false label in the learning process to enable the false label to be re-aggregated into clusters with similar semantics, which comprises the following steps: Stripping the sample close to the segmentation curved surface; generating C cluster centers by adopting a diffusion attribute matrix X p and anchor samples of the corresponding clusters respectively; Calculating a space distance Dis between Vr and each cluster center C p through a cluster center C p obtained by taking an average value of anchor samples in the same cluster; finding the minimum distance calculated between the sample and the cluster center, and then respectively corresponding the sample to the cluster with the minimum distance; and (5) extracting important clustering information by the model. Further, a loss function and complexity analysis are included. Further, the loss function analysis includes: Integrating steps (1) (2) and (3) into the same framework, the total loss function is: Wherein, the =UThe set of pseudo tags for all difficult samples and anchor samples,=+,The moment in time when the anchor sample is selected is s