Search

CN-122020227-A - Asymmetric multi-modal clustering method based on hyperbolic uncertainty modeling

CN122020227ACN 122020227 ACN122020227 ACN 122020227ACN-122020227-A

Abstract

The application belongs to the technical field of deep learning and data mining, and discloses an asymmetric multi-modal clustering method based on hyperbolic uncertainty modeling, which comprises the steps of acquiring a multi-modal data set and preprocessing to obtain a processed multi-modal data set; inputting multi-mode data into an encoder network to obtain hidden layer features and projecting to obtain hyperbolic embedded features, calculating the distance between the hyperbolic embedded features and a clustering prototype, calculating prediction entropy, constructing an asymmetric mode alignment mechanism to enable the low-entropy mode to guide the high-entropy mode in a unidirectional mode to perform representation alignment, fusing multi-mode soft allocation probability to obtain global consensus probability and a pseudo tag, updating the clustering prototype, performing iterative training on a mode independent automatic encoder network to obtain a trained mode independent automatic encoder network, and outputting clustering results of the multi-mode data. The method and the device remarkably improve the accuracy and the robustness of the multi-mode clustering in the environments with noise and information deletion.

Inventors

  • WANG YIMING
  • XIANG HAO
  • DING YINAN
  • WANG HONGYI
  • XIAO FU
  • CUI KAIYAN
  • LI QUN

Assignees

  • 南京邮电大学

Dates

Publication Date
20260512
Application Date
20260413

Claims (7)

  1. 1. The asymmetric multi-modal clustering method based on hyperbolic uncertainty modeling is characterized by comprising the following steps of: Step 1, acquiring a multi-mode data set, and preprocessing the multi-mode data set to obtain a processed multi-mode data set; Step 2, inputting the multi-mode data in the multi-mode data set into a pre-established modal independent automatic encoder network, outputting to obtain hidden layer characteristics, taking the hidden layer characteristics as vectors in a poincare sphere origin tangent space, projecting the vectors into a hyperbolic space through exponential mapping, and outputting to obtain hyperbolic embedded characteristics; Step 3, obtaining an initial clustering result based on the hyperbolic embedded features output in the step 2, extracting a clustering prototype, calculating the hyperbolic distance between the hyperbolic embedded features and the clustering prototype, obtaining soft allocation probability, calculating prediction entropy to quantify uncertainty, constructing a dynamic gating mechanism based on the prediction entropy, and generating asymmetric guide weights; step 4, calculating uncertainty perception weight based on prediction entropy of each mode, fusing multi-mode soft allocation probability, outputting to obtain global consensus probability and pseudo labels, screening reliable samples by adopting a masking mechanism based on confidence, calculating candidate clustering prototypes by utilizing the distance of the reliable samples in a poincare sphere origin tangent space, and updating the clustering prototypes by a momentum strategy; And step 5, performing iterative training on the modal independent automatic encoder network by using the obtained global consensus probability and the pseudo tag as a clustering prediction result and adopting reconstruction loss, asymmetric alignment loss and hyperbolic cross entropy clustering loss to obtain a trained modal independent automatic encoder network, and outputting a clustering result of obtaining multi-modal data.
  2. 2. The asymmetric multi-modal clustering method based on hyperbolic uncertainty modeling as set forth in claim 1, wherein the step 2 specifically includes the steps of: step 2.1, multimode data in the multimode data set Inputting the hidden layer characteristics into a pre-established modal independent encoder network, and outputting the hidden layer characteristics; Step 2.2, taking the hidden layer characteristics output in the step 2.1 as the Poincare sphere origin point cutting space Vector in (3) , wherein, Representing the dimensions of the hyperbolic space, Indicating the number of samples to be taken, Representing a modal sequence number; step 2.3, use of exponential mapping Cutting the origin of poincare sphere into space Vector in (3) Projected to hyperbolic space with negative curvature In, output hyperbolic embedded features : To ensure the numerical stability, a cut space decoding strategy is adopted, namely, the Poincare sphere origin is cut into space Vector in (3) Outputting reconstructed multi-modal data to a modal independent decoder network While hyperbolic embedded features Only for clustering.
  3. 3. The asymmetric multi-modal clustering method based on hyperbolic uncertainty modeling as claimed in claim 2, wherein the step 3 specifically includes the following steps: Step 3.1, based on hyperbolic embedding characteristics by a K-means algorithm Obtaining initial clustering result and extracting clustering prototype Computing hyperbolic embedded features And clustering prototypes Hyperbolic distance between : And soft allocation probability is obtained by utilizing Sinkhorn-Knopp algorithm Calculating a prediction entropy to quantify the uncertainty; step 3.2, soft allocation probability based on step 3.1 Calculating the prediction entropy of each sample in each mode : Wherein, the Representing the total number of categories in the multimodal dataset, A category number; Step 3.3, according to the obtained prediction entropy A dynamic gating mechanism is constructed.
  4. 4. The asymmetric multi-modal clustering method based on hyperbolic uncertainty modeling of claim 3, wherein in step 3.3, the dynamic gating mechanism includes asymmetric guide weights and asymmetric alignment loss functions, specifically including the steps of: Step 3.3.1, first calculating the asymmetric boot weights When the source mode Is lower than the target mode In the time-course of which the first and second contact surfaces, Conversely, the method can be used for controlling the temperature of the liquid crystal display device, , Represent the first Sample number 1 Prediction entropy of each modality; Step 3.3.2 Using asymmetric boot weights The alignment loss is weighted, and the weighted asymmetric alignment loss function is as follows: Wherein, the A total number of samples representing the multi-modal dataset, Representing the total number of modalities of the multi-modal dataset, Represent the first The first sample is at The hyperbolic embedded features in the individual modes, Represent the first Sample number 1 The hyperbolic embedded features in the individual modes, The distance of the hyperbolic curve is indicated, Representing gradient truncation operations in neural network training.
  5. 5. The asymmetric multi-modal clustering method based on hyperbolic uncertainty modeling as claimed in claim 3, wherein the step 4 specifically includes the following steps: Step 4.1, calculating uncertainty perception weights based on prediction entropy of each mode : Fusing soft allocation probabilities Outputting to obtain global consensus probability Pseudo tag : ; Step 4.2, based on the global consensus probability Selecting 50% samples with confidence from high to low in all samples in the multi-modal data set to participate in clustering loss calculation, wherein the masked hyperbolic cross entropy clustering loss is as follows: Wherein, the The binary mask of the first 50% of samples with confidence from high to low is 1 for the binary mask vector, and otherwise 0, Representation and representation The cluster prototypes that are closest in distance, Is the first Clustering prototypes; Step 4.3, calculating candidate clustering prototypes in Poincare sphere origin cut space by using confidence level to rank from high to low and using the first 50% samples through K-means algorithm Updating candidate cluster prototypes with momentum policies Corresponding hyperbolic space In a cut space clustering prototype The momentum strategy update formula is as follows: Wherein, the And Respectively the first Secondary and tertiary The first iteration The prototypes were clustered in space by a cut, Is a momentum update coefficient, representing the prototype gravity of each iteration update; step 4.4, clustering the cut space into prototypes through index mapping Reprojection back to hyperbolic space As a final clustering prototype 。
  6. 6. The asymmetric multi-modal clustering method based on hyperbolic uncertainty modeling of claim 5, wherein in step 5, iterative training is performed on the modal independent automatic encoder network by adopting reconstruction loss, asymmetric alignment loss function and hyperbolic cross entropy clustering loss, and the method specifically comprises the following steps: Step 5.1 by minimizing reconstruction losses The initial preheating of the modal independent automatic encoder network is completed, so that the cutting space characteristics, namely the Poincare sphere origin cutting space, are achieved Vector in (3) Preliminarily capturing the internal structure of the multi-mode data; step 5.2, entering a joint training stage, and jointly optimizing reconstruction loss Asymmetric alignment loss function at hyperbolic distance Hyperbolic cross entropy clustering penalty Performing iterative training on a pre-established modal independent automatic encoder network to obtain total loss: And the modal independent automatic encoder network obtains a trained modal independent automatic encoder network after iterative training, and finally, the clustering result of the obtained multi-modal data is output.
  7. 7. The asymmetric multi-modal clustering method based on hyperbolic uncertainty modeling of claim 1, wherein the asymmetric multi-modal clustering method is implemented by an asymmetric multi-modal clustering system comprising: the preprocessing module is used for acquiring the multi-modal data set, preprocessing the multi-modal data set and obtaining a processed multi-modal data set; The feature extraction and projection module is used for inputting the multi-mode data into a pre-established modal independent encoder network, outputting to obtain hidden layer features, taking the hidden layer features as vectors in a poincare sphere origin tangent space, projecting the tangent space features into a hyperbolic space through index mapping, and outputting to obtain hyperbolic embedded features; The alignment calculation module is used for calculating the hyperbolic distance between the hyperbolic embedded feature and the clustering prototype, obtaining soft allocation probability, calculating prediction entropy to quantify uncertainty, constructing a dynamic gating mechanism based on the prediction entropy, and generating an asymmetric pair Ji Quanchong; the prototype updating module calculates uncertainty perception weight based on prediction entropy of each mode, fuses multi-mode soft allocation probability, outputs to obtain global consensus probability and pseudo labels, screens reliable samples by adopting a masking mechanism based on confidence, calculates candidate clustering prototypes by utilizing the distance of the reliable samples in a poincare sphere origin tangent space, and updates the clustering prototypes by a momentum strategy; And the training and clustering module is used for carrying out iterative training on the modal independent automatic encoder network by using the obtained global consensus probability and the pseudo tag as a clustering prediction result and adopting a reconstruction loss, an asymmetric alignment loss function and a hyperbolic cross entropy clustering loss function to obtain a trained modal independent automatic encoder network and outputting a clustering result of obtaining multi-modal data.

Description

Asymmetric multi-modal clustering method based on hyperbolic uncertainty modeling Technical Field The application belongs to the technical field of deep learning and data mining, and particularly relates to an asymmetric multi-modal clustering method based on hyperbolic uncertainty modeling. Background With the rapid development of multi-sensor and multi-modal data acquisition technologies, multi-modal data has become a major form of existence of real world information. The multi-modal clustering aims at extracting unified semantic consensus from the heterogeneous and multi-source data under the unsupervised condition so as to realize automatic grouping and mode discovery of the data. However, the existing deep multi-modal clustering method mainly faces two fundamental bottlenecks: First, geometric mismatch problems. Real-world multimodal data typically contains a complex hierarchy and varying degrees of uncertainty, while most existing approaches map multimodal features into euclidean space with zero curvature. The volume of the euclidean space increases in a polynomial manner, so that not only is enough boundary capacity lacked to separate highly discriminant clusters, but also the certainty and the ambiguity of the data cannot be naturally and geometrically isolated, and the ambiguity and noise characteristics are easy to interfere with the whole cluster structure. Second, noise propagation from rigid alignment. Conventional multi-modal contrast learning or alignment strategies typically employ symmetric distance minimization. When one modality contains rich information and the other modality is severely contaminated or missing, this rigid symmetrical alignment forces the features of the high quality modality to drift toward the noise modality, resulting in a severe degradation of the representation learned by the model. Therefore, a new geometric mapping model and alignment mechanism are needed to adaptively perceive uncertainty in multi-modal data and cut off the negative impact of noise modes on high quality modes. Disclosure of Invention In order to solve the technical problems, the application provides an asymmetric multi-mode clustering method based on hyperbolic uncertainty modeling, which replaces the traditional Euclidean space by using Poincare sphere manifold, creatively provides an uncertainty-aware unidirectional asymmetric alignment mechanism, and realizes robust fusion and accurate clustering of complex multi-source data. In order to achieve the above purpose, the application is realized by the following technical scheme: the application relates to an asymmetric multi-mode clustering method based on hyperbolic uncertainty modeling, which specifically comprises the following steps: Step 1, acquiring a multi-mode data set, and preprocessing the multi-mode data set to obtain a processed multi-mode data set; Step 2, inputting the multi-mode data in the multi-mode data set into a pre-established modal independent automatic encoder network, outputting to obtain hidden layer characteristics, taking the hidden layer characteristics as vectors in a poincare sphere origin tangent space, projecting the vectors in the poincare sphere origin tangent space into a hyperbolic space through exponential mapping, and outputting to obtain hyperbolic embedded characteristics; Step 3, obtaining an initial clustering result based on the hyperbolic embedded features obtained in the step 2, extracting a clustering prototype, calculating the hyperbolic distance between the hyperbolic embedded features and the clustering prototype, obtaining soft allocation probability, calculating prediction entropy to quantify uncertainty, constructing a dynamic gating mechanism based on the prediction entropy, and generating asymmetric guide weights; step 4, calculating uncertainty perception weight based on prediction entropy of each mode, fusing multi-mode soft allocation probability, outputting to obtain global consensus probability and pseudo labels, screening reliable samples by adopting a masking mechanism based on confidence, calculating candidate clustering prototypes by utilizing the distance of the reliable samples in a poincare sphere origin tangent space, and updating the clustering prototypes by a momentum strategy; And step 5, performing iterative training on the modal independent automatic encoder network by using the obtained global consensus probability and the pseudo tag as a clustering prediction result and adopting reconstruction loss, asymmetric alignment loss and hyperbolic cross entropy clustering loss to obtain a trained modal independent automatic encoder network, and outputting a clustering result of obtaining multi-modal data. The application is further improved in that the step 2 specifically comprises the following steps: Step 2.1, multimode data Inputting the hidden layer characteristics into a pre-established modal independent encoder network, and outputting the hidden layer characteristics; Step 2.