CN-122020220-A - Multi-view data clustering method based on neighborhood correction and VMF hybrid model

CN122020220ACN 122020220 ACN122020220 ACN 122020220ACN-122020220-A

Abstract

The invention discloses a multi-view data clustering method based on a neighborhood correction and VMF hybrid model. The method comprises the steps of firstly collecting a data set containing multi-view samples, secondly extracting multi-view data consistency correction by adopting a cross-view consistency loss function, then building a neighborhood-driven feature correction module, completing correction of consistency features, carrying out fusion processing on the corrected view features to generate eye views, then adopting von Mises-Fisher (VMF) hybrid models to respectively conduct cluster distribution modeling on target views and feature views, finally quantifying similarity of VMF hybrid distribution corresponding to each view through a probability kernel function, designing cross-view distribution alignment constraint, aligning cluster distribution of the same sample in different views, and finally outputting high-precision cluster labels. The method provided by the invention has remarkable effectiveness through verification of a plurality of public standard data sets, and can generate a cluster structure with strong intra-class compactness and excellent inter-class separation degree. Compared with the existing multi-view clustering technical scheme, the method greatly improves the accuracy and reliability of clustering results.

Inventors

CUI JINRONG
Yao Liulong
XU JINHUI
Rao Chunyang

Assignees

华南农业大学

Dates

Publication Date: 20260512
Application Date: 20251230

Claims (5)

1. A multi-view data clustering method based on a neighborhood correction and VMF mixed model is characterized by comprising the following steps: s1, extracting potential characteristic representations of each view, and respectively inputting multi-view data into a symmetrical structure self-encoder; s2, designing a cross-view consistency loss function to obtain a preliminary consistency characteristic representation of each view; s3, constructing a neighborhood driven feature correction module, and guiding the primary consistency features of each view to gather in a feature high-density region through the feature correction module to obtain corrected features of each view; S4, modeling multiple views by adopting a VMF mixed model, fusing the corrected view features to form a new fused view, inputting the fused feature representation and the single view correction feature representation obtained in the step S3 into the VMF mixed distribution model respectively, and completing cluster distribution modeling by virtue of the VMF mixed distribution model to obtain primary cluster distribution; S5, introducing a probability kernel function to measure VMF mixed distribution similarity, constructing cross-view distribution consistency constraint, and optimizing cluster distribution of the fusion view; and S6, carrying out iterative optimization on cross-view consistency loss, and finally generating a high-precision clustering label by combining modeling optimization of the VMF mixed distribution model and condition reinforcement of cross-view distribution consistency constraint.
2. The method for multi-view data clustering based on the neighborhood correction and VMF mixture model of claim 1, wherein the step of constructing a cross-view consistency loss function comprises: The potential features of each view obtained in S21) are mapped to a high-dimensional space, expressed as follows: ; Wherein the method comprises the steps of An ith sample representing a v-th view, The trainable parameters of the advanced encoder representing view v, MLP represents a multi-layer perceptron network. The self-encoder in S1 is shown. S22) cross-view consistency loss function expression is as follows: ; Where N represents the number of samples and, Representing a fixed bit of temperature coefficient of 0.5 for adjusting the gradient magnitude, sim () represents the cosine similarity.
3. The method for clustering multi-view data based on the neighborhood correction and VMF mixture model according to claim 1, wherein the step S3 obtains corrected features, expressed as: ; Wherein the method comprises the steps of Representing the i-th sample after correction, An uncorrected i-th sample is taken, The fixing time is set to be 0.5, The number of clusters, z represents the characteristic before correction, A neighborhood sample set representing sample i, expressed as: 。
4. the multi-view data clustering method based on the neighborhood correction and the VMF mixed model according to claim 1, wherein the step S4 of completing cluster distribution modeling through the VMF mixed distribution model to obtain the preliminary cluster distribution comprises: s41) modeling with VMF mixed distribution in the form: ; Wherein mu is a cluster center direction vector, k is a concentration parameter, m represents a format with the number of density centers equal to that of cluster types Is a normalization constant, in which Representing a first type of modified Bessel function, d representing a dimension; s42) based on VMF hybrid model, the ith sample The probability of belonging to the mth cluster in the v-th view can be expressed as: ; Wherein the method comprises the steps of The own VMF component blend coefficients are estimated independently for each view. A mean direction vector representing the mth cluster of the v-th view, Concentration parameters representing the mth cluster of the v-th view. U and k are model parameters in the VMF mixed model; S43) optimizing the joint maximum likelihood estimation function of the VMF model as: 。
5. The multi-view data clustering method based on the neighborhood correction and VMF mixture model according to claim 1, wherein the step S5 constructs a cross-view distribution consistency constraint expressed as: ; Wherein the method comprises the steps of The method is used for measuring the distribution similarity among the views and further realizing the structure alignment of the cross-view. For the difference in distribution structure of sample i between the fused view and the v-th view, a probability kernel similarity function of the form: ; wherein D represents a feature dimension; a mean vector component representing dimension d; Represent the first Concentration k) in dimension; Feature representation of the ith sample after fusion of all views; The modified feature representation of the ith sample in the v-th view.

Description

Multi-view data clustering method based on neighborhood correction and VMF hybrid model Technical Field The invention relates to the field of data mining and machine learning, in particular to a multi-view data clustering method based on a neighborhood correction and VMF (virtual local framework) hybrid model, which aims to realize effective cluster analysis of high-dimensional, multi-source and multi-mode data by fusing multi-view features and combining spatial neighborhood information and improve the accuracy and robustness of clustering. Background The multi-view clustering is used as a core technology in the field of unsupervised learning, and aims to fully mine complementary and consistent information in multiple data source views on the premise of not depending on label information so as to realize accurate grouping of samples. Along with the rapid development of data acquisition and storage technology, data in a real scene often has multi-view characteristics, such as image data including views of colors, textures and the like, social network data covers dimensions of social relations, behavior habits and the like, and the multi-view data has important application values in the fields of data mining, image segmentation, recommendation systems and the like. In recent years, the deep learning technology is widely applied to multi-view clustering by virtue of strong characteristic characterization capability, and the clustering effect is remarkably improved by jointly training a multi-view encoder and considering shared information and difference characteristics among views. The existing depth multi-view clustering method still has obvious limitations that most methods focus on characterization and fusion of feature layers excessively, neglect distribution characteristics inside views, and are difficult to fully mine potential structural information of each view, meanwhile, multi-view data often have the problem of inconsistent feature distribution, unreasonable feature distribution in a single view or overlarge distribution deviation among views, the degradation of a clustering structure can be caused, problems of intra-class sample dispersion, inter-class boundary blurring and the like occur, and clustering performance is seriously affected. Disclosure of Invention The invention provides a multi-view data clustering method based on a neighborhood correction and VMF hybrid model. The method comprises the steps of obtaining robust features of each view through a design view consistency feature extraction module, optimizing local structural representation of the features through a neighborhood non-parameter feature correction module, improving intra-class compactness, simultaneously introducing a clustering model probability kernel function based on VMF distribution, measuring distribution similarity design among different views, and designing cross-view distribution alignment constraint to achieve effective alignment of a multi-view distribution structure so as to improve overall performance of clustering. Verification on multiple sets of public data sets shows that the method can generate compact and well-separated clustering results. 1. A multi-view data clustering method based on a neighborhood correction and VMF mixed model is characterized by comprising the following steps: s1, collecting a multi-view public data set, and respectively inputting data of each view into a symmetrical structure self-encoder; S2, inputting the potential features obtained in the step S1 into a feature encoder, and outputting high-dimensional feature representations of all views; s3, constructing a neighborhood driven feature correction module, and guiding the primary consistency features of each view to gather in a feature high-density region through the feature correction module to obtain corrected features of each view; S4, carrying out fusion processing on the corrected features of each view obtained in the step S3 to obtain unified fusion feature representation, respectively inputting the fusion feature representation and the corrected feature representation of each view in the step S3 into a VMF mixed distribution model, and completing cluster distribution modeling through the VMF mixed distribution model to obtain primary cluster distribution; S5, measuring VMF mixed distribution similarity through a probability kernel function, constructing cross-view distribution consistency constraint, and aligning VMF mixed distribution of the same sample in different views. S6, finally outputting a high-precision clustering label through iterative optimization of cross-view consistency loss, a VMF mixed distribution model and cross-view distribution consistency constraint conditions; In one embodiment, the expression of a specific implementation encoder in which S1 further includes an edge detection module may be expressed as: The expression of the decoder can be expressed as: Wherein the method comprises the steps of As an or