CN-121983278-A - Multi-view subspace clustering cancer subtype identification method based on self-reinforcement learning
Abstract
The invention provides a multi-view subspace clustering cancer subtype identification method based on self-reinforcement learning, which comprises the steps of firstly extracting potential feature representation of each view from multi-group data through a potential feature learning module, then clustering similar samples by using the self-expression learning module, introducing initial graph information as a supervision signal to construct a self-expression coefficient matrix, secondly inputting the matrix into a view graph fusion unit, fusing the multi-view information to generate a consensus graph, further restraining noise interference in the multi-group data, introducing a self-reinforcement counter propagation unit, generating a confidence matrix by optimizing the self-expression coefficient, guiding fusion loss counter propagation, iteratively improving the quality of the self-expression coefficient and optimizing the consensus graph, and finally realizing cancer subtype identification by applying a spectral clustering algorithm based on the optimized consensus graph. According to the method, a self-reinforcement learning strategy is introduced, so that the interference of noise on the capture of the sample relation is effectively relieved, and the clustering performance is remarkably improved.
Inventors
- YANG DAN
- LIN SULI
- XIAN JIAJUN
- LIU MING
- LIU CHENG
- Xiao Qingyao
- CHEN ZIJUN
- SUN JIAWEN
- YE YIXUAN
- ZHOU CAN
- ZHANG WEI
- WANG CHENG
Assignees
- 电子科技大学长三角研究院(衢州)
Dates
- Publication Date
- 20260505
- Application Date
- 20251231
Claims (9)
- 1. A multi-view subspace clustering cancer subtype identification method based on self-reinforcement learning, which is characterized by comprising the following steps: Firstly, acquiring multiple groups of learning data, initializing network parameters of a pre-training encoder and a decoder based on reconstruction loss and view specificity, setting training related super parameters including training round number and learning rate, and setting model related super parameters including regularization parameters, loss balance parameters, neighbor screening parameters and confidence parameters; Inputting multiple groups of learning data into an encoder-decoder structure to learn potential characteristic representations of each view, wherein an encoder network projects original characteristics into a low-dimensional space, and the decoder network is used for generating reconstruction characteristics and training the encoder-decoder structure through a reconstruction loss function to guide an automatic encoder to learn discriminant representations of the data; Step three, constructing a block diagonal structure by minimizing the Frobenius norm of the self-expression coefficient based on the potential feature representation in the step two, learning the self-expression coefficient matrix of each view by a self-expression loss function, and simultaneously introducing a local graph guiding strategy to guide the self-expression learning so as to represent the relation of data points in the potential feature; The third step is to take the self-expression coefficient matrix as input, obtain a highly reliable self-expression coefficient matrix through a good neighbor optimizing subunit of the intra-module self-strengthening counter-propagation unit, construct a confidence matrix through a confidence level gating subunit, introduce the confidence matrix into an intra-module view fusion unit to guide the learning of the consensus graph, guide the potential feature learning and the self-expression learning through the counter-propagation of the view fusion loss, iteratively update the self-expression coefficient matrix and the confidence matrix, optimize the consensus graph based on the fusion loss, and finally apply a spectral clustering algorithm to the iteration optimized consensus graph to obtain a final cancer subtype identification result.
- 2. The method for identifying the cancer subtype based on multi-view subspace clustering of self-reinforcement learning according to claim 1, wherein the step two is realized by the following substeps: (2.1) building a view-specific encoder network Learning the raw data Potential features of (a) The specific process is as follows: ; Wherein the method comprises the steps of Training parameters representing the encoder network; (2.2) then, building a corresponding view-specific decoder network Generating reconstruction features The specific process is as follows: ; Wherein the method comprises the steps of Training parameters representing a decoder network; (2.3) to enhance the ability of the potential features to recover per-group specific view information, original features are defined Corresponding reconstruction features Reconstruction loss function between: ; Wherein the method comprises the steps of Reconstruction loss for the total number of histology views Quantizing the difference between the input data and the reconstructed data to promote discriminant characterization of the self-encoder network learning data, and finally realizing effective learning to obtain reliable potential characteristics 。
- 3. The method for identifying multiple view subspace clustering cancer subtypes based on self-reinforcement learning according to claim 1, wherein the step three is implemented by the following substeps: (3.1) under the assumption that subspaces are independent of each other, a k-th class histology potential feature matrix is output according to the second step in the pre-training Self-expression coefficient matrix By minimizing the self-expression loss function solution, the following constraints are satisfied: ; ; Wherein, the Representing the Buddha Luo Beini Us norm, For ensuring that each sample is not represented by itself, by minimizing the objective function described above, the potential features of the k-th class of group can be represented linearly by the potential features of the other samples while facilitating Forming a block diagonal structure to finally obtain a self-expression coefficient matrix ; (3.2) Self-expression matrix for learning reliability To accurately represent potential features The relationship between data points defines the custom loss function as follows: ; Wherein, the Regularization parameters for balancing self-expression coefficients against self-expression sparsity, self-expression loss functions Is the first item of (2) Promoting self-expression matrices Accurately representing data points in the underlying feature space concept, second term Enhancing sparsity of self-expression coefficients, causing self-expression matrices A block diagonal structure is presented; (3.3) introducing a local graph guidance strategy, using the initial graph information obtained from the original features As a supervisory signal, self-expression learning of a specific view is guided, and a loss function is defined as follows: ; Wherein, the Representing the Kullback-Leibler divergence, the divergence term being used to facilitate the self-expression coefficient matrix Is close to the initial graph domain information At the same time, to avoid model falling into local minimum solution, the initial graph information As a reliable starting point of the optimization process, the model is promoted to better explore the solution space and acquire accurate self-expression coefficients.
- 4. The method for identifying the cancer subtype based on multi-view subspace clustering of self-reinforcement learning according to claim 1, wherein the step four is realized by the following substeps: (4.1) the self-expression coefficient matrix outputted in the step three Inputting good neighbor optimization subunits subordinate to the self-strengthening back propagation unit, and constructing a highly reliable self-expression coefficient matrix by screening reliable association relations among samples and optimizing the self-expression coefficient matrix ; (4.2) Highly reliable self-expression coefficient matrix Inputting confidence gating subunits, and constructing a confidence matrix through a threshold screening mechanism of the subunits The confidence matrix The fusion loss participation model back propagation optimization for the view-in diagram is expressed as follows: ; Wherein the method comprises the steps of Is an indication function: ; for a preset confidence threshold for distinguishing highly reliable self-expression coefficient matrix The reliability of the elements in the matrix is further controlled The middle element takes the decision boundary of 0 or 1; (4.3) confidence matrix And self-expression matrix Input to a self-expression fusion subunit, and construct a consensus graph through multi-view coefficient weighted fusion And fusing the loss functions through the view graphs The potential feature extraction of the second step of back propagation redirection and the self-expression coefficient learning of the third step are defined as follows: ; Wherein the method comprises the steps of Is the product of elements; (4.4) continuing to execute the steps (4.1) - (4.3) to continuously update the consensus diagram Until the training wheel number reaches the preset maximum training wheel number Stopping iteration and outputting, and finally optimizing the consensus diagram Inputting the spectrum clustering module to obtain the cancer subtype identification result 。
- 5. The method for identifying multiple view subspace clustered cancer subtypes based on self-reinforcement learning according to claim 4, wherein the step (4.1) specifically includes: first, based on self-expression coefficient matrix The absolute value of the element in (a) is equal to that of each sample Before screening The most similar samples form a sample A kind of electronic device -Neighbor set, the specific procedure is expressed as: ; Wherein, the For the sample The set of neighbors is a set of the neighbor, For a preset number of samples of the neighbor set, For a sample of patients to be clustered, Representing a matrix of self-expression coefficients Middle (f) Line 1 Elements of columns, corresponding to samples From a sample Coefficient weights represented linearly; If it is Samples of -Neighbor set, simultaneously belonging to a sample A kind of electronic device -The number of samples of the neighbor set reaches a preset threshold If yes, then determine the sample And (3) with Mutually high-quality neighbors, then, determining the good neighbor set of each sample , For a predefined number of good neighbors, For the total number of samples, based on the good neighbor set pair initial self-expression coefficient matrix Performing weighted correction, and estimating to obtain highly reliable self-expression coefficient matrix The specific process is expressed as follows: ; Wherein, the Representing highly reliable self-expression coefficient matrices Middle (f) Line 1 Column elements.
- 6. The method for multi-view subspace clustering cancer subtype identification based on self-reinforcement learning according to claim 4, wherein the final optimized consensus chart is obtained in the step (4.4) The input spectrum clustering module obtains the cancer subtype identification result, which is specifically: Consensus diagram obtained by fusion of multiple views Sub-space division, i.e. by extracting consensus diagrams And (3) carrying out feature decomposition on the Laplace matrix eigenvalue and eigenvector to construct a low-dimensional embedded space, and dividing sample types of the low-dimensional embedded matrix by adopting a K-means clustering algorithm to finally obtain a cancer subtype identification result.
- 7. A multi-view subspace clustering cancer subtype identification system based on self-reinforcement learning, which is characterized in that the system is realized based on the multi-view subspace clustering cancer subtype identification method based on self-reinforcement learning as set forth in any one of claims 1 to 6, and specifically comprises the following modules: The initialization parameter module acquires multiple groups of chemical data, initializes the network parameters of a pre-training encoder and a decoder based on reconstruction loss and view specificity, sets training related super parameters including training round number and learning rate, and sets model related super parameters including regularization parameters, loss balance parameters, neighbor screening parameters and confidence parameters; The system comprises a potential feature learning module, a model analysis module and a model analysis module, wherein the potential feature learning module inputs multiple groups of learning data into an encoder-decoder structure and learns potential feature representation of each view, wherein an encoder network projects original features into a low-dimensional space, the decoder network is used for generating reconstruction features and training the encoder-decoder structure through a reconstruction loss function to guide an automatic encoder to learn discriminant representation of the data; The self-expression learning module is used for constructing a block diagonal structure by minimizing the Frobenius norm of the self-expression coefficient based on the potential feature representation of the reconstructed feature learning module, learning the self-expression coefficient matrix of each view by a self-expression loss function, and simultaneously guiding the self-expression learning by introducing a local graph guiding strategy so as to represent the relation of data points in the potential feature; The self-strengthening view graph fusion module takes a self-expression coefficient matrix of a self-expression learning unit as input, obtains a highly reliable self-expression coefficient matrix through a good neighbor optimization subunit of a self-strengthening counter propagation unit in the module, constructs a confidence matrix through a confidence level gating subunit, introduces the confidence matrix into the intra-module view graph fusion unit to guide the learning of the consensus graph, guides the potential feature learning and the self-expression learning through the counter propagation of view graph fusion loss, iteratively updates the self-expression coefficient matrix and the confidence matrix, optimizes the consensus graph based on the fusion loss, and finally applies a spectral clustering algorithm to the iteration optimized consensus graph to obtain a final cancer subtype identification result.
- 8. An electronic device, comprising: One or more processors; A memory for storing one or more programs; The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the self-reinforcement learning multi-view subspace clustering cancer subtype identification method of any one of claims 1-7.
- 9. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method for multi-view subspace clustered cancer subtype identification of self-strong learning of any one of claims 1-7.
Description
Multi-view subspace clustering cancer subtype identification method based on self-reinforcement learning Technical Field The invention relates to the field of bioinformatics, in particular to a multi-view subspace clustering cancer subtype identification method based on self-reinforcement learning. Background Nowadays, cancer is a serious disease seriously threatening human health, cancer subtype identification is used as a core link of accurate medical treatment, and related theories and methods can provide key basis for early diagnosis, treatment scheme selection and prognosis evaluation of cancer, and have important significance for improving the survival quality and cure rate of cancer patients. Methods for cancer subtype identification have undergone a continual evolution from traditional multiple-learning methods to modern deep learning techniques. Along with the rapid development of high-throughput experimental technology, the integration of multi-type histology data such as genomics, transcriptomics, proteomics and the like for the identification of cancer subtypes has become a research trend in the field, and a rich data basis and research possibility are provided for the accurate identification of cancer subtypes from heterogeneous data sets. Recently, with the assistance of deep learning techniques, various cancer subtype typing methods emerge successively. The method comprises the steps of collecting multiple groups of data, extracting valuable characteristic representations from the multiple groups of data by means of a deep neural network, improving accuracy of subtype identification, combining a consensus Gaussian mixture model with an antagonism generation network by researchers, providing a novel comprehensive cancer subtype method based on deep learning, effectively learning subspace structures in the single group of data and the whole multiple groups of data by using the deep neural network by Deep Subspace Mutual Learning (DSML), integrating multiple groups of data by using a decoupling comparison clustering method by using a cancer subtype decoupling comparison clustering method (SDCC), and merging ideas of comparison learning into the deep neural network so as to learn the representation favorable for clustering to carry out cancer subtype identification. In summary, existing integrated cancer subtype analysis methods exhibit promising results in the identification of cancer subtypes. However, due to the high noise levels in the omics data, effectively integrating and clustering the sets of multi-omics data remains challenging, which limits the capture of accurate relationships between samples. Meanwhile, the methods of the deep neural network are still unstable in terms of capturing sample relationships and obtaining accurate clustering results. The deep subspace clustering model of the method is easy to fall into a poor local minimum solution, and cannot meet diversified data analysis requirements. Disclosure of Invention The invention aims at overcoming the defects of the prior art and providing a depth multi-view subspace clustering cancer subtype identification method based on self-reinforcement learning The aim of the invention is realized by the following technical scheme: a multi-view subspace clustering cancer subtype identification method based on self-reinforcement learning, the identification method comprising: Firstly, acquiring multiple groups of learning data, initializing network parameters of a pre-training encoder and a decoder based on reconstruction loss and view specificity, setting training related super parameters including training round number and learning rate, and setting model related super parameters including regularization parameters, loss balance parameters, neighbor screening parameters and confidence parameters; Inputting multiple groups of learning data into an encoder-decoder structure to learn potential characteristic representations of each view, wherein an encoder network projects original characteristics into a low-dimensional space, and the decoder network is used for generating reconstruction characteristics and training the encoder-decoder structure through a reconstruction loss function to guide an automatic encoder to learn discriminant representations of the data; Step three, constructing a block diagonal structure by minimizing the Frobenius norm of the self-expression coefficient based on the potential feature representation in the step two, learning the self-expression coefficient matrix of each view by a self-expression loss function, and simultaneously introducing a local graph guiding strategy to guide the self-expression learning so as to represent the relation of data points in the potential feature; The third step is to take the self-expression coefficient matrix as input, obtain a highly reliable self-expression coefficient matrix through a good neighbor optimizing subunit of the intra-module self-strengthening counter-propagation