Search

US-12626824-B2 - Method and system of molecular typing and subtying classifier for immune-related diseases

US12626824B2US 12626824 B2US12626824 B2US 12626824B2US-12626824-B2

Abstract

A method and a system of molecular typing and subtyping classifier for immune-related diseases are provided. The method includes: conducting molecular typing via a clustering algorithm in a training set to obtain a plurality of subtypes stably appearing in the training set and a marker gene for each subtype; conducting enrichment analysis on marker genes for the plurality of subtypes, conducting immune cell infiltration evaluation on the plurality of subtypes, and dividing the plurality of subtypes into a plurality of subtype classes according to results of the analysis and the evaluation; comparing treatment response rates of different subtype classes through a comparison set to determine a subtype class to be identified; constructing a support vector machine model with feature genes screened and an optimal parameter combination; and determining whether immune-related disease data to be classified is the subtype class to be identified.

Inventors

  • Jie Liu
  • Feifei LUO
  • Shaocong MO
  • Huan Song

Assignees

  • Huashan Hospital, Fudan University

Dates

Publication Date
20260512
Application Date
20220812
Priority Date
20211029

Claims (5)

  1. 1 . A method of molecular typing and subtyping classifications for immune-related diseases, comprising: a data acquisition step: acquiring an immune-related disease microarray dataset and dividing the immune-related disease microarray dataset into a training set, a validation set, and a comparison set; a molecular typing step: conducting the molecular typing via a clustering algorithm in the training set to obtain a plurality of subtypes stably appearing in the training set and a marker gene for each of the plurality of subtypes, and verifying a stability of molecular typing results through the validation set; an analysis and evaluation step: conducting an enrichment analysis on the marker gene for each of the plurality of subtypes, conducting an immune cell infiltration evaluation on the plurality of subtypes, and dividing the plurality of subtypes into a plurality of subtype classes according to results of the enrichment analysis and the immune cell infiltration evaluation; a comparison step: comparing treatment response rates of different subtype classes through the comparison set to determine a subtype class to be identified; a classifier construction step: constructing a support vector machine model with feature genes selected from the marker genes and an optimal parameter combination for a support vector machine, wherein the support vector machine model is configured to identify an immune homeostasis like (IHL) class within an immune-related disease data; a classification step: inputting the immune-related disease data to be classified into the support vector machine model and determining that the immune-related disease data to be classified is of the IHL class; and a treatment step: treating an immune-related disease identified as being of the IHL class based on the classification step with a biological agent therapy of ustekinumab; wherein the immune-related disease microarray dataset is acquired from the gene expression omnibus (GEO) database, and the immune-related disease microarray dataset comprises an ulcerative colitis (UC) microarray dataset or a Crohn's disease (CD) microarray dataset; wherein the immune homeostasis like (IHL) class is a subtype class determined based on immune infiltration evaluation results, in which the immune cell infiltration evaluation comprises single-sample gene set enrichment analysis (ssGSEA), and ssGSEA-based immune cell enrichment scores of the IHL class are lower than corresponding ssGSEA-based immune cell enrichment scores of each other subtype class identified in the molecular typing step.
  2. 2 . The method of molecular typing and subtyping classifications for immune-related diseases according to claim 1 , wherein the clustering algorithm is a CrossICC algorithm that is an interactive consensus clustering framework for cross-platform data analysis, the enrichment analysis is performed using a clusterProfiler software package, and the immune cell infiltration evaluation is performed using cell-type identification by estimating relative subsets of ribonucleic acid transcripts (CIBERSORT) and single-sample gene set enrichment analysis (ssGSEA); and the plurality of subtype classes comprise an innate immune activation (IIA) class, a whole immune activation (WIA) class, and an immune homeostasis like (IHL) class, or an IHL class, an IIA class, and an intermediate class.
  3. 3 . The method of molecular typing and subtyping classifications for immune-related diseases according to claim 1 , wherein a method for selecting the feature genes comprises: setting a maximum number of runs and a number of trees for the marker gene of each of the plurality of subtypes by a random forest method, and inputting marker genes left after screening into Lasso regression of 10-fold cross-validation to leave marker genes with non-zero parameters as the feature genes.
  4. 4 . The method of molecular typing and subtyping classifications for immune-related diseases according to claim 1 , further comprising: conducting a prediction and an evaluation with a constructed support vector machine model in the training set and the validation set, and evaluating performance of classifying by a confusion matrix, wherein: an accuracy=samples correctly classified/total samples; a sensitivity=a number of positive samples correctly classified/a total number of positive samples; a specificity=a number of negative samples correctly classified/a total number of negative samples; a false positive rate=negative samples determined to be positive/the total number of negative samples; and a false negative rate=positive samples determined to be negative/the total number of positive samples.
  5. 5 . The method of molecular typing and subtyping classifications for immune-related diseases according to claim 1 , wherein a gamma value and a cost value are selected based on the feature genes to obtain the optimal parameter combination.

Description

CROSS REFERENCE TO THE RELATED APPLICATIONS This application is the national phase entry of International Application No. PCT/CN2022/112157, filed on Aug. 12, 2022, which is based upon and claims priority to Chinese Patent Application No. 202111276527.3, filed on Oct. 29, 2021, the entire contents of which are incorporated herein by reference. TECHNICAL FIELD The present disclosure relates to the technical field of precision medicine, and specifically to a method and a system of molecular typing and subtyping classifier for immune-related diseases based on machine learning of artificial intelligence. BACKGROUND Immune-related diseases are caused by an imbalance in the immune regulation of an organism. There are many treatments for immune-related diseases, and in particular, biological agents such as monoclonal antibodies are increasingly used. However, the prognoses of different patients with a same immune-related disease are not the same clinically, indicating that immune states of the different patients with the same disease have extensive heterogeneities and can hardly be distinguished according to clinical manifestations. Therefore, it is urgent to accurately type the immunocharacterization of a patient with an immune-related disease at a molecular level to facilitate the clinical prognosis and treatment. For example, ulcerative colitis (UC), a typical immune-related disease, is characterized by a chronic inflammation from a rectum to a proximal colon, and brings a huge burden to the global medical care. Drugs for treating UC include 5-aminosalicylic acid (5-ASA) drugs, glucocorticoids, azathioprine, anti-tumor necrosis factor (TNF) drugs, anti-integrins, Janus kinase inhibitors, or the like. Currently, in clinical practice, the 5-ASA treatment is mainly adopted for mild patients and the glucocorticoids and anti-TNF drugs are often adopted for the remission treatment of moderate to severe patients, but the prognoses of patients are limited by drug resistance, adverse drug reactions, and high drug prices. From the perspective of treatment mechanisms, the disruption of intestinal homeostasis, the dysfunction of an intestinal barrier, and an inflammatory response are pathological characteristics of UC patients. A balance between negative regulators for inflammation and pro-inflammatory factors in an intestinal epithelium of a UC patient is disrupted, and the activation of neutrophils and lymphocytes and various cytokines such as interleukin-9 (IL-9), interleukin-13 (IL-13), interleukin-23 (IL-23), and interleukin-36 (IL-36) are involved in an intestinal inflammation of a UC patient. It can be concluded that the disruption of intestinal immune homeostasis is the essence of UC onset, indicating that poor medication effects for some UC patients are related to the heterogeneities of local immune infiltration of lesions. Due to the lack of molecular typing for UC, the clinical medication for UC is mainly based on the severity of UC pathology. The patent document CN110993099A discloses a method and a system for evaluating the severity of UC based on deep learning, where a UC severity evaluation model is used to output score prediction results of vascular typing, spontaneous bleeding, and erosive ulcer characteristics under a Mayo endoscope, and then the score prediction results of vascular typing, spontaneous bleeding, and erosive ulcer characteristics are accumulated to obtain an activity index score of UC under the endoscope. In the prior art, according to a severity evaluation result, a single therapy solution or a combined therapy solution is adopted. For example, glucocorticoids and immunobiologic agents are often used for the remission treatment of moderate to severe patients. The glucocorticoids have broad-spectrum effects, but have large side effects. The immunobiologic agents often target specific immune targets, but have limited effects and high prices, resulting in a poor accuracy and a heavy economic burden. In addition to UC, many immune-related diseases, such as Crohn's disease (CD), systemic lupus erythematosus, and rheumatoid arthritis, face the above clinical problems. The treatments of these diseases have one thing in common, which involves the use of drugs to dampen the autoimmune response directed against the body itself. The most common drugs are glucocorticoid such as prednisone, hydrocortisone, and dexamethasone. Immunosuppressive drugs have a major common adverse effect, that is, the drugs will affect the anti-infection and anti-tumor immune functions of a body to varying degrees. Therefore, the molecular typing for immune-related diseases is of great significance for understanding the heterogeneities of diseases to allow personalized treatments and avoid over-treatments. However, according to the existing reports, there are few studies on the accurate and high-quality molecular typing for immune-related diseases. In the present disclosure, UC and CD are taken as two examples to illustrat