CN-122024819-A - Method for screening atopic dermatitis key autophagy genes based on transcriptome and machine learning
Abstract
The invention discloses a key autophagy gene screening method for atopic dermatitis based on transcriptome and machine learning, which comprises the steps of collecting AD transcriptome and single cell data, screening differential expression genes after pretreatment, obtaining candidate genes through WGCNA analysis and third-order screening of autophagy gene intersection and differential gene filtration, adopting 8 machine learning algorithms, screening a preferred model through residual error and AUC double standard, obtaining key autophagy genes through core gene intersection, and carrying out four-dimensional verification through diagnosis performance, immune infiltration, single cell positioning and causality, and predicting and verifying targeted drugs. According to the invention, RELB, TP53INP2, TNFSF10 and PRKCB key genes are screened, the average AUC of a validation set reaches 0.872, 4 high-affinity drugs are matched, the problems of low screening accuracy, no validation system and poor conversion in the prior art are solved, and technical support is provided for AD accurate diagnosis and treatment.
Inventors
- CHEN TAO
- YANG XING
Assignees
- 常德市第一人民医院
Dates
- Publication Date
- 20260512
- Application Date
- 20260127
Claims (10)
- 1. A key autophagy gene screening method for atopic dermatitis based on transcriptome and machine learning, comprising the steps of: Step S1, data collection and preprocessing, namely acquiring an AD related transcriptome dataset comprising a training set GSE193309, a verification set GSE121212 and a single-cell RNA sequencing dataset GSE269981 from a GEO database, acquiring 371 autophagy related genes from a HADb database, screening the transcriptome dataset for differential expression genes DEGs, setting a screening threshold value as |log 2 FC| >0.5 and P <0.05, and carrying out quality control and normalization treatment on the single-cell data to screen high variant genes; S2, screening AD autophagy candidate genes by a third-order intersection, namely screening central genes of AD related modules through weighted gene co-expression network analysis WGCNA, defining the central genes as genes meeting module membership MM >0.8, gene significance GS >0.1 and weight >0.1 to obtain a central gene set, acquiring the intersection of the central genes and the autophagy genes to obtain autophagy related central genes, acquiring the intersection of the autophagy related central genes and DEGs to obtain AD autophagy candidate gene sets; S3, multi-algorithm double-standard screening key autophagy genes, namely adopting 8 machine learning algorithms to construct a model, screening a preferred model through residual error <0.5 and AUC >0.9, extracting intersection of core genes of the preferred model, and obtaining AD key autophagy genes; step S4, four-dimensional verification, namely confirming the reliability of the key autophagy gene by adopting diagnosis performance verification, immune infiltration association verification, single cell positioning verification and causal relationship verification; And S5, predicting a key gene targeting drug, namely screening candidate small molecule drugs related to the key autophagy genes through a DSigDB database, acquiring a three-dimensional structure of the candidate drugs from a PubChem database, performing molecular docking simulation by using AutoDock Vina software, and analyzing an interaction mode of the drug and the target protein by taking the binding free energy < -4kcal/mol as an effective binding standard.
- 2. The method for screening atopic dermatitis key autophagy gene based on transcriptome and machine learning according to claim 1, wherein in step S1, the training set GSE193309 comprises 111 AD samples and 112 normal control samples, the verification set GSE121212 comprises 21 AD samples and 38 normal control samples, the single cell RNA sequencing data set GSE269981 comprises 5 AD patients and 4 normal control samples, and the standard of screening high variant genes is that the average expression amount is > 0.1 and the expression cell number is not less than 10.
- 3. The method for screening atopic dermatitis key autophagy genes based on transcriptome and machine learning according to claim 1, wherein the WGCNA parameters in the step S2 are soft threshold power=6, minimum module size=30, module cleavage height=0.25, wherein the AD autophagy candidate gene set in the step S2 contains 15 genes; Preferably, the 8 machine learning algorithms in the step S3 include decision trees, gradient elevators, generalized linear models, K neighbors, LASSO regression, neural networks, random forests, and support vector machines, and the preferred models are LASSO regression, gradient elevators, generalized linear models, and support vector machines.
- 4. The method for screening atopic dermatitis key autophagy gene based on transcriptome and machine learning according to claim 1, wherein said AD key autophagy gene in step S3 is rel, TP53INP2, TNFSF10, PRKCB.
- 5. The method for screening key autophagy genes for atopic dermatitis based on transcriptome and machine learning according to claim 1, wherein the diagnosis performance verification in step S4 is specifically that the diagnosis efficacy of the key autophagy genes is analyzed by ROC curve in training set GSE193309 and verification set GSE121212, and AUC >0.7 is used as effective diagnosis marker standard.
- 6. The method of claim 1, wherein auc=0.956 for the rel in the diagnostic performance verification set GSE121212, auc=0.763 for TP53INP2, auc=0.909 for TNFSF10, and auc=0.858 for PRKCB in step S4.
- 7. The method for screening key autophagy genes for atopic dermatitis based on transcriptome and machine learning according to claim 1, wherein the immune infiltration correlation verification in step S4 is specifically to quantify infiltration ratio of 22 immune cells in AD sample and normal sample by CIBERSORT algorithm, and to investigate the correlation of key autophagy genes with differential immune cells by Spearman correlation analysis (P < 0.05).
- 8. The method according to claim 1, wherein RELB, TP53INP2, TNFSF10 and activated NK cells, M0 macrophages, resting mast cells are positively correlated (P < 0.05) and PRKCB and said cells are negatively correlated (P < 0.05) in the immunoinfiltration correlation test in step S4.
- 9. The method for screening the key autophagy genes of atopic dermatitis based on transcriptomics and machine learning according to claim 1, wherein the single cell localization verification in the step S4 is specifically that the GSE269981 dataset is subjected to UMAP dimensionality reduction and Louvain clustering analysis, the cell types are annotated by known cell type marker genes, the expression distribution of the key autophagy genes in each cell type is analyzed, RELB and TNFSF10 are mainly and highly expressed in keratinocytes in the single cell localization verification in the step S4, TP53INP2 is mainly and highly expressed in smooth muscle cells, and the cell types comprise 9 types of keratinocytes, fibroblasts, T cells, endothelial cells, macrophages, smooth muscle cells, NK cells, dendritic Cells (DC) and epithelial cells.
- 10. The method according to claim 1, wherein the causal relationship verification in step S4 is specifically that the causal relationship is evaluated by using a two-sample Mendelian Randomization (MR) analysis with the critical autophagy gene as an exposure variable and AD as an outcome variable and using an Inverse Variance Weighting (IVW) method, wherein the stability of the result is verified by MR-Egger regression, cochran 'S Q test and leave-one method, wherein PRKCB is negatively correlated with AD in the causal relationship verification in step S4, wherein OR=0.84 (95% CI: 0.71-0.99), wherein P=0.037, wherein the MR-Egger regression, cochran' S Q test result shows no significant heterogeneity and no horizontal pleiotropic property, and wherein the drugs binding free energy < -4kcal/mol in step S5 include PYRVINIUM (RELB), NALOXONE HYDROCHLORIDE (TP 53INP 2), DIGITOXIN (TNFSF 10), BISINDOLYLMALEIMIDE I (PRKCB).
Description
Method for screening atopic dermatitis key autophagy genes based on transcriptome and machine learning Technical Field The invention relates to the technical field of bioinformatics and dermatological diagnosis and treatment, in particular to a key autophagy gene screening method for atopic dermatitis based on transcriptome and machine learning. Background Atopic Dermatitis (AD) is a common chronic inflammatory skin disease, and its characteristic symptoms include eczema skin damage, chronic itching and impaired skin barrier, severely affecting the quality of life of the patient. The prior study shows that the abnormal interaction of epithelial cells and dermal immune cells is a core trigger factor of AD progression, while autophagy dysfunction plays a key role in the immune imbalance and epidermal barrier defect of AD, namely, autophagy defect can lead to CEBPB protein accumulation and inhibit JAK1-STAT6 pathway activation, and autophagy activation can relieve IL-4/IL-13 mediated skin barrier damage, so that screening of AD related key autophagy genes has important significance for disease mechanism research and clinical transformation. Currently, AD-related key autophagy gene screening relies mainly on differential expression analysis of a single transcriptome dataset, or simple validation with only a few bioinformatics tools, with significant limitations. On one hand, the traditional method depends on a single data set, has limited sample size and is easily influenced by batch effect, so that the screened genes have poor stability and high false positive rate, on the other hand, the prior art lacks systematic analysis on gene co-expression networks, cell specific expression and causal relationship, the direct association between genes and AD can not be ensured only through bioinformatics prediction, and a complete technical chain of 'gene screening-functional verification-drug matching' is not formed, so that the method is difficult to be directly applied to clinical transformation. Along with the development of high-throughput sequencing technology, multiple sets of chemical data such as transcriptome sequencing, single-cell RNA sequencing and the like are increasingly abundant, but how to efficiently integrate the heterogeneous data and mine core functional genes becomes a technical bottleneck. The application of the existing machine learning in gene screening is limited to 1-2 algorithms, and the lack of multi-algorithm cross-validation leads to insufficient result reliability, and meanwhile, most methods are not combined with validation means such as immune infiltration analysis, mendelian randomization and the like, cannot clearly determine the effect and causal relationship of genes in an AD immune microenvironment, and further limit the clinical application value of screening results. Therefore, the prior art can not solve the problems of the screening precision, the verification systematicness and the transformation consistency of the AD key autophagy genes. Therefore, there is a need in the art for a key autophagy gene screening method for atopic dermatitis based on multiple sets of chemical integration, multiple algorithm validation, and multi-dimensional functional validation. Disclosure of Invention The invention aims to overcome the defects of single screening dimension, incomplete verification system and clinical transformation dislocation in the prior art, and provides a three-order screening-four-step verification-drug matching AD key autophagy gene screening method, which realizes high-precision screening of key genes, multi-dimensional function confirmation and clinical transformation link closed loop. In order to achieve the aim, the invention adopts the technical scheme that the method for screening the atopic dermatitis key autophagy gene based on transcriptome and machine learning comprises the following steps: Step S1, data collection and preprocessing, namely acquiring an AD related transcriptome dataset comprising a training set GSE193309, a verification set GSE121212 and a single-cell RNA sequencing dataset GSE269981 from a GEO database, acquiring 371 autophagy related genes from a HADb database, screening the transcriptome dataset for differential expression genes DEGs, setting a screening threshold value as |log 2 FC| >0.5 and P <0.05, and carrying out quality control and normalization treatment on the single-cell data to screen high variant genes; S2, screening AD autophagy candidate genes by a third-order intersection, namely screening central genes of AD related modules through weighted gene co-expression network analysis WGCNA, defining the central genes as genes meeting module membership MM >0.8, gene significance GS >0.1 and weight >0.1 to obtain a central gene set, acquiring an intersection of the central genes and autophagy genes to obtain autophagy related central genes, acquiring an intersection of the autophagy related central genes and DEGs to obtain AD autophagy c