CN-121983132-A - Screening method of key transcription factors of metabolic related fatty liver disease

CN121983132ACN 121983132 ACN121983132 ACN 121983132ACN-121983132-A

Abstract

The invention discloses a screening method of key transcription factors of metabolic related fatty liver disease, which comprises the steps of S1, obtaining transcriptome sequencing data of liver tissues of MAFLD patients and normal control liver tissues, carrying out standardization and batch effect correction on the transcriptome sequencing data, constructing a unified expression matrix, carrying out differential expression analysis, screening out candidate gene sets, S2, extracting transcription factors in the candidate gene sets, respectively deducing regulation and control relations between the transcription factors and target genes by using a mutual information algorithm based on an information theory and a gradient lifting algorithm based on a tree model, and constructing a multi-algorithm fused gene regulation network, S3, comprehensively scoring the transcription factors in the gene regulation network, and selecting the candidate key transcription factors according to comprehensive score sequencing. The method combines ARACNe-AP algorithm and GRNBoost algorithm, and is good at capturing complex multi-factor co-regulation.

Inventors

LI JIAYAN
ZHU CHANGTAI

Assignees

上海海洋大学
上海市第六人民医院

Dates

Publication Date: 20260505
Application Date: 20260330

Claims (10)

1. A screening method of key transcription factors of metabolic-related fatty liver disease, which is characterized by comprising the following steps: S1, acquiring transcriptome sequencing data of liver tissues of MAFLD patients and normal control liver tissues, carrying out standardization and batch effect correction on the transcriptome sequencing data, and constructing a unified expression matrix; S2, extracting transcription factors in the candidate gene set, respectively deducing the regulation and control relation between the transcription factors and target genes by using a mutual information algorithm based on an information theory and a gradient lifting algorithm based on a tree model, fusing the deduced results of the two algorithms, and constructing a multi-algorithm fused gene regulation and control network; S3, comprehensively scoring the transcription factors in the gene regulation network, and selecting candidate key transcription factors according to comprehensive score ranking.
2. The method according to claim 1, wherein in step S2, the mutual information algorithm based on the information theory is ARACNe-AP algorithm, and the gradient lifting algorithm based on the tree model is GRNBoost < 2 >.
3. The method according to claim 2, wherein in step S2, the number of target genes having the highest importance scores to be retained for each transcription factor is limited to 100 to 500 by using GRNBoost algorithm.
4. The method according to claim 3, wherein in step S2, the indirect regulation relationship is removed by using the data processing inequality DPI when using the ARACNe-AP algorithm.
5. The method according to claim 2, wherein in step S2, the two algorithm inference results are combined to form a multi-algorithm fusion gene regulation network.
6. The method according to claim 1, wherein in step S3, in the calculation of the composite score, a statistical index of each regulatory factor is calculated, and the sum is obtained after taking the negative logarithm, and the statistical index includes: Linear correlation P value, linear regression significance P value between transcription factor and target gene expression; Mutual information significance P value, which is the statistical significance P value of the mutual information intensity between the transcription factor and the target gene; enrichment analysis P value enrichment significance P value of target gene set of transcription factors calculated by Fisher accurate test in the known MAFLD disease gene set.
7. The method according to claim 1, wherein in step S1, the batch effect correction is performed by ComBat algorithm, and the differential expression analysis screening criteria is |log 2 Fold change| > 2 and FDR < 0.05.
8. The method of claim 1, further comprising the step of S4, constructing MAFLD in vitro cell models, respectively carrying out gene knockdown treatment on the candidate key transcription factors selected in the step S3, detecting the expression change of MAFLD related pathway genes after knockdown, and eliminating a candidate key transcription factor if the expression of MAFLD related pathway genes is not changed significantly due to knockdown of the candidate key transcription factor.
9. The method for screening key transcription factors for metabolic-related fatty liver disease according to claim 8, wherein the key transcription factors obtained by screening and verification comprise MMP14, GLIS2, SOX9, HEYL, ZNF462, and HAND2.
10. The screening method of key transcription factors of metabolic-related fatty liver disease according to claim 9, wherein in the process of detecting the expression change of MAFLD related pathway genes after knockdown, a specific primer is used to detect the expression level of the key transcription factors, and the sequence of the specific primer is shown as SEQ ID NO. 1-12.

Description

Screening method of key transcription factors of metabolic related fatty liver disease Technical Field The invention relates to the technical field of medicine, in particular to a screening method of key transcription factors of metabolic-related fatty liver diseases. Background Metabolic-related fatty liver disease (Metabolic Associated FATTY LIVER DISEASE, MAFLD) is a complex chronic liver disease caused by metabolic disorders, and its pathological process involves several links such as fatty liver formation, inflammatory reaction and fibrosis. Since the etiology is closely related to metabolism, MAFLD has replaced the nomenclature of traditional nonalcoholic fatty liver disease (NAFLD), emphasizing its association with metabolic syndrome. In liver metabolic regulation, transcription factors (Transcription Factors, TF) are key gene regulatory factors. For example, SREBP1c regulates lipid synthesis and promotes fat accumulation in the liver, LXR regulates cholesterol, fatty acids and glucose metabolism as nuclear receptor class TF, which plays an important role in the progression of liver metabolic diseases. However, current systematic screening studies for MAFLD key transcription factors remain limited. Traditional approaches rely primarily on single transcriptome data or known candidate genes, possibly missing new regulatory factors. The existing key transcription factor screening methods have advantages and disadvantages. Some studies have used co-expression network analysis (e.g., WGCNA) to find disease-related modules and core gene pathways, but often only related gene modules can be found, making it difficult to determine specific regulatory factors. For example, studies have found that SREBF1, HNF4A, KLF, etc. may be involved in key TF of NAFLD by metabolic network analysis and have been partially validated in a mouse model. In addition, some methods incorporate machine learning feature selection to identify disease marker genes. For example, lu et al combined WGCNA with algorithms such as random forest and screened transcriptome data for two characteristic genes closely related to MAFLD, which demonstrated an indicative effect on disease diagnosis and progression. For another example, lei and the like analyze inflammatory cell apoptosis necrosis related genes of NAFLD patients by using various machine learning methods, and two key genes of TIRAP and GSDMD are screened out as diagnostic markers, so that a high-precision prediction model (the combined characteristic AUC is approximately equal to 0.996, and the AUC is verified to be 0.825 in an external dataset) is constructed. The method can mine potential key genes from mass data and is used for disease prediction, but has the defects that prediction accuracy is limited by a single algorithm or a single data set, cross verification of multi-source data is lacking, the model is easy to deviate from the limitation of the single method due to the deficiency of multi-algorithm integration, in addition, biological verification is relatively lacking, a plurality of screened candidate factors are not verified by full functional tests, and the actual regulation and control effect and reliability are difficult to guarantee. In summary, there is currently no key transcription factor screening strategy that combines multiple sets of biological data, multiple algorithms, and assisted by systematic biological validation. Aiming at the defects, the invention provides a novel screening method, improves the network prediction precision and the reliability of candidate factors, verifies the functions of the candidate factors through a cell experiment, and is expected to provide a new idea for elucidating MAFLD mechanisms and finding intervention targets. Disclosure of Invention Aiming at the defects and shortcomings in the prior art, the invention aims to overcome the defects in the prior art and provide a screening method of key transcription factors of metabolic-related fatty liver diseases. The method reduces bias by fusing multi-queue data, improves network inference precision by combining a complementary algorithm based on information theory and a tree model, and remarkably improves accuracy and biological reliability of screening results. The aim of the invention can be achieved by the following technical scheme: A screening method of key transcription factors of metabolic-related fatty liver disease, comprising the following steps: S1, acquiring transcriptome sequencing data of liver tissues of MAFLD patients and normal control liver tissues, carrying out standardization and batch effect correction on the transcriptome sequencing data, and constructing a unified expression matrix; S2, extracting transcription factors in the candidate gene set, respectively deducing the regulation and control relation between the transcription factors and target genes by using a mutual information algorithm based on an information theory and a gradient lifting algorithm based