CN-121999871-A - Method for detecting viral load based on single cell transcriptome sequencing

CN121999871ACN 121999871 ACN121999871 ACN 121999871ACN-121999871-A

Abstract

The invention relates to the technical fields of single-cell transcriptome and microbiology, in particular to a method for detecting viral load based on single-cell transcriptome sequencing, which comprises the steps of extracting single-cell transcriptome sequencing data to form a single-cell gene expression matrix, and comparing the single-cell gene expression matrix with a host comparison file to extract reading data; the method comprises the steps of dynamically translating the reading data to generate an amino acid sequence, comparing and grading the amino acid sequence with a virus protein sequence in a virus database, correcting the distance weight to obtain a weighted comparison value, forming a virus sequence-cell matrix by the weighted comparison value and the reading data, inputting the virus reading data, a host cell immune response index and a cell type sensitivity score into a cell infection tendency index calculation formula, and obtaining the cell infection probability through a logistic regression model. The invention avoids the technical problem of poor sensibility to RNA viruses with high mutation rate, realizes quantitative estimation of single cell infection probability, and is suitable for detecting remote viruses or new viruses which are not recorded yet.

Inventors

ZHAO JIANJUN
LI SHANGTONG
Deng Shuhan
LIU QING
MA WEI

Assignees

格致博雅生物科技(嘉兴)有限公司
河北农业大学

Dates

Publication Date: 20260508
Application Date: 20260410

Claims (10)

1. A method for detecting viral load based on single cell transcriptome sequencing, comprising the steps of: S1, extracting single-cell transcriptome sequencing data to form a single-cell gene expression matrix, reading data which are not compared with a host reference genome from a file which is compared with the host, recording cell barcodes and unique molecular identifiers of the data, and extracting corresponding reading data; S2, dynamically translating the reading data extracted in the step S1 to generate an amino acid sequence; S3, comparing and scoring the amino acid sequence with the virus protein sequence in the virus database, and correcting the distance weight to obtain a weighted comparison value, wherein the weighted comparison value and the reading data in the step S1 form a virus sequence-cell matrix; S4, obtaining virus reading data V in single cells based on the virus sequence-cell matrix in the step S3, obtaining a host cell immune response index H according to IFN PATHWAY gene sets based on the single cell gene expression matrix in the step S1, and giving a cell type sensitivity score C according to the cell type; S5, inputting the virus reading data V, the host cell immune response index H and the cell type sensitivity score C obtained in the step S4 into an infection tendency index quantitative model to obtain the cell infection probability.
2. The method of claim 1, wherein in step S3, the calculation formula of the distance weight correction is: ; in the formula, A comparison score corrected for the distance weight; Comparing the scores based on the comparison; the evolution distance between two nodes in a phylogenetic tree constructed for a viral protein reference database; is an evolution distance weight function, wherein: 。
3. The method of claim 2, wherein when detecting viral load based on single cell transcriptome sequencing Less than 0.05 is judged as unknown virus.
4. The method for detecting viral load based on single cell transcriptome sequencing of claim 1, wherein in step S4, the cell type sensitivity score C is in the range of 0 to 1.
5. The method of claim 1, wherein in step S5, the quantitative model of infection propensity index is: ; in the formula, Is a Sigmoid function; model parameters a, C, d, e are respectively estimated parameters through maximum likelihood, supervised learning parameters, EM algorithm parameters and fixed experience parameters, V CPM is CPM standardization of virus reads with the number of V for reflecting absolute expression levels of virus RNA, H is host immune index normalization for capturing host immune response, and C is cell type sensitivity normalization for representing susceptibility differences caused by cell types.
6. The method of claim 5, wherein the ILI value is in the range of 0 to 1.
7. A device for detecting viral load based on single cell transcriptome sequencing, comprising: The positive and negative chain dynamic translation module is used for dynamically translating reading frames of reads data on positive and negative chains to generate an amino acid sequence; the virus protein comparison module based on the evolution distance weighting is used for calculating the phylogenetic distance between the virus proteins, the amino acid sequences are compared with software to obtain comparison scores, and a weighting coefficient is set according to the distances, so that the virus sequence identification is realized, and a virus sequence-cell matrix is obtained; And calculating ILI value module, obtaining virus load characteristic V based on virus sequence-cell matrix, host immune response index H and cell type susceptibility score C, and obtaining ILI value of cell according to infection tendency index quantitative model.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the method of detecting viral load based on single cell transcriptome sequencing of any one of claims 1 to 6.
9. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the method of detecting viral load based on single cell transcriptome sequencing according to any one of claims 1 to 6.
10. A computer program product comprising a computer program which, when executed by a processor, implements the method of detecting viral load based on single cell transcriptome sequencing according to any one of claims 1 to 6.

Description

Method for detecting viral load based on single cell transcriptome sequencing Technical Field The invention relates to the technical fields of single-cell transcriptome and microbiology, in particular to a method for detecting viral load based on single-cell transcriptome sequencing. Background With the rapid development of high throughput sequencing technology, single-cell transcriptome sequencing (single-cell RNA sequencing, scRNA-seq) has become an important tool for resolving cellular heterogeneity, immune response, multicellular system composition, and disease progression. Compared with the traditional tissue transcriptome sequencing, single-cell transcriptome sequencing can analyze the expression characteristics of each cell, reveal the difference between cells, and is widely applied to research directions such as tumor microenvironment, immune system development, infection disease model, multi-tissue map construction and the like. In recent years, more and more studies have shown that scRNA-seq data not only contains host transcription information, but also may carry transcription fragments derived from exogenous nucleic acids such as viruses. In the related research of virus infection, the method can identify virus infected cells on single cell resolution, analyze the distribution rule of viruses among different cell types, and research the host gene expression change caused by infection, thereby having important scientific significance and application prospect. With the popularity of single cell transcriptome sequencing technology, the identification of viral infection at the single cell level has become an important requirement for research of host interaction with pathogens and for monitoring of new viruses. However, there are still a number of limitations to the current scRNA-seq virus detection technology. Traditional virus detection methods rely mainly on the reference genome of known viruses to match the sequencing data to the reference sequence by nucleotide alignment, thereby identifying the infection event. However, unknown or distant viruses cannot be found, are insensitive to mutant virus detection, and are computationally intensive and slow. For example, patent CN115512767a discloses a method for detecting and analyzing the amount of virus expression in single cell transcriptome sequencing data. The scheme constructs a 'host-virus' composite reference genome by downloading the reference genome sequences of a host and a specific virus, and adding the virus genome into the host genome in a specific chromosome manner. And then comparing the single cell sequencing reads with the combined reference genome to obtain the coverage and the expression quantity of different genes of the virus, and further calculating the expression correlation of the virus genes and host genes to analyze the virus-host interaction relationship. Although this technique enables quantitative analysis of transcripts of known viruses at the single cell level, it relies on the complete reference genome of known viruses, fails to identify viruses without reference sequences, nor to discover and identify unknown or new viral sequences from single cell sequencing data. Thus, this approach has significant limitations in dealing with new viruses in unknown pathogens, mixed infections, or complex tissue microenvironments. The patent CN116758988a is directed to a method and apparatus for analyzing microbial information of single cell transcriptome data, and also uses a reference genome alignment method to determine microbial information. In summary, the existing methods have the following problems: a) The prior art generally relies on the reference genome or virus-specific gene signature of known viruses to make it possible to detect only the viruses that are recorded in the database. The method can not monitor new viruses, unknown variant strains or distant viruses with larger differences from known viruses; b) The method is insensitive to RNA viruses with high mutation rate, the existing nucleotide comparison method is difficult to adapt to the characteristic of the RNA viruses with high mutation rate, and the base change easily causes comparison failure, so that misjudgment as uninfected can be generated, the detection accuracy is reduced, and the robustness of virus identification is insufficient; c) The existing method only judges whether the infection exists or not, and the modeling mode of comprehensive information such as viral load, cell type sensitivity, multi-virus co-infection condition and the like is lacking, so that the infection degree or the infection probability cannot be accurately inferred. Therefore, the development of a new technology that does not rely on the complete viral reference genome, is capable of tolerating viral mutations, and is capable of rapidly identifying viral infection at single cell resolution is an urgent need in the current virology and single cell sequencing fields. Disclosure of Inv