CN-122024847-A - Genital tract microorganism and drug resistance gene analysis method based on targeting nanopore sequencing data
Abstract
The invention discloses a genital tract microorganism and drug-resistant gene analysis method based on targeting nanopore sequencing data, which comprises the steps of constructing a microorganism reference database to obtain original sequencing data of targeting nanopore sequencing of a sample, performing quality control on the original sequencing data to obtain a high-quality effective sequence set, performing targeting sequence identification after primer comparison to extract a targeting sequence, performing comparison and abundance estimation, species unique comparison reading calculation and quality and batch pollution control on the extracted targeting sequence and the microorganism reference database, and outputting a microorganism detection result after removing human reads from the extracted targeting sequence, performing sequence comparison, mutation detection and annotation on the drug-resistant gene reference database, and combining the obtained result with the microorganism detection result after result filtration. The method only needs 6 hours for completing data analysis, and the required time is obviously lower than that of a second generation metagenome sequencing method and a nanometer Kong Hong genome sequencing method, so that the method is rapid and efficient.
Inventors
- GAO PENG
- SHEN DAN
- LIN DECHUN
- SHI HONG
- NIE LI
- ZHOU DONGQIN
Assignees
- 杭州迪安生物技术有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260119
Claims (10)
- 1. The genital tract microorganism and drug resistance gene analysis method based on the targeting nanopore sequencing data is characterized by comprising the following steps: S1, constructing a microorganism reference database, wherein the microorganism reference database comprises a microorganism comparison database, a microorganism annotation database, a drug resistance gene comparison database and a drug resistance gene annotation database; S2, obtaining original sequencing data of targeted nanopore sequencing of a sample; S3, preprocessing the original sequencing data, namely performing quality control on the original sequencing data to obtain a high-quality effective sequence set, and performing target sequence identification after primer comparison to extract a target sequence; S4, performing species comparison and abundance estimation, species unique comparison reads calculation and quality and batch pollution control on the microorganisms by adopting the target sequence extracted in the step S2 and the microorganism reference database in the step S1, and then outputting a microorganism detection result; S5, after removing the humanized reads from the targeting sequence extracted in the step S2, carrying out sequence comparison of the drug-resistant genes with a drug-resistant gene reference database in the step S1, mutation detection and annotation, and combining the filtered results with the microorganism detection result in the step S4 to output a final result.
- 2. The genital tract microorganism and drug resistance gene analysis method based on the targeted nanopore sequencing data according to claim 1, wherein the targeted sequence identification method in step S3 is as follows: s3-1, setting a sequence similarity threshold value to be more than or equal to 75%, and comparing primer sequences of the sequencing sequences and only reserving the best matching result of each sequence; S3-2, setting the primer coverage rate to be more than or equal to 90%, setting the primer mismatch number to be less than or equal to 3 bp, setting the primer matching sites within 15 bp range at the two ends of the amplified fragment, setting the length of the amplified fragment to be in the range of 200-2000 bp, and only reserving the sequence meeting the requirements of directivity and sites as a targeting sequence.
- 3. The method of claim 1, wherein the threshold abundance in the abundance estimation in step S4 is set to 0.1%.
- 4. The genital tract microorganism and drug resistance gene analysis method based on the targeted nanopore sequencing data according to claim 1, wherein in the step S4, the species unique comparison read calculation method is as follows; s4-1, extracting candidate comparison of each ready, and calculating comparison; s4-2, screening optimal comparison and suboptimal comparison in candidate comparison, and judging that the reads is uniquely attributed to the species if the suboptimal/optimal score ratio is less than 0.9; s4-3, if only a single trusted alignment or no competing species exists, also marking as a unique alignment; S4-4, counting the number of unique comparison reads of each species, and using the number of unique comparison reads for species abundance correction and threshold judgment; S4-5, setting species positive judgment thresholds for different pathogenic microorganisms based on unique comparison of ready numbers, wherein bacteria is more than or equal to 10, parasites is more than or equal to 30, fungi is more than or equal to 30, viruses are more than or equal to 3, the positive species is judged only when the detection result meets the threshold requirement, and the result lower than the threshold is not reported.
- 5. The method for analyzing genital tract microorganisms and drug resistance genes based on targeting nanopore sequencing data according to claim 1, wherein in the quality and batch pollution control of step S4, the quality standard is set to be Q20>85% and Q30>75%, the effective microorganism sequence proportion is not less than 10%, and the batch standard is set to be that if reads of a species in a single sample account for <1% of the total reads of the species batch, the contamination or background is judged and removed.
- 6. The method according to claim 1, wherein in the step S4, the detection result of the microorganism includes species name, unique comparison reads number, relative abundance, batch filtration status and clinical comments.
- 7. The genital tract microorganism and drug resistance gene analysis method based on the targeting nanopore sequencing data, which is disclosed in claim 1, is characterized in that the sequence alignment of the drug resistance gene in the step S5 is required to meet the conditions that the sequence uniformity is more than or equal to 90%, the coverage is more than or equal to 40% and the effective alignment length is more than or equal to 450 bp.
- 8. The genital tract microorganism and drug resistance gene analysis method based on the targeting nanopore sequencing data, which is disclosed in claim 7, is characterized in that the step S5 results filtration adopts a double-layer filtration standard, namely, whether the gene has a layer surface, namely, only positive results supporting the number of reads is more than or equal to 5 are reserved, and the SNP layer surface, namely, the mutation site needs to meet the coverage depth more than or equal to 10 and the mutation frequency more than or equal to 25%.
- 9. The method for analyzing genital tract microorganisms and drug-resistant genes based on targeting nanopore sequencing data according to claim 1, wherein in the final output of the step S5, the drug-resistant genes and drug-resistant SNP sites thereof are reported only on the premise that the corresponding microorganism is detected in the sample, and if the corresponding host microorganism is not detected, the report is not included.
- 10. The method for analyzing genital tract microorganisms and drug resistance genes based on the targeted nanopore sequencing data according to claim 1, wherein in the step S5, the drug resistance gene detection result includes a gene name, a mutation site, a number of homogenization reads, a coverage depth, a mutation frequency and a drug resistance judgment.
Description
Genital tract microorganism and drug resistance gene analysis method based on targeting nanopore sequencing data Technical Field The invention belongs to the technical field of microorganism detection, and particularly relates to a genital tract microorganism and drug resistance gene analysis method based on targeting nanopore sequencing data. Background The diseases related to genital tract infection relate to bacteria, fungi, viruses and other pathogenic microorganisms, and the rapid and accurate diagnosis of the diseases has important significance for clinical treatment and prognosis. Although the traditional culture method is a gold standard for etiology diagnosis, the traditional culture method is limited by culture conditions and periods, and is often difficult to reflect the composition of pathogenic microorganisms timely and comprehensively. Molecular detection methods such as PCR are widely used in the detection of specific pathogenic microorganisms and partial drug resistance genes, but have limited and single detection ranges, and thus are difficult to cope with mixed and rare infections. In addition, the detection of pathogenic microorganisms and drug resistance genes is separately detected, and the results of the detection of microorganisms and drug resistance genes cannot be detected at one time. Pathogen metagenome sequencing based on second-generation sequencing can detect thousands of pathogenic microorganisms at a time, and greatly expands the detection range. However, in clinical application, single-ended 50bp or 75bp sequencing is usually adopted, but too short sequencing read leads to ineffective discrimination of homologous sequences, so that clinical samples, particularly sensitive samples, are difficult to discriminate closely, and the detection specificity is insufficient. And the third generation-based nanopore metagenome sequencing greatly improves the analysis specificity due to the long reading length characteristic. However, the second generation or third generation metagenome sequencing is random unbiased sequencing, and usually more than 95% of reads detected in one sample are host sequences and non-pathogenic colonization bacteria sequences, which causes great waste of data and is high in cost. Therefore, development of a data analysis method with low cost, short detection time, higher sensitivity and good specificity is needed. Disclosure of Invention In order to solve at least one of the problems, the invention provides a genital tract microorganism and drug resistance gene analysis method based on targeting nanopore sequencing data. In order to achieve the above purpose, the invention adopts the following technical means: the first aspect of the invention provides a genital tract microorganism and drug resistance gene analysis method based on targeting nanopore sequencing data, comprising the following steps: S1, constructing a microorganism reference database, wherein the microorganism reference database comprises a microorganism comparison database, a microorganism annotation database, a drug resistance gene comparison database and a drug resistance gene annotation database; S2, obtaining original sequencing data of targeted nanopore sequencing of a sample; S3, preprocessing the original sequencing data, namely performing quality control on the original sequencing data to obtain a high-quality effective sequence set, and performing target sequence identification after primer comparison to extract a target sequence; S4, performing species comparison and abundance estimation, species unique comparison reads calculation and quality and batch pollution control on the microorganisms by adopting the target sequence extracted in the step S2 and the microorganism reference database in the step S1, and then outputting a microorganism detection result; S5, removing human reads by adopting the targeting sequence extracted in the step S2, performing sequence comparison of the drug resistance genes with a drug resistance gene reference database in the step S1, detecting and annotating mutation, filtering the result, and combining the result with the microorganism detection result in the step S4 to output a final result. In some embodiments of the invention, the microbial comparison database comprises: (a) Bacterial 16S database, used for bacterial classification and species identification, was mainly derived from NCBI nt (ftp:// ftp. NCBI. Lm. Nih. Gov/blast/db/FASTA/nt. Gz) and RefSeq. (B) Fungus ITS database for rapid identification of fungus species, data from UNITE (ftp:// ftp. Unite. Ut. Ee) and NCBI nt. (C) The genome library of a specific species comprises high quality strain sequences for genital tract related species or genes derived from the complete genome or target gene sequences published in GenBank (ftp:// ftp. Ncbi. Nlm. Nih. Gov/genomes). During construction, species names map to NCBI Taxonomy and unmatched species are labeled with superior genus taxid. And redundant