Search

CN-122024852-A - Genome identification, quantification and visualization method for drug-resistant pathogenic bacteria in environmental sample

CN122024852ACN 122024852 ACN122024852 ACN 122024852ACN-122024852-A

Abstract

The invention provides a genome identification, quantification and visualization method for drug-resistant pathogenic bacteria in an environment sample, which is suitable for identifying risk drug-resistant pathogenic bacteria in various environments. The method comprises the steps of (1) obtaining a high-quality bacterial genome based on a macro genome sequencing big data set by utilizing genome quality control, assembly and box separation technology, (2) identifying a drug-resistant bacterial genome carrying antibiotic drug-resistant genes therein, further annotating virulence factors and comparing the virulence factors with authoritative pathogenic bacteria directory to realize accurate identification of drug-resistant pathogenic bacteria, (3) mapping drug-resistant pathogenic bacteria reference genome through the macro genome to quantify abundance of drug-resistant pathogenic bacteria, and further carrying out visual analysis on the drug-resistant pathogenic bacteria mediated drug-resistant genes, related movable genetic elements and other gene neighborhood structures. The method does not depend on separation culture, realizes full spectrum analysis of drug-resistant pathogenic bacteria in an environmental sample, and provides important technical support for accurately and scientifically evaluating the drug resistance risk of environmental bacteria antibiotics.

Inventors

  • MA LIPING
  • LIU HUAFENG

Assignees

  • 华东师范大学

Dates

Publication Date
20260512
Application Date
20260202

Claims (6)

  1. 1. The genome identification, quantification and visualization method for the drug-resistant pathogenic bacteria in the environmental sample is characterized by comprising the following steps: Step 1, acquiring total DNA of an environmental sample microorganism, performing high-throughput sequencing of a metagenome, and obtaining high-quality metagenome data CLEAN READS after sequence quality control and host sequence removal; Step 2, assembling CLEAN READS obtained in the step 1 by using assembling software to obtain Contigs, and then clustering the Contigs by using a plurality of Binning algorithms to reconstruct a metagenome assembly genome MAG; step 3, performing quality evaluation and screening on MAGs obtained in the step 2 to obtain a high-quality MAG set; Step 4, carrying out open reading frame ORF prediction on the high-quality MAG screened in the step 3, comparing the predicted protein sequence with an antibiotic drug resistance gene ARG database, identifying the MAG carrying the antibiotic drug resistance gene, and determining the MAG carrying the antibiotic drug resistance gene as a drug resistance gene host MAG; step 5, in the drug-resistant gene host MAG determined in the step 4, further comparing the predicted protein sequence to a virulence factor gene VFG database, and screening MAG carrying both antibiotic drug-resistant genes and virulence factor genes as a potential drug-resistant pathogenic bacteria genome; step 6, comparing the potential drug-resistant pathogenic bacteria genome obtained in the step 5 with a WHO authoritative pathogenic bacteria directory of the world health organization, and confirming the drug-resistant pathogenic bacteria genome in the environmental sample; Mapping CLEAN READS obtained in the step 1 onto the drug-resistant pathogenic bacteria genome confirmed in the step 6 by using quantitative analysis software, and calculating the abundance of the drug-resistant pathogenic bacteria in an environmental sample; and 8, extracting a genome region containing antibiotic resistance genes and/or virulence factor genes aiming at the drug-resistant pathogenic bacteria genome in the step 7, annotating the movable genetic elements MGEs around the genome region, and generating a corresponding gene neighborhood structure visualization map.
  2. 2. The method according to claim 1, wherein the step 3 specifically comprises: step 3-1, evaluating the integrity and pollution degree of each MAG based on the lineage specific marker gene set by CheckM software; And 3-2, screening MAG with the integrity larger than a preset threshold value and the pollution degree smaller than the preset threshold value as high-quality MAG, wherein the integrity is larger than 70% or 90%, and the pollution degree is smaller than 10%.
  3. 3. The method according to claim 1, wherein the step 4 specifically comprises: step 4-1, performing open reading frame prediction on the screened MAG by using Prodigal; Step 4-2, comparing the predicted protein sequence with SARG, CARD or RESFINDER of antibiotic resistance gene professional database by using BLAST or DIAMOND tool; And 4-3, setting a screening threshold value that the e value is less than or equal to 1 multiplied by 10 -10 , the sequence similarity is greater than or equal to 80 percent, and the coverage is greater than or equal to 75 percent, and identifying MAG carrying antibiotic resistance genes and determining the MAG as a drug resistance gene host MAG based on the comparison and screening result.
  4. 4. The method according to claim 1, wherein the step 5 specifically comprises: 5-1, comparing the protein sequence of the drug resistant gene host MAG determined in the step 4 with a virulence factor gene database VFDB by using BLAST or DIAMOND tools; and 5-2, setting a screening threshold value to be less than or equal to 1 multiplied by 10 -10 , wherein the sequence similarity is more than or equal to 70%, the coverage is more than or equal to 70%, and screening MAG carrying antibiotic resistance genes and virulence factor genes simultaneously as potential drug-resistant pathogenic bacteria genome.
  5. 5. The method of claim 1, wherein the authoritative pathogen list comprises a list of key or critical pathogens published by the WHO of the world health organization.
  6. 6. The method according to claim 1, wherein the step 7 specifically comprises: Step 7-1, mapping the high-quality metagenomic data CLEAN READS obtained in the step 1 onto the genome of the target drug-resistant pathogenic bacteria MAG confirmed in the step 6 by using CoverM or Salmon quantitative analysis software; Step 7-2, calculating the relative abundance of each target MAG, and normalizing the result to the number of mapping sequences per kilobase per million, namely RPKM value, wherein the calculation formula of the RPKM is as follows: RPKM wherein MAPPED READS is the sequence number mapped to the target MAG, total MAPPED READS is the Total mapping number of all MAGs, length is the genome Length of the target MAG, and the RPKM value represents the relative abundance level of the target drug-resistant pathogenic bacteria in an environmental microbial community and can be used for quantitatively analyzing the environmental distribution of the drug-resistant pathogenic bacteria.

Description

Genome identification, quantification and visualization method for drug-resistant pathogenic bacteria in environmental sample Technical Field The invention belongs to the technical field of environmental microbiology and antibiotic resistance risk assessment, and particularly relates to an integrated analysis method for identification, abundance quantification and genome structure visualization of drug-resistant pathogenic bacteria in an environmental sample based on Metagenome assembly genome (MAG-Assembled Genomes). The method is suitable for various environmental samples such as water, atmosphere, soil and the like, and realizes systematic analysis and risk assessment of high-risk drug-resistant pathogenic bacteria in the environment by correlating the information of antibiotic resistance, pathogenicity and genetic structure on genome scale. Background The continued spread of antibiotic resistance (Antibiotic Resistance, AR) has become one of the important issues threatening public health worldwide. Environmental systems, including river water, sewage treatment plants, farmland soil, and drinking water distribution networks, are widely recognized as important mediators of antibiotic resistance gene (Antibiotic RESISTANCE GENES, ARGS) production, enrichment, and transmission. In an environment medium closely related to human health, a microbial community has complex composition and various sources, wherein part of microorganisms possibly carry antibiotic resistance genes and pathogenic related factors at the same time, and once entering a human body, the microbial community can form a potential threat to public health safety. Therefore, the accurate identification, quantification and risk propagation assessment of the high-risk drug-resistant pathogenic bacteria in the environmental sample are key scientific and technical problems in environmental resistance research and public health monitoring. At present, research on drug-resistant pathogenic bacteria mainly depends on traditional separation culture technology or short sequence metagenome data analysis, and problems of low detection flux and poor accuracy exist respectively, so that high-flux and accurate screening of high-risk drug-resistant pathogenic bacteria in complex flora of environmental samples are difficult to meet: 1) ARGs is not enough in association with a host, namely, annotation and abundance statistics are carried out on ARGs based on a sequencing read (reads-level) or a short sequence splicing fragment (condigs-level), reliable association of ARGs with a specific microbial host on a genome scale is difficult, and a specific microbial individual carrying a drug resistance gene cannot be clearly identified, so that accurate identification of a drug resistance risk vector is limited. 2) The screening of drug-resistant pathogenic bacteria lacks a system judging mechanism, namely only the existence or abundance of ARGs is concerned, the host information of ARGs is difficult to effectively integrate with the authoritative pathogenic bacteria directory, and whether drug-resistant genes are derived from low-risk environmental background bacteria or risk human pathogenic bacteria cannot be distinguished, so that the environmental drug-resistant risk assessment result is obviously uncertain. 3) The risk quantification means of the genome scale is lacking, namely ARGs gene abundance is taken as an evaluation index, and the relative abundance quantification method aiming at a high-risk unit of a pathogenic bacterium genome carrying a drug-resistant gene is lacking, so that the real exposure level of the drug-resistant pathogenic bacterium in an environment sample is difficult to reflect. 4) The genetic transmission risk assessment capability is insufficient, the systematic analysis of the genome structure of the drug-resistant pathogenic bacteria is lacking, and particularly, the characterization of ARGs adjacent movable genetic elements (Mobile GENETIC ELEMENTS, MGES) is lacking, so that the transmission potential and risk hot spot areas of the drug-resistant genes through a horizontal gene transfer mechanism are difficult to judge. 5) The lack of a standardized integrated process, which does not establish a method system capable of simultaneously completing genome reconstruction, drug resistance and pathogenicity combined evaluation, pathogenic bacteria quantification and genetic structure visualization in a unified process, limits the universality of the technology in different environmental samples and practical application in public health monitoring. Therefore, there is a need for an environmental drug-resistant pathogen analysis method using a genome as a core analysis unit, which can realize accurate identification, effective quantification and visual assessment of potential transmission risks of drug-resistant pathogens in environmental samples by integrating antibiotic resistance, pathogenicity and genetic structure information on the ge