A bioinformatic analysis of Mycobacterium tuberculosis and host genomic data

J Phelan; (2018) A bioinformatic analysis of Mycobacterium tuberculosis and host genomic data. PhD (research paper style) thesis, London School of Hygiene & Tropical Medicine. DOI: 10.17037/PUBS.04646310
Copy

Human tuberculosis disease (TB) is caused by bacteria within the Mycobacterium tuberculosis complex, including M. tuberculosis (Mtb). Genetic variation within the pathogen can lead to drug resistance, affect virulence and transmissibility. I have analysed Mtb whole genome sequence data to improve the understanding of global genetic variation, and the resulting insights could ultimately assist the development of TB control measures. Whole genome sequencing platforms are being used to infer drug resistance profiles, and thereby could assist clinical management. I investigated the reproducibility of sequence data from two platforms (Illumina MiSeq, Ion Torrent PGM™) and two rapid analytic pipelines (TBProfiler, Mykrobe predictor). DNA replicates from the reference strain (H37Rv) and 10 drug-resistant strains were sequenced, and inferred drug resistance genotypes were compared to drug susceptibility testing phenotypes. Genome-wide association study (GWAS) can be used to detect mutations associated with Mtb drug resistance. A first GWAS (n=127) attempted to identify mutations associated with minimum inhibitory concentrations for first-line anti-tuberculosis drugs. A second GWAS was applied to a large global set (n>6400) to identify mutations associated with first- and secondline drug resistance. M. aurum is an environmental mycobacteria that has been proposed as a model for the development of anti-TB drugs. I have assembled and annotated its draft genome, and identified copy number variants in known drug resistance targets. Approximately 10% of the Mtb genome consists of two gene families (pe/ppe) that are poorly characterised, and are hypothesised to be important virulence factors. Using a de novo assembly approach, I characterised these genes and their diversity across a global collection of clinical isolates with high depth short-read sequence data (n=518). A follow-up study using a long-read sequence technology (n=18, diverse stain types) confirmed the findings. This work also generated new annotated reference genomes and characterised methylation sites, which may affect transmissibility, pathogenicity and virulence. A future direction of the TB genomics field is to identify genetic check points in host-pathogen interactions using both human and Mtb genotypes. I analysed the genomes of ~720 TB case–Mtb pairs and identified susceptibility markers, which are promising targets for future control measures.


picture_as_pdf
2018_ITD_PhD_Phelan_J.pdf
subject
Accepted Version
Available under Creative Commons: NC-ND 3.0

View Download

Atom BibTeX OpenURL ContextObject in Span Multiline CSV OpenURL ContextObject Dublin Core Dublin Core MPEG-21 DIDL EndNote HTML Citation JSON MARC (ASCII) MARC (ISO 2709) METS MODS RDF+N3 RDF+N-Triples RDF+XML RIOXX2 XML Reference Manager Refer Simple Metadata ASCII Citation EP3 XML
Export

Downloads