SNPPar: identifying convergent evolution and other homoplasies from microbial whole-genome alignments

David J Edwards ORCID logo; Sebastián Duchêne ORCID logo; Bernard Pope ORCID logo; Kathryn E Holt ORCID logo; (2020) SNPPar: identifying convergent evolution and other homoplasies from microbial whole-genome alignments. bioRxiv preprint. DOI: 10.1101/2020.07.08.194480
Copy

<jats:title>Abstract</jats:title><jats:p>Homoplasic single nucleotide polymorphisms (SNPs) are considered important signatures of strong (positive) selective pressure, and hence of adaptive evolution for clinically relevant traits such as antibiotic resistance and virulence. Here we present a new tool, SNPPar, for efficient detection and analysis of homoplasic SNPs from large WGS datasets (&gt;1,000 isolates and/or &gt;100,000 SNPs). SNPPar takes as input a SNP alignment, tree and annotated reference genome, and uses a combination of simple monophyly tests and ancestral state reconstruction (ASR, via TreeTime) to assign mutation events to branches and identify homoplasies. Mutations are annotated at the level of codon and gene, to facilitate analysis of convergent evolution.</jats:p><jats:p>Testing on simulated data (120<jats:italic>Mycobacterium tuberculosis</jats:italic>alignments representing local and global samples) showed SNPPar can detect homoplasic SNPs with very high sensitivity (zero false-positives in all tests) and high specificity (zero false-negatives in 89% of tests). SNPPar analysis of three empirically sampled datasets (<jats:italic>E. anophelis, B. dolosa</jats:italic>and<jats:italic>M. tuberculosis</jats:italic>) produced results that were in concordance with previous studies, in terms of both individual homoplasies and evidence of convergence at the codon and gene levels. SNPPar analysis of a simulated alignment of ∼64,000 genome-wide SNPs from 2000<jats:italic>M. tuberculosis</jats:italic>genomes took ∼23 minutes and ∼2.6 GB of RAM to generate complete annotated results on a laptop. This analysis required ASR be conducted for only 1.25% of SNPs, and the ASR step took ∼23 seconds and 0.4 GB RAM.</jats:p><jats:p>SNPPar automates the detection and annotation of homoplasic SNPs efficiently and accurately from large SNP alignments. As demonstrated by the examples included here, this information can be readily used to explore the role of homoplasy in parallel and/or convergent evolution at the level of nucleotide, codon and/or gene.</jats:p><jats:sec><jats:title>Impact statement</jats:title><jats:p>DNA sequences of bacterial pathogens are mutating all the time; most changes are deleterious or neutral, but sometimes a mutation leads to functional change that allows the pathogen to evade a potential threat. These random mutational changes (single nucleotide polymorphisms, or SNPs) are so very rarely beneficial, that when they do arise in parallel in distantly related isolates (known as homoplasic SNPs) this indicates that the change may be positively selected because it confers an adaptive advantage to the bacteria.</jats:p><jats:p>Finding homoplasic SNPs in large sets of bacterial genomes is challenging as current tools require substantial time and computational resources to run. Here we present SNPPar, a software program to efficiently and accurately automate the detection and annotation of homoplasic SNPs from large whole-genome sequence data sets. We use simulated data to demonstrate accuracy of the program, and re-analyse published datasets using SNPPar to illustrate how the results can be used to gain insights into the evolution of antibiotic resistance and other traits.</jats:p><jats:p>We envisage SNPPar will help facilitate the undertaking of long-term, real-time surveillance of bacterial pathogens, and their adaptive evolutionary response to interventions and control measures such as new drugs or vaccines.</jats:p></jats:sec><jats:sec><jats:title>Data summary</jats:title><jats:p>The authors confirm all supporting data, code and protocols have been provided within the article, through supplementary data files or other online sources as indicated in the article.</jats:p><jats:p>New content generated for this paper is:</jats:p><jats:list list-type="order"><jats:list-item><jats:p>SNPPar code is available from<jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/d-j-e/SNPPar">https://github.com/d-j-e/SNPPar</jats:ext-link>. The version described here is v1.0.</jats:p></jats:list-item><jats:list-item><jats:p>A GitHub repository containing the full protocol, ‘in-house’ code and data used to carry out the validation and performance testing is available at<jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/d-j-e/SNPPar_test">https://github.com/d-j-e/SNPPar_test</jats:ext-link>. This repository includes all the simulated and real data sets used here.</jats:p></jats:list-item></jats:list></jats:sec><jats:sec><jats:title>Data statement</jats:title><jats:p>The authors confirm all supporting data, code and protocols have been provided within the article, through supplementary data files or other online sources as indicated in the article.</jats:p></jats:sec>


picture_as_pdf
2020.07.08.194480v1.full.pdf
subject
Published Version
Available under Creative Commons: NC-ND 3.0

View Download

Atom BibTeX OpenURL ContextObject in Span Multiline CSV OpenURL ContextObject Dublin Core Dublin Core MPEG-21 DIDL EndNote HTML Citation JSON MARC (ASCII) MARC (ISO 2709) METS MODS RDF+N3 RDF+N-Triples RDF+XML RIOXX2 XML Reference Manager Refer Simple Metadata ASCII Citation EP3 XML
Export

Downloads