Completing bacterial genome assemblies with multiplex MinION sequencing
<jats:title>Abstract</jats:title><jats:p>Illumina sequencing platforms have enabled widespread bacterial whole genome sequencing. While Illumina data is appropriate for many analyses, its short read length limits its ability to resolve genomic structure. This has major implications for tracking the spread of mobile genetic elements, including those which carry antimicrobial resistance determinants. Fully resolving a bacterial genome requires long-read sequencing such as those generated by Oxford Nanopore Technologies (ONT) platforms. Here we describe our use of the ONT MinION to sequence 12 isolates of<jats:italic>Klebsiella pneumoniae</jats:italic>on a single flow cell. We assembled each genome using a combination of ONT reads and previously available Illumina reads, and little to no manual intervention was needed to achieve fully resolved assemblies using the Unicycler hybrid assembler. Assembling only ONT reads with Canu was less effective, resulting in fewer resolved genomes and higher error rates even following error correction with Nanopolish. We demonstrate that multiplexed ONT sequencing is a valuable tool for high-throughput bacterial genome finishing. Specifically, we advocate the use of Illumina sequencing as a first analysis step, followed by ONT reads as needed to resolve genomic structure.</jats:p><jats:sec><jats:title>Data summary</jats:title><jats:p><jats:list list-type="order"><jats:list-item><jats:p>Sequence read files for all 12 isolates have been deposited in SRA, accessible through these NCBI BioSample accession numbers: SAMEA3357010, SAMEA3357043, SAMN07211279, SAMN07211280, SAMEA3357223, SAMEA3357193, SAMEA3357346, SAMEA3357374, SAMEA3357320, SAMN07211281, SAMN07211282, SAMEA3357405.</jats:p></jats:list-item><jats:list-item><jats:p>A full list of SRA run accession numbers (both Illumina reads and ONT reads) for these samples are available in<jats:bold>Table S1</jats:bold>.</jats:p></jats:list-item><jats:list-item><jats:p>Assemblies and sequencing reads corresponding to each stage of processing and analysis are provided in the following figshare project:<jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://figshare.com/projects/Completing_bacterial_genome_assemblies_with_multiplex_MinION_sequencing/23068">https://figshare.com/projects/Completing_bacterial_genome_assemblies_with_multiplex_MinION_sequencing/23068</jats:ext-link></jats:p></jats:list-item><jats:list-item><jats:p>Source code is provided in the following public GitHub repositories:<jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/rrwick/Bacterial-genome-assemblies-with-multiplex-MinION-sequencing">https://github.com/rrwick/Bacterial-genome-assemblies-with-multiplex-MinION-sequencing</jats:ext-link><jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/rrwick/Porechop">https://github.com/rrwick/Porechop</jats:ext-link><jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/rrwick/Fast5-to-Fastq">https://github.com/rrwick/Fast5-to-Fastq</jats:ext-link></jats:p></jats:list-item></jats:list></jats:p></jats:sec><jats:sec><jats:title>Impact Statement</jats:title><jats:p>Like many research and public health laboratories, we frequently perform large-scale bacterial comparative genomics studies using Illumina sequencing, which assays gene content and provides the high-confidence variant calls needed for phylogenomics and transmission studies. However, problems often arise with resolving genome assemblies, particularly around regions that matter most to our research, such as mobile genetic elements encoding antibiotic resistance or virulence genes. These complexities can often be resolved by long sequence reads generated with PacBio or Oxford Nanopore Technologies (ONT) platforms. While effective, this has proven difficult to scale, due to the relatively high costs of generating long reads and the manual intervention required for assembly. Here we demonstrate the use of barcoded ONT libraries sequenced in multiplex on a single ONT MinION flow cell, coupled with hybrid assembly using Unicycler, to resolve 12 large bacterial genomes. Minor manual intervention was required to fully resolve small plasmids in five isolates, which we found to be underrepresented in ONT data. Cost per sample for the ONT sequencing was equivalent to Illumina sequencing, and there is potential for significant savings by multiplexing more samples on the ONT run. This approach paves the way for high-throughput and cost-effective generation of completely resolved bacterial genomes to become widely accessible.</jats:p></jats:sec>
Item Type | Article |
---|