This paper represents the collective work of the Ion Torrent R&D Team over the past three years. This is a team of over 300 people, in which many people have made important contributions. For a full list of members of this team, please see the Addendum at the end of this manuscript.
Progress in Ion Torrent semiconductor chip based sequencing
Version of Record online: 4 DEC 2012
© 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Special Issue: Next Generation Sequencing and Genotyping
Volume 33, Issue 23, pages 3397–3417, December 2012
How to Cite
Merriman, B., R&D Team, I. T. and Rothberg, J. M. (2012), Progress in Ion Torrent semiconductor chip based sequencing. ELECTROPHORESIS, 33: 3397–3417. doi: 10.1002/elps.201200424
- Issue online: 4 DEC 2012
- Version of Record online: 4 DEC 2012
- Manuscript Accepted: 18 SEP 2012
- Manuscript Revised: 17 SEP 2012
- Manuscript Received: 3 AUG 2012
Vol. 34, Issue 4, 619, Version of Record online: 18 FEB 2013
Disclaimer: Supplementary materials have been peer-reviewed but not copyedited.
Figure S1. Throughput Scaling Roadmap for Chips. This shows the general roadmap for achieving different amounts of sequencing throughput per run, for the 3 series and Proton series chips. The timeline shows approximate dates for chip availability. Each chip icon is centered around its target throughput at introduction, and the arrows besides the icons indicate the range of throughput to be achieved over the subsequent life of the chip. For the 3-series, the blue arrows indicate current R&D performance range to date (July 1, 2012). Red arrows are future projections for Proton series.
Figure S2. Progress in scaling of well-size for bead-in-well-based sequencing technologies . This figure shows the relative size of wells of the Ion 3-series wells (upper left inset, 3 micron diameter), the Ion Proton I wells (upper right inset, 1.2 micron diameter), and, for reference (main figure, 30 micron diameter), the size of the wells for the 454 Titanium sequencer (note: the 454 well image also shows 454 primary DNA carrier (largest), loading (medium) and reagent (smallest) beads).
Figure S3. Maintaining sequencing performance in the presence of scaling. The figure shows a typical 200 base AQ20 (99% aligned accuracy) read from a Proton 1 run. At left are normalized incorporation signals, and at right are the associated base calls, aligned against the reference genome (lower sequence tract). The 203 base aligned read contains 2 sequencing errors indicated by misalignment (at base 97 and base 112), as expected by the definition of a 200 AQ20 read.
Figure S4. Examples length distributions from runs on the 314 chip. Show are read length histograms for AQ17 reads (aligned accuracy of 98%) resulting from four different 314 chip runs, which illustrates the evolution toward long read length capability. The figures show runs with modal read AQ17 read lengths of 110 bases, 225 bases, 325 bases, and 400 bases. Note the step pattern in the histograms is an artifact of the quantized definition of AQ17 accuracy, which bins all reads by 1 error in 50 bases, 2 errors in 100 bases, 3 errors in 150 bases, etc.
Figure S5. Example 525 base perfect read. This shows an example of a 525 base perfect read, produced as the longest perfect read from a 314 chip run in which the modal AQ17 read length was nearly 500 bases. At left is the normalized series of incorporation signals from the 940 trial flows, and at right is the corresponding 525 base sequence, shown perfectly aligned to the reference genome (E Coli DH10B). This illustrates the potential for achieving contiguous read lengths on 500-base scale.
Figure S6. Typical raw signals from a 3-series pHFET sensor. This figure shows the raw data from a 314 pHFET sensor, sampled at 100Hz and with the signal voltage (fundamental signal type) converted to pH units here for presentation. Shown in blue is the trace from a flow in which single base incorporation occurred, whereas red shows a superimposed trace from another sensor in which no incorporation has occurred, and the ∼0.2 pH changes is due solely to pH changes in the solution flowing onto the chip. To isolated the incorporation signal explicitly, this background change in pH must be modeled and subtracted.
Figure S7. Raw signals and incorporation signals. This figure shows an example of output from a sensor on a 3-series chip, with the upper panel being the raw voltages (mV units) sampled from the sensor during the course of 12 nucleotide flows, indicated by the labels on each bump, and the bottom trace shows that background subtracted signals, which makes clear 5 positive incorporation events occurring during these 12 flows. The upper figure traces are dominated by the background pH changes due to the flow of buffers that differ slightly in pH from the wash buffer which separates each nucleotide flow (data not collected during the entire wash cycle, the start of which is indicated by the drop down to 0 at the end of each bump), and the bottom figure shows residual incorporation signals after subtraction of the background mode; here the inferred sequence would be TCAGC based on the observed signal spikes relative to the above flows.
Figure S8. Raw signal traces across the chip. This figure shows a collection of time traces from a single nucleotide flow, from four regions across a 314 chip, superimposed, with traces encoded as red if they come from a well that is empty (no bead) and traces encoded as blue if the well contains a bead. These views highlight the difference in traces that result from the presence of a bead, and the incorporation process. Highlighted are the signal bumps that come from the H+ release of base incorporation. Note that wells with beads generally have a different signal response than wells with beads, and there is overall a smooth background pH ramp up that must be subtracted to isolate the incorporation component of the signal, with the nature of this background and signal varying smoothly across the chip, depending on the properties of the laminar flow and chemical diffusion within the flow cell. In general, processes ramp up fast near the inlet, due to the sharp front in nucleotide and other chemical gradients, whereas by the outlet, diffusion has spread out all chemical gradients resulting in slower signal ramps.
Figure S9. Basecalling Process: Signal normalization and dephasing. In (a), the raw incorporation signals at left are transformed to a fully normalized form at right, which is then input to basecalling based on signal threshold. In (b) is shown a high-level map of this signal normalization process. It relies on an iterative process in which the key signals from the 5-mer adapter sequence at the start of every fragment read are used to set the scale of the signals for the well in question, and then the major parameters that characterize the dephasing of the clonal templates (CF, carry forward of previous flow nucleotides, IE, incomplete extension of templates, Droop, the decay of net signal strength). These parameters are iteratively fit to the data to properly deconvolute the signals from out of phase templates, resulting in the final normalized signals ready for basecalling. Typical magnitudes for these parameters are IE ∼ 1%, CF ∼ 0.5%, and Droop ∼ 0.1%. The very low observed Droop (loss of net signal strength) critically enables long reads.
Figure S10. Overview of the Ion Torrent workflow. The current workflow supports same day sequencing, with library preparation, automation for template bead preparation, followed by sequencing on the PGM or Proton instrument, and automated plug-in analysis on the Torrent server for primary data analysis.
Figure S11. Details of automated template preparation. (A) illustrates the automated in-line emulsion PCR process performed on the OneTouch™ instrument, which taken in library DNA and (1) creates the emulsion of templates with beads and PCR master mix, (2) forces the emulsion through a thermocycling block that executes the required PCR thermal processing program, and (3) performs a centrifugally driven mechano-chemical droplet breaking and bead recovery and washing process, resulted in a tube contained purified, washed beads resulting from the primary PCR reaction. (B) shows the ES™ enrichment station, which is use to selectively capture the templated beads for input into subsequent sequencing. The enrichment is performed as indicated, by binding a biotinylated oligo to the distal template adapter—which would only be present on beads which have undergone successful PCR-based templating—and then capturing these oligos and the conjugated beads vie magnetic streptavidin beads. The templated beads are then released into a collection tube for sequencing. Note future versions of the OneTouch™ will use Peltier-device driven thermocycling instead of the serpentine flow between two fixed temperature reservoirs shown in (A-2), to support more flexible PCR processes. Figure courtesy of Andrea B. Kohn, Tatiana P. Moroz, Jeffrey P. Barnes, Mandy Netherton, and Leonid L. Moroz.
Figure S12. Details of the Informatics pipeline. From the perspective of the informatics workflow, the Torrent Server performs the off-instrument signal processing, basecalling, and alignments to reference genomes. The Torrent Browser provides a dashboard from which to monitor these processes, and the production of basic datafiles (SFF, BAM, FASTQ) for downstream tertiary analysis. This is accomplished either through plug-in programs that perform a variety of diverse analyses, which are either supplied with the system, or available from the plug-in store or via user creation using a developer kit, or in addition, analysis can be performed using various commercial 3rd party software, running either in desktop or cloud based modalities. This includes the Ion Torrent IonReporter™ cloud-based solution, which supports human genome analyses.
Figure S13. Community Software development resources. The Torrent Suite Software Development Kit (SDK) that provides the framework and documentation for developing new plug-in components is available at the sites listed in the panel at left. A virtual Torrent Server machine can be instantiated as a test bed plug-in performance, either at a cloud site, or run locally, using the virtual machines available at the virtualization sites indicated in the panel at the right. Grants of free cloud time on the Amazon Web Services Cloud are available to developers undertaking specific plug-in development projects, details available at http://ioncommunity.iontorrent.com/docs/DOC-2479
Figure S14. Overview of the AmpliSeq™ multi-plex PCR technology. Genomic DNA is input to the custom-designed and manufactured AmpliSeq™ PCR primer pool. For a typical capture target consisting of the coding regions of 10–100's of genes, 2–4 AmpliSeq™ primer pools are created, which each require PCR reactions in separate tubes. A standardized PCR reaction is performed, with cycling parameters set according to the complexity of the primer pool and the source of the DNA (high molecular weight versus FFPE-derived or otherwise degraded). Primers are partially digested to eliminate the sequencing of most primer-related product. Sequencing adapters are ligated onto products, optionally including sample barcode identifiers if multiple samples are to be pooled into one sequencing run. Adapterized fragments are then input into standard template preparation procedures (OneTouch™), for templating onto beads for sequencing. Input DNA requirements are just 10ng per PCR tube reaction (20—40ng total), Individual AmpliSeq™ PCR reactions can target 12-6000 plex in a single tube, and barcoding supports post-PCR pooling of up to 96 samples. PCR is performed in standard thermocyclers.
Figure S15. Specifications of the AmplSeq™ Comprehensive Cancer Panel (CCP). This panel was designed to target the coding sequence of 409 genes known to be mutated in cancer, curated from literature and public databases. It also includes the entire content of the AmpliSeq™ 46 gene Cancer Mutation Hostspot Panel, which is consists of 46 genes with mutation hotspots in their coding sequence, and which are considered to be potentially actionable in diagnosis or treatment. The CCP targets > 16 000 amplicons. These are realized as four AmpliSeq™ primer pools, requiring a 4-tube PCR reaction, and this is embodied in a commercially available kit. When combined with sequencing on a 318 chip, this would typically result in a 250x mean depth of coverage for a single sample.
Figure S16. Performance Summary for the AmpliSeq™ Comprehensive Cancer Panel. This shows performance for a set of 7 samples run on the standard CCP assay, with sequencing on a 318 chip. Note 5 of the samples are from FFPE tumor samples, and two are High Molecular Weight DNA (HMW). Depth shows the mean depth of coverage across the target bases, for a 318 chip run. The mean across all samples is 360 fold depth of coverage. Uniformity shows the fraction of all bases that are coverage at least 20% of the overall mean depth (e.g. for a mean depth of 350x coverage, uniformity is the fraction of bases with at least 70x coverage). The mean across the seven samples is 94% of bases covered at least 20% of the respective mean depths. Specificity is the % of sequenced bases that map to the target region. Because of the PCR based targeting, the on-target percentage is very high, with a mean of 98% across all samples. This reduces waste of sequencing capacity on reading of off-target sequence.
Figure S17. Sequencing performance of the AmpliSeq™ Comprehensive Cancer Panel. Figure shows read length and accuracy properties of the CCP assay. The upper right panel shows the read length distribution, which is strongly related to the sizes of the underlying amplicons. Most reads are in the 90—150 base range, and the median read length is 130 bases, which reflects a short amplicon design target to make the assay well suited to material from FFPE, or otherwise degraded, tumor tissue samples. The lower right panel shows the gross accuracy statistics for reads passing the standard quality filter, with gross mean accuracy of 99.2%. The panel at left shows the Q score distribution of bases out at read length 150 bases, showing the majority of bases are at Q30 predicted quality levels. This dataset is available publically at http://lifetech-it.hosted.jivesoftware.com/docs/DOC-3012
Figure S18. Specifications of the AmplSeq™ Inherited Disease Panel (IDP). This panel was designed to target the coding sequence of 325 genes known to cause the most frequently occurring or high impact Mendelian disorders. The panel was chosen based on frequency of occurrence of the disorder in the US population, and collectively account for disorders impacting over 2% of US births, and accounting for over 95% of all Mendelian disease of known genetic cause in the US. In addition, it includes genes for rare metabolic disorders that are commonly tested in new born screening programs. Collectively, the genes are causal for over 700 disorders listed in the NIH ClinVar database (http://www.ncbi.nlm.nih.gov/clinvar/). The IDP targets > 10 500 amplicons, with an average amplicon size of 197 bases. These are realized as three AmpliSeq™ primer pools, requiring a 3-tube PCR reaction, and this is embodied in a commercially available kit. The figure shows the breakdown of the different classes of disorders, and number of genes.
Figure S19. Performance Summary for the AmpliSeq™ Comprehensive Cancer Panel. This shows performance for a set of 5 samples run on the standard IDP assay, with sequencing on a 316 chip. Depth shows the mean depth of coverage across the target bases, for a 316 chip run. The mean across all samples is 275 fold depth of coverage. Uniformity shows the fraction of all bases that are coverage at least 20% of the overall mean depth (e.g., for a mean depth of coverage of 275x, uniformity is the fraction of bases with at least 55x coverage). The mean across the five samples is 88% of bases covered at least 20% of the respective mean depths. Specificity is the % of sequenced bases that map to the target region. Because of the PCR based targeting, the on-target percentage is very high, with a mean of 98% across all samples. This reduces waste of sequencing capacity on reading of off-target sequence.
Figure S20. Sequencing performance of the AmpliSeq™ Inherited Disease Panel. Figure shows read length and accuracy properties of the IDP assay. The upper right panel shows the read length distribution, which is strongly related to the sizes of the underlying amplicons. Most reads are in the 100—190 base range, and the median read length is 180 bases, which reflects the underlying set of exons being targeted. The lower right panel shows the gross accuracy statistics for reads passing the standard quality filter, with gross mean accuracy of 99.1%, and 99.5% over the first 100 bases. The panel at left shows the Q score distribution of bases out at read length of 150, showing the majority of bases are at Q30 predicted quality levels. This dataset is available publically on the Ion Community site at http://lifetech-it.hosted.jivesoftware.com/docs/DOC-3011.
Figure S21. Performance Properties of AmpliSeq™ Designer Custom Design Tool. AmpliSeq Designer is a web-based tool and portal for creating custom AmpliSeq™ gene panels designs, and ordering the associated primer pools and reagents. The version 1.2 (May 2012) supports the performance specifications shown. Number of pools is the number of AmpliSeq™ physical primer pools required, which implies the corresponding number of tubes for the PCR reaction, and input DNA requirements of 10ng per tube. Typical designs utilize just 2 pools, with 24—3072 primers per pool (i.e. 12 to 1536 amplicons). The total target sequence can be up to 1Mb per design. The Design rate is the typical fraction of a requested target that can be targeted by the automated primer design process, presently > 85% of the desired target is typically capturable by a standard, automated design. Actual experimental capture is then typically ∼95% of this theoretically capturable target. Coverage uniformity is the fraction of bases that are covered at least 20% of the mean coverage, presently > 85%, and on target bases are the fraction of sequenced bases that derive from the target sequence, typically > 80%. Oligo synthesis and primer pool reagent delivery requires approximately 4 weeks.
Figure S22. Ion Torrent development community portal activities and rates of growth. Shown on the left panel, the Ion Community Portal is the reference web site for organizing engagement between Ion Torrent and the user and develop communities, located at http://ioncommunity.lifetechnologies.com. The dashboard at right shows tracking statistics of the major elements of the community activity: The Ion Community pane shows the growth of registered users, and their related discussions, documents, applications and datasets. The Grand Challenges panel shows the total number of participants attempting to solve Grand Challenge problems of making two-fold improvement in sequencing speed, error rate or throughput. The Run RecognitION and Ion Community Datasets panels show the number of user-contributed datasets for 314, 316 and 318 chip runs, and the record user contributed runs in terms of throughput. The Ion Community Developer Tools
Figure S23. Growth of Ion Community Participation. This figure shows the rate of growth of various aspects of the Ion Community. Blue bars indicate status after the first 14 months of the community portal (Nov 2010-Dec 2011). Red bars indicate the current totals (“Today” = July 1, 2012). This illustrates the rapid growth of users and developers of the platform.
Figure S24. Examples of Initiatives underway within the Ion Community: Upper left: The Torrent Circuit is a space within the Ion Community portal where users and developers can share and rate plugins developed by Ion Torrent and other users or developers; all such plugins are freely distributed. Upper Right: Plugin Developer Challenge: an opportunity to win compensatory prizes for developing new plug-ins. Lower right: Grants of free cloud time available to developers proposing specific plug-in development projects. Lower Left: The Ion Torrent Grand Challenges, which offer a US$1 Million prize to anyone who provides a process/software that makes a 2-fold improvement over the current champion results for throughput, accuracy or run time/prep time. More information on Grand Challenges is available through the following sites:
• Register at lifetechnologies/grandchallenges
• Grand Challenge @ ioncommunity is tinyurl.com/6c7hjoc
• Facebook at facebook.com/lifegrandchallenges
• Twitter at twitter.com/#!/Grand_Challenge
• BlogTalk Radio blogtalkradio.com/lifetechnologies
• Ion community, ioncommunity.lifetechnologies.com
Figure S25. Crowdsourcing Initiatives. Crowdsourcing initiatives have been successfully undertaken to improve key elements of the system software. These efforts make use of the TopCoder crowdsourcing provider. The left side shows the Crowdsourcing dashboard at TopCoder for a challenge to improve the speed of the Smith-Waterman local aligner used within the Ion Torrent TMAP aligner. The dashboard shows 75 total competitors, and the current top performers on the challenge metric. The lower box summarizes the challenge results, which were a 5.5-fold speed up of the alignment rate. The figure at right shows the Crowdsourced improvements made in 4 such efforts, the three others summarized in the boxes at lower right, focusing on data compression and accelerating the basecaller code. Blue bars indicate the Ion reference code performance metric, red bars indicate the reductions (in file size, or basecaller run time) achieved by the winning entrants of the TopCoder competitions, which offer $2000–$5000 prizes to the winner.
Figure S26. All Ion Torrent system software and standard plugins are available as open source software, including all source code, residing at the Github software repository site: https://github.com/iontorrent/. This includes the TMAP alignment suite, the TS Torrent Suite plugin and system software, and the Variant Caller plugins and the Torrent Scout tools for exploring signal data from runs. Software releases are pushed to Github quarterly, in sync with the commercial deployment releases of the system software.
Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.