Unit

UNIT 15.3 Genotyping in the Cloud with Crossbow

  1. James Gurtowski1,
  2. Michael C. Schatz1,
  3. Ben Langmead2

Published Online: 1 SEP 2012

DOI: 10.1002/0471250953.bi1503s39

Current Protocols in Bioinformatics

Current Protocols in Bioinformatics

How to Cite

Gurtowski, J., Schatz, M. C. and Langmead, B. 2012. Genotyping in the Cloud with Crossbow. Current Protocols in Bioinformatics. 39:15.3:15.3.1–15.3.15.

Author Information

  1. 1

    Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York

  2. 2

    Department of Computer Science, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland

Publication History

  1. Published Online: 1 SEP 2012

Abstract

Crossbow is a scalable, portable, and automatic cloud computing tool for identifying SNPs from high-coverage, short-read resequencing data. It is built on Apache Hadoop, an implementation of the MapReduce software framework. Hadoop allows Crossbow to distribute read alignment and SNP calling subtasks over a cluster of commodity computers. Two robust tools, Bowtie and SOAPsnp, implement the fundamental alignment and variant calling operations respectively, and have demonstrated capabilities within Crossbow of analyzing approximately one billion short reads per hour on a commodity Hadoop cluster with 320 cores. Through protocol examples, this unit will demonstrate the use of Crossbow for identifying variations in three different operating modes: on a Hadoop cluster, on a single computer, and on the Amazon Elastic MapReduce cloud computing service. Curr. Protoc. Bioinform. 39:15.3.1-15.3.15. © 2012 by John Wiley & Sons, Inc.

Keywords:

  • short reads;
  • read alignment;
  • SNP calling;
  • cloud computing;
  • Hadoop;
  • software package