29. Beyond the Pipelines: Cloud Computing Facilitates Management, Distribution, Security, and Analysis of High-Speed Sequencer Data

  1. Dr. Matthias Harbers3,4 and
  2. Prof. Dr. Günter Kahl5,6,7
  1. Boris Umylny1 and
  2. Richard S. J. Weisburd2

Published Online: 23 JAN 2012

DOI: 10.1002/9783527644582.ch29

Tag-Based Next Generation Sequencing

Tag-Based Next Generation Sequencing

How to Cite

Umylny, B. and Weisburd, R. S. J. (2011) Beyond the Pipelines: Cloud Computing Facilitates Management, Distribution, Security, and Analysis of High-Speed Sequencer Data, in Tag-Based Next Generation Sequencing (eds M. Harbers and G. Kahl), Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, Germany. doi: 10.1002/9783527644582.ch29

Editor Information

  1. 3

    4-2-6 Nishihara, Kashiwa-Shi, Chiba 277-0885, Japan

  2. 4

    DNAFORM Inc., Leading Venture Plaza 2, 75-1 Ono-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0046, Japan

  3. 5

    Mohrmühlgasse 3, 63500 Seligenstadt, Germany

  4. 6

    University of Frankfurt am Main Biocenter, Max-von-Lauestraße 9, 60439 Frankfurt am Main, Germany

  5. 7

    Frankfurt Biotechnology Innovation Center (FIZ), GenXPro Ltd, Altenhöferallee 3, 60438 Frankfurt am Main, Germany

Author Information

  1. 1

    Japan Bioinformatics KK, Yoyogiekimae Building 401, 1-36-6 Yoyogi, Shibuya-ku, Tokyo 151-0053, Japan

  2. 2

    ELSS Inc., 2504-3 Saiki, Tsukuba, Ibaraki 305-0028, Japan

Publication History

  1. Published Online: 23 JAN 2012
  2. Published Print: 14 DEC 2011

ISBN Information

Print ISBN: 9783527328192

Online ISBN: 9783527644582

SEARCH

Keywords:

  • cloud computing facilitates management;
  • high-speed sequencer data;
  • data management;
  • distribution;
  • analysis;
  • security

Summary

With the increasing quantity and complexities of molecular data, bioinformatics has gained increasing prominence and importance. Substantial resources have been invested in bioinformatic research and substantial gains have been realized, particularly in three areas:

data analysis algorithms,

data repositories, and

data visualization.

Tools to analyze various types of biological data have been developed, described, and distributed; giant public repositories have been set up; and graphical visualization tools can be used either directly from public servers or downloaded locally to the user's computer. Until now, these have proved to be sufficient. However, as new technologies, in particular high-speed sequencers (HSSs), greatly increase quantities of molecular data, existing approaches to handle these three areas, while vital, are proving insufficient. In this chapter, we will describe additional tools and techniques needed to effectively handle the giant datasets that are already being generated, how these datasets should be managed in the modern distributed research environment, and, perhaps most importantly, how the complexity of handling these datasets could be reduced through effective integration of new and existing tools, techniques, and algorithms. We expect that this chapter will be most useful to individuals tasked with establishing computational environments needed to support research utilizing HSSs and other high-throughput devices capable of generating large quantities of molecular data. This chapter is intended to provide sufficient detail to enable the development of appropriate requests for proposals as well as to assist in evaluating competing proposals. For organizations looking to develop an HSS support system internally, this chapter should be very useful in helping them develop appropriate requirements and high-level design specifications.