2. DeepCAGE: Genome-Wide Mapping of Transcription Start Sites

  1. Dr. Matthias Harbers3,4 and
  2. Prof. Dr. Günter Kahl5,6,7
  1. Dr. Matthias Harbers3,4,
  2. Mitchell S. Dushay1 and
  3. Piero Carninci2

Published Online: 23 JAN 2012

DOI: 10.1002/9783527644582.ch2

Tag-Based Next Generation Sequencing

Tag-Based Next Generation Sequencing

How to Cite

Harbers, M., Dushay, M. S. and Carninci, P. (2011) DeepCAGE: Genome-Wide Mapping of Transcription Start Sites, in Tag-Based Next Generation Sequencing (eds M. Harbers and G. Kahl), Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, Germany. doi: 10.1002/9783527644582.ch2

Editor Information

  1. 3

    4-2-6 Nishihara, Kashiwa-Shi, Chiba 277-0885, Japan

  2. 4

    DNAFORM Inc., Leading Venture Plaza 2, 75-1 Ono-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0046, Japan

  3. 5

    Mohrmühlgasse 3, 63500 Seligenstadt, Germany

  4. 6

    University of Frankfurt am Main Biocenter, Max-von-Lauestraße 9, 60439 Frankfurt am Main, Germany

  5. 7

    Frankfurt Biotechnology Innovation Center (FIZ), GenXPro Ltd, Altenhöferallee 3, 60438 Frankfurt am Main, Germany

Author Information

  1. 1

    Illinois Institute of Technology, Division of Biology, Life Sciences Building, 3101 South Dearborn Street, Chicago, IL 60616, USA

  2. 2

    RIKEN Yokohama Institute, Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan

  3. 3

    4-2-6 Nishihara, Kashiwa-Shi, Chiba 277-0885, Japan

  4. 4

    DNAFORM Inc., Leading Venture Plaza 2, 75-1 Ono-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0046, Japan

Publication History

  1. Published Online: 23 JAN 2012
  2. Published Print: 14 DEC 2011

ISBN Information

Print ISBN: 9783527328192

Online ISBN: 9783527644582



  • DeepCAGE;
  • genome-wide mapping;
  • transcription start sites;
  • methods;
  • applications


Gene expression is tightly controlled by regulatory elements within promoter regions. Genome-wide promoter identification and measurement of promoter activities is of essential importance for understanding gene expression and its regulation in a biological context. Cap analysis gene expression (CAGE) is a method for the isolation of short sequencing tags from the 5′ end of mRNA transcripts that are sequenced at high throughput by next-generation sequencing methods. Mapping back the short sequencing tags to a reference genome allows for reliable identification of transcription start sites (TSS) on a genome-wide scale. Hence, CAGE can be used for promoter and transcript identification, where the number of CAGE tags found per TSS is a quantitative measure of transcription from each site. CAGE makes use of the 5′-end-specific cap structure in eukaryotic mRNA. During cap selection by cap trapping, the cap structure is selectively biotinylated and the biotinylated RNA/cDNA hybrids are enriched on streptavidin-coated beats. Due to the high selectivity of the cap trapper step, CAGE libraries can be directly prepared from total RNA and do not require any mRNA purification. Moreover, cDNA synthesis is driven by random primers for monitoring even nonpolyadenylated mRNAs commonly not detected by other methods relying on oligo(dT) priming or mRNA purification. Here, we provide the latest version of our DeepCAGE protocol preparing CAGE tags for direct sequencing on an Illumina sequencing platform. Compared to the original CAGE protocol, we now use EcoP15I to obtain longer tags of 27 bp, we omitted the concatenation step needed for capillary sequencing, we introduced barcoding for multiplex sequencing, and we simplified purification steps during library preparation to allow for high-throughput library production. The process further reduced RNA requirements, where the new protocol now starts from only 5 µg of total RNA. In recent years, CAGE has been the underlying method for many promoter and gene network projects, including work for the NIH ENCODE and the RIKEN FANTOM projects. We believe that our new protocol will make CAGE very attractive to a large number of researchers.