Abstract
- Top of page
- Abstract
- INTRODUCTION
- RESULTS AND DISCUSSION
- CONCLUSIONS
- EXPERIMENTAL PROCEDURES
- Acknowledgements
- REFERENCES
- Supporting Information
Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc.
INTRODUCTION
- Top of page
- Abstract
- INTRODUCTION
- RESULTS AND DISCUSSION
- CONCLUSIONS
- EXPERIMENTAL PROCEDURES
- Acknowledgements
- REFERENCES
- Supporting Information
Understanding the mechanisms of dynamic gene expression remains a major goal of developmental biology. Previous studies have shown that many of the different spatial-temporal aspects of gene regulation are controlled by multiple, functionally independent cis-regulatory modules or enhancers (review by Bulger and Groudine, 2011). These studies have also identified several key characteristics of enhancers including their ability to act at some distance from the genes that they regulate, their positional independence relative to transcription direction of the regulated gene, and their ability to function from within transcribed sequences (reviewed by Davidson, 2001). Functional analysis of in vivo characterized enhancers has also revealed that they typically span 300 to 2,000 bp and contain clusters of DNA-binding sites for sequence-specific DNA-binding transcription factors (reviewed by Alonso et al., 2009). More recent studies indicate that some enhancers are regulated by chromatin DNA modifications and/or alterations in higher-order chromatin structure (reviewed by Suganuma and Workman, 2011).
The availability of genomic sequences from evolutionarily related species allows for the comparison of orthologous DNAs, via phylogenetic footprinting, to identify functionally important conserved sequences within enhancers (reviewed by Visel et al., 2007; King et al., 2007; Meireles-Filho and Stark, 2009; Alonso et al., 2009). The conserved enhancer sequence complexity suggests that they integrate multiple regulatory inputs via different sequence-specific DNA-binding factors (Kuo et al., 1998; Berman et al., 2004, Brody et al., 2007). One of the hallmarks of developmental enhancers is the presence of repeated DNA-binding sites for essential transcription factors (Small et al., 1992; Davidson, 1999, Berman et al., 2002, 2004; Gaul, 2010). For example, multiple conserved DNA-binding sites for Hunchback have been identified within Drosophila segmentation enhancers (Papatsenko et al., 2009), multiple bHLH DNA-binding sites are found within neural precursor cell enhancers (Brody et al., 2007; Kuzin et al, 2009), and similarly for Runt-, Ets-, and Smad-responsive enhancers in mammals (Bowers et al., 2010; Babayeva et al., 2010; Nakahiro et al., 2010). Studies have also shown that altering the copy number of transcription factor docking sites by adding or deleting multi-copy sequence motifs can alter enhancer behavior. This suggests that such repeat motifs are not necessarily redundant but each conserved copy may have an integral role in enhancer function (Kuzin et al., 2011). In addition, studies on sequentially arrayed or clustered Drosophila enhancers have shown that individual enhancers are flanked by sequences referred to as spacers (Small et al., 1993). Comparative genome analysis of spacer regions, termed here inter-clustal regions (ICRs), reveals that they exhibit a higher level of interspecies sequence length variability than do the less-conserved sequences within enhancer-conserved sequence clusters (CSCs) (Kuzin et al., 2009), thus providing a useful method for delimiting the boundaries of enhancers.
Our previous work has described EvoPrinter, a phylogenetic footprinting tool for discovering conserved sequences that are shared among orthologous DNAs (Odenwald et al., 2005; Yavatkar et al., 2008). The output of EvoPrinter, an evolutionary gene print or EvoPrint, portrays in a single readout the conserved DNA within a species of interest, thus highlighting conservation in a continuous gap-free sequence that facilitates the further comparative analysis of enhancer sub-structural organization as well as the discovery of novel enhancers (see below). We have also developed a set of integrated alignment algorithms, collectively known as cis-Decoder, that identify multi-copy and unique elements within CSCs that are shared with other CSCs (Brody et al., 2007, 2008).
To increase our understanding of enhancer sub-structure and to identify families of functionally related enhancers via comparative analysis, we constructed a web-accessible genome-wide database of Drosophila CSCs that includes, in addition, CSCs within most in vivo characterized enhancers. Also described are additional cis-Decoder search algorithms that facilitate the discovery of database CSCs related to any input enhancer. Once the user inputs an EvoPrinted enhancer, cis-Decoder algorithms scan the database to detect structurally related CSCs using a three-step protocol: the initial search identifies database CSCs that share conserved multi-copy elements with the input sequence; the program then identifies unique elements shared between the input enhancer and database CSCs; and finally the copy number of shared elements is evaluated to generate ranked similarity scores that relate the input enhancer to the database CSCs. To demonstrate the efficacy of this approach, which makes no assumptions about the function of individual sequence elements, we have utilized an enhancer of castor (cas), a late temporal neuroblast (NB) determinant (Mellerick et al., 1992; Cui and Doe, 1992; Kambadur et al., 1998), to identify previously uncharacterized late NB enhancers. We also show how cis-Decoder searches can identify multiple previously characterized cellular gap enhancers based on their shared sequence motifs and also identify shared overlapping transcription factor–binding sites.
Our comparative analysis of enhancers also reveals that there is no single combination of DNA-binding sites of known regulators or novel conserved sequence elements that can accurately predict enhancer regulatory behavior. However, enhancers that have a balance in copy number of shared sequence elements are more likely to exhibit similar regulatory activities. Although enhancers with similar regulatory behaviors share both multi-copy sequence motifs and unique conserved sequence elements that are balanced in copy number, arrangement of these shared elements differs between enhancers. Our studies also demonstrate that many enhancers are multifunctional; they regulate gene expression during different temporal phases of development. No other comparative alignment program allows for the user to generate an inventory of conserved repeat and unique sequences that are shared between CSCs, an essential step in the analysis of their structure. Since the database includes most of the genomic repertoire of CSCs, these tools should serve to help in the further analysis of other novel functionally important sequences and in the discovery of enhancers that drive gene expression during any developmental process or biological event. To our knowledge, this is the first systematic catalog of conserved DNA sequences within any phylogenetic group.
CONCLUSIONS
- Top of page
- Abstract
- INTRODUCTION
- RESULTS AND DISCUSSION
- CONCLUSIONS
- EXPERIMENTAL PROCEDURES
- Acknowledgements
- REFERENCES
- Supporting Information
We have generated a Drosophila genome-wide database of evolutionarily conserved DNA sequences that allows for discovery of functionally related enhancers. A cis-Decoder search identifies database CSCs that share balanced conserved sequence elements with an input enhancer. No prior information about the functional significance of DNA sequences within enhancers is required to identify other related enhancers. The database provides an inventory of conserved repeat sequences within CSCs and enables comparison between input and database CSCs by various metrics that allow the user to judge CSC similarity. Starting with a temporally restricted NB enhancer, we have shown that cis-Decoder can successfully identify other similarly regulating enhancers, and we also demonstrate how other functionally distinct enhancer families can be identified.
Our comparative analysis of enhancers described in this report and an additional 60 enhancers, have yielded the following observations considering enhancer structure and behavior: (1) Functionally related enhancers can be identified based on their balanced copy numbers of shared conserved repeat elements. (2) Enhancers that have extensive shared conserved sequence elements (often >60%), but do not have balanced shared repeat copy numbers, may display significantly different regulatory behaviors. (3) Shared repeat and unique elements between functionally related enhancers are not found in any fixed order or orientation. (4) Similarly regulating families of enhancers need not share specific sets of conserved sequence elements, since different enhancers can accomplish the same regulatory behavior with different but overlapping sets of conserved elements. (5) Enhancers that share conserved repeat elements and perform related cis-regulatory functions also contain unique sets of repeat elements that are only partially shared with other related enhancers.
Our observations have revealed that Drosophila CNS developmental enhancers are highly complex, based on their conserved sequence composition, and many have proven to be multifunctional. The observed complexity of enhancers, specifically with regard to multi-copy repeat motifs, also suggests that enhancer function is realized through a complex process involving combinatorial interactions among many factors and cannot be easily explained by single activator/repressor transcription factor switches. In addition, the fact that functionally diverse enhancers can display such extensive overlap in their conserved sequences underscores the combinatorial complexity of cis-regulation (also see Southall and Brand, 2009). Because of the lack of fixed order and orientation of shared elements between related enhancers, only the alignment flexibility of the cis-Decoder CSB aligner can rapidly detect the extent and makeup of shared conserved sequences between different enhancers. Until now, enhancer boundaries have, for the most part, been resolved by reporter transgene deletion analysis. The addition of evolutionary clustering of conserved sequences to this identification process will aid in enhancer identification and allow for an assessment of their structure and spatial constraints. cis-Decoder algorithms also allow one to generate libraries of conserved sequence elements that are shared among enhancers; this dataset will be useful for understanding the combinatorial complexity of tissue-specific gene regulation.