Drs. Gómez-Skarmeta and Lenhard contributed equally to this work.
Reviews–A Peer Reviewed Forum
New technologies, new findings, and new concepts in the study of vertebrate cis-regulatory sequences
Article first published online: 4 JAN 2006
Copyright © 2006 Wiley-Liss, Inc.
Special Issue: Campos-Ortega Special Focus
Volume 235, Issue 4, pages 870–885, April 2006
How to Cite
Gómez-Skarmeta, J. L., Lenhard, B. and Becker, T. S. (2006), New technologies, new findings, and new concepts in the study of vertebrate cis-regulatory sequences. Dev. Dyn., 235: 870–885. doi: 10.1002/dvdy.20659
- Issue published online: 10 MAR 2006
- Article first published online: 4 JAN 2006
- Manuscript Accepted: 7 NOV 2005
- Research Council of Norway
- Sars Centre
- University of Bergen
- European Commission. Grant Number: LSHG-CT-2003-503469
- Spanish Ministry of Education and Science. Grant Number: BFU2004-00310
- Junta de Andalucía
- Pharmacia Corporation
- Swedish National Research Council
- enhancer detection;
- phylogenetic footprinting;
- gene desert;
- human genome;
- highly conserved noncoding regions;
- hox cluster
All vertebrates share a similar early embryonic body plan and use the same regulatory genes for their development. The availability of numerous sequenced vertebrate genomes and significant advances in bioinformatics have resulted in the finding that the genomic regions of many of these developmental regulatory genes also contain highly conserved noncoding sequence. In silico discovery of conserved noncoding regions and of transcription factor binding sites as well as the development of methods for high throughput transgenesis in Xenopus and zebrafish are dramatically increasing the speed with which regulatory elements can be discovered, characterized, and tested in the context of whole live embryos. We review here some of the recent technological developments that will likely lead to a surge in research on how vertebrate genomes encode regulation of transcriptional activity, how regulatory sequences constrain genomic architecture, and ultimately how vertebrate form has evolved. Developmental Dynamics 235:870–885, 2006. © 2006 Wiley-Liss, Inc.
The control of gene expression during embryonic development is a key area of research in understanding the evolution of animal diversity (Davidson,2001). A large part of our knowledge of how metazoan gene regulation is achieved at the transcriptional level comes from extensive and elegant experimentation in Drosophila and the sea urchin, where the aim of inquiry has reached the resolution of entire gene regulatory networks (Levine and Tjian,2003; Levine and Davidson,2005). Despite several significant studies in mouse and frog (e.g., Ghislain et al.,2003; Koide et al.,2005), vertebrates are still far below this level of sophistication. This difference is to a large part due to their greater genome size and complexity and to the fact that cis-regulatory sequences can be very far from their corresponding transcriptional unit(s). While traditional ways to discover cis-regulatory sequences relied on experimental approaches where a sequence suspected to contain regulatory activity was placed in the context of a basal promoter driving a reporter gene, recently there have also been substantial efforts using the comparison of multiple genome sequences to search for noncoding, but nevertheless conserved, sequences. For instance, the ENCODE project has as its aim the identification of all functional elements in the human genome and represents an international, multidisciplinary endeavor led by the National Human Genome Research Institute (The ENCODE Consortium, 2004). Apart from exonic sequence, which comprises approximately 3.7% of the human genome, there are an additional 1–2% single copy conserved nongenic sequences recognizable by human-mouse comparisons for the vast majority of which a function has yet to be assigned (Dermitzakis et al.,2005). In publications that have appeared over the past year, these conserved sequences have been extracted with different conservation criteria with respect to sequence level conservation and evolutionary depth. Depending on the publication and, to a lesser extent, on the conservation level, they have been referred to as ultraconserved regions (UCR; Bejerano et al.,2004; Sandelin et al.,2004b), conserved noncoding elements (CNE; Woolfe et al.,2005), conserved nongenic sequences (CNG; Dermitzakis et al.,2005), or highly conserved noncoding regions (HCNRs; de la Calle-Mustienes et al.,2005). Also, Bejerano et al. (2004) extracted 481 elements with perfect conservation over 200 or more base pairs between human and mouse, whereas Sandelin et al. (2004b) identified 3,583 segments of 50 or more bp with >95% conservation between human and mouse, which had detectable similarity in fugu and had no evidence of being transcribed into mature RNAs. The latter, less stringent, but evolutionary deeper selection revealed more clearly genome-level organization of these conserved sequences into arrays that span the loci of their presumptive target genes (Fig. 1). Woolfe et al. (2005) chose non-RNA–overlapping sequences of 100 bp or more that align between human and fugu with Megablast, resulting in 1,373 unique segments. Here, we will denominate these regions as HCNRs, except when we specifically refer to those extremely conserved, as defined previously (Bejerano et al.,2004), that will be denominated UCRs.
In transgenic assays in mice, frogs, and zebrafish, several of these regions have been shown to act as enhancers (Muller et al.,2002; Nobrega et al.,2003; de la Calle-Mustienes et al.,2005; Goode et al.,2005; Poulin et al.,2005; Woolfe et al.,2005). Often these regions are scattered over large distances in vertebrate genomes, but cluster in regions of low gene density around genes encoding transcription factors or, more rarely, signaling molecules, such as growth factors, or their receptors and binding partners (Sandelin et al.,2004b; Woolfe et al.,2005). Moreover, genomic areas of the human genome containing these conserved regions and the associated developmental genes show significant synteny, not only to rodents but all the way to teleosts (Woods et al.,2000; Goode et al.,2005). However, not all regulatory sequences are highly conserved, but their activity can be detected, at least in teleost models, by random insertion of reporter vectors (Ellingsen et al.,2005). We will argue here that it is this type of nongenic sequences, including equivalent elements with lower conservation, that often impose conservation of genomic architecture on all vertebrates. The systematic discovery and characterization of cis-regulatory sequences are the subjects of this study, and we review here recent methodologies that we believe will accelerate this process.
Computational Approaches to Characterization of cis-Regulatory Elements
The discovery of developmental regulatory elements in silico can be approached on multiple levels. The fundamental problems in genome- or even gene-locus wide searches for regulatory elements are (1) the right choice of genomic region around the target gene, and (2) locating functional regulatory elements in them. The choice of genomic region depends on the knowledge of precise location of the gene within it. One of the first prerequisites for the assignment and annotation of regulatory elements is a complete and reliable annotation of the genome with the positions of full-size transcripts. This is no longer a problem for human and mouse genes, where large-scale transcriptome sequencing efforts have resulted in a high coverage of transcript sequences from which it is possible to deduce positions of 5′ ends as well as different alternative transcripts. For some other model organisms (e.g., fish or insect species with sequenced genomes), the transcriptome data are much less complete. For fish genomes, it is not straightforward to remap complete transcripts of other vertebrates. Further large-scale transcriptome analysis efforts might amend this soon.
In Caenorhabditis elegans and Drosophila, as well as most of the tissue-specific genes of vertebrates, the first place to look for regulatory elements has been in proximal regions just upstream of transcription start sites, although enhancer elements in introns or downstream of a transcriptional unit are not uncommon, especially in the fly. For a comprehensive overview of Drosophila and C. elegans noncoding sequence and genome architecture, see Nelson et al. (2004). Precise localization of transcription start sites is important for the accurate determination of core and proximal promoter elements; for the detection of distal enhancers, they are of lesser importance. The genomic organization of conserved elements around vertebrate developmental genes (Sandelin et al.,2004b) reveals that they do not show preference for locations at 5′ ends of genes but span the entire gene locus and are found 5′ and 3′ to the gene and in its introns (Fig. 1). In addition, the type and composition of the core promoter is likely related to its responsiveness to long-range enhancers (see below).
Sequence Motif Detection
The ultimate goal of in silico searches is to determine the positions of binding sites for transcription factors (or other types sequence elements) that regulate the expression pattern of gene(s) of interest. The simplest approach to search for individual transcription factor binding sites is by using known motif models. This strategy is computationally straightforward to implement, but relies critically on the availability and quality of models of binding sites. The binding sites can be represented as consensus strings or regular expressions, or as more informative position-specific score matrices (PSSM) (Stormo,2000): an example of different representations of the mammalian carbohydrate response element is given in Figure 2. For an extensive review of transcription factor binding site detection methods and associated tools, see Wasserman and Sandelin (2004).
Using known motif models to scan for putative transcription factor binding sites works reasonably well for short individual regulatory regions in which one more or less knows what to look for. In the absence of this information, exploratory searchers for transcription factor binding sites have been plagued by low signal to noise ratio (Wasserman and Sandelin,2004) and the large size of noncoding regions, which can potentially harbor regulatory determinants in higher eukaryotes (Bondarenko et al.,2003). Such low specificities of individual eukaryotic transcription factors preclude their application to genome-wide analysis (Stormo,2000). The only available biologically meaningful transcription factor binding site data comes from still relatively rare experimental analysis of inferred regulatory regions (e.g., Koide et al.,2005; von Bubnoff et al.,2005). However, apart from the fact that many vertebrate genes have a discernible 5′ proximal promoter region with a TATA-box, a CAAT-box, and a GC-box (Bucher,1990)—which is by no means universal (Smale and Kadonaga,2003)—the information on other promoter regulatory elements has been scarce and far between (Fickett and Wasserman,2000). This was exacerbated by evidence that the results obtained in experimental systems for studying regulatory elements such as in vitro binding essays (Tronche et al.,1997) and sometimes even in vivo reporter gene constructs, do not always correlate well with the expression pattern of the studied regulatory genes in their original genomic context (e.g., Woolfe et al.,2005). Because these studies often use heterologous promoters, it is possible that not all proximal promoters interact with all enhancers in the same way.
De Novo Sequence Motif Discovery
If one wants to search for novel transcription factor binding sites in silico, the prerequisite is a collection of regulatory sequences that are thought to contain one or more types of binding sites in common, either by virtue of sharing a common regulatory mechanism or by physically binding the same protein in an experimental system. For example, a de novo motif discovery on regulatory elements of genes up-regulated in yeast amino acid starvation response will reveal a common motif that binds the GCN4 protein (Fig. 3), a known regulator of the response (Natarajan et al.,2001). Novel motifs are detected by one of many motif discovery algorithms. The most well-known ones were compared and benchmarked in (Tompa et al.,2005), and a newer method is presented in (Down and Hubbard,2005). Motif discovery algorithms do not rely on pre-existing models and can in principle find novel, previously unknown motifs. However, the algorithms are typically computationally intensive and prone to “motif drowning” if the sequences are too long compared with the motif strength and density.
Modeling and Discovery of cis-Regulatory Modules
Some progress was made upon observation that some tissue-specific genes are regulated by cis-regulatory modules (CRM). A CRM is defined as a cluster of transcription factor binding sites of related physiological function spanning a short region in the neighborhood or an intron of the target gene. CRMs enabled the building of the first successful predictive models for tissue-specificity of mammalian genes (Wasserman and Fickett,1998; Krivan and Wasserman,2001), followed by even simpler models for the developmental genes of Drosophila (Berman et al.,2002; Rajewsky et al.,2002). All these approaches relied on a previous knowledge of the transcription factor binding site composition of several experimentally characterized CRMs, which served to construct the predictive models. However, it is believed that many other modes of context-specific expression are controlled by CRMs, but experimental evidence of the actual binding sites that define them is unavailable. In such cases, the computational challenge is to predict similar clusters of transcription factor binding site that a set of sequences has in common. Several recently developed tools such as MSCAN (Alkema et al.,2004) and EMCMODULE (Gupta and Liu,2005) are able to perform this “CRM discovery”, i.e., detect clusters of co-occurring transcription factor sites without previous biological knowledge.
A major advance was to use cross-species comparisons in search of regulatory sequences (Wasserman et al.,2000). The idea behind phylogenetic footprinting is that functionally significant parts of the genomic sequences evolve more slowly than their nonfunctional neighborhood. Given two or more orthologous noncoding sequences, their alignment and localization of the most conserved regions will narrow down the range in which to search for active regulatory elements (Fig. 4). Phylogenetic footprinting itself is not a motif discovery tool but a method to reduce the sequence search space in a biologically meaningful manner. It is commonly used in combination with motif detection and discovery, where it has been shown to increase signal to noise ratio in transcription binding site detection by an order of magnitude (see Fig. 5, for an example), at the expense of only a modest decrease in sensitivity (Lenhard et al.,2003). Phylogenetic filtering has been combined successfully with other detection methods such as CRM detection (Krivan and Wasserman,2001) and de novo motif discovery (Blanchette and Tompa,2002), resulting in significant increase in the specificity of predictions. Indeed, as shown for some cases below, phylogenetic footprinting has been extremely useful to select candidate regions that were functionally tested by transgenic assays (e.g., Muller et al.,2002; Nobrega et al.,2003; de la Calle-Mustienes et al.,2005; Goode et al.,2005; Poulin et al.,2005; Woolfe et al.,2005).
Comparison across multiple species can further increase signal to noise ratio in phylogenetic footprinting, enabling true genome-wide detection and discovery of regulatory motifs in yeast and mammalian genomes (Thomas et al.,2003; Harbison et al.,2004; Xie et al.,2005). It must be noted, however, that a minority of functional transcriptional factor binding sites do not appear to be conserved across evolutionary distances commonly used in phylogenetic footprinting (some examples are given in Moses et al.,2004). Many of them are still conserved across more closely related species, but there the background divergence is often too high to identify those elements reliably against nonfunctional neighborhood. When orthologous sequences from multiple closely related species are available, an alternative approach is to superimpose the (relatively rare) differences across multiple sequences to identify regions that mutate more rapidly and, therefore, have no selective pressure acting on them. This complementary approach is known as phylogenetic shadowing, and it has been applied for the identification of primate-specific cis-regulatory elements (Boffelli et al.,2003) and the identification of pre-microRNA genes (Berezikov et al.,2005). Another efficient approach to identify evolutionarily conserved regions not detectable by simple pairwise sequence comparisons is the targeted sequencing of defined genomic regions in diverse vertebrate species, as was done for a genomic fragment of human chromosome 7, including the gene mutated in cystic fibrosis (Thomas et al.,2003). The disadvantage of these methods is that the sequences of a relatively large number of species is required for effective filtering, and those sequences are at present not available for most genes.
Methods for Detection of Transcription Factor Binding Sites Shared by Coregulated Sets of Genes
The detected and discovered putative transcription factor binding sites are of potentially great utility in detecting elements common to long-range enhancers if combined with suitable descriptors of their spatiotemporal expression patterns. The simplest reliable descriptor that is in wide use is microarray profiling data. For tracking the expression of developmental genes and their enhancers during embryonic development, a more structured ontology data might be preferred, broken into spatial (anatomical) and temporal (developmental stage) categories. Also, high throughput in situ hybridizations, as exemplified in the zebrafish (Thisse et al.,2001; see www.zfin.org) in combination with the Sanger zebrafish assembly will allow detection of coregulated genes. Once we have sets of reliable coregulated genes, we are still faced with the choice of whether we want to look for novel motifs (i.e., pattern discovery) or just detect overrepresented motifs matching known genes. The latter is impaired by the large number of motifs we test for overrepresentations, where we encounter the standard statistical problem of multiple testing, correction for which can seriously decrease computed confidence levels for motif overrepresentation. One approach that works well for exploratory purposes is to use a limited number of familial binding site models (Sandelin and Wasserman,2004), which are generalized position-specific score matrices which capture core DNA binding properties of whole structural classes of transcription factors. Among the tools available for detecting transcription factor binding sites or predefined combinations thereof in a set of sequences, recent ones such as oPOSSUM (Ho Sui et al.,2005) and SynoR (Ovcharenko and Nobrega,2005) were successfully used to detect overrepresented binding sites common to sets of coregulated vertebrate genes from previously published expression profiling studies. A further powerful technique to identify transcription factor binding sites in situ, termed serial analysis of chromatin occupancy (Impey et al.,2004), complements the in silico approaches, but this as well as related approaches are beyond the scope of this study.
Extracting Conserved Noncoding Regions and Determination of Their Genomic Organization
The aforementioned highly conserved elements implicated in developmental regulation can be detected solely on the basis of their conservation across different vertebrate genomes (Bejerano et al.,2004; Sandelin et al.,2004b; Woolfe et al.,2005), and suitably chosen conservation criteria can also reveal their overall genomic organization (Sandelin et al.,2004b). There are several methods for extracting conserved noncoding regions in model organisms (see, e.g., Loots and Ovcharenko,2005); for more advanced scenarios, the reader is referred to two recently published protocols with review of available methods (Bejerano et al.,2005; Papatsenko and Levine,2005). If the analysis is performed on the genomic sequence of an organism with available genome assembly, one is best advised to manipulate all detected regions as genomic coordinates. The UCSC Genome Browser (Karolchik et al.,2003; Web site: http://genome.ucsc.edu) recently has become equipped with a batch conversion tool to remap the coordinates between different versions of the genome assemblies, as well as between closely related species. For HCNRs remapping functions well across the genomes of different mammals.
Pattern Composition of Highly Conserved Noncoding Regions
While the conservation levels of HCNRs cannot be explained solely by the selective pressure to retain transcription factor binding specificity, their sequence does have distinguishable properties. HCNRs are more AT-rich than average noncoding sequence and many have a high overrepresentation of motifs resembling homeobox binding sites and some other motifs implicated in embryonic development and differentiation (Sheng and Lenhard, manuscript in preparation). Based on that and their demonstrated enhancer activity, it is likely that many HCNRs serve as regulatory input sites that harbor clusters of binding sites that are bound by different transcription factors in a combinatorial manner to produce complex spatiotemporal expression patterns. Therefore, one of the immediate goals of the analysis of these elements will be to determine those binding sites and the transcription factors that bind them (Vavouri and Elgar,2005). The computational approaches described here will play an essential role in that endeavor.
Transgenic Tools for Testing of cis-Regulatory Sequences
Once potential cis-regulatory elements are identified, they have to be verified by experiment, and this is usually done by placing the sequence into a reporter construct that is then used for transfection into tissue culture cells or to test for expression in embryos, either as transient assays or in stable transgenes. Although much has been learned about enhancers and responsive elements in cell culture, gene transfer into embryos will ultimately be the only way to resolve tissue specific spatiotemporal expression in the context of a whole developing organism. Of concern here are also the resolution in space and time in which gene expression can be observed. The vertebrate system most often used to date in which the cis-regulatory activity of candidate sequences has been tested is the mouse. The reason for this choice is that mice are mammals, are experimentally amenable, and are relatively close to humans from an evolutionary point of view. Additionally there are established, reliable methods for transgenesis as well as the possibility of generating site-specific deletions by homologous recombination. However, manipulating the mouse genome is labor intensive and costly, and observation of embryonic development cannot be done outside of the mother. Frog and zebrafish/Medaka develop externally and are relatively easy to manipulate. The fish systems have the additional advantage of embryo transparency and rapid development and lend themselves to live whole embryo imaging using fluorescent protein markers (Megason and Fraser,2003), thus allowing testing of enhancer activity at single cell resolution. Electroporation into chick embryos also has been used to test for potential enhancers, a method that is rapid although limited largely to genes expressed in the neural tube (Ghislain et al.,2003; Uchikawa et al.,2003). Although targeted deletions are not yet possible in either frogs or fish, several technological developments suggest that both of these systems can be put to use to complement the mouse in the quest to understand vertebrate cis-regulation. Efficient transgenesis protocols have been established for frogs (Amaya and Kroll,1999), where sperm can be made transgenic and used to fertilize oocytes giving rise to a transgenic embryo in which expression of a reporter, usually green fluorescent protein (GFP) can be readily observed in real-time (e.g., Hartley et al.,2001; Latinkic et al.,2004; Martynova et al.,2004; de la Calle-Mustienes et al.,2005; Khokha and Loots,2005). This is a powerful system, which allows testing a putative sequence in a matter of days rather than the months it would take using the mouse. Although this methodology is not available in the fish model systems, there are several methods for high throughput transgenesis (reviewed in Amsterdam and Becker,2005). The embryos of these species are transparent and allow observation of GFP within the embryo. Testing of reporter constructs can be done transiently, that is, in a nontransgenic fish embryo that was injected and in which reporter expression is observed in a mosaic manner. In a pioneering study, this technique was used to identify enhancer elements in the sonic hedgehog locus (Muller et al.,1999). Thereafter, it has been used for in enhancer analysis in zebrafish extensively (reviewed in Muller et al.,2002; see also Dickmeis et al.,2004). Recently, this strategy has been used to test several elements conserved between human and fish (e.g., de la Calle-Mustienes et al.,2005; Goode et al.,2005; Woolfe et al.,2005). Although this approach is rapid, a drawback is that expression is not observed in all cells that would activate reporter expression in a stable transgene; hence, expression patterns have to be compiled from many injected embryos. In addition, observation of expression patterns over several days becomes extremely tedious. The alternative is the generation of transgenic fish, which recently has become a straightforward and efficient affair, mainly by using plasmid DNA, retroviruses or transposons, which integrate into the fish germ line (reviewed in Shafizadeh et al.,2002; Kawakami,2004; Amsterdam and Becker,2005). Of help might also be the use of the tol2 transposon system, which promotes high frequency of integration of DNA presumably at very early stages and promotes more consistent patterns of enhancer activity in injected zebrafish embryos (Kawakami,2004; J.L.G-S., unpublished observations). However, whether this effect is a result of higher numbers of insertions or through a few very early integrations is currently not clear, as this system has not been used extensively yet in the testing of cis-regulatory sequences.
A further technology in the zebrafish is enhancer detection, a technique where a basal promoter driving a fluorescent protein cassette is inserted randomly into the fish genome with the idea that, if such a construct inserts near an endogenous enhancer, it will be activated and fluorescence can be observed with the spatiotemporal specificity of cis-regulatory sequence(s) (Balciunas et al.,2004; Parinov et al.,2004; Ellingsen et al.,2005). This approach can be used both for detection of activating sequences by using a basal promoter with no endogenous activity or by using a ubiquitous promoter and then selecting for restriction of expression pattern (reviewed in Amsterdam and Becker,2005). Provided large numbers of transgenic lines can be produced and characterized, these technologies can be used to probe the genome for local cis-regulatory information. Once an expression pattern is recovered, the retroviral or transposon insertion can be located by isolating sequence flanking the insertion and mapping it to the zebrafish genome sequence (Amsterdam and Becker,2005; Ellingsen et al.,2005). Although this technique in itself would not allow identification of regulatory sequences, because, as mentioned above, these sequences can be very far from the gene they regulate, the combination with phylogenetic footprinting/shadowing makes this a powerful approach complementing other available tools. For instance, insertions in and around Hox clusters allow defining the cis-regulatory activity in a precise genomic location and, thus, permit measuring the extent of the regulatory domain of a given enhancer (Ellingsen et al.,2005; and T.S. Becker, unpublished observations). We will give examples in the following sections on how genomic information gleaned from large-scale enhancer detection efforts can be used to complement comparative genomics, to annotate cis-regulatory sequences to the genes they regulate, and to gain insight into the influence of cis-regulatory elements on vertebrate genome architecture.
Enhancers in Vertebrates Often Correlate with Conserved Noncoding Regions
The vertebrate neurogenin genes are required for the induction of neuronal differentiation within the vertebrate nervous system, and they also appear to impart identity to particular subtypes of neurons (reviewed in Bertrand et al.,2002). In the mouse, both Ngn1 and Ngn2 expression patterns are the results of the activity of specific cis-regulatory elements scattered along the Ngn locus and both 5′ and 3′ of the Ngn genes (Scardigli et al.,2001; Nakada et al.,2004). Similarly, for the zebrafish ngn1 gene, two independent regions were found to act as distant enhancers that activate expression in lateral primary neurons and in the anterior neural plate, respectively (Blader et al.,2003). Moreover, sequence comparisons of the zebrafish enhancers with the mammalian Ngn1 loci show high conservation between these species in several genomic regions that are in equivalent positions (Blader et al.,2003). The regions conserved in mouse correspond to some of those exhibiting enhancer activity (Nakada et al.,2004). As expected, homologous zebrafish and mouse enhancers promote expression in equivalent territories (Nakada et al.,2004). However, not all mouse Ngn1 enhancers correspond to zebrafish conserved regions, indicating that certain enhancer elements have been conserved in all vertebrates, whereas others might have been added in the course of evolution. This is also the case for the Sox2 locus, which harbors territory-specific enhancers distributed along a 50-kb interval surrounding the Sox2 transcriptional unit (Uchikawa et al.,2003). All except one of these enhancers coincide with regions of high conservation when comparing genomic sequences from the chick, mouse, and human Sox2 loci. However, not all blocks of conserved sequences between these species seem to show enhancer activity in chick electroporation assays (Uchikawa et al.,2003). A potential caveat here is that these elements were only tested for their activity in the neural tube, although they could act as enhancers that function in other tissues or at other time points not targeted in this study. Regardless, enhancer activity is not strictly associated with interspecies conservation, but there seems to be a preponderance toward conservation around early developmental regulatory genes (Sandelin et al.,2004b; Plessy et al.,2005).
In addition, there are many cases of genes controlled by multiple cis-regulatory elements that are redundant. One such example is the homebox-containing gene Otx2. Mouse transgenic experiments using Otx2 genomic regions directing reporter gene expression allowed detection of enhancers for specific Otx2 expression domains (Kimura et al.,1997,2000; Kurokawa et al.,2004a,b). Interestingly, mouse deletions of some of these enhancers demonstrated their contribution to Otx2 regulation and the presence of redundant elements within the Otx2 locus (2004a, b). Moreover, many of the Otx2 enhancers isolated correspond to noncoding regions conserved in other vertebrates (Kimura-Yoshida et al.,2004; Kurokawa et al.,2004a,b). Fugu Otx2 genomic regions orthologous to some of the mouse Otx2 enhancers activate expression in equivalent domains (Kimura-Yoshida et al.,2004; Kurokawa et al.,2004a). However, within the Otx2 locus, many genomic regions tested that harbor conserved sequence did not show enhancer activity. Conversely, several fugu Otx2 genomic regions examined in similar mouse transgenic assays were able to activate expression in Otx2 subdomains while not containing conserved sequences (Kimura-Yoshida et al.,2004), again stressing the point that there is not a one-to-one correlation between enhancer activity and sequence conservation. With these results in mind, it seems unlikely that the extreme conservation of some of these genomic noncoding regions with no enhancer activity (at least as demonstrated in transgenic assays) is due solely to multiple overlapping of transcription factor binding sites. Rather, it indicates that they must have additional roles, such as suppressive activity or the regulation of chromatin structure. In some cases, the same sequence might contain both structural and enhancer activities. It should be expected that similar expression domains for homologous genes in two related species is the consequence of the action of a combination of common transcription factors. However, if the DNA binding sites for these factors in number, distribution, or orientation were not located similarly in the genomes of those species, no homology would be detected. A recent example is the regulation of Otx2 expression patterns in two species of ascidians: Halocynthia roretzi and Ciona intestinalis (Oda-Ishii et al.,2005).
In contrast to these cases, where enhancers were detected by testing the activity of genomic fragments followed by examination of sequence conservation, there are several recent reports that have initially compared genomes and have then analyzed HCNRs for enhancer activity. Of approximately 1,500 HCNRs conserved from fish to humans, 25 such regions were tested in transient transgenics in zebrafish (Woolfe et al.,2005). These regions were selected because they were in close association with known developmental genes (Shh, Pax6, Sox21, and Hlxb9). Of these elements, 23 showed enhancer activity, many of them active in subdomains of the expression patterns of the proximal genes, whereas in some cases, expression was detected in tissues where the genes that these sequences are close to are not expressed. However, as seen in enhancer detection screens, random insertions in the zebrafish genome do not always assume the expression pattern of the closest gene but sometimes that of a gene further away (Ellingsen et al.,2005). It is not always straightforward, therefore, to assign a given conserved element to a gene, plus several elements might be necessary to produce the correct pattern, for example with activating as well as restricting activities. Just like their extreme level of conservation, the significance of the distance of a cis-regulatory sequence from its target gene(s) is not understood; a recent report showed that a fugu element associated with the DACH gene was completely identical, in a core region, over 144 bp with human and mouse and could drive reporter expression in transgenic mouse embryos. However, the introduction of two different 16-bp insertions did not affect this expression pattern. It is possible that subtle changes in cis-regulatory elements lead to relatively subtle yet important differences in expression and that the resolution of the reporter assay in this case was insufficient. For instance, in zebrafish enhancer detection insertions near regulatory genes expressed throughout the life of the animal, while embryonic expression patterns often mimic the pattern of the endogenous gene fairly closely, the expression pattern in the adult brain can be different (Birgit Adolf, Laure Bally-Cuif, and T.S. Becker, unpublished results). This might hint at temporal control being an important issue, and this cannot be tested if one looks at only one time point during the development of an embryo. It is likely, therefore, that the elements necessary for early embryonic development will be relatively easy to characterize, but more subtle aspects, or later temporal regulation, might escape detection, adding a further layer of complication to the characterization of cis-regulatory elements.
Gene Complexes and Regulatory Landscapes
The extent of the regulatory domain, or the “striking distance” of a given enhancer, or the limit thereof, appears to be a signature of that enhancer, although how this characteristic comes about is unknown currently. It is becoming clear, however, that this individual feature of enhancers imposes evolutionary constraints on the chromosomal neighborhood within their reach.
Gene complexes are sets of genes performing similar functions located in one region of the chromosome. These genes are often members of the same gene family. Gene complexes were first discovered by genetic means in Drosophila, the best known perhaps being the bithorax complex BX-C (Lewis,1978), also the first Hox complex described. The Hox complexes are conserved throughout bilaterian evolution and at least partially owe their specific arrangement along the chromosome to cis-regulatory regions shared by more than one gene, thus constraining against rearrangements during evolution. The correlation between the position of mutations (chromosomal breakpoints and deletions) within the BX-C and the phenotypes associated with these mutations predicted the presence of cis-regulatory elements in different regions of the complex. These elements were postulated to be responsible for the expression of the BX-C genes in precise antero/posterior (A/P) regions. Several enhancers were isolated within the predicted regions by means of transgenic experiments where genomic regions were driving the expression of the LacZ reporter gene (Simon et al.,1990). The mouse Hox clusters are well researched in terms of gene regulation and a large body of recent work has revealed that Hox genes share not only enhancers but also promoters and, sometimes, exons, emphasizing their dependence on evolutionary Hox cluster maintenance (Sharpe et al.,1998; reviewed in Kmita and Duboule,2003).
In vertebrates the Hox gene clusters play an essential role in A/P patterning as well but they are also required for the proper formation of the limbs (reviewed in Krumlauf,1994). Four paralogous clusters of Hox genes are present in mammals. In each cluster, Hox genes are numbered from 13 to 1 accordingly to their location, from 5′ to 3′, in the cluster. In the HoxD cluster, HoxD10 to 13, the four 5′ genes located at the end of the cluster, are required for proper specification of the limb buds. These genes are coexpressed in the prospective digit territory, although at different levels. Genes situated further 5′ are expressed stronger than those further 3′. Therefore, HoxD13 shows the strongest and HoxD10 the lowest expression levels. This finding suggests that HoxD10 to 13 genes share a common cis-regulatory element that activates the expression of each of these genes at individual levels. This enhancer, also termed a locus control region (LCR) is located outside of the HoxD cluster, 250-kb upstream of HoxD13, with a further two genes that lie in between the LCR and the HoxD cluster, forming what has been imaginatively termed a genomic regulatory landscape (Spitz et al.,2003). Of interest, these two unrelated genes evx2 and lunaparc show expression patterns in the limbs similar to HoxD10-13, indicating that they are also under regulation of the LCR. In the mouse Ulnaless inversion the LCR is removed from the 5′ HoxD genes to a position 700 kb distal to the 3′ HoxD genes. As a result, these 3′ genes are not responding to the LCR, suggesting that its range of activity is approximately 500 kb (250 kb in each direction; Spitz et al.,2003). Highlighting how enhancers may have individual domains of activity, a neural enhancer was found adjacent to the limb LCR, which activates evx2 and lunaparc but not the 5′ HoxD genes, indicating that its range of activity is shorter than that of the limb LCR (Spitz et al.,2003). Why the LCR enhancer is not capable to activate the four 5′ HoxD genes at the same level was addressed in an elegant series of deletion and duplication of HoxD genes, combined, in some cases, with inactivation of some of them, suggesting that there is a competition of HoxD promoters for this distant LCR, closer genes being induced stronger (Kmita et al.,2002). Moreover, the preference of the limb enhancer for closer HoxD genes depends on a conserved noncoding region placed 5′ to HoxD13. This sequence facilitates the interaction of the distal cis-regulatory element with the HoxD genes and preferentially targets most of the enhancer activity to the 5′ extreme of the complex (Kmita et al.,2002). This action causes a gradual decrease in the activation of HoxD genes further 3′ up to the limit of the enhancer domain. Hence, the limb-specific activity of HoxD genes is imparted by an enhancer outside of the cluster, and this process imposes constraints on relocation of all genes for which its activity is crucial.
Spitz et al. (2005) constructed a mouse strain in which the HoxD cluster was separated into two pieces by inversion of a part of chromosome 2 that resulted in disruption of the HoxD locus between HoxD10 and HoxD11. The resulting changes in the expression of the corresponding HoxD genes revealed a partition of regulatory influences of the corresponding enhancers, further confirming that the expression of HoxD genes is a product of a complex cis-regulatory network that spans the whole genomic region and whose disruption during the vertebrate evolution was likely strongly selected against.
These examples show how genetics is necessary to map, predict, and confirm localized enhancer elements. Furthermore, these cases also illustrate how shared enhancers impose genome architecture: two or more genes regulated together by the same sequence cannot separate by translocation or inversion without severely affecting their spatiotemporal expression pattern.
A further way of how cis-regulatory sequences impose constraints on genomic architecture is by being positioned at a certain distance to its target gene. If these distant elements are inside an intron of an unrelated gene (with a different expression pattern), it will in effect “interlock” these two (or, in the case of multiple elements sometimes more) genes, resulting in the evolutionary conservation of a genomic neighborhood throughout all vertebrate genomes from humans to teleosts (Mackenzie et al.,2004; Goode et al.,2005; Kleinjan and van Heyningen,2005).
Sonic hedgehog (Shh) is a secreted protein required for the generation of polarity within the developing limbs. Shh is expressed in a restricted pattern in the posterior mesenchyme of the limb bud, in a region defined as the zone of polarizing activity (ZPA; reviewed in Johnson and Tabin,1997). The expression of Shh in the ZPA is mediated by an enhancer located 1 Mb away in intron 5 of the unrelated Lmbr1 locus, with two further unrelated genes in between (Lettice et al.,2003). Of interest, preaxial polydactyly (PPD), a human congenital anomaly defined by mirror-image digit duplications, as well as two mouse mutations with similar phenotypic effects, Hx and M100081, are found to be caused by single base substitutions within this enhancer (Lettice et al.,2003; Sagai et al.,2004). In these mouse mutants Shh is ectopically expressed in the anterior margin of the limb bud mesenchyme, suggesting a functional alteration of the Shh ZPA enhancer (Masuya et al.,1995). That this single base-pair change is the direct cause of ectopic expression of Shh was elegantly shown in transgenic mice where the mutated enhancer drives similar ectopic expression of a lacZ reporter in the limb buds (Maas and Fallon,2005). Further demonstration of the requirement of this enhancer for Shh ZPA expression comes from mice in which this enhancer specifically has been deleted (Sagai et al.,2005). In these mutants, Shh expression in the ZPA is absent with the phenotypic consequence of deletion of distal elements of the limbs, a phenotype also observed in homozygous Shh− mice (Sagai et al.,2005). This enhancer is also highly conserved and similarly positioned in all vertebrates except snakes and limbless newts (Sagai et al.,2005). It directs reporter expression in the appropriate pattern when placed upstream of a basal promoter driving lacZ but fails to terminate expression at the right time point, suggesting that either spatial or temporal aspects of an expression pattern might be regulated by enhancer–promoter distance, placing possible constraints on the genomic region in between. The region around the shh gene forms the largest syntenic block conserved between the human and fish genomes, extending over a total of 16 genes (Goode et al.,2005), suggesting that other constraints must be acting on this region as well. Thus, syntenic blocks of gene neighborhoods might indicate the presence of long-range enhancers acting on one or more genes in the region. Enhancer detection in zebrafish can uncover such regions, as it is sometimes found that an insertion in the zebrafish genome exhibits the expression pattern of a gene at a large distance from the actual insertion (see below). In many cases so far where human or mouse heritable defects have been found to be the direct cause of mutations or chromosomal rearrangements outside of the responsible gene, the corresponding region is syntenic to the fugu and zebrafish genomes. Examples are the murine gene pairs formin and gremlin (Zuniga et al.,2004) and human elp4 and pax6 (Kleinjan and van Heyningen,2005). A recent example is sclerostin and MEOX1 (Loots et al.,2005), a gene neighborhood also conserved in the human and zebrafish genomes.
We have seen in the last section that the extent of an enhancer domain can include several genes, which may or may not be coregulated by the enhancer, and that these regions tend to be evolutionarily conserved from humans to fish. Many important developmental genes are not surrounded by any genes but rather are in, or abut, so-called gene deserts (GDs), large intergenic regions devoid of confirmed transcriptional units, often extending hundreds of kilobases or more in the human genome (Ovcharenko et al.,2005). The gene deserts within the human genome have been classified into stable and variable, depending on whether or not they are found to be syntenic with the chicken genome and, hence, are evolutionarily conserved (Ovcharenko et al.,2005). This is an important point as evolutionary conservation, once again, suggests likely cis-regulatory function. Interestingly, the presence of a gene desert associated to a specific gene is usually also an evolutionary conserved characteristic. In many cases, HCNRs have been found in gene deserts (e.g., Sandelin et al.,2004b). It is, therefore, likely that regulatory elements necessary for the expression of the developmental regulatory genes lie within these gene deserts. Searching for HCNRs, therefore, becomes a reasonable strategy to look for enhancers within GDs and test for their activity. This strategy was used to search for cis-regulatory elements close to the DACH gene (Nobrega et al.,2003). This gene, which is expressed in numerous tissues including retina and ear during development, is flanked by two gene deserts of 1.3 and 0.87 Mbs in mammals (see Fig. 1). In this pioneering work, it was shown that, in mouse transgenic assays, of 9 (of 32) HCNRs conserved from human to fish tested, 7 were able to activate expression in mouse transgenes. Moreover, these HCNRs promoted expression in domains that partially recapitulate several aspect of the endogenous DACH expression pattern (Nobrega et al.,2003).
The first exhaustive analysis of the activity of multiple HCNRs located in a gene desert was done with those flanking the vertebrate Iroquois (Irx) genes. The Irx genes are organized in two large clusters (1–2 Mb in mammals), IrxA and IrxB, with three Irx genes each. These genes participate in many processes during development, such as subdivision of the vertebrate developing neuroectoderm in the A/P and dorsoventral axes (reviewed in Gómez-Skarmeta and Modolell,2002). In this study, approximately 50 HCNRs from the zebrafish and/or Xenopus clusters were analyzed for enhancer activity in zebrafish and Xenopus transgenic assays (de la Calle-Mustienes et al.,2005). The majority of these elements were found to be active in subdomains of endogenous Irx expression, and many are candidates for enhancers shared by neighboring genes. This finding would explain the highly similar expression pattern of these genes and the evolutionary conservation of Irx gene clustering, including maintenance of the associated gene deserts (see Fig. 6 for an interpretation of these results). Furthermore, HCNRs present in tetrapod IrxB but not in fish seem to be responsible for novel Irx expression domains that appeared after their divergence. Among the conserved regions tested, a more detailed analysis was done for two IrxB ultraconserved noncoding regions (UCRs) duplicated in IrxA clusters in similar relative positions. These four regions share a core region of approximately 400 bp highly conserved between all of them, and drive expression in similar domains. However, interspecies conserved sequences surrounding the core, specific for each of these UCRs, are able to modulate their expression. Therefore, these results indicate that the genomic context influences enhancer activity (de la Calle-Mustienes et al.,2005). In addition, this work confirms that searching for vertebrate conserved noncoding regions may be used as a guide for enhancers within large intergenic regions. For example, a comparative analysis of the sequences conserved among vertebrates in the HoxA clusters indicated that many of the HCNRs detected correspond to previously identified enhancers (Santini et al.,2003). In these cases, an initial computational approach was essential to identify candidate regions that may contain enhancer elements. The functional activity of some of these regions was then demonstrated by transgenic experiments, demonstrating the power of combining in silico and in vivo approaches. The opposite situation is exemplified by screening for random insertions in zebrafish (Ellingsen et al.,2005). Here, the initial detection of an expression pattern associated with an activated insertion prompts further genomic analysis around the genomic insertion site. Therefore, in these cases, in vivo studies are strongly complemented by further in silico analysis. For instance, a gene desert upstream of sox11 in the human genome was found to contain a HCNR (Sandelin et al.,2004b). In the zebrafish, this gene desert has been duplicated including the HCNR, and in the gene desert upstream of sox11b, 3 enhancer detection insertions (up to 132 kb from the sox11b start site) exhibit the expression pattern of the gene, whereas a fourth, 222 kb away, and closer to an unrelated gene on the other side of the gene desert has a much weaker and restricted expression pattern (see Fig. 7; Ellingsen et al.,2005). Importantly, the enhancer detection insertions in this case do not cluster around the gene, but rather around the HCNR in the gene desert, suggesting that the spatial pattern of expression is not affected by distance from an enhancer in this case. The same can be found in insertions in syntenic blocks, where sometimes an insertion maps far from the gene whose expression pattern it assumes, but close to the regulatory element driving this pattern (T.S. Becker, unpublished results).
Even though it has been shown that long-range developmental enhancers can exert control of transcription at the locus level, transcription profiles of genes in a region within the domain of a particular enhancer or an array of enhancers shows that there are genes that apparently ignore all clues of the locus-exerted control. This finding is particularly true of some tissue-specific genes. Recent studies might provide an explanation: there are fundamentally different core promoter architectures predominantly associated with tissue-specific and broader expression patters. Most tissue-specific contexts are predominantly associated with TATA box-containing promoters, while the broadly and ubiquitously expressed genes (a.k.a. housekeeping genes) are more likely to be started from a TATA-less promoter overlapping a CpG island (Butler and Kadonaga,2002; Schug et al.,2005; Yamashita et al.,2005). In hybrid cases, TATA-box directed initiation of transcription is often associated with context-specific expression, whereas the overlapping CpG island is responsible for low levels of system-wide expression (Carninci et al., manuscript submitted for publication; Ponjavic et al., manuscript submitted for publication). The most studied example for such a dual mechanism of transcription initiation is the mammalian α-globin gene (Cuadrado et al.,2001). However, TATA-less core promoters have been described that are not necessarily CpG island overlapping and that have multiple transcription start sites and lack an ATG start codon (Lee et al.,2005).
Fundamental developmental regulators are a notable exception to the rule that tightly controlled expression is associated with TATA-box promoters—even though they have elaborate spatiotemporal expression patterns, their core promoters are generally TATA-less and overlapping (sometimes very large) CpG islands. For instance, the core promoters of Hox genes are usually TATA-less (Sharpe et al.,1998); indeed, most genes for transcription factors spanned by HCNRs arrays described in (Sandelin et al.,2004b) have such TATA-less promoters. Of interest, in zebrafish enhancer detection, which uncovered several genes associated with HCNRs as well as other developmentally important genes (Ellingsen et al.,2005), the reporter construct also used a TATA-less promoter, that of the GATA2 gene, which is itself spanned by HCNRs (Sandelin et al.,2004b), suggesting that the choice of promoter can have an influence on the range of enhancers detected.
Although the proposition and demonstration of insulator elements that give rise to functional gene expression domains has provided explanations in some cases, the demonstration that some genes within a regulatory domain nevertheless have distinct expression patterns suggests that other mechanisms must exist to ensure maintenance of independence of a gene promoter from the influence of cis-regulatory sequences dominating the genomic region in question (Dillon and Sabbattini,2000). As we currently have no way of classifying enhancers other than perhaps by their distance of action and their relative “strength” or their tissue specificity, the central problem will be which different classes of promoters will be discernible in vertebrate genomes.
The search for vertebrate cis-regulatory sequences has only begun. Comparative genomics has provided a cornucopia of candidate genes and conserved noncoding regions, which ultimately will have to be confirmed by experiments involving transgenesis in fish, frogs, chicken, or mouse as well as through targeted deletion in the mouse. In most cases, more than one organism will be required for testing a conserved noncoding region, because small differences in the sequences between species, as well as the genomic context of these sequences, may alter their enhancer activity. Recent developments in transgenesis have made possible the large-scale detection of cis-regulatory activity in the zebrafish genome and frog transgenesis has proven a rapid way of testing candidate sequences. We are beginning to see how cis-regulatory sequences exert their influence on genomic architecture. However, many questions remain open. How cis-regulatory sequences interact with promoters currently is not understood completely, nor do we know how specific elements interact with specific promoters. Why are some enhancers intermingled with unrelated genes but still manage to activate a specific target, whereas others are found in genomic regions devoid of other genes? Why and how do these elements interact at sometimes exceedingly large distances when placing them in the context of a reporter construct yields similar results? A further problem will be how activated and repressed states of chromatin are maintained.
An important aspect of cis-regulation that remains largely unresolved is the identification and characterization of restrictive/silencing elements. The transgenic assays described here are very efficient to detect positive regulatory elements but do not address the issue of isolating silencer regions. However, Balciunas et al. (2004) used a ubiquitously active promoter, ef1 alpha, to generate random insertions in the zebrafish genome and found that, in certain insertions, this pattern would be restricted in a tissue-specific manner. A possible way, therefore, to test candidate silencer elements would be to place these regions (e.g., HNCRs with no enhancer activity) in the context of such a moderate ubiquitous promoter driving a reporter like enhanced GFP. In embryos transgenic for these constructs, such restrictive elements with regional-specificities would generate lack of fluorescence in specific domains. In contrast, silencers that repress transcription in a territory-independent manner would down-regulate expression in a general way. This latter case, however, would be difficult to interpret. As with enhancer elements, promoter specificity and genomic context may be expected to be of importance with silencer regions as well.
The coming years will bring intense activity in identification and characterization of cis-regulatory elements, and input on the functional annotation of these elements will come from diverse disciplines such as bioinformatics, the investigation of human genetic disorders, and experimental approaches in mouse, chicken, frogs, and fish. At least in Drosophila and sea urchin, cis-regulatory sequences are thought to consist of multiple transcription factor binding sites (Davidson,2001; Levine and Davidson,2005). However, there are recent examples in vertebrates where no transcription factor binding sites could be identified in sequence that did act as an enhancer (e.g., Sagai et al.,2005). Although it is almost certain that not all transcription factor binding sites are known in vertebrates, some of the UCRs are identical between human and mouse over several hundred base pairs, a finding that is difficult to explain through conservation of binding sites alone. The jury is still out on how to solve this conundrum.
We thank Julien Ghislain and two anonymous reviewers for thoughtful comments on the manuscript. This work is funded by the Functional Genomics Programme (FUGE) in the Research Council of Norway and by the European Commission as part of the ZF-Models integrated project in the 6th framework programme to T.S.B., J.L.G-S. was funded by the Spanish Ministry of Education and Science and Junta de Andalucía, B.L. was funded in part by the Pharmacia Corporation (now Pfizer) and by the Swedish National Research Council.
- 2004. MSCAN: identification of functional clusters of transcription factor binding sites. Nucleic Acids Res 32: W195–198. , , , .
- 1999. A method for generating transgenic frog embryos. Methods Mol Biol 97: 393–414. , .
- 2005. Transgenes as screening tools to probe and manipulate the zebrafish genome. Dev Dyn (in press). , .
- 2004. Enhancer trapping in zebrafish using the Sleeping Beauty transposon. BMC Genomics 5: 62. , , , , , .
- 2004. Ultraconserved elements in the human genome. Science 304: 1321–1325. , , , , , , .
- 2005. Computational screening of conserved genomic DNA in search of functional noncoding elements. Nat Methods 2: 535–545. , , , .
- 2005. Phylogenetic shadowing and computational identification of human microRNA genes. Cell 120: 21–24. , , , , , .
- 2002. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl Acad Sci U S A 99: 757–762. , , , , , , , .
- 2002. Proneural genes and the specification of neural cell types. Nat Rev Neurosci 3: 517–530. , , .
- 2003. Multiple regulatory elements with spatially and temporally distinct activities control neurogenin1 expression in primary neurons of the zebrafish embryo. Mech Dev 120: 211–218. , , .
- 2002. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 12: 739–748. , .
- 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299: 1391–1394. , , , , , , .
- 2003. Communication over a large distance: enhancers and insulators. Biochem Cell Biol 81: 241–251. , , , .
- 1990. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J Mol Biol 212: 563–578. .
- 2002. The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev 16: 2583–2592. , .
- 2001. Species-specific organization of CpG island promoters at mammalian homologous genes. EMBO Rep 2: 586–592. , , .
- 2001. Genomic regulatory systems: development and evolution. San Diego: Academic Press. .
- 2005. A functional survey of the enhancer activity of conserved non-coding sequences from vertebrate iroquois cluster gene deserts. Genome Res 15: 1061–1072. , , , , , , , .
- 2005. Conserved non-genic sequences - an unexpected feature of mammalian genomes. Nat Rev Genet 6: 151–157. , , .
- 2004. Expression profiling and comparative genomics identify a conserved regulatory region controlling midline expression in the zebrafish embryo. Genome Res 14: 228–233. , , , , , , , .
- 2000. Functional gene expression domains: defining the functional unit of eukaryotic gene regulation. Bioessays 22: 657–665. , .
- 2005. NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence. Nucleic Acids Res 33: 1445–1453. , .
- 2005. Large-scale enhancer detection in the Zebrafish genome. Development 132: 3799–3811. , , , , , , .
- 2000. Discovery and modeling of transcriptional regulatory regions. Curr Opin Biotechnol 11: 19–24. , .
- 2003. Neural crest patterning: autoregulatory and crest-specific elements co-operate for Krox20 transcriptional control. Development 130: 941–953. , , , , .
- 2002. iroquois genes: genomic organization and function in vertebrate neural development. Curr Opin Genet Dev 12: 403–408. , .
- 2005. Highly conserved regulatory elements around the SHH gene may contribute to the maintenance of conserved synteny across human chromosome 7q36.3. Genomics 86: 172–181. , , , , .
- 2005. De novo cis-regulatory module elicitation for eukaryotic genomes. Proc Natl Acad Sci U S A 102: 7079–7084. , .
- 2004. Transcriptional regulatory code of a eukaryotic genome. Nature 431: 99–104. , , , , , , , , , , , , , , , , , , , .
- 2001. Transgenic Xenopus embryos reveal that anterior neural development requires continued suppression of BMP signaling after gastrulation. Dev Biol 238: 168–184. , , , , .
- 2005. oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res 33: 3154–3164. , , , , , , .
- 2000. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296: 1205–1214. , , , .
- 2004. Defining the CREB regulon: a genome-wide analysis of transcription factor regulatory regions. Cell 119: 1041–1054. , , , , , , , , , .
- 1997. Molecular models for vertebrate limb development. Cell 90: 979–990. , .
- 2003. The UCSC Genome Browser Database. Nucleic Acids Res 31: 51–54. , , , , , , , , , , , , .
- 2004. Transgenesis and gene trap methods in zebrafish by using the Tol2 transposable element. Methods Cell Biol 77: 201–222. .
- 2005. Strategies for characterising cis-regulatory elements in Xenopus. Brief Funct Genomic Proteomic 4: 58–68. , .
- 1997. Cis-acting elements conserved between mouse and pufferfish Otx2 genes govern the expression in mesencephalic neural crest cells. Development 124: 3929–3941. , , , , , .
- 2000. Visceral endoderm mediates forebrain development by suppressing posteriorizing signals. Dev Biol 225: 304–321. , , , , , .
- 2004. Characterization of the pufferfish Otx2 cis-regulators reveals evolutionarily conserved genetic mechanisms for vertebrate head specification. Development 131: 57–71. , , , , , , , , , .
- 2005. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am J Hum Genet 76: 8–32. , .
- 2003. Organizing axes in time and space; 25 years of colinear tinkering. Science 301: 331–333. , .
- 2002. Serial deletions and duplications suggest a mechanism for the collinearity of Hoxd genes in limbs. Nature 420: 145–150. , , , .
- 2005. Xenopus as a model system to study transcriptional regulatory networks. Proc Natl Acad Sci U S A 102: 4943–4948. , , .
- 2001. A predictive model for regulatory sequences directing liver-specific transcription. Genome Res 11: 1559–1566. , .
- 1994. Hox genes in vertebrate development. Cell 78: 191–201. .
- 2004a. Regulation of Otx2 expression and its functions in mouse forebrain and midbrain. Development 131: 3319–3331. , , , , , .
- 2004b. Regulation of Otx2 expression and its functions in mouse epiblast and anterior neuroectoderm. Development 131: 3307–3317. , , , , , , .
- 2004. Transcriptional regulation of the cardiac-specific MLC2 gene during Xenopus embryonic development. Development 131: 669–679. , , , , , , .
- 2005. ATG deserts define a novel core promoter subclass. Genome Res 15: 1189–1197. , , , , , .
- 2003. Identification of conserved regulatory elements by comparative genome analysis. J Biol 2: 13. , , , , , .
- 2003. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum Mol Genet 12: 1725–1735. , , , , , , , , , .
- 2005. Gene regulatory networks for development. Proc Natl Acad Sci U S A 102: 4936–4942. , .
- 2003. Transcription regulation and animal diversity. Nature 424: 147–151. , .
- 1978. A gene complex controlling segmentation in Drosophila. Nature 276 565–570. .
- 2005. Dcode.org anthology of comparative genomic tools. Nucleic Acids Res 33: W56–W64. , .
- 2005. Genomic deletion of a long-range bone enhancer misregulates sclerostin in Van Buchem disease. Genome Res 15: 928–935. , , , , , , , , .
- 2005. Single base pair change in the long-range sonic hedgehog limb-specific enhancer is a genetic basis for preaxial polydactyly. Dev Dyn 232: 345–348. , .
- 2004. Is there a functional link between gene interdigitation and multi-species conservation of synteny blocks? Bioessays 26: 1217–1224. , , .
- 2004. Patterning the forebrain: FoxA4a/Pintallavis and Xvent2 determine the posterior limit of Xanf1 expression in the neural plate. Development 131: 2329–2338. , , , , , , .
- 1995. A duplicated zone of polarizing activity in polydactylous mouse mutants. Genes Dev 9: 1645–1653. , , , , .
- 2003. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 31: 374–378. , , , , , , , , , , , , , , , , , , , , .
- 2003. Digitizing life at the level of the cell: high-performance laser-scanning microscopy and image analysis for in toto imaging of development. Mech Dev 120: 1407–1420. , .
- 2004. MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biol 5: R98. , , , , .
- 1999. Intronic enhancers control expression of zebrafish sonic hedgehog in floor plate and notochord. Development 126: 2103–2116. , , , , , .
- 2002. Search for enhancers: teleost models in comparative genomic and transgenic analysis of cis regulatory elements. Bioessays 24: 564–572. , , .
- 2004. Separable enhancer sequences regulate the expression of the neural bHLH transcription factor neurogenin 1. Dev Biol 271: 479–487. , , , , .
- 2001. Transcriptional profiling shows that Gcn4p is a master regulator of gene expression during amino acid starvation in yeast. Mol Cell Biol 21: 4347–4368. , , , , , , .
- 2004. The regulatory content of intergenic DNA shapes genome architecture. Genome Biol 5: R25. , , .
- 2003. Scanning human gene deserts for long-range enhancers. Science 302: 413. , , , .
- 2005. Making very similar embryos with divergent genomes: conservation of regulatory mechanisms of Otx between the ascidians Halocynthia roretzi and Ciona intestinalis. Development 132: 1663–1674. , , , , .
- 2005. Identifying synonymous regulatory elements in vertebrate genomes. Nucleic Acids Res 33: W403–W407. , .
- 2005. Evolution and functional classification of vertebrate gene deserts. Genome Res 15: 137–145. , , , , , .
- 2005. Computational identification of regulatory DNAs underlying animal development. Nat Methods 2: 529–534. , .
- 2004. Tol2 transposon-mediated enhancer trap to identify developmentally regulated zebrafish genes in vivo. Dev Dyn 231: 449–459. , , , .
- 2005. Enhancer sequence conservation between vertebrates is favoured in developmental regulator genes. Trends Genet 21: 207–210. , , , .
- 2005. In vivo characterization of a vertebrate ultraconserved enhancer. Genomics 85: 774–781. , , , , , , .
- 2002. Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3: 30. , , , .
- 2004. Phylogenetic conservation of a limb-specific, cis-acting regulator of Sonic hedgehog (Shh). Mamm Genome 15: 23–34. , , , , , , , , .
- 2005. Elimination of a long-range cis-regulatory module causes complete loss of limb-specific Shh expression and truncation of the mouse limb. Development 132: 797–803. , , , , .
- 2004. Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. J Mol Biol 338: 207–215. , .
- 2004a. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32: D91–D94. , , , , .
- 2004b. Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics 5: 99. , , , , , , , .
- 2004c. ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res 32: W249–252. , , .
- 2003. Evolutionary conservation of regulatory elements in vertebrate Hox gene clusters. Genome Res 13: 1111–1122. , , .
- 2001. Cross regulation between Neurogenin2 and pathways specifying neuronal identity in the spinal cord. Neuron 31: 203–217. , , , .
- 2005. Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol 6: R33. , , , , ,
- 2002. Transgenic zebrafish expressing green fluorescent protein. Methods Mol Biol 183: 225–233. , , .
- 1998. Selectivity, sharing and competitive interactions in the regulation of Hoxb genes. EMBO J 17: 1788–1798. , , , , .
- 1990. Regulatory elements of the bithorax complex that control expression along the antero-posterior axis. EMBO J 9 3945–3956. , , .
- 2003. The RNA polymerase II core promoter. Annu Rev Biochem 72: 449–479. , .
- 2003. A global control region defines a chromosomal regulatory landscape containing the HoxD cluster. Cell 113: 405–417. , , .
- 2005. Inversion-induced disruption of the Hoxd cluster leads to the partition of regulatory landscapes. Nat Genet 37: 889–893. , , , .
- 2000. DNA binding sites: representation and discovery. Bioinformatics 16: 16–23. .
- 2001. Expression of the zebrafish genome during embryogenesis. ZFIN on-line publication. , , , , , , , , , , .
- 2003. Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424: 788–793. , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , .
- 2005. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23: 137–144. , , , , , , , , , , , , , , , , , , , , , , , , .
- 1997. Analysis of the distribution of binding sites for a tissue-specific transcription factor in the vertebrate genome. J Mol Biol 266: 231–245. , , , , .
- 2003. Functional analysis of chicken Sox2 enhancers highlights an array of diverse regulatory elements that are conserved in mammals. Dev Cell 4: 509–519. , , , , .
- 2005. Prediction of cis-regulatory elements using binding site matrices - the successes, the failures and the reasons for both. Curr Opin Genet Dev 15: 395–402. , .
- 2005. Phylogenetic footprinting and genome scanning identify vertebrate BMP response elements and new target genes. Dev Biol 281: 210–226. , , , , , , , .
- 2004. Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 5: 276–287. , .
- 1998. Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol 278: 167–181. , .
- 2000. Human-mouse genome comparisons to locate regulatory sites. Nat Genet 26: 225–228. , , , , .
- 2000. A comparative map of the zebrafish genome. Genome Res 10: 1903–1914. , , , , , , , .
- 2005. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol 3: e7. , , , , , , , , , , , , , , , .
- 2005. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434: 338–345. , , , , , , , .
- 2005. Genome-wide analysis reveals strong correlation between CpG islands with nearby transcription start sites of genes and their tissue specificity. Gene 350: 129–136. , , , .
- 2004. Mouse limb deformity mutations disrupt a global control region within the large regulatory landscape required for Gremlin expression. Genes Dev 18: 1553–1564. , , , , , , , , , , , , .