Multiple layers of complexity in cis-regulatory regions of developmental genes


  • Nicolás Frankel

    Corresponding author
    1. Departamento de Ecología, Genética y Evolución, IEGEBA-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
    • Departamento de Ecología, Genética y Evolución, IEGEBA-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
    Search for more papers by this author


Genomes contain the necessary information to ensure that genes are expressed in the right place, at the right time, and with the proper rate. Metazoan developmental genes often possess long stretches of DNA flanking their coding sequences and/or large introns which contain elements that influence gene expression. Most of these regulatory elements are relatively small and can be studied in isolation. For example, transcriptional enhancers, the elements that generate the expression pattern of a gene, have been traditionally studied with reporter constructs in transgenic animals. These studies have provided and will provide invaluable insights into enhancer evolution and function. However, this experimental approach has its limits; often, enhancer elements do not faithfully recapitulate native expression patterns. This fact suggests that additional information in cis-regulatory regions modulates the activity of enhancers and other regulatory elements. Indeed, recent studies have revealed novel functional aspects at the level of whole cis-regulatory regions. First, the discovery of “shadow enhancers.” Second, the ubiquitous interactions between cis-regulatory elements. Third, the notion that some cis-regulatory regions may not function in a modular manner. Last, the effect of chromatin conformation on cis-regulatory activity. In this article, I describe these recent findings and discuss open questions in the field. Developmental Dynamics, 2012. © 2012 Wiley Periodicals, Inc.


The precise control of gene transcription is crucial for the development of an organism. The presence of abnormal mRNA levels during development can modify the adult phenotype and have detrimental fitness effects. The genome contains the necessary information to ensure that genes are expressed in the right place, at the right time and with the proper rate. Developmental genes often possess long stretches of DNA flanking their coding sequences and/or large introns which contain elements that influence gene expression. Most of these regulatory elements are relatively small and can be studied in isolation (separated from their respective coding sequences). Certainly, the modular nature of most regulatory elements (i.e., their ability to function in isolation) paved the way for their discovery and study. Up to date, four types of elements have been described: transcriptional enhancers (Banerji et al., 1981), transcriptional silencers (Laimins et al., 1986), enhancer-blocking insulators (Udvardy et al., 1985), and promoter-tethering elements (Calhoun et al., 2002). Transcriptional enhancers are, by far, the most studied among the four element types (Levine, 2010). When placed upstream of a minimal promoter and a reporter gene in a transgene, enhancers recapitulate part of the native expression pattern of a gene. Enhancer activity is generated by the binding of transcription factors to short DNA motifs (6–20 base-pairs) present in their sequence. In vivo, chromatin loops mediate the interaction between enhancers and their target core promoters (Akbari et al., 2006). Silencers repress gene expression through interactions with the core promoter and/or enhancer elements (Ogbourne and Antalis, 1998). Insulators isolate enhancers from nearby genes, restricting their influence to the target core promoter(s) (Bell et al., 2001). Last but not least, promoter-tethering elements facilitate the interaction between enhancers and target core promoters (Akbari et al., 2008).

The collection of noncoding regulatory elements of a gene constitutes its cis-regulatory information. Unlike with coding sequences, there is no simple way to demarcate the physical limits of cis-regulatory regions in metazoan genomes; even when flanking and intronic DNA has been explored in detail, it is difficult to rule out the presence of additional distant regulatory elements. For example, the gene sonic hedgehog (shh), which is involved in vertebrate limb development, is activated by an enhancer located approximately 1 megabase upstream of its coding sequence (Lettice et al., 2003). Despite this challenging issue, studies of reporter expression with large DNA fragments allow us to define discrete functional units which seem to fully recapitulate the expression pattern of a gene. Thus, for some developmental genes, the apparent boundaries of their cis-regulatory DNA have been established (Fujioka et al., 2002; Kim and Lauderdale, 2006; Venken et al., 2009).

It has been shown that genes involved in embryonic development, cell differentiation, and pattern specification in D. melanogaster and C. elegans have significantly more flanking DNA than housekeeping genes (Nelson et al., 2004). However, there is high variance in flanking DNA content among developmental genes (Nelson et al., 2004). This probably means that the regulatory information of a developmental gene can have different degrees of compactness. Indeed, this has been observed within and between genomes. For instance, the regulatory information needed for embryonic expression of S. purpuratus endo16 appears to be circumscribed to 2,300 base-pairs (Yuh et al., 2001). In contrast, several developmental genes in metazoan genomes have cis-regulatory elements scattered over tens or hundreds of kilobases (Maeda and Karch, 2009; Frankel et al., 2011; Montavon et al., 2011; Visser et al., 2012).

Traditionally, the study of cis-regulation was tackled with functional analyses of individual cis-regulatory elements (mostly enhancers). Over many years, these analyses have shed light on the logic of cis-regulation (Istrail and Davidson, 2005). Furthermore, they have provided invaluable insights into the structural rules (Erives and Levine, 2004; Panne et al., 2007; Swanson et al., 2010) and evolutionary histories (Ludwig and Kreitman, 1995; Crocker et al., 2008) of enhancers. Nevertheless, it is widely recognized in the cis-regulation community that, often, small enhancer elements do not faithfully recapitulate native expression patterns (Barolo, 2012). Frequently, reporter constructs drive expression in the wrong cells, a fact known as “ectopic expression” (Summerbell et al., 2000; Chao et al., 2010; Prazak et al., 2010; Frankel et al., 2011; Perry et al., 2011). It is also common to observe reporter expression which does not coincide temporally with that of the native gene, a phenomenon often referred as “heterochronic expression” (Adachi et al., 2003; Lin et al., 2010; Prazak et al., 2010; Frankel et al., 2011; Ludwig et al., 2011). This probably means that additional information is necessary to generate the native expression pattern of a gene. Also, it is known that transgenes are usually subject to position effects; their expression changes depending on the genomic context in which they are located. This suggests that chromatin conformation within and around regulatory elements is vital for their proper function. It is important to clarify that none of these known issues undermine the work with isolated cis-regulatory elements (these studies will continue to be useful). Instead, they make us think that additional levels of analysis are necessary to fully comprehend the complexity of cis-regulatory regions. In effect, comprehensive studies and methodological breakthroughs have started to expose this complexity. In recent years, numerous investigations have uncovered new aspects of functional complexity at the level of whole cis-regulatory regions. First, the discovery that one gene can have multiple enhancers driving similar expression patterns. Second, the ubiquitous interactions between cis-regulatory elements. Third, the notion that some cis-regulatory regions may not function in a modular fashion. Last, the effect of epigenetic marks and chromatin conformation on the activity of regulatory elements, and the consequences for gene expression. In this article, I describe these recent findings in detail and evaluate the extent of the discoveries. In addition, I discuss open questions in the field and the state-of-the-art methodological approaches that will facilitate research on cis-regulation.

Genes Have Multiple Enhancers With Overlapping or Very Similar Expression Patterns

In 2007, a study of cis-regulation with D. melanogaster embryos exposed a surprising fact. Whole genome ChIP-chip analyses for three transcription factors involved in dorso-ventral patterning, revealed that the target gene vnd is regulated by two enhancers separated by many kilobases, which drive remarkably similar expression patterns in presumptive neurogenic ectoderm (Zeitlinger et al., 2007). This surprising fact was later observed for more genes involved in dorso-ventral patterning. The Drosophila genes sog and brinker also possess two enhancers, separated by many kilobases, with similar embryonic activity (Hong et al., 2008). It was suggested that at least one-third of all Dorsal target genes might have this type of regulation (Hong et al., 2008). In the latter study, the authors coined the term “shadow enhancers.” In my opinion, this term is more attractive than descriptive. However, it is difficult to propose an alternative term that is short and descriptive at the same time. Thus, “shadow enhancers” has permeated into the literature and the scientific community.

Given this interesting phenomenon, there was an obvious question to answer: why do some genes have enhancers with apparently redundant activities? Is one enhancer sufficient for accurate gene function? The widespread occurrence of “shadow enhancers” suggested that there was a functional explanation for their presence in genomes. A plausible answer for the above questions came from experimental manipulations of cis-regulatory regions. The gene shavenbaby (svb) encodes a master transcription factor that regulates the formation of trichomes (hair-like structures) in the cuticle of insects (Payre et al., 1999). This gene also contains multiple enhancers which drive expression in the same group of cells (Fig. 1A). The expression patterns of two distant svb enhancers named Z and DG2 overlaps with the patterns of enhancers A and E (Fig. 1A). All these four enhancers are active at the same time in dorso-lateral cells of the embryonic epidermis (Frankel et al., 2010). Then, what happens if the distant enhancers are removed from the native locus through a deletion? If enhancers Z and DG2 are deleted and embryos develop under an optimal growth temperature the larval cuticle has a wild type appearance (Fig. 1B). However, if embryos develop under extreme temperatures, the lack of enhancers Z and DG2 causes a significant loss of cuticular trichomes (Fig. 1B; Frankel et al., 2010). A similar effect is observed with a “stressful genetic background.” Wingless is a known regulator of svb; if the two alleles of this gene are wild type, the absence of the distant enhancers has no phenotypic effect. In contrast, if one copy of wingless is mutated, the lack of Z and DG2 produces a significant loss of trichomes (Frankel et al., 2010). Hence, the activity of the distant enhancers is dispensable under optimal growth conditions but vital under stressful conditions. Thus, we can conclude that enhancers Z and DG2 confer robustness to the phenotype (i.e., canalize the phenotype; Swami, 2010).

Figure 1.

The function of shavenbaby “shadow enhancers.” A: (above) Drawing from the lateral perspective of a D. melanogaster first instar larva. The pattern of trichomes (hair-like structures) is depicted in black. The domain producing quaternary trichomes on the fifth abdominal segment is enclosed in a black outline. (below) Diagram of the region upstream of shavenbaby (svb) transcription start site, showing the positions of the enhancers (black rectangles) for this locus. The overlapping expression driven by enhancers DG2, Z, A, and E6 is shown in red in the diagrams of the quaternary domain (below each enhancer). B: Larvae carrying a deletion of enhancers Z and DG2 in the svb gene show a wild type number of quaternary trichomes if embryos develop at 25°C, the optimal growth temperature. In contrast, larvae with the deletion display a diminished number of quaternary trichomes in the areas where DG2 and Z drive expression (indicated by black arrows) when embryos are reared at extreme temperatures (17°C or 32°C). Larvae with a wild-type shavenbaby develop very similar numbers of quaternary trichomes when grown at different temperatures. Thus, the activity of DG2 and Z confers robustness to the phenotype.

A similar set of experiments was also applied to the gene snail in D. melanogaster (Perry et al., 2010). However, in this case, the experimental manipulations were not made in the native locus but in a BAC (bacterial artificial chromosome) containing the cis-regulatory region of the gene. Snail is activated by two enhancers (one proximal and one distal to the transcription start site) with similar activities in the early embryo. Under normal growth conditions, removal of the proximal or the distal enhancer did not affect snail function. On the other hand, if the embryo develops at high temperatures or with just one copy of Dorsal (a gene that activates snail), the lack of one enhancer causes gastrulation defects (Perry et al., 2010). Intriguingly, a recent report with a very similar experimental approach postulated that the distal enhancer of snail is essential for gastrulation under normal growth conditions (Dunipace et al., 2011), contending the previous claim about the dispensability of this enhancer (Perry et al., 2010).

In recent years, numerous cis-regulatory regions of animal genes have been shown to contain multiple enhancers with overlapping or very similar expression patterns (Jeong et al., 2006; Werner et al., 2007; Hong et al., 2008; McGaughey et al., 2008; Corbo et al., 2010; Frankel et al., 2010; Kalay and Wittkopp, 2010; L'Honore et al., 2010; Naranjo et al., 2010; Perry et al., 2010, 2011; Franchini et al., 2011; Galindo et al., 2011; Ghiasvand et al., 2011; Lee et al., 2011; McBride et al., 2011; Watts et al., 2011; Pauls et al., 2012). Yet, the first observations of overlapping expression from different enhancers date back from the 1990s (Hoch et al., 1990; Kassis, 1990; Tolias and Kafatos, 1990). In addition to these experimentally validated cases, a computational analysis has predicted the occurrence of “shadow enhancers” in many D. melanogaster genes that act in the segmentation cascade (Kazemian et al., 2010). The presence of two enhancers with similar activities, however, does not automatically mean that one enhancer is required to canalize the phenotype. For example, a natural deletion of a distal enhancer of ATOH7 (a gene that has a seemingly redundant proximal enhancer) causes a congenital eye disease in humans without any environmental or genetic perturbation (Ghiasvand et al., 2011). Thus, experimental manipulations or analyses of natural variation are definitely required to establish the dispensability of an enhancer.

The existence of multiple enhancers in cis-regulatory regions of developmental genes may be a widespread buffering mechanism against environmental or genetic variation. This type of cis-regulatory architecture could act in concert with other buffering mechanisms, such as the microRNA pathway (Hornstein and Shomron, 2006; Li et al., 2009).

Interactions Between Regulatory Elements

In general, small DNA fragments, isolated from their flanking DNA, are tested for cis-regulatory activity using reporter constructs. These reporter constructs are either injected into embryos to generate stable transformants (Venken and Bellen, 2007; Tasic et al., 2011; Frokjaer-Jensen et al., 2012), or electroporated (Funahashi and Nakamura, 2008; Vierra and Irvine, 2012), to observe the transient effect of the construct. In most cases, these small DNA fragments are sufficient to provide coherent functions in vivo. Undoubtedly, modularity is present to some extent in non-coding DNA and this characteristic can be exploited to further dissect the function of regulatory elements. Nonetheless, the fact that cis-regulatory elements perform discrete functions which can be tested with isolated DNA fragments does not mean that these elements act alone in their native loci. In fact, several recent reports suggest that interactions between cis-regulatory elements are commonplace, and may play relevant roles for the function of cis-regulatory regions.

The HoxD cluster of vertebrates is active during limb development. The precise spatial and temporal expression of HoxD genes in limb buds is vital for appendage formation. This cluster is known to be activated by two proximal enhancer regions named Prox and GCR (Gonzalez et al., 2007). Located toward the centromere, an enigmatic gene desert of 600 kilobases flanks the HoxD cluster. Recent work that investigated physical contacts between putative regulatory elements in the gene desert and HoxD genes uncovered a surprisingly high number of new regulatory elements and molecular interactions (Montavon et al., 2011). At least five regulatory elements dispersed in the gene desert interacted with the gene HoxD13 during digit development. Moreover, these regulatory elements interacted strongly among themselves during the same developmental period. Four of these noncoding elements, analyzed with lacZ reporter constructs in transgenic mice, showed enhancer activity in the developing distal limb. Therefore, in limb development, interactions between many regulatory elements seem to be essential for proper digit patterning. These physical interactions between regulatory elements are not a particular feature of the HoxD cluster. In T cells, the TNF gene is activated by two enhancers separated by 12 kilobases. These two enhancers physically interact when T cells become activated, generating a chromatin loop that promotes transcriptional activation (Tsytsykova et al., 2007). In the human β-globin gene, a locus control region (LCR) modulates gene expression. Within this control region there are several elements named MAREs, which contain binding sites for the transcription factors Maf and Bach1. Using atomic force microscopy, investigators showed that distant MAREs physically interact in vitro when Maf and Bach1 are present (Yoshida et al., 1999). These interactions between MAREs might be critical for chromatin structure and gene expression (Yoshida et al., 1999). Although not directly related to the control of gene expression, the case of the Igh locus provides another interesting example of interactions between noncoding elements (Guo et al., 2011). This genomic region contains multiple CTCF insulator elements whose interactions are necessary to produce correct V(D)J rearrangements (Guo et al., 2011).

In several cis-regulatory regions, active in a wide variety of developing organs, synergistic interactions between cis-regulatory elements have been documented. The combined effect of two or more regulatory elements can generate expression in new groups of cells, boost or down-regulate the expression of the target gene, or change the temporal patterns of transcription. A nice example of these phenomena comes from D. melanogaster leg development. In this organ, part of the expression pattern of the gene Distalless (Dll) is controlled by two distant enhancers named LT and M. These two enhancers drive very different expression patterns, and none of them alone recapitulates Dll native expression in the leg disc (Estella et al., 2008). However, when these two elements are combined in a lacZ reporter construct, the enhancer duo drives expression in all Dll-expressing cells (Estella et al., 2008). A similar behavior has been observed in the gene sloppy-paired-1 during Drosophila embryonic development (Prazak et al., 2010). This gene has two enhancers, named DESE and PESE, that drive incomplete or ectopic patterns when analyzed individually with a reporter construct. Again, a composite reporter containing both elements faithfully recapitulates sloppy-paired-1 native expression (Prazak et al., 2010). Finally, a relevant case is provided by the gene snail, whose structure was mentioned before to illustrate the function of “shadow enhancers.” As described above, this gene contains a proximal and a distal enhancer with similar activities in the early Drosophila embryo. Interestingly, it has been shown that these two enhancers work in a nonadditive manner; the proximal enhancer reduces the activity of the distal enhancer to fine-tune expression levels (Dunipace et al., 2011).

The Unknown Spatial Distribution of cis-Regulatory Information

The physical limits of cis-regulatory regions are difficult to demarcate. Currently, it is possible to define regions that contain regulatory elements that affect the expression of a particular gene or group of genes. However, for a developmental gene with a complex expression pattern it would be dubious to state that “the cis-regulatory region of gene X starts in nucleotide A and ends in nucleotide B.” Developmental genes bear substantial regulatory information, and, so far, we ignore whether the spatial distribution of regulatory elements follows a particular set of rules. In this regard, recent reports show that some transcriptional enhancers maintain their positions in cis-regulatory regions of distant species (Hare et al., 2008; Cande et al., 2009; Frankel, Wang, and Stern, unpublished). These data suggest the existence of constraints in the architecture of cis-regulatory regions, although not much more is known about the large scale organization of cis-regulatory regions. If we zoom in on smaller regions, we realize that the physical limits of individual cis-regulatory elements are difficult to define as well. In particular, transcriptional enhancers can have diverse architectures. These elements can have a compact configuration, where transcription factor binding sites are close together. Alternatively, transcription factor binding sites can be spread over larger regions, forming separate functional units. These two models have been named the “enhanceosome” and the “billboard,” respectively (Arnosti and Kulkarni, 2005). Certainly, these models represent different loci in a continuum of enhancer configurations. However, not all enhancers may have a somehow modular architecture. In theory, different functional units, all necessary to drive coherent expression, could be separated by many kilobases. This could be the potential architecture of a “femur enhancer” acting on the D. melanogaster gene Ultrabithorax (Davis et al., 2007). In the latter study, a comprehensive survey of a large genomic region failed to isolate an enhancer of Ultrabithorax, suggesting that this enhancer might be structurally complex (Davis et al., 2007). In the case of the “stripes enhancer” of the Drosophila runt gene, a large element is necessary to drive a coherent expression pattern (Klingler et al., 1996). Specifically, part of the embryonic expression of runt is regulated by a ∼5 Kb enhancer, which cannot be subdivided into smaller functional sub-elements (Klingler et al., 1996). It is conceivable that more of these complex enhancers exist in animal genomes. Hence, in some instances, the inability to isolate enhancers could be due to its complex nature. Unfortunately, “negative results” in the search for enhancers are difficult to interpret and will very likely remain unpublished.

A detailed study of the cis-regulatory region of the gene Distalless (Dll) nicely exemplifies the topic discussed in the above paragraph (Fig. 2). As mentioned in previous sections, this gene contains cis-regulatory elements, able to work in isolation, that are active in D. melanogaster leg discs. These regulatory elements are separated by 12 Kb of DNA without enhancer activity in leg discs (Estella et al., 2008). A recent study revealed that at least four different regions within these 12 Kb harbor functional binding sites for the transcription factor GAF (Fig. 2A; Agelopoulos et al., 2012). GAF (the GAGA factor, encoded by the gene trl) is a protein involved in modifying chromatin structure, among other things (Adkins et al., 2006). It was also shown that the whole Dll cis-regulatory region appears to be in a compact state in thoracic leg primordia, where the gene is active, whereas the same region is in a loose chromatin state in abdomen cells, where Dll is repressed by Hox proteins (Fig. 2B; Agelopoulos et al., 2012). The compact conformation of Dll cis-regulatory region in thoracic leg primordia suggests that this large region acts as a single functional unit (Agelopoulos et al., 2012). Certainly, the cis-regulatory region of Dll turned out to be much more complex than previously thought. Initially, its function was explained by the activity of a couple of distant regulatory elements. However, a deeper examination of the locus, performed with a methodology that involves cell-type-specific chromatin immunoprecipitation (cgChIP; Agelopoulos et al., 2012), demonstrated that it is a large functional unit with cis-regulatory information dispersed over 14 Kb. The insights on Dll regulation also highlight the relevance of chromatin conformation in the control of gene expression. This topic is the theme of the following section.

Figure 2.

The architecture of the Distalless (Dll) locus in D. melanogaster. A: Linear view of the Distalless locus. Black boxes demarcate enhancer regions (LT/304 and M). Yellow boxes indicate regulatory elements without enhancer activity (I1, I2, I3, and I4). The GAF protein, represented here by a red circle, can bind to all six of these regulatory regions. B: (Above) In the thoracic cells that will give rise to legs, the chromatin of the Dll locus is in a compact state, likely due to GAF activity. In this conformation the gene is active. (Below) In contrast, Dll is inactive in the homologous abdominal cells, which do not form legs. In these cells, the chromatin is in an extended conformation. The stoichiometry of GAFs in relation to DNA elements is purely hypothetical. Redrawn with permission from Agelopoulos et al. (2012).

Epigenetic Marks, Chromatin Conformation, and cis-Regulatory Activity

Until a few years ago, most studies of cis-regulation were “epigenetics free.” Isolated DNA fragments were analyzed with reporter constructs present in episomes or integrated in random genome locations. In consequence, the chromatin structure of the native locus was not considered as a variable affecting the activity of a particular regulatory element. This bias is probably due to historical and technical reasons; a decade ago, epigenetics was not as omnipresent and easy to tackle experimentally as today. It is now clear that the chromatin state of both regulatory elements and regulatory-element-flanking DNA is important for the regulation of gene expression. Hence, the activity of a regulatory element should ideally be analyzed in the proper genomic context. It could be argued, though, that isolated regulatory elements are subjected to a certain degree of epigenetic regulation. However, this potential regulation could be very different from that occurring at the native location, and, as mentioned before, it does not have the influence of flanking chromatin.

The study of epigenetics has experienced exponential growth in recent years, making the field a rapidly changing ground. The current prominence of the field has motivated the inclusion of epigenetic aspects in some experiments of gene regulation. Moreover, recent technical advances have augmented the resolution of analyses, allowing researchers to obtain deeper insights on epigenetic regulation. In the following paragraphs, I summarize some of the numerous epigenetic processes that impact cis-regulatory activity.

The posttranslational modifications of nucleosomal proteins constitute a vital part of gene regulation. Histones are subjected to a large number of posttranslational modifications, being methylation and acetylation the most common epigenetic marks. The action of histone acetylases (HATs), deacetylases (HDACs), methyl-transferases (HMTs), and demethylases (HDMs) generates part of the complex epigenetic landscape, also known as the histone code. Histone marks and chromatin remodeling complexes modify chromatin conformation, regulating the accessibility of transcription factors to cis-regulatory regions. Some epigenetic marks have been associated with specific groups of regulatory elements; in humans (Heintzman et al., 2009), zebrafish (Aday et al., 2011), and flies (Negre et al., 2011) the mono-methylation of histone H3 lysine 4 (H3K4me1) is usually found in active enhancer regions, close to areas of open chromatin. A recent study showed that the activity of the protein LSD1 (a demethylase of H3K4 and H3K9) is essential for differentiation of mouse embryonic stem cells (Whyte et al., 2012). With the progression of embryogenesis active enhancers in embryonic stem cells experience a decrease in histone acetylation. In turn, this low level of acetylated histones triggers LSD1-mediated H3K4 demethylation. This modification silences enhancers of genes that need to be shut-down for cell differentiation (Whyte et al., 2012). In addition to the action of histone marks, local chromatin conformation can be influenced by the composition of nucleosomes (Rangasamy et al., 2003; Raisner et al., 2005). It has been reported that histone variants H2A.Z and H3.3 are associated with active enhancers and insulators in the human genome (Jin et al., 2009).

The activity of regulatory elements can be modulated by local chromatin modifications (exerted by nearby nucleosomes) or global chromatin changes, which affect the structure of large regions. In metazoans, GATA and Polycomb proteins interact with specific DNA-motifs in cis-regulatory regions and modify the conformation of chromatin. Proteins of the Polycomb group, regulate and interact with epigenetic marks, controlling gene expression programs through cell divisions (Schwartz and Pirrotta, 2008). It has been shown that Polycomb-repressive complexes mediate chromatin compaction in flies and mice (Grau et al., 2011). This large-scale chromatin change represses the expression of many developmental genes (Isono et al., 2005; Boyer et al., 2006; Lee et al., 2006; Lanzuolo et al., 2007; Sing et al., 2009). Another group of proteins, the GATA factors, regulate the conformation of chromatin, influencing gene expression (Vakoc et al., 2005). For example, at the Kit locus, the exchange of GATA proteins causes the formation of different chromatin loops that affect transcriptional activity (Jing et al., 2008). In immature erythroid cells, a long chromatin loop mediated by GATA-2 promotes the interaction of an enhancer with the basal promoter. Upon maturation, GATA-1 replaces GATA-2, favoring a repressive loop that down-regulates gene expression (Jing et al., 2008).

DNA methylation also has a big impact on gene expression. The phenomenon of imprinting, where an organism expresses only the maternal or paternal allele, is caused by differential DNA methylation (Feng et al., 2010). Mammalian clusters of imprinted genes contain cis-regulatory elements called imprinting control regions (ICRs) which have parental-specific DNA methylation (Barlow, 2011). Besides its role in imprinting, DNA methylation regulates gene expression in lineages of somatic cells during embryogenesis (Oda et al., 2006; Illingworth et al., 2008). Methylation of cytosines in CpG dinucleotides is a common epigenetic mark in vertebrate cis-regulatory regions (Deaton and Bird, 2011). Currently, it is known that CpG methylation changes chromatin architecture, affecting the transcriptional potential of a gene (Blackledge and Klose, 2011).

The mechanisms described in the above paragraphs represent just a part of epigenetic regulation (for a more complete picture see Gibney and Nolan, 2010). Unquestionably, epigenetic processes are complex and dynamic. Furthermore, there seems to be a significant interplay between different epigenetic marks and epigenetic regulators; methylated CpGs, histone modifications, and Polycomb complexes cross-talk in cis-regulation (Brinkman et al., 2012; Cedar and Bergman, 2012; Ong and Corces, 2012). Accordingly, it has become clear that developmental genes undergo rapid chromatin changes, orchestrated by several interacting players (Hunkapiller et al., 2012).


I hope I have convinced the reader that cis-regulatory regions are more than a collection of modular cis-regulatory elements in a string of DNA. Nowadays, it has become evident that the genetic architecture and the epigenetic landscape of cis-regions have multiple interrelated pieces. Part of this complex functionality has been depicted in previous sections, but there are many more things that we would like to know. Indeed, we lack the “big picture” of cis-regulation; there are many separate bits that need to be put together. Here are some relevant questions that need answers: (i) Is the activity of shadow enhancers additive or interactive? Are shadow enhancers always regulated by the same transcription factors, as in the case of Dorsal target genes? (ii) Are there cis-regulatory elements with new functions yet to be discovered? Are there genetic elements and/or epigenetic marks that delimit cis-regulatory units? (iii) Are there architectural rules in cis-regulatory regions? What are the constraints that shape cis-regulatory evolution? (iv) What are the molecular mechanisms underlying interactions between regulatory-elements? (v) Can we measure changes in regulatory-element activity during development? Can we track chromatin conformation dynamics within cis-regulatory regions? Can we integrate these data in dynamic models of cis-regulation?

Luckily, new techniques can help answer some of these questions. A map of interactions between distant elements can be obtained using 3C (chromatin conformation capture; Miele and Dekker, 2009) and Hi-C (Lieberman-Aiden et al., 2009). Moreover, these techniques can aid in the discovery of new regulatory elements and, thus, provide a better picture of the physical limits of cis-regulatory regions.

To avoid biological noise when analyzing chromatin states or interactions, it is very important to examine a homogeneous group of cells (only those cells in the embryo where the gene in question is active). An interesting approach to overcome this difficulty is to FACS-sort embryonic cells expressing the right markers (Spencer et al., 2011; Bonn et al., 2012; Zhou and Pu, 2012). This way, it is possible to obtain accurate temporal maps of chromatin changes.

Another methodological approach is the use of BACs (bacterial artificial chromosomes) containing putative cis-regulatory regions. BACs accommodate large DNA regions that can be modified in many ways through BAC recombineering (Warming et al., 2005). Among other modifications, it is easy to add a reporter gene to a BAC. This way, it is possible to monitor the expression driven by large chunks of DNA, and compare this expression with that of the native gene. Furthermore, it is feasible to mutate known regulatory elements or regions of unknown function in the context of a large DNA region.

The techniques mentioned above (combined with traditional methods such as the functional dissection of regulatory regions with reporter assays) are fundamental to get a comprehensive view of cis-regulatory activity. Surely, these methodologies coupled to wise experimental designs will generate major discoveries in the field. However, current techniques are not sufficient to obtain a dynamic view of cis-regulatory activity in live embryos. Thus, a major breakthrough would be the development of a robust assay to monitor cis-regulatory interactions in real-time.


I thank Ella Preger, David Stern, Justin Crocker, and two anonymous reviewers for helpful comments on the manuscript. My lab is supported by reinstallation grants from The Pew Charitable Trusts and Fundación Bunge y Born. I am a career investigator of the Argentine National Research Council (CONICET).