Methyl-CpG-binding proteins

Targeting specific gene repression


E. Ballestar, Department of Biochemistry and Molecular Biology, University of Valencia, C/Dr Moliner, 50. 46100 Valencia, Spain. Fax: + 34 963 864 635, Tel.: + 34 963 864 385, E-mail:


CpG methylation, the most common epigenetic modification of vertebrate genomes, is primarily associated with transcriptional repression. MeCP2, MBD1, MBD2, MBD3 and MBD4 constitute a family of vertebrate proteins that share the methyl-CpG-binding domain (MBD). The MBD, consisting of about 70 residues, possesses a unique α/β-sandwich structure with characteristic loops, and is able to bind single methylated CpG pairs as a monomer. All MBDs except MBD4, an endonuclease that forms a complex with the DNA mismatch-repair protein MLH1, form complexes with histone deacetylase. It has been established that MeCP2, MBD1 and MBD2 are involved in histone deacetylase-dependent repression and it is likely that this is also the case for MBD3. The current model proposes that MBD proteins are involved in recruiting histone deacetylases to methyl CpG-enriched regions in the genome to repress transcription. The lack of selectivity for MBD association with particular DNA sequences indicates that other mechanisms account for their recruitment to particular regions in the genome.


methyl-CpG-binding domain


green fluorescent protein


transcriptional repression domain

Methylation at CpG dinucleotides, the most abundant epigenetic modification in vertebrate genomes, plays an essential part in the control of gene expression. In vertebrates, somatic genomes are globally methylated, with the exception of CpG islands. CpG islands are GC-rich regions of DNA, stretching for an average of about 1 kb, which are coincident with the promoters of ≈ 60% of human RNA polymerase II-transcribed genes. Aberrant or accidental methylation of CpG islands in the promoter of many cancer-related genes results in silencing of their expression.

The way by which nuclear factors interpret the information encoded by a particular pattern of methylation to regulate gene expression has been studied for over 30 years. Early analysis of the role of methylation using tissue-specific genes introduced into mammalian cells by transfection led to a general consensus that DNA methylation directs the formation of nuclease-resistant chromatin, leading to repression of gene activity [1–4]. In native chromatin, it has been demonstrated that DNA sequences enriched in 5-methylcytosine are refractory to digestion by micrococcal nuclease and other nucleases that can cleave CpG [5,6]. A model that could accommodate both gene repression and altered chromatin structure hypothesises the existence of nuclear factors that bind differently to a methylated gene, leading to an altered chromatin structure, which would in turn deny access to the transcription machinery. This model prompted a search for proteins with selective affinity for methylated DNA.

In the early 1990s, two activities with affinity for methylated DNA were found, namely MeCP1 and MeCP2. MeCP1 was originally identified as a nuclear factor that could discriminate between methylated and unmethylated DNA using band-shift assays [7]. Radioactive DNA molecules that contained 12 or more symmetrically methylated CpG pairs specifically complexed with MeCP1 leading to a shift in electrophoretic mobility. MeCP1 appeared to be a large multisubunit complex, the composition of which has remained controversial [8,9]. MeCP2 was the first true member of the family of proteins that selectively recognize methylated CpG. It is a chromatin-associated nuclear protein of molecular mass ≈ 55 kDa which, in contrast with the behaviour originally reported for MeCP1, is able to bind DNA that contains a single symmetrically methylated CpG pair [10]. Also, MeCP2 was found to bind to chromosomes at sites known to contain methylated DNA [11]. In particular, in mouse chromosomes this is visualized as prominent binding to the highly methylated major satellite located just proximal to the centromeres.

MeCP2 as the archetypical methyl-CpG-binding protein

Bird and colleagues [10] characterized by deletion analysis the minimal region of MeCP2 required for binding to methylated DNA, thereby defining the so-called methyl-CpG-binding domain (MBD). A short region of MeCP2 containing about 70 residues located within its N-terminal third retained the ability to bind selectively methylated DNA. Subsequent work has shown that the MBD of MeCP2 binds DNA with significantly reduced affinity compared with the full length. The amino-acid sequence of the MBD contains no recognizable DNA-binding motifs, suggesting that the MBD may constitute a DNA-binding domain itself. Intact MeCP2, however, contains several short motifs that have been seen in proteins that bind to the minor groove of AT-rich DNA sequences. These motifs, which are excluded from the MBD, may contribute to non-specific binding. In fact, the selectivity for methylated DNA increases as regions containing those motifs are deleted [10].

The fact that the MBD recognizes a symmetrically methylated CpG dinucleotide suggested that dimerization may be required for the binding of MeCP2. Nan et al. [10] proved that no MBD heterodimer complex is formed, when MBDs of distinguishable size are mixed with probe DNA. In addition, the ability to use the South-Western assay for MeCP2, in which proteins are immobilized after SDS/PAGE also argues that dimerization is not required for binding to methylated sites.

A family of methyl-CpG-binding proteins

A database search for sequence homologous to the MBD led to the identification of a protein containing an MBD-like motif located at its N-terminus [8]. This protein, originally called protein containing MBD (PCM1), was renamed MBD1. MBD1 was shown to bind methylated DNA and to repress transcription from a methylated promoter in vitro. It was initially believed to be a component of the MeCP1 complex [8]. An additional search of an EST database found three more genes in mammalian cells that encode proteins containing putative MBDs, namely Mbd2, Mbd3 and Mbd4 [12]. Alignment of the MBD-like regions from the murine MBD1 to MBD4 and MeCP2 proteins showed that two subgroups could be established. The MBD of MBD4 is most similar to that of MeCP2 in primary sequence, while the MBDs of MBD1, MBD2 and MBD3 are more similar to each other than to those of either MBD4 or MeCP2 ( Fig. 1). The presence of an intron located on a conserved position in all five genes [12] indicates that the MBDs within each protein are evolutionarily related.

Figure 1.

Sequence alignment of the MBD of human MeCP2, MBD1, MBD2, MBD3, Xenopus MBD3 and human MBD4. Positions of β-strands (arrows), loops (thick lines), and the α-helix (rectangle) defined by the solution structures of MeCP2 [21] and MBD1 [18] are indicated above the alignment. MeCP2 numbering is located above human MeCP2 sequence. General numbering for MBDs is located below all the sequences. Conserved residues are shaded and those essential for binding to methylated DNA are indicated by an asterisk. Four Rett syndrome mutants occuring in the MBD are indicated by grey circles.

Analysis of MBD expression in numerous murine tissues showed that mbdgenes are expressed in all samples tested [12], and, only embryonic stem cells, where DNA methylation is known to be dispensable [13], seem to have low levels of mbd1 and mbd2 transcripts.

The sequence similarity between these five proteins is largely limited to their MBD, although MBD2 and MBD3 are similar and share about 70% of overall identity over most of their length. The greatest divergence occurs at the C-terminus, where MBD3 has 12 consecutive glutamic acid residues encoded by an imperfect trinucleotide repeat. This characteristic is conserved in the rat and human MBD3. The absence of any other known sequence motifs within MBD2 or MBD3 provides no clues as to the activities of these two proteins.

MBD1, a 70-kDa protein and the largest member of the MBD family, has its MBD on the N-terminus and two or three cysteine-rich regions (CxxC motifs) that are related to those in DNA methyltransferase protein 1 and the mammalian trithorax-like protein HRX [8]. The exact number of these motifs present in MBD1 depends on alternative splicing. There is also another alternative splicing event, detected in cDNA derived from mouse brain, where the third exon (encoding the C-terminal half of the MBD) is removed [12]. Furthermore, there is a third alternative splicing event resulting in the replacement of the terminal 20 amino acids with an alternative 44 amino-acid terminus. The physiological relevance of these variants remains to be explored.

There are two potential forms of MBD2, corresponding to initiation of translation at either the first (MBD2a, 43.5 kDa) or second (MBD2b, 29.1 kDa) methionine codons. The mbd2 gene is also subjected to alternative splicing events that can produce nonsense transcripts [12]. The shortest form of MBD2, MBD2b, has been proposed to be a DNA demethylase. Szyf and colleagues [14] have reported that MBD2b has the ability to demethylate 5-methylcytosine after the in vitro translation of mRNA derived from an expressed sequence tag cDNA. The appealing biochemistry and enzymatic activities of this protein has led to subsequent tests of demethylase activity by other groups. To date, attempts to reproduce results by Bhattacharya et al. [14] have been unsuccessful [9,15].

MBD3 also has variants produced by alternative splicing. The most abundant is a 32-kDa protein which shares high homology to MBD2b (80% similar, 72% identical). The second variant contains an insertion of a small exon (20 amino acids) in the MBD, with the rest of its sequence being identical with that of the short form of MBD3. These two MBD3 variants have been detected in human, mouse and Xenopus systems [15,16]. Considering its high similarity to MBD2b, demethylase activity has also been tested for MBD3 [15]. No demethylase activity was detected.

MBD4 is a 62-kDa protein with the MBD located close to its N-terminus. There is also evidence of alternative splicing events in MBD4, although none of them affect the MBD region. Database searches of the C-terminal region of MBD4 showed homology with bacterial DNA-repair enzymes. Among the related proteins are the 8-oxoGA mispair-specific adenine glycosylase MutY of Escherichia coli, the GT mismatch-specific glycosylase Mig of Methanobacterium thermoautotrophicum, the thymine glycol glycosylase EndoIII of E. coli, and the photodimer-specific UV-endonuclease of Micrococcus luteus. In fact, it has recently been proved that MBD4 can efficiently remove thymine or uracil from a mismatched CpG site in vitro, and the combined specificities of binding and catalysis indicate that this enzyme may function to minimize mutation at methyl-CpG [17].

MBDs bind to methylated DNA

As the MBD proteins were identified because they contained a domain homologue to the MBD of MeCP2, the immediate property to test was their ability to recognize selectively methylated DNA. Hendrich & Bird [12] used gel retardation assays to demonstrate that MBD1, MBD2 and MBD4 are able to form specific complexes with methylated DNA in vitro. In contrast, specific complexes were not reported to form between MBD3 and methylated DNA. These results were complemented by transfection studies with green fluorescent protein (GFP) constructs of the MBD proteins. Overexpressed MBD1–GFP, MBD2–GFP, and MBD4–GFP preferentially localized to regions of the genome known to be highly methylated in vivo, such as major satellite DNA. This observation is consistent with their ability to bind methylated DNA in band-shift assays. The MBD3–GFP fusion protein was shown to accumulate in many nuclear foci in cells expressing large amounts of the fusion protein. These foci do not coincide with major satellite DNA, indicating that MBD3 does not prefer to associate with the highly methylated major satellite DNA in mouse cells. The lack of ability to recognize methylated DNA by MBD3 is somehow surprising considering the high similarity to MBD2b, especially within their MBDs.

Wade et al. [15] have identified a Xenopus homologue of the mammalian MBD3. Under conditions in which mammalian MBD3 has little specificity for methylated DNA, Xenopus MBD3 is able to bind with high selectivity to methylated DNA, as demonstrated by both South-Western and mobility-shift assays [15]. There are two regions where Xenopus and mammalian MBD3 are divergent. Xenopus MBD3 has a smaller number of acidic residues in its C-terminus than the highly acidic C-terminal tail of mammalian MBD3. This difference may explain the reduced affinity of the mammalian MBD3 for DNA. However, deletion of the C-terminus of mammalian MBD3 did not change the distribution of MBD3–GFP constructs [12]. Another explanation for the different behaviour between mammalian and Xenopus MBD3 may be point substitutions in highly conserved residues within the MBD. Ohki et al. [18] have established that point mutations, in highly conserved residues within the MBD, dramatically impair the ability to recognize selectively methylated DNA. Moreover, Amir et al. [19] described the presence of point mutations within the MBD of MeCP2 in patients with Rett syndrome, a childhood neurodevelopmental disorder. Most of these mutations completely abolish the binding of MeCP2 to methylated DNA [20]. These results support the hypothesis that point substitutions in the MBD of mammalian MBD3 may impair recognition of methylated DNA. There are two residues in the MBD of mammalian MBD3 that have been replaced by similar residues, which may account for a reduction in affinity for methylated DNA. One is residue 30, which is a lysine or arginine in the rest of the MBD family, including Xenopus MBD3. Mammalian MBD3 has a histidine instead. The second is located at position 34. This is a tyrosine for each member of the MBD family, except for mammalian MBD3. Although these two substitutions are quite conservative, they may well explain a significant decrease in selectivity for methylated DNA. This possibility remains to be tested.

The recent determination of the solution structure of the MBD has shed light on the type of interactions between the MBD motif and a methylated CpG pair. Ohki et al. [18] have characterized the structure of the MBD of the human methylation-dependent repressor MBD1. Independently, Wakefield et al. [21] solved the solution structure of the MBD of MeCP2. NMR data of the MBD and the results of chemical-shift perturbation with DNA led to similar models of the structure of the domain and its interaction with methylated DNA. Both groups have described a very similar structure consisting of a wedge shape with four antiparallel β-strands which constitute one face of a wedge. The two longer β-strands are proposed to interact with the major groove of the DNA, where a methyl group would be located ( Fig. 2). Although the sequences from MBD1 and MeCP2 show only moderate homology, alignment can be easily made with a number of conserved residues throughout the MBD. The current model proposes that the interaction between MBD and methylated DNA takes place along the major groove of a standard B-form DNA ( Fig. 2). Also the residues between β4 and α1 seem to establish contacts with the phosphate backbone. A number of residues in the MBD are conserved throughout the entire family of MBD proteins, and it is likely that the structure and type of interaction with methylated DNA are essentially the same.

Figure 2.

Model for the interaction between MBD and methylated DNA. Co-ordinates for the MeCP2 MD and the CpG helix were used to construct this model (accession numbers 1qk9 and 329G, respectively). Four Rett MBD mutations are shown. Reprinted, with permission, from [20]. Copyright 2000 American Chemical Society.

When the MBD of MeCP2 was delineated by deletion analysis, Nan et al. [10] observed a strong non-specific binding activity when the flanking regions to the MBD were added. Attempts to find any sequence specificity in the binding of MBDs to DNA have been unsuccessful to date. If MBDs are the only driving force to target histone deacetylases activities to methyl-enriched regions, it is difficult to envisage what makes them specific to certain regions, taking into account that all the MBDs are able to bind to a single methyl CpG. Originally, MeCP1 was reported to require at least 10 CpGs to bind methylated DNA, although MBD2, the MBD component of MeCP1, is able to bind a single CpG [12]. It is possible that different requirements with regard to the density of methyl-CpGs may account for differences in the targeting to a particular gene. It has been observed that different sequences containing different distributions of CpG seem to have different affinities for MBD proteins (our unpublished work); however, there is no direct evidence that the sequence around a CpG influences binding of an MBD protein. It is also possible that differences in the dissociation constants of the complex with methylated DNA between the different MBDs distinguish the binding of different MBDs to regions with different CpG density.

Early biochemical studies (salt extraction and nuclease treatment) demonstrated that MeCP1 and MeCP2 are associated with chromatin [22]. Staining with antibodies to MeCP2 has shown that the distribution of MeCP2 along the chromosomes parallels that of methyl-CpG. In mouse, for example, MeCP2 is concentrated in pericentromeric chromatin, which contains a large fraction (about 40%) of all genomic 5-methylcytosine. Also, MBD1 localizes to the hypermethylated region of chromosome 1q12 as well as 4,6-diamino-2-phenylindole (DAPI)-bright regions in the nucleus of human cells [23].

Several factors that bind DNA are unable to access their binding sites when nucleosomes are present [24–27]. Evidence suggests that the removal or remodelling of nucleosomes near a promoter is a prerequisite for assembly of a functional transcription complex (reviewed by Struhl [28]), presumably because this removes a block to factor binding. MeCP2 does not appear to require prior disruption of nucleosomal chromatin to bind to the genome. At a gross level, Nan et al. [29] observed that large numbers of extra MeCP2 molecules produced during transient expression found appropriate binding sites in mouse heterochromatin. The simplest explanation for this is that MeCP2 can directly access sites in chromatin. In fact, it has also been shown that MeCP2 is able to bind nucleosomal DNA to form discrete complexes [30]. These observations provide a molecular mechanism by which MeCP2, and presumably the other MBDs, can gain access to chromatin in order to target corepressor or coactivator complexes that further modify chromatin structure.

MBDs repress transcription

A key aspect to understanding the function of MBD proteins is their ability to repress transcription. As mentioned above, methylated DNA is associated with transcriptional repression and inactive chromatin. One of the early models constructed to interpret how methylated DNA affects transcriptional status hypothesized the existence of factors that bind differently to methylated and non-methylated DNA. The binding of these factors could then lead to an altered chromatin structure, which would deny access to the transcriptional machinery.

One could see the finding of MeCP2 as a transcriptional repressor in chromatin as a two-step breakthrough. A first approach showed that MeCP2 represses transcription in vitro and in vivo from methylated promoters but does not repress non-methylated promoters [11,31]. Nan et al. [11] were able to identify a region capable of long-range repression in vivo, the transcriptional repression domain (TRD). Kaludov & Wolffe [31] found that the TRD of MeCP2 could repress transcription at least in part through interactions with a key component of the basal transcriptional machinery, TFIIB.

On a second step, the TRD was found to be associated with a corepressor complex containing the transcriptional repressor Sin3 and a histone deacetylase [32,33]. Silencing conferred by MeCP2 and methylated DNA can be relieved by inhibition of histone deacetylase, by Trichostatin A. This finding finally established a direct causal relationship between DNA methylation-dependent transcriptional silencing and the modification of chromatin.

Once the remaining members of the MBD family were discovered, similar approaches were followed to test their properties. In fact, MBD1, MBD2 and MBD3 are also associated with histone deacetylases [9,15,16,34].

MBD1 was initially shown to be a component of the MeCP1 repressor complex [8]. The MeCP1 complex also contains the deacetylases HDAC1 and HDAC2, and the RbAp48/46 histone-binding protein. In a later report, MBD2 [9] was shown to be the actual MBD protein contained in the MeCP1 complex. The explanation for this discrepancy is that the antibody to MBD1 that supershifted the MeCP1 complex was a polyclonal antibody raised against full-length MBD1. This antibody cross-reacted with MBD2 [34]. The use of highly specific antibodies against MBD1 proved not only that MBD1 does not belong to the MeCP1 complex, but also that it is associated with other deacetylases, and represses transcription in a deacetylase-dependent manner. The TRD of MBD1 is located at its C-terminus, and it is enriched in hydrophobic residues, unlike the highly basic TRD of MeCP2 [11]. Although deacetylases seem to be involved in the mechanism of repression through MBD1, neither HDAC1 nor Sin3 are likely to be responsible. The deacetylase-dependent pathway could be different from that utilized by the methylation-dependent repressor MeCP1 and MeCP2.

MBD1 represses transcription through the co-operation of the MBD, CxxC motifs and TRD [35]. One of the three CxxC motifs has DNA-binding capability, regardless of the methylation status. For this reason, the MBD1 variant containing this particular CxxC can also affect transcription from unmethylated or hypomethylated promoters.

On the other hand, MBD2 is the MBD component of the MeCP1 complex [11]. Highly specific antibodies to MBD2, which do not cross-react with MBD3, were used to detect MBD2 as a part of the MeCP1 complex. MBD2b was also proved to repress transcription in a deacetylase-dependent manner.

Similarly, MBD3 has also been identified as a component of a deacetylase complex with a nucleosome remodelling activity, known as the Mi-2/NuRD complex [15,16,36]. The Mi-2 complex contains six putative subunits including the known histone deacetylase Rpd3, RbA p48/p46 Mi-2, MTA1-like and MBD3. Mi-2 is a Snf2 superfamily member with nucleosome-dependent ATPase activity. In the Mi-2 complex, two disparate classes of chromatin regulators, namely a histone deacetylase and a Snf2 superfamily ATPase, are associated. It has not yet been proved that MBD3 represses transcription through mechanisms similar to those described for MeCP2, MBD1 or MBD2, but it is likely that this is the case. Association of MBD3 with histone deacetylases suggests similar behaviour to that observed for MBD2 and the MeCP1 complex.

Tatematsu et al. [37] have recently reported that MBD2 can form heterodimers with MBD3. Interestingly this MBD2–MBD3 complex seems to bind hemimethylated DNA and recruit histone deacetylases as well as DNA methyltransferase protein 1. This association would ensure the stable maintenance of a repressive state of chromatin.

In contrast, MBD4 is the only member of the MBD family known to be involved in a different class of processes. Recently it has been shown that it can efficiently remove thymine or uracil from a mismatch CpG site in vitro. The MBD of MBD4 binds preferentially to m5CpG × TpG mismatches, the primary product of deamination at methyl-CpG, suggesting that this enzyme may function to minimize mutation at methyl-CpG [17].

The physiological relevance for the existence of four different corepressor complexes involving an MBD protein remains unclear. The most straightforward explanation is that each complex is targeted to a different subset of genes. A summary of the different situations that MBDs could interpret is shown in Fig. 3. Differences in affinity constants for each MBD, distinctive requirements of methyl CpG distribution or density ( Fig. 3B–D), and the existence of regulatory subunits with DNA-binding or histone-binding activities may well explain differences in the association with different sequences of each complex. For instance, MeCP2 has the character of a structural component of the chromosome that ensures long-term silencing of methylated sequences. MBD2/MeCP1, on the other hand, is released from nuclei by low salt, suggesting that is not stably complexed with DNA. MeCP2 binds to a single methylated CpG, whereas MeCP1 requires densely methylated DNA. On the other hand, MBD1 also can affect transcription from unmethylated and hypomethylated promoters ( Fig. 3A). The structure of the chromatin ( Fig. 3E) may also regulate MBD recognition. The reasons for a specific role for each MBD-containing complex are still to be explored. The use of chromatin immunoprecipitation with specific antibodies to each MBD and immunocytological analysis should provide specific and powerful tools for identifying the set of genes controlled by each MBD-containing complex and thereby an approach to understanding the specific role of each of these complexes.

Figure 3.

MBDs target corepresor complexes to methylated DNA. On the left side are different situations that the MBDs may recognize. Empty circles represent unmethylated CpGs, whereas full circles correspond to their methylated status. (A) corresponds to hypomethylated DNA with occasional hemi-methylated CpGs. (B) corresponds to fully methylated sequence with a low density of CpGs. (C) and (D) are two sequences with a high number of CpGs but with different organizations that may be recognized by different MBDs. (E) includes the structure of the chromatin.


We thank Dr Luis Franco and Dr Luis Aragon-Alcaide for helpful comments on the manuscript.