The MLL1 gene encodes a large protein of 3969 amino acid residues that contains several conserved domains with functions implicated in chromatin-mediated transcriptional regulation  (Fig. 1). Domains include DNA-binding AT hooks, a cysteine-rich CXXC domain with homology to DNA methyltransferases, plant homeodomain (PHD) finger motifs, a bromo domain (BD), a transactivation domain (TAD), a nuclear receptor interaction motif (NR box), a WDR5 interaction or Win motif and a C-terminal SET domain, which is responsible for MLL1’s histone methyltransferase (HMT) activity [6,12,13]. Upon normal expression of the MLL1 gene, the full-length protein is proteolytically processed into two fragments, MLL-N and MLL-C, which associate to form a complex in vivo (Fig. 1A) [14,15]. The mature protein assembles with numerous regulatory proteins into multimolecular complexes important for MLL1’s transcriptional co-activator activity [12,16–21].
Figure 1. Schematic representation showing the domain architecture of the MLL1 protein. (A) The full-length MLL1 protein is rapidly processed by the Taspase 1 enzyme into MLL-N and MLL-C fragments, which reassociate through FYRN and FYRC motifs to form a stable complex. This mature protein then assembles with a number of proteins to form MLL1 complexes in the cell. (B) Known 3D structures of conserved MLL1 domains (colored green in each image). On the top, from left to right is the CXXC domain (PDB code: 2j2s) and the C-terminal SET domain (PDB code: 2w5z). On the bottom, from left to right is the MLL1 TAD domain (green) bound to the CBP : c-Myb complex (orange and blue, respectively; PDB code: 2agh); and the MLL1 Win motif (green) bound to the WDR5 protein (purple; PDB code: 3eg6).
Download figure to PowerPoint
Because of its large size, full-length MLL1 protein has thus far proven refractory to structural analysis. However, the modular nature of MLL1 has allowed structural analysis of some individual domains alone or in complex with functionally relevant ligands (Fig. 1B). Structures that have been determined include the MLL1 CXXC domain , a portion of the MLL1 TAD bound to the KIX domain of the cAMP response element-binding (CREB) binding protein (CBP) , a peptide from the Win motif of MLL1 bound to the WD40 repeat protein, WDR5 [24,25] and the C-terminal SET domain in the presence and absence of histone peptides and the cofactor product, S-adenosyl-homocysteine (AdoHcy)  (Fig. 1B). These structures provide clues as to how MLL1 is targeted to MLL1-dependent genes and how MLL1’s enzymatic activity is regulated.
The molecular mechanisms by which the MLL1 protein is recruited to specific target genes are poorly understood. The CXXC domain of MLL1 binds selectively to nonmethyl CpG DNA , and is essential for target gene recognition, transactivation and myeloid transformation in MLL1 fusion proteins . Because the promoters of active genes in vertebrates are generally hypomethylated , the CXXC domain of MLL1 may play a role in targeting MLL1 to active genes. To identify the molecular basis of DNA recognition by the MLL1 CXXC domain, Allen et al.  determined the solution structure of the MLL1 CXXC domain consisting of amino acid residues 1146–1214, and used chemical-shift mapping and site-directed mutagenesis to identify residues involved in DNA recognition. The overall structure adopts an extended crescent-like shape that coordinates two zinc ions using the two conserved CGXCXXC motifs (Fig. 2A). The zinc ions are required for the structural integrity of the protein, as mutation of any of the cysteine residues involved in zinc coordination results in protein unfolding . The structure contains a positively charged surface groove containing a number of residues that were shown using chemical-shift mapping and site-directed mutagenesis to be important for DNA binding (Fig. 2A). The MLL1 CXXC domain binds to unmethylated CpG DNA with a dissociation constant of ∼ 4 μm, as measured by isothermal titration calorimetry , but does not bind to similar DNA-containing methyl-CpG dinucleotides – consistent with previous observations [27,28]. These studies suggest a model in which the phospho-backbone of DNA binds to the positively charged groove on the CXXC domain, whereas residues from the extended loop insert into the major groove to interact with the CpG dinucleotide . It is hypothesized that methylation of the CpG prevents the extended loop from interacting with the CpG dinucleotide, resulting in reduced affinity for DNA.
Figure 2. The CXXC and TAD domains of MLL1 help recruit MLL1 to target loci. (A) Transparent surface representation of the MLL1 CXXC domain (purple) determined by heteronuclear NMR spectroscopy (PDB code: 2j2s). A cartoon of the protein backbone is shown with zinc ions represented as spheres. The surfaces of amino acid residues perturbed by DNA binding in chemical shift and mutagenesis experiments are indicated in blue. The location of the extended loop is indicated with an arrow. (B) The CBP–KIX domain : cMyb binary complex. The CBP–KIX domain is shown in orange and the c-Myb transactivation domain is shown in blue (drawn from PDB code: 1sb0). Positions of E665 and E555 of the CBP–KIX domain, and residues K291 and R294 of the c-Myb transactivation domain are indicated. (C) The CBP–KIX:cMyb:MLL1 TAD ternary complex (drawn form PDB code: 2AGH). The MLL1 TAD is shown in green and the colors for the CBP–KIX:cMyb are as in (B). Upon formation of the ternary complex, residues E665 and E666 of the CBP–KIX domain become ordered and interact with the c-Myb transactivation domain (indicated with the arrow).
Download figure to PowerPoint
Although recognition of unmethylated CpG dinucleotides by the CXXC domain of MLL1 likely contributes to MLL1 targeting, as previously noted , several genes that are not regulated by MLL1 also contain unmethylated CpG dinucleotides in their promoters, indicating that other mechanisms contribute to target gene recognition by MLL1. A more recent structure of the TAD of MLL1 bound to the CBP protein describes one such additional mechanism that could also be involved in targeting MLL1 to specific loci.
The CBP protein and its homolog p300 are general transcriptional co-activators that contain histone and transcription factor acetylation activities . In addition, CBP contains a number of protein-binding domains that mediate transcription factor recruitment. The MLL1 TAD interaction with CBP was originally identified in a yeast three-hybrid screen using the CREB–CBP complex as bait , and was shown to be important for MLL1-mediated transcriptional activation . Domain mapping experiments localize MLL1’s interaction to the KIX or CREB-binding domain of CBP . The KIX domain of CBP is a structural platform that is capable of binding several different families of transcriptional activators , and evidence indicates that the KIX domain has the ability to simultaneously interact with at least two different polypeptides in a cooperative manner [31,32]. To identify the molecular basis of cooperative transcription factor binding by CBP, De Guzman et al.  determined the solution structure of a peptide derived from the MLL1 TAD bound to the KIX domain:c-Myb binary complex.
The overall structure of the c-Myb:KIX binary complex resembles a four-helix bundle in which the c-Myb peptide adopts a helical conformation that binds to helices α1 and α3 of KIX (Fig. 2B) . When the MLL1 TAD peptide is added to the binary complex, the TAD peptide adopts a helical conformation in which the conserved residues of MLL1 TAD (residues 2845–2853) bind in a hydrophobic groove on the opposite side of the KIX domain between helices α2 and α3 (Fig. 2C) . No direct interaction between the c-Myb and MLL1 peptides are observed when bound to the KIX domain, suggesting that the mechanism of cooperative transcription factor binding is transmitted through subtle conformational changes in the KIX domain . Consistent with allosteric binding, residues of the α3 helix of KIX that are disordered in the binary complex become ordered when MLL1 binds (see arrow in Fig. 2C). This conformational change results in the placement of conserved KIX domain amino acids E665 and E666 into positions for optimal electrostatic interactions with conserved residues R294 and K291 of the c-Myb transactivation domain, respectively. Thermodynamic binding experiments show that interaction of MLL1 with the KIX domain increases CBP’s affinity for the c-Myb transactivation peptide by approximately two-fold .
These experiments begin to provide a picture of how the recruitment of MLL1 can increase the binding of other important transcriptional activators that ultimately could result in the synergistic activation of gene transcription. In addition, cooperative transcription factor binding through CBP could provide a mechanism to help MLL1 recognize its target genes. MLL1 recruitment to chromatin results in the methylation of H3K4 by the SET domain of MLL1, an activity that is regulated in part by a core complex of proteins that includes WDR5, RbBP5, Ash2L and DPY-30 [26,34–37]. H3K4 methylation is an epigenetic mark correlated with transcriptionally active forms of chromatin . Several recent investigations have provided structural and functional information that describe how the HMT activity of MLL1 is regulated.
MLL1 contains an evolutionarily conserved SET domain which is found in a number of chromatin-associated proteins with diverse transcriptional activities . The SET domain is a HMT motif named for its presence in Drosophila chromatin regulators SuVar3-9, E(z), and Trx . SET domain proteins can be classified into several families that differ with respect to substrate specificity, processivity and the presence of associated domains, and include the SUV39, SET1, SET2, E(z), Riz, SMYD and SUV2-20 families . MLL1 belongs to the SET1 family of SET domain proteins, which are found in conserved multisubunit complexes that regulate cellular H3K4 methylation levels [9,41]. Because of the role of H3K4 methylation in diverse cellular processes ranging from stem cell differentiation to metazoan development and cancer, there has been an intense interest in understanding how SET1 family enzymes regulate H3K4 methylation.
To understand the structural basis of H3K4 methylation by the MLL1 SET domain, Southhall et al.  determined the X-ray crystal structures of a minimal MLL1 SET domain fragment in complex with its cofactor product AdoHyc in the presence and absence of a peptide mimicking the methylated histone H3 N-terminal tail (Fig. 3). Much like other SET domains where the structures have been determined , the overall structure of the MLL1 SET domain contains two canonical conserved regions, SET-N and SET-C, that are separated by a less conserved insert region (SET-I) (expanded region in Fig. 3A). The MLL1 SET domain is flanked on the C-terminus by a 22-amino acid post-SET domain, which provides several conserved residues that coordinate a zinc atom that is required for enzymatic activity (A Patel & MS Cosgrove, unpublished observation). In the ternary complex, the histone H3 peptide binds in a deep channel that divides a pair of acidic lobes, one of which is composed of residues from the SET-I region and the other of residues from the SET-C and post-SET regions. Lysine 4 of histone H3 is inserted into a channel, at the end of which is the AdoHcy binding site, which is composed of residues from SET-N, SET-C and the post-SET domain (Fig. 3A).
Figure 3. X-Ray crystal structure of the C-terminal MLL1 SET domain bound to AdoHcy (yellow) and histone H3 peptide (purple) (PDB code: 2W5Z). (A) At the top is a schematic representation of the full-length MLL1 protein and blown up is the construct used for crystallization of the MLL1 SET domain (residues 3785–3969). The SET-N, SET-I and SET-C sub-domains are colored in blue, yellow and green, respectively. The post-SET domain is colored in grey, and the N-flanking region is colored white. The positions of histone H3 and AdoHcy are indicated. (B) Crystal packing constrains the MLL1 SET domain into an open conformation. Surface representation of the MLL1 SET domain (grey) shown with a symmetry related molecule in red. The N-terminus of the symmetry related molecule interacts extensively with the SET-I region – constraining the MLL1 SET domain in an open conformation.
Download figure to PowerPoint
In published 3D structures of other SET domain proteins that also contain the canonical post-SET domain [43–46], formation of the ternary complex results in ordering of the post-SET domain, so that the two lobes that flank the peptide-binding site close around the peptide, presumably to exclude solvent from the active site. However, comparison of the binary and ternary complexes of the MLL1 SET domain crystal structures reveals that the two lobes remain in a relatively open conformation, which is not optimal for catalysis . It has been suggested on the basis of this observation that proteins that interact with the SET domain are required to induce the correct conformation of the active site , which is consistent with the poor catalytic activity of the isolated MLL1 SET domain. However, an analysis of crystal packing forces suggests that the SET-I lobe may be constrained in an unnatural conformation in the crystalline state by residues from the N-terminus of a symmetry related molecule (Fig. 3B). It therefore remains to be determined to what extent the observed conformation of the isolated MLL1 SET domain in the crystal structure represents the range of possible conformations that may exist in solution.
Consistent with the conformational change hypothesis, Southhall et al.  observed that addition of other components of the MLL1 core complex, namely WDR5, RbBP5, Ash2L and DPY-30, increases H3K4 methylation by ∼ 20-fold compared with that of the isolated MLL1 SET domain. However, the extent to which this 20-fold increase in enzymatic activity is because of a conformational change in the MLL1 SET domain is unclear at present. This is because the construct used to determine the structure of the MLL1 SET domain lacks the evolutionarily conserved Win motif in the region flanking the N-terminus of the SET domain , which has been shown to be essential for the assembly and dimethyltransferase activity of the MLL1 core complex [24,25,36]. In addition, recent work from our laboratory indicates that the non-SET domain components of the MLL1 core complex possess a previously unrecognized H3K4 methyltransferase activity that is independent of the MLL1 SET domain  (see below). It is therefore possible that the increase in H3K4 methylation activity observed by Southhall et al.  is due, at least in part, to the independent activities of the MLL1 SET domain and the sub-complex containing WDR5, RbBP5, ASH2L and DPY-30, which do not significantly interact in the absence of the MLL1 Win motif [25,36].
The WD40 repeat protein WDR5 is a conserved component of SET1 family complexes ranging from yeast to humans and has been shown to be important for H3K4 methylation and HOX gene expression in hematopoiesis and development . Recent studies have shown that WDR5 interacts directly with MLL1 or other SET1 family members and functions to bridge interactions between MLL1 and other components of the MLL1 core complex [20,48]. It has also been suggested that WDR5 functions within the MLL1 core complex as a histone-binding module that presents histone H3 for further methylation by MLL1 [47,49]. In an effort to identify the WDR5-binding surface in MLL1, two independent groups mapped the WDR5-binding site in MLL1 to a short six-residue conserved sequence in the N-flanking region of the MLL1 SET domain [25,36]. This sequence, called the Win or WDR5 interaction motif, is highly conserved among metazoan MLL1 orthologs and other SET1 family members . To determine the structural basis for the interaction between MLL1 and WDR5, two groups independently determined high-resolution crystal structures of WDR5 bound to peptides derived from the MLL1 Win motif [24,25]. Surprisingly, the structures reveal that the MLL1 Win motif forms a 310-helix that binds to the central opening of WDR5, the same site that was previously suggested to bind histone H3 (Fig. 4). Conserved arginine 3765 of the MLL1 Win motif inserts into the central opening and is stabilized by a number of hydrogen bond, cation–Pi and Pi–Pi interactions with conserved residues from WDR5. Consistent with a central role for the MLL1 Win motif in the interaction by WDR5, substitution of arginine 3765 with alanine in MLL1 abolishes the interaction between MLL1 and WDR5 [25,36]. Furthermore, the same amino acid substitution, or a synthetic peptide derived from the MLL1 Win motif abolishes the interaction between MLL1 and the WDR5–RbBP5–Ash2L sub-complex, which also results in loss of the H3K4 dimethylation activity of the MLL1 core complex . These results have led to a model in which the conserved Win motif of MLL1 and other metazoan SET1 family members functions to bind the WDR5 component of the WDR5–RbBP5–Ash2L sub-complex, which is required for the assembly and H3K4 dimethylation activity of the MLL1 core complex . These results also suggest that Win motif peptides or related compounds could have therapeutic value as inhibitors of SET1 family complexes.
Figure 4. X-Ray crystal structure of the MLL1 Win motif peptide in complex with WDR5. At top the domain architecture of full-length MLL1 is shown. The blown up portion shows a cut-away view of the MLL1 Win motif (green) bound to the central opening of WDR5 (PDB code 3EG6). The position of the conserved Arg 3765 is indicated. On the left, a stick representation is used to show the position of the MLL1 Win motif residues 3762–3770 (green) bound to the central opening of WDR5. MLL Win motif residue numbers are indicated.
Download figure to PowerPoint
Binding of the MLL1 Win motif to the central arginine-binding pocket of WDR5 raises questions about the proposed role of WDR5 in binding histone H3, at least while WDR5 is incorporated into the MLL1 core complex. This is because structure–function studies show that histone H3 and MLL1 compete for the same binding site on WDR5 (for a review see ). To reconcile these models, it has been suggested that the WDR5–MLL1 interaction in the MLL1 core complex may be displaced by the mono- or dimethylated H3K4 product of the MLL1 core complex in a potential feedback mechanism . Indeed, it has been demonstrated that H3 peptides that are mono- or dimethylated at H3K4 more efficiently disrupt the interaction between MLL1 and WDR5 than similar peptides that are unmodified or trimethylated at H3K4 . Because WDR5 is required for assembly of the MLL1 core complex [34,36], this model predicts that the mono- and dimethylated forms of H3K4 could potentially regulate assembly of the MLL1 core complex at specific loci . However, this hypothesis is difficult to reconcile with the high-affinity interaction between WDR5–MLL1 (estimated at 120 nm measured by analytical ultracentrifugation) , with the relatively weaker binding of the mono- and dimethyl H3K4 peptides to WDR5, for which a broad range of estimated dissociation constants have been reported in solution (∼ 7-115 μm for H3K4me1 and ∼ 5-77 μm for H3K4me2, as measured by isothermal titration calorimetry [50,51]). It remains to be determined if the H3K4me1 and H3K4me2 peptides can displace the WDR5–MLL1 interaction within the context of the holo-MLL1 core complex.
Mechanism of multiple lysine methylation by MLL1
SET domain enzymes differ in their ability to add one, two or three methyl groups to the epsilon amino group of a lysine side chain, a phenomenon that has been termed ‘product specificity’ . Structure–function studies have demonstrated that product specificity of SET domain enzymes is determined by the presence of a phenylalanine or tyrosine at a key position in the SET domain active site, called the Phe/Tyr switch position [44,52–55]. Enzymes with a Phe at the switch position have a relatively larger active site volume that can accommodate the addition of more than one methyl group to the lysine side chain. By contrast, SET domain enzymes with a tyrosine at the switch position have a relatively smaller active site volume and are predominantly monomethyltransferases. Although site-directed mutagenesis studies have validated the Phe/Tyr switch hypothesis for a number of SET domain enzymes [44,52], SET1 family enzymes appear to contradict this rule . This is because SET1 family enzymes are predicted to be monomethyltransferases based on the presence of a conserved tyrosine at the Phe/Tyr switch position. However, mono-, di- and trimethylation activities have been attributed to SET1 family complexes in vivo and in vitro . To resolve this paradox, it has been proposed that the product specificity of SET1 family enzymes is regulated by proteins that bind to and alter the conformation of the SET domain active site [26,56].
To test the conformational change hypothesis, we have developed an in vitro system to examine the enzymatic activity and product specificity of the MLL1 SET domain in the presence and absence of MLL1-interacting proteins WDR5, RbBP5, Ash2L and DPY-30 . This analysis reveals that the isolated MLL1 SET domain is a relatively slow H3K4 monomethyltransferase, which is consistent with the predictions of the Phe/Tyr switch hypothesis . Substitution of Tyr 3942 with phenylalanine in MLL1 converts MLL1 into a mono-, di- and trimethyltransferase , suggesting that Tyr 3942 largely limits the product specificity of wild-type MLL1 to that of a monomethyltransferase. By contrast, when WDR5, RbBP5, Ash2L and DPY-30 are added to the MLL1 SET domain, enzymatic activity increases ∼ 600-fold, but only to the dimethyl form of histone H3 , suggesting that the product specificity of the MLL1 core complex is that of a dimethyltransferase. Contrary to expectations, kinetic experiments suggest that the mechanism of multiple lysine methylation is distinct from that expected from a conformational change in the SET domain active site . To test the alternative hypothesis that one of the other components of the MLL1 core complex catalyzes dimethylation of H3K4, we assembled the MLL1 core complex with a catalytically inactive MLL1 SET domain variant, and discovered that the non-SET domain components of the MLL1 core complex possess a previously unrecognized HMT activity that catalyzes H3K4 dimethylation within the MLL1 core complex . In addition, it was shown that the non-SET domain components of the MLL1 core complex [WDR5, RbBP5, Ash2L and DPY-30 (WRAD)] possesses an H3K4 monomethyltransferase activity in the absence of the MLL1 SET domain . Because the WRAD components lack homology to a conserved SET or DOT1-like methyltransferase fold (Fig. 5A), WRAD represents a new class of non-SET domain HMTs. These results suggest that the mechanism of multiple lysine methylation by the MLL1 core complex involves the sequential addition of two methyl groups at two distinct active sites within the same complex (Fig. 5B).
Figure 5. New model for the mechanism of multiple lysine methylation by the MLL1 core complex (adapted from Patel ). (A) The MLL1 core complex is composed of two distinct H3K4 methyltransferases each possessing H3K4 monomethylation activity on their own. The dashed oval on the WDR5–RbBP5–Ash2L–DPY-30 sub-complex indicates that the catalytic motif is currently unknown, and may be shared between subunits. (B). WDR5’s recognition of the MLL1 Win motif results in the assembly of the MLL1 core complex, which possesses H3K4 dimethyltransferase activity. We suggest that the MLL1 SET domain catalyzes monomethylation of histone H3 at lysine 4, which is followed by transfer of the monomethylated histone H3 to a second active site on the WRAD sub-complex, where H3K4 dimethylation occurs. We propose that mechanisms that control the assembly of the MLL1 core complex will be important for the regulation of H3K4 methylation states in the cell.
Download figure to PowerPoint
The lack of H3K4 trimethylation by the in vitro assembled MLL1 core complex is surprising . This observation is in contrast to previous results suggesting that an insect cell immunoprecipitated complex containing MLL1, WDR5, RbBP5 and Ash2L represents the minimal complex required for H3K4 trimethylation activity [34,37]. A possible reason for this discrepancy could be the different assays used to quantitate the degree of H3K4 methylation in enzymatic reactions . In previous investigations [34,37], the degree of H3K4 methylation was monitored with methylation-state-specific antibodies, which can sometimes provide misleading results because of antibody cross-reactivity . Indeed, we and others  have observed significant cross-reactivity of α-H3K4me3 antibodies with H3K4me2 epitopes in enzymatic assays. By contrast, in the investigation by Patel et al. , MALDI-TOF MS was used to quantitate the degree of H3K4 methylation, which shows an accumulation of the dimethyl from of H3K4 with little evidence for H3K4 trimethylation under the assayed conditions. These results suggest that an additional unidentified protein or post-translational modification may be required for H3K4 trimethylation by the MLL1 core complex . The possibility that an additional enzyme is required for H3K4 trimethylation is strengthened by the existence of a SET domain enzyme [PRDM9 (Meisetz)] that can trimethylate H3K4, but not mono- or dimethylate H3K4 . Further experimentation with more quantitative techniques to assess the degree of H3K4 methylation will be required to understand how H3K4 trimethylation is regulated by the MLL1 core complex.