Calmodulins and related potential calcium sensors of Arabidopsis

Authors


Author for correspondence: Janet Braam Tel: +1 713 348 5287 Fax: +1 713 348 5157 Email: braam@bioc.rice.edu

Summary

  • •   Calcium (Ca2+) signaling is thought to orchestrate responses to cellular stimuli. The efficacy of Ca2+ signaling requires mediation by Ca2+-binding proteins.
  • •   The determination of the Arabidopsis genome sequence enables the identification of genes encoding potential Ca2+ sensors.
  • •   Six Arabidopsis loci are defined as calmodulin (CAM) genes. Fifty additional genes are CAM-like (CML) genes, encoding proteins composed mostly of EF-hand Ca2+-binding motifs, have no other identifiable functional domains, and at least 16% identical with CaM. Number and structural diversity of the EF hands are evaluated. Intron/exon boundaries, phylogenetic tree and chromosomal distribution data for the CAMs and CMLs are presented.
  • •   Arabidopsis has 6 CAM genes, encoding only 3 isoforms. Maintenance of these genes suggests that they are unlikely to be fully redundant in function. Furthermore, the repeated EF hand motif is incorporated into at least 50 additional loci. The CaM relatives have altered EF hand number, organization, and predicted functional capacity. Additional structural differences and expression behaviors also indicate that the CML family has likely evolved distinct roles from the CAMs.

Introduction

Calcium (Ca2+), at extracellular levels (∼10−3M), is a predicament for cells because it is toxic to their phosphate-based energy system. As a consequence, Ca2+ is actively pumped from the cytosol to extracellular spaces or intracellular compartments, such as the endoplasmic reticulum and vacuole, to maintain intracellular Ca2+ at low (∼10−7M) resting levels. Remarkably, evolution has turned this deadly ion into an essential signaling molecule. Cells capitalize on the steep Ca2+ gradient set up by intracellular removal of Ca2+; gating of Ca2+ channels can result in rapid and dramatic (10- to 100-fold) increases in local intracellular [Ca2+] ([Ca2+]i). Through the linkage of extracellular events to Ca2+ influx, [Ca2+]i changes can be used as second messengers to trigger physiological changes in response to external stimuli (Clapham, 1995).

In part because of the energy efficiency of this built-in signal source, Ca2+ may be one of the most widely used second messengers in eukaryotic cells. Undoubtedly, Ca2+ signaling has numerous, diverse and essential functions in plants as it does in animals (Berridge et al., 1998). Ca2+ signals, in part, are generated, modified, propagated and perceived through the action of proteins that bind Ca2+ (Roberts & Harmon, 1992; Vogel, 1994; Ehlers & Augustine, 1999). The quintessential eukaryotic Ca2+-binding protein is calmodulin (CaM). The central role of CaM in eukaryotic biology is reflected in its conservation (van Eldik & Watterson, 1998). For example, all known vertebrate CaMs are identical in amino acid sequence and share 91% amino acid identity to plant CaMs.

CaM is an unusual protein because it harbors no intrinsic activities of its own. It is a 148 amino acid protein with 4 repeating units, called EF hands; each EF hand binds a single Ca2+ ion (Strynadka & James, 1989). Cooperative binding sites enables CaM to act in clean on/off states (Strynadka & James, 1989; Nelson & Chazin, 1998b); a characteristic beneficial for cells and organisms that deal with environments capable of rapid changes. As a consequence of Ca2+ binding, CaM alters its structure (Babu et al., 1985; Wriggers et al., 1998). The structural changes reveal hydrophobic surfaces that serve to interact with and alter activities of target proteins in a Ca2+-dependent manner. CaM is therefore an exceptional protein, theoretically simple in form and biochemical function. And yet because of the potential to mediate Ca2+-dependent regulation to multiple targets (Roberts & Harmon, 1992; Ohya & Botstein, 1994; Vogel, 1994; Snedden & Fromm, 2001), CaM harbors the ability to affect diverse cellular pathways. Furthermore, in addition to the highly conserved CaM, organisms also harbor CaM-like proteins that share the EF hand structure of CaM, but differ in ways that likely impact function, such as target specificity, subcellular localization and Ca2+ affinities (Roberts & Harmon, 1992; Zielinski, 1998; Braunewell & Gundelfinger, 1999; Haeseleer et al., 2002; Luan et al., 2002; Zielinski, 2002). These CaM-like proteins may have evolved to contribute to the diverse roles of Ca2+ signaling.

The full sequencing of the Arabidopsis genome reveals a striking complexity of CAM-like genes (The Arabidopsis Genome Initiative, 2000). Recently, 250 EF-hand encoding genes have been identified in the Arabidopsis genome and grouped into 6 classes (Day et al., 2002). Here, we further characterize Group IV and V members, which we define as typical CAMs and CaM-like (CML) genes because they encode primarily EF hands. Analyses and comparisons of protein relatedness, Ca2+ binding potential, gene structures, chromosomal locations, and expression characteristics shed light on the evolutionary relationships among these CAM and CML genes and the potential functions of the encoded proteins. Genomic analyses provide the foundation for further studies aimed at defining the biochemical and physiological functions of the gene products.

Materials and Methods

Construction of alignments and trees

Sequences of CML proteins were downloaded from The Arabidopsis Information Resource (TAIR) (http://www.arabidopsis.org) and subjected to phylogenetic analysis. Alignments were constructed using the multiple sequence alignment mode of ClustalX (Thompson et al., 1997). Alignments were subsequently viewed using SeqVu1.0.1 (Garvan Institute of Medical Research, Sydney, Australia) to shade protein alignments at positions with identical residues in greater than 65% of the aligned sequences. Protein trees were constructed using the neighbor-joining method (Saitou & Nei, 1987) implemented in the ClustalX program. Bootstrap analysis was performed using 200 iterations of tree building and varying the random seed generator.

Determination of amino acid percent identity among proteins

To determine the percentage identity between pairs of proteins, an alignment was performed in ClustalX (Thompson et al., 1997) independently of other CaM or CML sequences. The number of identical residues throughout the alignment was summed and divided by the total number of amino acids in the shorter of the proteins being compared. This value was expressed as a percentage. This method emphasizes the total percentage identity between two proteins.

Determination of CAM/CML gene structure and nucleotide percent identity among CAM coding sequences

BAC clone ID number, genomic nucleotide sequences, identified strand of DNA used for transcription, and predicted intron/exon boundaries were determined by searching the Locus History available at the TAIR (http://www.arabidopsis.org). The obtained predicted intron/exon boundaries were used to construct scaled models of each open reading frame illustrating locations of introns and EF hand encoding nucleotides. Prior to phylogenetic analysis of CAM sequences, intron sequences were manually removed as indicated by the predicted intron splice sites from the GenBank database (http://www.ncbi.nlm.nih.gov) and TAIR (http://www.arabidopsis.org). CAM coding sequences were aligned in ClustalX (Thompson et al., 1997). The number of identical nucleotides was summed, divided by the total number of nucleotides, and expressed as a percentage.

Chromosomal distribution and segmental duplications

Approximate gene locations were determined by searching for the appropriate open reading frames at TAIR (http://www.arabidopsis.org). Approximate locations for segmental duplications were estimated by scaling the map of segmental duplications (The Arabidopsis Genome Initiative, 2000) to match map of open reading frame locations. Inclusion of a CML within a segmental duplication was verified by comparing open reading frame names to a comprehensive list of segmental duplicated regions (The Arabidopsis Genome Initiative, 2000).

Expressed Sequence Tags

ESTs corresponding to CAM/CML genes were identified by performing a Locus History search at TAIR (http://www.arabidopsis.org) and by searching at The Institute for Genomic Research Arabidopsis Gene Index (http://www.tigr.org/tdb/tgi/agi/). Characteristics of CAM/CML expression were determined based on the types of libraries from which ESTs were derived.

Results and Discussion

Defining true Arabidopsis CAMs

There are six typical CAMs in Arabidopsis. To date, there has been confusion in the literature regarding gene names and corresponding protein identities. For example, CAM6 has been used to name a cDNA for which there is no corresponding genomic locus yet identified, CAM5 has been mistakenly used as an alternative name for CAM2 (first reported as TCH1), CAM4 has been used as the name for two different genes, and CaM8–CaM14 have been used as names for proteins that do not meet strict criteria for being true CaMs (Chandra & Upadhyaya, 1993; Gawienowski et al., 1993; Ito et al., 1995; Luan et al., 2002). To reconcile these discrepancies in the literature, Tables 1 and 2 link the gene identification numbers to the CAM nomenclature. Part of the naming confusion is likely attributable to the high degree of nucleotide identity among the genes and the fact that the six distinct genomic loci encode only three distinct protein isoforms. CAM2, CAM3 and CAM5 encode identical gene products; CaM7 is different by one amino acid; CaM1 and CaM4 are identical differing from CaM7 by 4 amino acids.

Table 1.  Characteristics of CaM/CML proteins
Open reading frame nameCaM/CML nameGroup number defined by treeNumber of amino acidsNumber of EF handsPercentage methioninePresence of cysteine 27Presence of lysine 115Potential myristoylation site% amino acid identity to CaM2
At5g37780CaM1114946.0% + +    96.6%
At2g41110CaM2114946.0% + +  100.0%
At3g56800CaM3114946.0% + +  100.0%
At1g66410CaM4114946.0% + +    96.6%
At2g27030CaM5114946.0% + +  100.0%
At3g43810CaM7114946.0% + +    99.3%
At3g59450CML1614812.7%     21.6%
At4g12860CML2615248.6%  +    38.2%
At3g07490CML3615345.9%  +    39.6%
At3g59440CML4619545.6%  +    39.6%
At2g43290CML5621544.7%  +    41.6%
At4g03290CML6615445.8%  +    44.2%
At1g05990CML7615045.3%  + +  44.2%
At4g14640CML8215144.0% + +    73.2%
At3g51920CML9215147.9%  +    49.6%
At2g41090CML10219143.7%  +    65.1%
At3g22930CML11217345.2% + +    74.5%
At2g41100CML12232464.9%     62.4%
At1g12310CML13314834.0%     50.3%
At1g62820CML14314834.1%  +    50.0%
At1g18530CML15415744.5%     39.6%
At3g25600CML16416145.0%     39.5%
At1g32250CML17416641.8%     43.6%
At3g03000CML18416542.4%     42.9%
At4g37010CML19516746.6%     42.3%
At3g50360CML20516947.1%     45.0%
At4g26470CML21524843.6%   +  27.5%
At3g24110CML22522943.9%     24.1%
At1g66400CML23815745.7%  +    40.9%
At5g37770CML24816145.0%  + +  40.3%
At1g24620CML25818644.3%   +  43.6%
At1g73630CML26816344.3%     38.2%
At1g18210CML27817044.1%     39.6%
At3g03430CML288  8324.8%     34.9%
At5g17480CML298  8324.8%     32.5%
At2g15680CML30818744.8%     34.9%
At2g36180CML31814446.3%  +    37.5%
At5g17470CML32814644.8%  +    32.9%
At3g03400CML33813734.4%  +    36.5%
At3g03410CML34813143.1%  +    35.9%
At2g41410CML35921643.2%     34.2%
At3g10190CML36920942.4%     36.9%
At5g42380CML37618534.3%     34.2%
At1g76650CML38617735.6%     28.8%
At1g76640CML39615948.2%     26.1%
At3g01830CML40614623.4%     23.2%
At3g50770CML41620543.4%     36.2%
At4g20780CML42719132.6%     34.9%
At5g44460CML43718132.2%     33.6%
At1g21550CML44715532.6%     29.5%
At3g29000CML45719422.6%  +    29.5%
At5g39670CML46720422.9%  +    28.9%
At3g47480CML47718323.8%     30.2%
At2g27480CML48718622.7%     16.1%
At3g10300CML49733020.9%     22.8%
At5g04170CML50731521.3%     22.8%
Table 2.  Characteristics of CAM/CML genes
Open reading frame nameCML nameOther nameLiterature referenceNumber of ESTs identifiedTotal number of nucleotideStrand of DNA: Watson (W) or Crick (C)BAC clone ID number
At5g37780 CAM141101348CK22F20.20
At2g41110 TCH1/CAM241,4216  813WT3K9.12
At3g56800 CAM34017  939WT20P8.8
At1g66410 CAM420241354CT27F4.1
At2g27030 CAM520  5  991CT8M16.130
At3g43810 CAM719231736CT28A8.100
At3g59450CML1    02083WF25L23.310
At4g12860CML2    0  458CT20K18.210
At3g07490CML3    0  461WF21O3.20
At3g59440CML4    2  587WF25L23.300
At2g43290CML5  12  647CT1O24.3
At4g03290CML6    2  464WF4C21.22
At1g05990CML7    1  452WT21E18.4
At4g14640CML8CAM817  22196WDL336OW
At3g51920CML9CAM917  61136CF4F15.30
At2g41090CML10CABP22/CAM103910  795WT3K9.14
At3g22930CML11CAM1116  31549CF5N5.10
At2g41100CML12TCH3/CAM1242191275WT3K9.13
At1g12310CML13CAM131616  447CF5011.35
At1g62820CML14CAM1416  21801CF23N19.25
At1g18530CML15    0  473WF25I16.13
At3g25600CML16    2  485WT5M7.5
At1g32250CML17    0  500WF5D14.1
At3g03000CML18    5  497WF13E7.5
At4g37010CML19    7  959WAP22.11
At3g50360CML20Centrin51  11081WF11C1.200
At4g26470CML21    3  966WM3E9
At3g24110CML22    0  984CMUJ8.1
At1g66400CML23    6  473CT27F4.15
At5g37770CML24TCH242  9  485CK22F20.10
At1g24620CML25    0  560CF21J9.36
At1g73630CML26    2  491WF25P22.4
At1g18210CML27  18  512CT10O22.19
At3g03430CML28    0  252WT21P5.15
At5g17480CML29APC152  0  251WK3M16.50
At2g15680CML30    1  563WF9O13.23
At2g36180CML31    0  434CF9C22.11
At5g17470CML32    0  440CK3M16.40
At3g03400CML33    0  413CT21P5.18
At3g03410CML34    0  395CT21P5.17
At2g41410CML35PM1295326  650CF13H10.4
At3g10190CML36    2  629WF14P13.21
At5g42380CML37    7  557CMDH9.7
At1g76650CML38    4  533CF28O16.2
At1g76640CML39    0  479CF28O16.1
At3g01830CML40    4  440WF28J7.16
At3g50770CML41    4  617WF18B3.50
At4g20780CML42    1  575CF21C20.130
At5g44460CML43    1  545WMFC16.12
At1g21550CML44    5  467CF24J8.15
At3g29000CML45    1  584WK5K13.13
At5g39670CML46  10  614WM1J24.17
At3g47480CML47    1  551CF1P2.30
At2g27480CML48    3  933WF10A12.16
At3g10300CML49  121639WF14P13.10
At5g04170CML50   141842WF21E1.90

Primary sequence comparisons among species (Fig. 1) lead to the prediction that the Arabidopsis CaM isoforms function as typical CaMs. The EF hands have the canonical 12-residue Ca2+ binding loop (Fig. 1). Ca2+ is bound in a pentagonal bipyramidal geometry with seven sites of coordination occurring through interactions with six amino acids, those in positions 1, 3, 5, 7, 9 and 12 (alternatively called X, Y, Z, #, -X and -Z) (Strynadka & James, 1989; Nelson & Chazin, 1998b). All of these amino acids interact with Ca2+ through side chain oxygens, except residue seven, which acts through its main chain oxygen. Chelation by residue nine sometimes occurs indirectly through a hydrogen-bonded water molecule. Thus, there are strong preferences for specific amino acids within the Ca2+-binding loop. The X position is almost exclusively filled with aspartate (D); Y is usually aspartate (D) or asparagine (N); Z is aspartate (D), asparagine (N), or serine (S); the # position tolerates a variety of amino acids; -X also varies, but is usually aspartate (D), asparagine (N), or serine (S); -Z, which contributes two coordination sites, is nearly invariably glutamate (E). Glycine (G) at position 6 is highly conserved and is thought to provide the ability for a sharp turn within the loop. Finally, position 8 is most often isoleucine (I), which can form hydrogen bonds with the other EF loop in a pair. The cysteine (C) residue in position 7 of the first EF hand is common among plant CaMs (Zielinski, 1998), but uncommon in nonplant CaMs.

Figure 1.

CaM isoform similarities among diverse species. The three CaM isoforms encoded by the 6 Arabidopsis thaliana CAM genes are aligned with CaMs predicted from other species’ DNA sequences. Amino acid numberings are indicated at left and right. Note that the initiator methionine (M) is likely removed from the mature protein such that most of the mature proteins are 148 amino acids long. Sequences are positioned such that the helix-loop-helix portions of the first and third and the second and fourth EF hands, respectively, are aligned for comparison. The regions corresponding to the E helices, the Ca2+-binding loops and the F helices are indicated by the black, gray, and black bars, respectively. The consensus sequences for these regions are indicated beneath the relevant sequences. ‘E’ stands for glutamic acid, ‘h’ for hydrophobic amino acid; ‘*’ for any amino acid and ‘X, Y, Z, G, #, I, -X, -Z’ are defined in the text. Amino acid sequence identities are shaded.

The E helix generally starts with a glutamate (E); both the E and F helices flanking the Ca2+-binding loop are generally each 9 amino acids long. There is a regular distribution of hydrophobic amino acids in the E helices with a pattern of ‘h**hh**h’ where ‘h’ represents hydrophobic amino acids and ‘*’ represents any amino acid. The pattern is similar for the F helices of hands 1 and 3, but diverges slightly in hands 2 and 4 (Fig. 1).

Based on the conservation of the consensus EF hand sequence motifs among the 3 Arabidopsis CaMs isoforms, it is predicted that these CaMs have Ca2+ binding behaviors that are similar to that of CaMs that have been extensively characterized (Fig. 1).

Sequence conservation of CaMs, including the three Arabidopsis CaM isoforms, is not restricted to the EF hand structures. Lysine (K) encoded at position 116, between the 3rd and 4th EF hands, is a potential site for trimethylation and is found in all but the yeast CaMs. (In mature CaMs that have the initiator methionine (M) removed, the K is at the 115 position.) Expression of a mutant CaM with an arginine (R) as amino acid 115 results in transgenic tobacco that have enhanced production of reactive oxygen species (Harding et al., 1997), indicating a role of this conserved amino acid in normal CaM function. In addition, the three Arabidopsis CaM isoforms are methionine (M)-rich proteins (Table 1). The average methionine content of proteins is 1.4%; whereas CaMs generally have approx. 6% methionine (Rose et al., 1985; Nelson & Chazin, 1998b). The unusual flexibility and polarizability of the methionine side chains are thought to contribute to CaM structure and function in two ways. When CaM binds Ca2+, CaM undergoes structural alterations generating the so-called open conformation. In this form, hydrophobic regions become exposed. The properties of the methionine side chains make this configuration more energetically stable and are thought to enable the open CaM structure to adapt to both buried and solvent-exposed environments (Nelson & Chazin, 1998a). In addition, methionine residues serve to interact through strong van der Waals with numerous targets of distinct structural properties (O’Neil & DeGrado, 1990; Vogel & Zhang, 1995).

Because of this high degree of sequence similarity of the CaM1, CaM2, CaM3, CaM4, CaM5 and CaM7 proteins to known CaMs of other species, we consider these true CaMs. Further experimentation will be required to determine whether the three Arabidopsis isoforms of CaM have distinct functions or regulation.

CaM-like proteins of Arabidopsis

Using the databases (The Arabidopsis Genome Initiative, 2000), we developed a classification of genes, whose members we call CaM-likes or CMLs. A family of 50 CML genes (Tables 1 and 2 and Figs 2 and 3) encode proteins with the following characteristics: composed mostly, if not entirely of EF hands (like CaM); have no other identifiable functional domains, and share at least 16% overall amino acid identity with CaM. All but one (CML1) have at least 2 identifiable EF hand motifs.

Figure 2.

Neighbor joining tree tree based on amino acid similarities segregates 9 groups of CaMs and CMLs. The amino acid sequences of the predicted CaMs and CMLs were analyzed as described in the Materials and Methods. The groupings referred to in the text and in Table 1 are indicated at the right. Both the gene identifier number and CaM/CML names are shown. The distance indicated by ‘0.1’ refers to the percent sequence divergence as calculated by ClustalX (Thompson et al., 1997).

Figure 3.

Amino acid sequence identities among proteins of distinct groups. The amino acid sequences of each member of the CaM group (group 1) and the 8 CML groups (groups 2–9) were compared. The values indicate the percent amino acid sequence identities between each given pair of proteins. The group numbers are shown in the lower left corner of each comparison table.

This CML class does not include a similarly large number of Arabidopsis proteins that have EF hand motifs and additional known and/or potential functional domains, such as Ca2+-dependent protein kinases (CDPKs) (Roberts & Harmon, 1992), calcineurin B-like proteins (CBLs) (Luan et al., 2002), and SUB1 and related SUL1/2 genes (Guo et al., 2001).

As summarized in Table 1, the CMLs are predicted to be relatively small proteins, ranging from 83 to 330 amino acids. Most of the CMLs (31/50) have four predicted EF hands; only one, CML12, has more than four hands (Table 1).

CML8 and CML9 have previously been called CaM8 and CaM9, respectively, in the literature (Köhler & Neuhaus, 2000; Zielinski, 2002). However, because the encoded proteins share only 73% and 49% amino acid identity with CaM, they are likely to have distinct functions from CaM. Indeed, the proteins have been shown to have different binding activities and functional complementation efficiencies of a yeast CaM null mutant (Köhler & Neuhaus, 2000; Zielinski, 2002). These results reinforce the rationale for a separate classification of CMLs from CaMs. CML8 is one of the CMLs most closely related in overall sequence to CaM (73.2% identity); yet it has been shown to be functionally distinct from CaM. Thus, the sequence variation from CaMs indicates that the CML proteins are unlikely to be true CaMs and therefore have the potential for unique, yet undiscovered, functions.

We used the neighbor-joining method analysis (Saitou & Nei, 1987) to generate a bootstrapped phylogenetic tree based on amino acid sequence similarity of the CaMs and CMLs. This analysis enables us to separate the CaM/CML family into nine groups based on apparent divergence from the typical CaMs (Fig. 2). Divergence reflects overall sequence identities to CaM2 (identical to CaM3 and CaM5) (Table 1), at least for those groups most closely aligned with CaM. For example, the CaMs that fall into group 1 share between 99.3% and 100% sequence identity to CaM2. Group 2 amino acid identities to CaM2 range from 50% to 75%, group 3 has 50% identity and group 4 ranges from 40% to 44% identity. Groups 5, 6, 7, 8 and 9 have sequence identity relatedness to CaM2 that average 35%, 35%, 28%, 37% and 36%, respectively. Groups 5, 6 and 7 have the most divergent members with percentage identities to CaM2 as low as 24%, 22%, and 16%, respectively. Although many family members show great distance from CaM based on the tree distribution, there is an overall maintenance of CaM sequence similarity that may reflect the conservation of EF hand sequences (Fig. 4).

Figure 4.

Amino acid composition of the Ca2+-binding loop in CML EF hands. The 172 predicted EF hands in the 50 CML proteins were examined. Position 1 through 12 of the Ca2+-binding loop are shown at top; the single letter abbreviation for each amino acid is shown at left. The frequency at which an amino acid residue is found in each position is in the appropriate column and row. The amino acids found most frequently are shown below as a ‘consensus’ sequence. A capital letter is used for those amino acids that were found at that position in at least 50% of the EF hands; a lower case letter is used for those amino acids that were found at that position between 25% and 50% frequency. At right, under ‘TOT’ (total), are the total number of times an amino acid was found within the 172 Ca2+-binding loops.

There are 12 examples of highly related (> 70% identity) pairs of proteins illustrated by close branches on the tree (Fig. 2) and high sequence identity (Fig. 3). The most highly related pair is composed of CML13 and CML14 that are 94.6% identical at the amino acid level (Fig. 3). These related gene pairs suggest relatively recent gene duplications.

We aligned the amino acids that compose the 172 Ca2+-binding loops found in the 50 CML proteins’ EF hands (additional data files) and tabulated the frequency at which amino acids are found in each site in the binding loop (Fig. 4). As is seen in true CaMs, there are strong preferences for positions 1, 3, 5, 6, 9 and 12. This indicates that the majority of EF hand motifs in the CMLs are likely to be functional, high affinity Ca2+-binding sites. At the bottom of Fig. 4, a consensus sequence is shown; the positions 1, 3, 5, 6, 9, and 12 are occupied by the amino acids that most frequently fill those positions in the true CaMs. However, a subset of individual CMLs has significant sequence divergence in the Ca2+-binding loops. One substitution is aspartate (D) for glutamate (E) in 12th position of the Ca2+-binding loop. This substitution has been shown to increase the binding of Mg2+ by EF hands (Houdusse & Cohen, 1996; Cates et al., 2002). At least one hand of 10 CML proteins (CML9, CML13, CML14, CML19, CML20, CML22, CML35, CML36, CML40, CML41, and CML46) has this alteration. For CML19 and CML20, this amino acid change in the second of four EF hands is the only significant alteration in the Ca2+-binding loops; thus these proteins are predicted to maintain their capability of Ca2+-binding but may also have increased affinity for Mg2+. The coincident position of this substitution and the overall sequence similarities between CML19 and CML20 (67% amino acid identity) suggest that this E to D substitution occurred once before a duplication that led to the existence of these 2 genes (Fig. 2). The shared noncanonical sequences in CML13 and CML14 also infer a common progenitor that evolved these changes. CML13 and CML14 have additional E to D substitutions in the 12th position in the 2nd and 3rd EF hands, and the 3rd binding loop also has a glycine (G) in position 3 instead of an aspartate (D) or asparagine (N). These changes leave only the first Ca2+-binding loop with canonical amino acids. Thus, CML13 and CML14 are predicted to be at least somewhat impaired in Ca2+-binding. Similarly, the E to D substitution in the first Ca2+-binding loops of CML35 and CML36 likely occurred before a gene duplication that gave rise to these two genes. CML35 also has E to D changes in the 2nd and 4th Ca2+-loops and a loss of a conserved D in the third position of the second Ca2+-binding loop; therefore, CML35 is predicted to have three of four hands with reduced Ca2+ affinities. Noncanonical amino acids in sites that are generally conserved among functional EF hands leave CML1, CML22, CML40, CML41, CML46 and CML48 with fewer than two sites likely to be fully functional in Ca2+ binding. The relevance of Ca2+ binding to the functioning of these proteins remains to be determined.

Other features of CaM are seen in subsets of the CML proteins. Nineteen of the CML genes encode the conserved lysine (K) corresponding to position 115 of CaM (Table 1), which is a potential site for trimethylation. Only the two proteins that are most closely related to typical CaM, CML8 and CML11, contain cysteine (C) residues in the first EF hand. However, nine other CMLS (CML4, CML5, CML6, CML7, CML19, CML23, CML24, CML35, CML41) have C in the 7th position of other EF hands. All but three CMLs (CML17, CML49, CML50) have greater than 2% methionine (M) (Table 1). The paucity of methionine residues in CML49 and CML50 reflects the low overall sequence relatedness of these proteins to CaM. Surprisingly, however, CML17 has over 43% sequence identity with CaM, but has only 1.8% methionine. One prediction is that these methionine-poor proteins do not act as sensor proteins and may not undergo significant conformational changes upon Ca2+ binding. Alternatively, they may bind only one or a few target proteins and thus may not require the flexibility of the methionine-side chains for ligand interactions.

Comparative modeling of the three dimensional structure of CML24 (also called TCH2) suggests that cysteines (C) at positions 126 and 131 may be close enough in space to have the potential to form a disulfide bond (Khan et al., 1997). This potential post-translational modification suggests the possibility for regulation by re-dox state of the cell and is predicted to affect the ability of CML24 to undergo conformational changes upon loss of Ca2+ or target binding. Interestingly, CML23, CML25, CML26, CML27, CML33, CML35, CML36 and CML37 also have pairs of cysteines situated similarly close to each other along the primary sequence. Thus, there is the possibility that these CMLs may also have the potential to form disulfide bonds that could affect structural properties and function.

A number of EF hand containing proteins involved in synaptic activity and visual signaling regulation undergo cotranslational covalent linkage of myristate that affects subcellular localization (Ames et al., 1997). CML sequences were scanned for the potential myristoylation consensus sequence G-{EDRKHPFYW}-X-X-[STAGCN]-{P} using the ProfileScan Server (http://hits.isb-sib.ch/cgi-bin/PFSCAN). Amino acids within {} are excluded from the indicated position, residues within [] are allowed at the indicated position, and X represents any amino acid. Only potential sites that begin within the first 20 amino acids were considered to be significant. Only one CML, CML21, has the strict consensus sequence for potential amino terminal myristoylation. Others, however, have potential myristoylated glycines that are near, but not directly at, the N-terminus. If these proteins (CML7, CML24 and CML25) are proteolytically processed, it is possible that these internal glycines could be recognized for modification (http://www.cbs.dtu.dk/services/TargetP/). None of the CMLs have the carboxyl-terminal CAAX motif for prenylation. TargetP (http://www.cbs.dtu.dk/services/TargetP/; Emanuelsson et al., 2000) analysis suggests that the hydrophobic-rich amino-terminal extensions of CML46 and CML47 may function as endoplasmic reticulum signal sequences. Similarly, the amino-terminal sequences of CML41 may act to direct this protein into chloroplasts. However, the great majority of the CaMs and CMLs are predicted to be cytosolic or possibly nuclear.

CAM and CML gene structures

The nucleotide sequence variation, ranging from 83 to 91% identity, among the CAMs is higher than that of the amino acid sequence variability (Table 1). The fact that multiple genes, with nucleotide variability, encode identical proteins suggests the presence of selective pressure on the strict maintenance of amino acid sequence. Arabidopsis CaMs therefore show a similar conservation of sequence that is seen in vertebrates; vertebrates also have 3 CAM genes, like the CAM2, CAM3 and CAM5 genes, which encode identical CaM proteins. Although this conservation of sequence could be an example of genomic redundancy, it is difficult to explain how natural selection can act to keep the protein sequences identical. If multiple CAM genes were truly redundant, one would expect some sequence divergence at least among the genes from the distinct vertebrate species (Toutenhoofd & Strehler, 2000). One possibility is that CAM genes are differentially expressed and therefore the products function with spatial or temporal specificity (Toutenhoofd & Strehler, 2000).

The six Arabidopsis CAM genes share the characteristic of having a single intron disrupting the coding sequence for the first EF hand at codon glycine (G) 26 (Fig. 5). Only 13 of the 50 CMLs are interrupted by introns, the location for five of these (CML8, CML9, CML10, CML12 and CML22) is comparable to CAM G26 position (Fig. 4). Interestingly, the 2nd and 3rd introns of CML12 are also in comparable positions, suggesting that the gene region encoding the second and third pairs of hands was derived from the 5′ end of the gene encoding the first pair of hands. Indeed sequence similarity comparisons of EF domains are consistent with this idea (additional data files).

Figure 5.

Predicted presence and prediction of introns, exons and EF hand-coding sequences in the Arabidopsis CAMs and CMLs. Intron and exon boundaries were determined by comparisons of genomic DNA with cDNAs or predicted based on genomic sequences. EF hands were identified by presence of canonical sequences (see Figs 1 and 4) and alignment with related CaMs and CMLs as described in the text. Thin lines represent introns, thick bars represent exons and gray regions indicate positions of regions encoding EF hands. The size marker at bottom indicates the distance for 100 bases.

Chromosomal distribution

The family members are distributed on the five chromosomes (Fig. 6). Thirty-five of the 56 CAMs and CMLs are in regions of the genome that are thought to be derived from segmental duplications (The Arabidopsis Genome Initiative, 2000). These regions include pairs of genes that encode identical CaMs (CAM1 and CAM4; CAM2 and CAM3) and highly related CMLs that pair in the phylogenetic tree (Fig. 3), including CML4/CML5, CML8/CML11, CML13/CML14, CML23/CML24, CML26/CML27, CML28/CML29, CML42/CML43 and CML49/CML50. These CML pairs encode proteins that share between 72% and 95% amino acid identity. There are also six chromosomal sites of tandemly arranged CAMs and/or CMLs (Fig. 6). Four of the pairs cluster close together in the phylogenetic tree (Fig. 3) (CML1/CML4, CML10/CML12, CML33/CML34, and CML38/CML39). CML24 and CAM1 are adjacent on chromosome 5, and the gene most related to CML24, CML23, lies in tandem with CAM4, the gene most related to CaM1 (CML23 and CML24 are 77% identical; CaM1 and CaM4 are 100% identical, Fig. 3). Thus, most likely, there was a local duplication and divergence followed by a segmental duplication. Another CAM gene, CAM2 is found in a tandem grouping with CML12 and CML10 on chromosome 2. CML10 is the closest relative to CML12.

Figure 6.

The CAMs and CMLs are distributed among the 5 Arabidopsis chromosomes. The arms of the 5 Arabidopsis chromosomes are indicated as rounded gray bars; centromeric sites dividing arms are represented by thin connecting lines. The positions of the CAM and CML genes are indicated. Regions of predicted segmental duplication (http://www.arabidopsis.org) are indicated by color-specific shading. The CAM and CML names and numbers are shaded for those genes found within predicted duplication regions. Genes with names at left of chromosomes are on the Watson strand; genes with names at right of chromosomes are on the Crick strand (see also Table 2).

CAM and CML expression

Identification of expressed sequence tags (ESTs) corresponding to the CAM/CML genes provides evidence for CAM/CML expression. In addition, because many of the EST libraries were made from RNA present in distinct tissues or organs, or after treatment of plants with stresses or hormones, some characteristics of CAM/CML expression can be inferred. These data are summarized in Tables 2 and 3. ESTs have been found for 42 of the 56 CAM/CML genes. The EST data suggest that some of the CAM/CML genes may be up regulated by stress and/or hormones because the greatest numbers of ESTs were found in libraries generated from plants subjected to stress or treated with hormones. For example, CML10, CML12 and CML35 expression levels are likely to be stress and/or hormone induced.

Table 3.  Evidence for CAM/CML expression
ORF NAMECAM/CMLRootsFlowers and /or SiliquesSeedRosetteLiquid Cultured SeedlingsStress and/or Hormones
  1. Numbers indicate frequency of EST identification (www.tigr.org/tdb/tgi/agi/).

at5g37780CAM11 11   4
at2g41110CAM252     5
at3g56800CAM3 1    
at1g66410CAM4 6     5
at2g27030CAM5211    4
at3g43810CAM723     4
at3g59450CML1      
at4g12860CML2      
at3g07490CML3      
at3g59440CML4       2
at2g43290CML54 11   3
at4g03290CML6 2    
at1g05990CML7      
at4g14640CML8      
at3g51920CML9    1  2
at2g41090CML10   1   7
at3g22930CML11       1
at2g41100CML122   213
at1g12310CML13 221   5
at1g62820CML14       1
at1g18530CML15      
at3g25600CML1612    
at1g32250CML17      
at3g03000CML18       2
at4g37010CML191     
at3g50360CML20   1  
at4g26470CML211     
at3g24110CML22      
at1g66400CML23   11  2
at5g37770CML241 2    2
at1g24620CML25      
at1g73630CML26      
at1g18210CML2755     4
at3g03430CML28      
at5g17480CML29      
at2g15680CML30       2
at2g36180CML31      
at5g17470CML32      
at3g03400CML33      
at3g03410CML34      
at2g41410CML3532 1 10
at3g10190CML362     
at5g42380CML37       4
at1g76650CML382      2
at1g76640CML39      
at3g01830CML40 2  2 
at3g50770CML41   1   3
at4g20780CML42       1
at5g44460CML43      
at1g21550CML44       2
at3g29000CML45      
at5g39670CML463      4
at3g47480CML47       1
at2g27480CML48 2     1
at3g10300CML4922     1
at5g04170CML50323    2

Three of the genes for which no EST expression data are evident (CML1, CML17 and CML22) encode proteins that have lost specific features of true CaMs; CML1 and CML22 are predicted to have fewer than 2 functional EF hands and CML17 has only 1.8% methionine. Two gene pairs encoding highly related proteins that fall in the same segmental duplication (CML28/CML29 and CML33/CML34) also have no evidence yet for expression. However, lack of EST identification to date is not strong evidence for nonexpression; sensitive methods of RNA detection will be required to determine if these genes are active.

Zielinski and colleagues have monitored expression of several of the CAM and CML genes (Ling et al., 1991; Perera & Zielinski, 1992; Gawienowski et al., 1993; Ling & Zielinski, 1993; Zielinski, 2002). For example, CAM1-CAM4 transcripts are in siliques and leaves; CAM1 mRNAs are detected in roots. CML7, CML8 and CML9 expression is detectable in leaves, flowers and developing siliques and CML10 RNAs are in leaves.

An unusual expression feature of some plant CAMs and CMLs is that they are rapidly and dramatically up-regulated by mechanical force, such as simple touch stimulation (Braam & Davis, 1990). Furthermore, CAM2, CML12 and CML24 are up-regulated in expression in response to darkness, cold shock, heat shock, and phytohormones (Braam & Davis, 1990; Braam, 1992; Sistrunk et al., 1994; Antosiewicz et al., 1995; Polisensky & Braam, 1996). Detailed analysis of CML12 expression has been monitored by direct RNA analyses (Sistrunk et al., 1994), patterns of CML12::GUS reporter gene fusion activities in planta (Sistrunk et al., 1994) and immunolocalization (Antosiewicz et al., 1995). These data can be summarized to conclude that CML12 expression correlates with sites predicted to be under mechanical stress, such as branch points and at other sites where cells are undergoing expansion.

These data accumulated to date are the first indications of the possible sites of function of the CAM/CML genes of Arabidopsis. Complete characterization of expression behaviors by direct RNA analyses and patterns of reporter gene activities and the phenotypic consequences of gene knockouts will be invaluable as the next steps in understanding the functional significance of this unexpectedly large family of genes encoding CaMs and CaM-like proteins in Arabidopsis.

Acknowledgements

This material is based upon work was supported by the National Science Foundation (under grant no. IBN9982654 to J.B.) and, in part, by the National Institutes of Health (Biotechnology Training Grant no T32-GM08362 to E.M.).

Supplementary material

The following material is available to download from http://www.blackwellpublishing.com/products/journals/suppmat/NPH/NPH845/NPH845sm.htm

Fig. S1 The alignments used to construct the tree in Fig. 2.

Fig. S2 The alignments used to tabulate amino acid frequency in the Ca2+-binding loops of the CMLs shown in Fig. 4.

Fig. S3 The alignments used to compare CML12 EF hands to CaM.

Fig. S4–S12 The alignments showing intragroup amino acid sequence identities.

Ancillary