Dual-specificity protein phosphatases (DSPs) are important regulators of a wide variety of protein kinase signaling cascades in animals, fungi and plants. We previously identified a cluster of putative DSPs in Arabidopsis (including At3g52180 and At3g01510) in which the phosphatase domain is related to that of laforin, the human protein mutated in Lafora epilepsy. In animal and fungal systems, the laforin DSP and the beta-regulatory subunits of AMP-regulated protein kinase (AMPK) and Snf-1 have all been demonstrated to bind to glycogen by a glycogen-binding domain (GBD). We present a bioinformatic analysis which shows that these DSPs from Arabidopsis, together with other related plant DSPs, share with the above animal and fungal proteins a widespread and ancient carbohydrate-binding domain. We demonstrate that DSP At3g52180 binds to purified starch through its predicted carbohydrate-binding region, and that mutation of key conserved residues reduces this binding. Consistent with its ability to bind exogenous starch, DSP At3g52180 was found associated with starch purified from Arabidopsis plants and suspension cells. Immunolocalization experiments revealed a co-localization with chlorophyll, placing DSP At3g52180 in the chloroplast. Gene-expression data from different stages of the light–dark cycle and across a wide variety of tissues show a strong correlation between the pattern displayed by transcripts of the At3g52180 locus and that of genes encoding key starch degradative enzymes. Taken together, these data suggest the hypothesis that plant DSPs could be part of a protein assemblage at the starch granule, where they would be ideally situated to regulate starch metabolism through reversible phosphorylation events.
As a recent review (Pawson and Nash, 2003) points out, the sequencing of genomes produces extensive lists of proteins, many of which are in search of functions. The detection of conserved domains, modules or cassettes in many of these sequences provides the first potential clues to cellular function. An important frontier of genomics is the elucidation of interaction modules that allow proteins to associate with each other, and with various ligands, to produce functional assemblies and signaling networks.
In an earlier report (Kerk et al., 2002) we presented the sequences of the protein phosphatases encoded by the genome of Arabidopsis. As reversible protein phosphorylation mediates a variety of important cellular processes, such as developmental events, metabolism and responses to hormone and other environmental stimuli (Luan, 2003), it was anticipated that knowledge of protein phosphatases in Arabidopsis would open a window onto a variety of phenomena of general importance to the plant research community. In that survey we found 18 sequences encoding dual-specificity phosphatases (DSPs) potentially capable of acting on substrate Ser/Thr and Tyr residues, many of them with no obvious clues to function.
Working independently, Fordham-Skelton et al. (2002) discovered the sequences of a small set of DSPs, including At3g52180, its tomato orthologue Le14970762, and At3g01510. They showed that these sequences contain a non-phosphatase domain (kinase interaction sequence, KIS) that can bind to the Snf-1-related protein kinase (SnRK) AKIN11.
We have found a rice orthologue of At3g01510, and have investigated this non-phosphatase domain in this set of DSPs. We report here that this domain is phylogenetically widespread, and encodes an ancient carbohydrate-binding domain. We present data showing that At3g52180 binds to starch, and that it is localized in chloroplasts. This raises the possibility that this set of plant DSPs is ideally situated to participate in regulatory events important to starch metabolism.
Plant DSPs contain a widespread and ancient non- phosphatase domain
In our previous genome-scale survey of protein phosphatase-encoding genes in Arabidopsis, we classified three sequences (At3g52180, At3g01510, At3g10940) as DSPs based on their strong conservation of critical catalytic residues and spacing patterns. We also noted that these sequences clustered with moderate support with the animal laforin DSPs (Kerk et al., 2002; Figures 4 and 6). This phosphatase domain similarity was noted independently by Fordham-Skelton et al. (2002). They also examined the non-phosphatase regions of these sequences, and detected the presence of a KIS in At3g52180 and At3g01510, C terminal to the phosphatase domain. Wang et al. (2002) reported that the human laforin DSP contains a domain with structural similarity to solved carbohydrate-binding proteins. They demonstrated the association of laforin with glycogen particles in vivo, and binding to purified glycogen in vitro.
We built models of this candidate domain based on sets of both aligned (profile method of Gribskov and Veretnik, 1996) and unaligned sequences (MEME; Bailey and Elkan, 1995), and used these to conduct database searches. We found an additional Arabidopsis sequence with this domain: At5g39790. Subsequent analysis showed that this sequence contains no DSP domain, consistent with the failure of our previous search (Kerk et al., 2002) to find it.
A blastp search of the TIGR rice genome database with sequences At3g52180, Le14970762 and At3g01510 yielded a strong hit with the latter (sequence 50945153, E = 1e–160). When this sequence is placed in the alignment of 169 DSPs from our previous study, it has all the conserved DSP catalytic residues with the proper spacing. Phylogenetic tree analysis with the alignment containing this rice sequence shows that it clusters together with At3g01510 with a very high level of support, to the exclusion of tomato and other Arabidopsis sequences (data not shown). We infer that this sequence (referred to here as Os50945153) represents the rice orthologue of At3g01510.
We built a multiple-sequence alignment to examine in more detail the conservation of sequence features within this candidate domain in a comprehensive sequence set including plant, fungal and animal proteins, the animal laforin DSPs, the Arabidopsis DSPs, and various solved carbohydrate-binding sequences of the CBM 20 family (carbohydrate-active enzymes, CAZy: http://afmb.cnrs-mrs.fr/CAZY/acc.html). The alignment is presented in Figure 1 with a representation of sequence conservation. The best conserved positions are G (14), W (19), G (32), R/K (37). Slightly weaker conservation is shown at three other positions. At position 42 most sequences have a G; at position 44 most sequences have a W; and at position 58 most sequences have a G. Wang et al. (2002), in their analysis of human laforin, performed homology modeling against the solved structure 2DIJ (cyclodextrin glycosyltransferase), and predicted contact with a carbohydrate ligand at two conserved tryptophan residues (corresponding to positions W19 and W44 in our alignment), and a lysine (corresponding to position K37 in our alignment). Mutagenesis at the first tryptophan position or at the lysine position abolished glycogen binding. A similar analysis by Polekhina et al. (2003) showed that the beta-1 subunit of human AMPK (HsAMPKβ1 in our alignment) can also bind glycogen. They performed mutagenesis studies and noted that alteration of positions W19 or K37 abolished glycogen binding, whereas change at positions W44 or G58 reduced binding. More recently, Polekhina et al. (2005) have solved the structure of HsAMPKβ1. The residues that correspond to our W19 and K37 are carbohydrate-contact residues. Inspection of the alignment in Figure 1 shows that the Arabidopsis DSPs, like most other members of the sequence set, share these functionally important residues.
To examine the clustering pattern of sequences inherent in this multiple sequence alignment, we utilized several methods of phylogenetic tree inference: neighbor joining (Saitou and Nei, 1987); maximum parsimony (Felsenstein, 1996); maximum likelihood, as implemented in the ‘quartet puzzling’ approach (Schmidt et al., 2002); and Bayesian analysis (Ronquist and Huelsenbeck, 2003). The results of this analysis are summarized in Figure 2 and Table 1. There were slight differences between tree topologies: the maximum parsimony pattern is presented, along with the percentage of support for various critical nodes using the several inference methods. It can be seen that major clades are formed by yeast Snf-1 kinase beta-regulatory subunits (Yeast, node 1); the plant Snf1-related kinase (SnRK) beta-regulatory subunits (Plantβ, node 2); animal AMP-regulated protein kinase (AMPK) beta-regulatory subunits (Animal AMPKβ, node 4); plant SnRK beta-gamma regulatory subunits (Plantβγ, node 5); and solved carbohydrate-binding proteins of the CBM20 family (node 7). The animal laforin DSPs are the sister group to the carbohydrate-binding sequences (CBM20 + laforins, 4/4 methods, node 9). The clustering of the Yeast and Plantβ groups attains >50% support in all four methods (node 3). The clustering of the Animal AMPKβ and Plantβγ groups attains >50% support in three out of four methods (node 6). The DSPs previously identified by Fordham-Skelton et al. (2002) (At3g52180, Le14970762), here designated ‘Old DSPs’, clustered with the CBM20 + laforin group in all four methods. In three of the four methods, they form a sister group (Figure 2) with support >50% (node 11). Sequence At1g27070 was identified by Fordham-Skelton et al. (2002) as containing a KIS domain, but with no DSP domain. We identified sequence At5g39790 in our database searching, and it has the same characteristics. These two sequences cluster together in all four methods (node 12), and are here designated ‘AtKIS’. This pair together could not be well resolved in our analysis. Node 13 represents the topology in maximum parsimony, but this was variable in the different methods, and had low support. Sequence At3g01510 was previously shown by Fordham-Skelton et al. (2002) to contain a KIS domain, and we found sequence Os50945153 in our database searching. These sequences cluster together in all methods of analysis (node 14) and are here designated ‘New DSP’. The placement of this pair together is also not well resolved in our analysis. The topology of the maximum parsimony tree (node 15) is shown in Figure 1. This is the position of the relevant node in three of the four methods of analysis, but support is low, not reaching 50% in any method.
Table 1. Support for internal nodes of phylogenetic tree of carbohydrate-binding domain-containing sequences
Approximately 1000 alternative trees were sampled for each phylogenetic tree method. A majority-rule consensus tree was then constructed for the output of each method, and the tabulated numbers represent the percentage of replicate trees presenting the topology indicated at each node in Figure 2.
The results of the primary sequence analysis indicated a general similarity between the various sequences and solved carbohydrate-binding structures, particularly with regard to conservation of functionally important residues. To explore this possibility further, we subjected all sequences to fold-compatibility analysis (threading). The results are summarized in Table 2. In all cases there was a top-scoring hit with a known carbohydrate-binding structure. In most instances the top hit was to 1Z0M, the recently solved structure of the human AMPKβ1 subunit (Polekhina et al., 2005). The Z scores are quite high: the FFAS03 method has 97% specificity at scores >9.5. This indicates a high degree of structural similarity among this sequence set.
Table 2. Fold-compatibility (threading) analysis of carbohydrate-binding domain sequences
NCBI gi Number
The sequences of putative carbohydrate-binding domains were subjected to fold-compatibility (threading) analysis by the FFAS03 method. The highest-scoring structure hit, and the accompanying Z score, are presented. %ID, percentage identity between the query sequence and the sequence of the solved structure. The specificity of the FFAS03 method is >97% when Z > 9.5.
1Z0M, glycogen-binding domain (GBD) of Homo sapiens AMPKβ1.
1KUM, granular starch-binding domain of Aspergillus niger glucoamylase.
aThe sequence contained in the database for AtSNF4 (gi: 9965729) is truncated at the 5′ end. The amino acid sequence was obtained by translation of the upstream genomic sequence as described by Fordham-Skelton et al. (2002).
bThe NCBI entry for At5g39790 (gi|:18421911) was recently discontinued, and replaced by gi|:42568201. The new sequence has an N terminal extension of 131 amino acids not present in the old sequence.
FOG1 protein Kluyveromyces marxianus var. lactis
Glucose repression protein Gal83p Saccharomyces cerevisiae
Snf1 interacting protein Sip2p Saccharomyces cerevisiae
AMP-activated protein kinase Schizosaccharomyces pombe
AKIN beta1 Arabidopsis thaliana
SNF1-related kinase complex anchoring protein Sip1 Lycopersicon esculentum
Gal83 protein Solanum tuberosum
AKIN beta2 Arabidopsis thaliana
AMP-activated protein kinase beta-2 subunit Homo sapiens
AMP-activated protein kinase beta-1 subunit Homo sapiens
AMP-activated protein kinase beta-1 subunit Mus musculus
AMP-activated protein kinase beta-1 subunit Sus scrofa
G8057-PA Drosophila melanogaster
ENSANGP00000022004 Anopheles gambiae str. PEST
Protein kinase AKINbetagamma-1 Zea mays
Protein kinase AKINbetagamma-2 Zea mays
Putative activator subunit of SNF1-related protein kinase SNF4 Arabidopsis thaliana
DSP, homologue of At3g01510 Oryza sativa (japonica cultivar-group)
DSP Arabidopsis thaliana
Expressed protein Arabidopsis thaliana
Expressed protein Arabidopsis thaliana
DSP, homologue of At3g52180 Lycopersicon esculentum
Plant sequences with the putative carbohydrate-binding domain are predicted to be localized to chloroplasts
Plant sequences were examined for the presence of putative chloroplast transit signals using a number of methods. Several sequences yielded positive predictions of transit signals (Figure 2); the other plant sequences proved negative. Details of the server predictions are presented in Table 3. The strongest evidence for the presence of such a signal is in DSP sequence Os50945153, which is positive in all four prediction methods with high scores/probabilities. The evidence is positive, although not as strong, for two other DSPs (At3g52180, At3g01510) and the AtKIS sequences (At1g27070 and At5g39790): positive in three of four methods, with lower scores/probabilities. The evidence by this criterion is more equivocal for the DSP Le14970762, which is positive according to two methods, and negative for two. Transit signal predictions were followed up by an assessment of the molecular architecture of the N terminus of each sequence. It is expected that a genuine chloroplast transit signal will lie in a region upstream of other established domain homology. This criterion is satisfied for all the above sequences, and is consistent with these positive predictions being accurate.
Table 3. Predictions of chloroplast transit peptides in carbohydrate-binding domain proteins
As detailed in Experimental procedures, sequences were evaluated by the respective web servers. Other plant sequences in the data set were evaluated and found to be negative.
aThis method presents no single numerical estimate of prediction strength.
bA score in excess of the threshold score of 0.42 is considered positive.
cThe reported score is the probability of a plastid transit peptide being present.
dThe method obtains scores for presence of a chloroplast, mitochondrial or signal peptide, or for some other cellular localization. The winning prediction is placed in a ‘reliability class’ (RC) according to the difference between the winning score and the next highest score. The classes, in descending order of reliability, are RC1–RC5.
The Arabidopsis DSP At3g52180 is a carbohydrate-binding protein
To address whether At3g52180 can bind carbohydrate, we cloned and expressed this protein. Glutathione S-transferase (GST) fusions of full-length At3g52180 and the N- and C terminal fragments containing the DSP and carbohydrate-binding domains alone (Figure 3a) were expressed in bacteria and purified (Figure 3b). The fusions are predicted to yield 64 kDa (WT), 51 kDa (N terminal) and 42 kDa (C terminal) proteins. SDS–PAGE analysis (Figure 3b) shows highly enriched bands for these fusions migrating with apparent masses of 66 kDa (WT), 48 kDa (N terminal) and 41 kDa (C terminal), which were confirmed by Western blot analysis to be the GST fusions (Figure 3c). Having purified these recombinant proteins, we tested the ability of each to bind starch in a co-sedimentation assay. As illustrated in Figure 3(d), only the wild-type, full-length protein and the putative carbohydrate-binding domain readily pellet with starch. In contrast, only the DSP domain fused to GST remains entirely in the supernatant fraction, confirming that co-sedimentation of WT DSP is due to binding through the predicted carbohydrate-binding region. Having established that the DSP can bind starch through its carbohydrate-binding domain, we then examined its ability to interact with the polysaccharides amylopectin, amylose and pullulan. As shown in Figure 3(e), the C terminal (CT) or carbohydrate-binding domain readily bound these three polysaccharides. Sequence analysis and comparisons to solved carbohydrate-binding domain structures predicted that W278 and K307 (W19 and K37 in our alignment) might be critical for binding to starch. We mutated these residues to test this idea. Figure 3(f) illustrates that altering tryptophan 278 to a glycine partially abolishes binding; while changing lysine 307 to alanine also reduces binding, but to a lesser extent. The double mutant also displayed reduced binding, but was still able to interact with starch to a limited extent.
The DSP At3g52180 is bound to endogenous starch
Knowing that the enzyme could bind complex polysaccharides, we wanted to examine if this DSP was bound to endogenous starch. An antibody was generated against the CT domain expressed as a GST-fusion protein. A preliminary blot with diluted serum showed a strong cross-reaction with as little as 5 ng DSP–CT GST, and a weak interaction with 50 ng GST or another GST-fusion protein (data not shown). The antibodies that recognize GST were depleted from the serum by incubation with GST–Sepharose. The remaining antibodies were shown to be specific for the DSP–CT (Figure 4a). Another preliminary blot using crude extracts from Arabidopsis, an isolated starch fraction and diluted crude serum that was depleted of GST antibodies revealed a strong cross-reaction with proteins of approximately 34 and 40 kDa, and a weaker band that migrated at approximately 76 kDa. The approximately 34-kDa band was also detected with the rabbit pre-immune serum. To increase antibody specificity, we then affinity-purified the antibodies using GST–DSP–CT coupled to CH-Sepharose. As shown in Figure 4(b), these affinity-pure antibodies could detect as little as 0.1 ng recombinant DSP–CT. These antibodies were then used to probe an Arabidopsis crude extract and starch samples isolated from Arabidopsis suspension cells and seedlings (Figure 4c). The affinity-purified antibodies robustly detect the approximately 40-kDa protein. The approximately 76-kDa protein was weakly detected in the Arabidopsis crude extract and starch sample isolated from Arabidopsis suspension cells, but not the seedlings. No immunoreactive bands were detected when secondary antibody was used alone (data not shown). Incubation of affinity-purified antibodies with an excess of DSP–CT was able to block the signal observed in Figure 4(c) for the 40-kDa protein, but not the approximately 76-kDa protein (Figure 4d). Although running at a position larger than its predicted mass (40 versus 36 kDa), it is likely that the 40-kDa protein is DSP At3g52180, which indicates that the DSP resides on endogenous starch. It is interesting to note that the glycogen-binding mammalian AMPK beta-subunit has a predicted mass of 33 kDa and runs at 38 kDa during SDS–PAGE. As a further control, the same starch and crude samples were examined for the chloroplast-localized protein PII (Smith et al., 2003, 2004a). This protein was robustly detected in the crude fraction but not on the isolated starch, demonstrating that the association of DSP At3g52180 to starch was not non-specific. To examine the DSP location in cells, the same affinity-purified antibody was used for immunological staining of fixed cells. Figure 4(f) shows that the protein is localized exclusively to a small organelle in the cell. and chlorophyll fluorescence reveals this location to be the chloroplast (Figure 4g,h). The staining observed in Figure 4(f) can be blocked with an excess of recombinant DSP, as in Figure 4(d), indicating that the signal is derived from DSP At3g52180. A further control shows no signal from secondary antibody alone (Figure 4j).
Arabidopsis DSP gene expression is correlated with that of starch metabolic enzymes
Gene-expression data were mined from public databases for the two Arabidopsis DSPs and a set of Arabidopsis chloroplast-localized proteins involved in starch metabolism. A convenient tabulation of starch metabolic proteins is given by Smith et al. (2004b) (Table 1). Data from a developmental survey of Arabidopsis (Schmid et al., 2005), obtained using the Affymetrix ATH1 Arabidopsis Genome Array, are presented in Figure 5. There is a strong correlation between the expression pattern of transcripts encoded by the DSP locus At3g52180 and that of transcripts for the glucan-water dikinase protein (GWD1) (locus At1g10760) and transcripts encoding the related GWD3 protein (also known as phosphoglucan-water dikinase, PWD; locus At5g26570). The correlation coefficients for single-gene comparisons are At1g10760/At5g26570 (0.916); At3g52180/At1g10760 (0.880); At3g52180/At5g26570 (0.884). A weaker correlation can be observed between the expression patterns of transcripts of the Arabidopsis DSP locus At3g01510 and the locus At1g03310, which encodes the starch-debranching enzyme ISA2 (correlation coefficient = 0.716). As a control, five genes from the ATH1 gene array were chosen at random. All 10 pairwise comparisons were made between the expression patterns of these genes, and the mean correlation coefficient was 0.024. Comparisons were then made between the pattern of expression of each Arabidopsis DSP and that of each of these five random genes. The mean correlation coefficients for those comparisons are At3g52190/random (0.012); At3g01510/random (−0.072). We analysed additional expression data sets available from the Nottingham Arabidopsis Stock Centre (NASC) collection via the Arabidopsis Membrane Protein Library (AMPL) website (‘Search Expression’ option; data not shown). Across an additional 40 experimental conditions, the mean correlation coefficients for single-gene comparisons are At1g10760/At5g26570 (0.886); At3g52180/At1g10760 (0.901); At3g52180/At5g26570 (0.887); At3g01510/At1g03310 (0.758).
The present results demonstrate that the C terminal non-phosphatase domain of DSP At3g52180 can bind to starch and its component polymers amylose and amylopectin, as well as the polysaccharide pullulan. Furthermore, mutation of two residues (W278 and K307) reduces binding to starch. This is consistent with our phylogenetic analysis and with other recently reported experimental work. The related human laforin DSP possesses a similar domain at its N terminus, which can bind to glycogen and starch but not monomeric sugars (Chan et al., 2004; Fernandez-Sanchez et al., 2003; Ganesh et al., 2004; Wang and Roach, 2004; Wang et al., 2002). Reports by Polekhina et al. (2003) and Hudson et al. (2003) demonstrate that a related domain, which mediates binding to glycogen, is found in the beta-subunit of the human AMPK. Wiatrowski et al. (2004) showed that a similar domain is present in the beta-regulatory subunit of the yeast Snf-1 kinase. Mutations in the W and K residues in the corresponding positions of these proteins (which represent the highly conserved positions W19 and K37 in our alignment) either reduce or abolish polymer binding. Carbohydrate binding has now been demonstrated by proteins widely dispersed in our phylogenetic tree (Figure 2). The high support for the deep nodes of this tree suggests that this is a property derived from a common ancestral sequence. This, combined with the high degree of sequence conservation for critical binding residues demonstrated in Figure 1, suggests that carbohydrate binding might be a property shared by nearly all the sequences in this data set. This is supported by the results of our fold-compatibility (threading) analysis, where all sequences produced a high-scoring hit to a solved carbohydrate-binding protein. These data are consistent with the recent findings of Polekhina et al. (2005) with the solved carbohydrate-binding structure of HsAMPKβ1. Based on modeling work, they predicted that similar plant proteins should be able to bind starch. Taken together, this body of evidence suggests the widespread distribution of an ancient, generalized carbohydrate polymer-binding domain.
The Western blotting results presented here show that the DSP At3g52180 is a starch-binding protein and thus is predicted to be localized to the chloroplast. The chloroplast protein complement in Arabidopsis has recently come under intense and systematic study. Richly and Leister (2004) evaluated the accuracy of four bioinformatic chloroplast transit peptide prediction methods against a large set of proteins of known subcellular localization. They determined that a ‘three of four’ criterion (positive prediction in three of four methods) correlated best with known chloroplast protein localization. Using this criterion, they then projected that the Arabidopsis chloroplast should contain approximately 2000 proteins encoded by the nuclear genome. Kleffmann et al. (2004) used a combination of tandem mass spectrometry and parallel RNA profiling to determine the Arabidopsis chloroplast proteome directly. The set of 690 proteins they found is clearly a fraction of the predicted number, and is consistent with a methodology that would preferentially identify more abundant chloroplast proteins. Our data are consistent with these reports. Five of the six proteins in Table 3 achieve positive chloroplast transit peptide predictions in at least three of the four methods evaluated by Richly and Leister (2004). The exception, the tomato DSP Le14970762 (two out of four positive predictions), may still represent a true positive given that all the other DSPs have stronger positive predictions. Finally, none of the Arabidopsis sequences in Table 2 was present in the positive data set of Kleffmann et al. (2004). This would be the expected pattern of results for low-abundance chloroplast proteins engaged in regulatory functions. Finally, the immunolocalization experiments we present with affinity-purified antibodies show conclusively that the DSP At3g52180 is localized to chloroplasts.
A regulatory role for protein phosphorylation in the chloroplast has been well established (Tetlow et al., 2004; Zer and Ohad, 2003). Protein phosphorylation and the control of plastid-localized metabolic enzymes have been highlighted by the identification of 14-3-3 proteins in the plastid and the identification of several target phospho-proteins (Moorhead et al., 1999; Sehnke et al., 2001). 14-3-3 proteins recognize and bind target proteins at a conserved motif in a phosphorylation-dependent manner. Using anti-sense technology, Sehnke et al. (2001) demonstrated that 14-3-3 proteins play a direct role in starch accumulation. Furthermore, they showed that 14-3-3s associate with the starch granule, and the likely target for starch bound 14-3-3 is starch synthase III. More recently, Tetlow et al. (2004) demonstrated that two starch synthases and two starch branching enzymes, which reside on the starch granule, are phosphorylated. Due to the prevalence of protein phosphorylation as a regulatory mechanism, it is likely that additional metabolic steps in the synthesis and degradation of starch will be regulated in this fashion, but to date no protein phosphatases have been identified that are localized to the starch granule or that regulate starch metabolic enzymes.
Patterns of gene expression strongly support a role for the Arabidopsis DSP At3g52180 in the regulation of starch metabolism. Smith et al. (2004b) noted a strong diurnal pattern in the expression of transcripts of several Arabidopsis genes involved in starch degradation. Among these were the loci At1g10760 and At5g26570. The former encodes the protein GWD1 (glucan-water dikinase) (Ritte et al., 2002; Yu et al., 2001), and the latter GWD3 (phosphoglucan-water dikinase) (Kotting et al., 2005). These chloroplast-localized enzymes add phosphate groups to the amylopectin polymer, which may allow it to be more effectively attacked by other, as yet uncharacterized, degradative enzymes (Smith et al., 2005). Reduction in the level of transcripts of these loci, through either mutation or experimental RNAi manipulation, result in a starch-excess phenotype in which starch accumulated during the light cycle in chloroplasts is not effectively degraded in the subsequent dark cycle. Smith et al. (2004b) also noted several other Arabidopsis genes with transcripts that also showed an identical marked diurnal fluctuation – among these was the DSP locus At3g52180. We have mined an extensive set of publicly available gene-expression data spanning a variety of tissues and growth conditions, which shows an additional strong correlation between the expression patterns of this gene set. This argues that the transcription of these genes is co-regulated and that the proteins constitute part of a common metabolic pathway. In addition, we present weaker data indicating the possibility of a correlation between the patterns of expression of the DSP locus At3g01510 and the locus At1g03310, which encodes a starch-debranching enzyme (ISA2). This protein is not required for starch degradation, but is thought to be essential for starch synthesis (Smith et al., 2005). Smith et al. (2004b) presented data (Figure 5a) indicating that transcripts of the locus At1g03310 show a moderate diurnal pattern, with maximal and minimal values less extreme than for the GWD/At3g52180 transcripts. We have noted that transcripts of the locus At3g01510 show a similar pattern (data not shown).
The data linking protein phosphatases and starch metabolic enzymes are consistent with the importance of protein phosphorylation changes to chloroplast energy metabolism in general. Smith et al. (2004b) emphasize the importance of post-translational changes to the control of starch metabolism, as protein levels of several critical enzymes display little change throughout the light–dark cycle, despite the large diurnal transcriptional changes that they and others (Yu et al., 2001) have observed. Finally, Smith et al. (2005) cite the discovery that the sex4 locus, another gene in which mutations produce a starch-excess phenotype characterized by reduced starch degradation, encodes a protein phosphatase.
Polekhina et al. (2003), in their summary of the potential significance of the presence of a GBD in the beta-regulatory subunit of AMPK, presented a model of glycogen acting as a molecular scaffold or micro-compartment, retaining various metabolic enzymes at its surface. It has been established for many years that the protein kinases and phosphatases that regulate glycogen metabolism in mammals reside, at least in part, on the surface of glycogen, where their substrates are located (Hubbard and Cohen, 1993). It has also been known for some time that glycogen synthase (GS) is an excellent substrate for AMPK in vitro, and it now appears the beta-subunit of AMPK targets this protein to this carbohydrate reserve, potentially to regulate GS (Hardie et al., 2003). The type 1 protein phosphatase (PP1) is targeted to mammalian glycogen by a variety of regulatory glycogen-binding proteins including GM, GL and PTG (R5) (Cohen, 2002; Moorhead et al., 1995). Evidence suggests that, at least in some tissues, the dual-specificity protein phosphatase laforin may participate in glycogen surface-bound regulatory events. The evidence linking laforin to glycogen is much the same as that for the AMPK beta-subunit, with which it shares a glycogen-binding domain. Hardie et al. (2003) suggest that the localization of AMPK to the glycogen granule through the GBD on the beta subunit may be part of a mechanism sensing glycogen levels in the cell. Thus a picture is emerging of a macromolecular complex in which a community of critical regulatory and enzymatic proteins is localized to the glycogen particle. We have performed experiments to examine the percentage of DSP At3g52180 bound to endogenous starch, and this value is quite low (data not shown). It is likely that this result is, in part, an artifact of the tissue source and preparation methods. Arabidopsis suspension cells accumulate relatively little starch, and this fraction was washed extensively to ensure purity, which probably contributed to loss of both starch and bound DSP. However, as we argue below, the association between the DSP and starch is probably a regulatory interaction. This might be expected to consume only a minor fraction of the available DSP in the chloroplast. Finally, circumstantial evidence from glycogen-associated enzymes is consistent with this result. AMPK, only recently shown to bind glycogen, was originally purified as a soluble protein from a liver extract. Glycogen phosphorylase is bound to glycogen particles, yet much of it in a muscle or liver extract is in the soluble fraction. Finally, phosphorylase kinase, although capable of glycogen binding, is readily released from glycogen by dilution and slight pH shift.
Our results raise the interesting possibility that a number of enzymes may reside at the starch granule in plant cells. It is well known that a variety of proteins are bound to the starch granule (Kossmann and Lloyd, 2000; Tetlow et al., 2004). In particular, the starch-degradative enzymes GWD1 and GWD3 are starch granule-bound (Kotting et al., 2005; Lorberth et al., 1998; Ritte et al., 2000). Our phylogenetic analysis shows that proteins present in Arabidopsis, tomato and rice contain a domain closely related to those previously characterized in animal proteins as glycogen-binding. Two of these Arabidopsis proteins were previously classified as DSPs and shown to be similar to the animal protein laforin, based on primary sequence characteristics (Kerk et al., 2002). Our data show that the DSP At3g52180 is localized in chloroplasts, and our bioinformatic analysis suggests the presence of probable chloroplast transit signals on several other plant DSPs. Our biochemical results show directly that the putative carbohydrate-binding domain of At3g52180 can indeed bind starch. A wide assortment of gene-expression data support the co-regulation of DSP At3g52180 and the starch-degradative enzymes GWD1 and GWD3. In summary, all these results suggest the hypothesis that dual-specificity protein phosphatases with a widespread plant distribution might be able to gain access to plastids and bind starch, where they would be well situated to participate in critical interactions regulating its metabolism.
Multiple sequence alignment and phylogenetic tree inference. clustalx and t-coffee (Notredame et al., 2000) were used to generate small multiple sequence alignments from related sequence sets. The effect of scoring matrix and gap-penalty settings was examined using a preliminary data set, and default settings were found to be optimal and used thereafter. t-coffee was used to evaluate alignments, which were then edited by hand and re-evaluated. The composition of small sequence sets was chosen to optimize the final evaluation score. Small sequence alignments were then combined using the profile alignment option of clustalx. Alignments were edited to remove poorly aligned regions as revealed by t-coffee analysis. The original approximately 100-amino-acid conserved domain yielded an alignment of approximately 65 residues after this procedure. Phylogenetic trees were inferred by the neighbor-joining (Saitou and Nei, 1987) component of clustalx, or by maximum parsimony as implemented in the phylip package (Felsenstein, 1996). Maximum-likelihood trees were inferred by the method of quartet puzzling as implemented in the program tree-puzzle (Schmidt et al., 2002). Bayesian analysis was performed as implemented in the program mrbayes (Ronquist and Huelsenbeck, 2003).
Mining of gene-expression data. The gene-expression data obtained using the ATH1 Arabidopsis Genome Array cited by Smith et al. (2004a,b) was obtained from the NASC (European Arabidopsis Stock Center) website (http://affymetrix.arabidopsis.info/narrays/experimentpage.pl?experimentid=60). Additional Affymetrix gene-chip data were accessed at the AMPL website, (Ward, 2001; http://www.cbs.umn.edu/arabidopsis). Arabidopsis gene numbers can be input to the ‘search expression’ feature of this site, which then returns ‘panels’ of bar-graph data from a large set of experiments using the Affymetrix chip. Data were collected and analysed for the DSP genes At3g52180 and At3g01510, and a set of starch metabolic genes. Microsoft excel files of numeric data corresponding to these graphic panels were kindly supplied by Dr John Ward, University of Minnesota.
Cloning, expression and purification of DSP fragments
Three protein-expression clones were constructed. The first, designated DSP-WT, included the complete coding sequence of the Arabidopsis thalianaAt3g52180 gene, plus the authentic stop codon, fused in-frame with an N terminal GST tag. Primers for PCR amplification and engineering of the gene were designed based on the sequence of a full-length cDNA clone (U14967; GenBank Accession AY143878) (Yamada et al., 2003) obtained from the Arabidopsis Biological Resource Center (ABRC, Ohio State University). The sequence of the forward primer [GW-DSP(WT)-FOR] was 5′-GGGGACAAGTTTGTACAAAAAAGCAGGCTCTATGAATTGTCTTCAGAATC-3′. This primer included 31 bases designed to introduce an attB1 recombination site at the 5′ end of the PCR product for use in the Gateway plasmid recombination system (Invitrogen, Carlsbad, CA, USA). All primers were synthesized by Sigma-Genosys (The Woodlands, TX, USA). The reverse primer [GW-DSP(WT)-REV] was 5′-GGGGACCACTTTGTACAAGAAAGCTGGGTTTCAAACTTCTGCCTCAGAAC-3′, and included 30 bases designed to introduce an attB2 recombination site at the 3′ end of the PCR product for cloning. This primer pair was used in PCR with 4 ng plasmid template DNA and 25 μl 2× Easy-A Taq polymerase (Invitrogen) in a 50 μl reaction according to the manufacturer's protocol. The thermal cycler was programmed for one cycle at 94°C (2 min) followed by 20 cycles of: 94°C (30 sec), 56°C (30 sec), and 72°C (90 sec), followed by one cycle at 72°C (4 min), after which the reaction was kept at 4°C for short-term storage. A 5-μl sample of the PCR reaction was checked by standard methods (Sambrook and Russell, 2001) on 1% (w/v) agarose gel in 0.5× TBE buffer, followed by ethidium bromide staining to visualize the DNA.
Fusion proteins between the N terminal GST tag and N- and C terminal fragments of the DSP protein were generated by PCR and subcloned as follows. To produce the clone encoding the N terminal 212 residues (lacking the CBD), designated DSP-N212, a new reverse primer was designed. This primer [GW-DSP(N212)-REV] had the sequence 5′-GGGGACCACTTTGTACAAGAAAGCTGGGTTAGTGTATGTCAACGCAACAG-3′ and was designed to introduce an attB2 recombination site at the 3′ end of the PCR product. This primer also engineered a new stop codon into the clone and was used in PCR, as described above, with the GW-DSP(WT)-FOR primer. To produce the clone encoding the C terminal 126 residues (which included the CBD), designated DSP-C126, a new forward primer was designed. This primer [GW-DSP(C126)-FOR)] had the sequence 5′-GGGGACAAGTTTGTACAAAAAAGCAGGCTCTATGAGGAAGACTGTTACTC-3′ and introduced an attB1 recombination site at the 5′ end of the PCR product, along with a new start codon. This primer was used in PCR, as described above, with the GW-DSP(WT)-REV primer.
Following PCR, the product DNA was purified from unincorporated primers and template DNA using spin columns (Ultraclean, MoBio Laboratories, Inc., Carlsbad, CA, USA) according to the manufacturer's protocol. Approximately 100 ng of each PCR product was reacted with 450 ng pDONR 221 vector (Invitrogen) in a BP clonase reaction according to the manufacturer's protocol. The insert in each of the resulting donor plasmids was transferred separately into the GST fusion vector pDEST 15 (Invitrogen) in an LR clonase reaction according to the manufacturer's protocol, to produce in-frame fusions between the GST and DSP sequences.
The plasmids derived from pDEST 15 (designated pGST–DSP-WT; pGST–DSP-N212; pGST–DSP-C126) were used for expression of GST fusion proteins by transforming into BL21-AI One-Shot competent cells (Invitrogen). Protein was expressed by growing cells in Luria-Bertani medium at 28°C to an OD600 of 0.3, followed by induction with 0.2%l-arabinose (Sigma) for 22 h. Cells were pelleted by centrifugation at 4500 g for 15 min, then resuspended in PBS plus 0.1 mm PMSF and 0.5 mm benzamidine, and snap-frozen. Cells were thawed and passed once through a French press at 15 000 psi, followed by centrifugation at 140 000 g to clarify. Extracts were mixed end-over-end for 1 h at 4°C with 1 ml glutathione Sepharose 4B (Amersham Biosciences, Piscataway, NJ, USA) previously equilibrated with PBS. After binding, the matrix was washed with 50 ml PBS, followed by 150 ml PBS plus 0.5 m NaCl, and finally with 25 ml PBS. Proteins were eluted with 10 mm glutathione in 50 mm Tris pH 8, dialysed into PBS, concentrated to <0.5 ml in a centriprep 30 (Millipore Bedford, MA, USA), then dialysed into PBS plus 50% glycerol for storage at −20°C. Proteins were electrophoresed on 10% SDS–PAGE and blotted to nitrocellulose and visualized by Ponceau S staining, or probed with a monoclonal anti-GST antibody diluted to 1 μg ml−1.
Mutagenesis was performed on pGST–DSP-C126 to produce single and double mutants. To make the W278G single mutation (where 278 refers to the amino acid position in the full-length protein) an oligonucleotide pair with sequences 5′-CTGGCCTTGACATTGGAGGAGGACAGAGGATACCTC-3′ and 5′-GAGGTATCCACTGTCCTCCTCCAATGTCAAGGCCAG-3′ was used in order to convert a UGG codon to GGA (note the corresponding underlined residues in the oligonucleotides) to produce the desired W to G substitution in the expressed protein. A QuickChange II XL site-directed mutagenesis kit (Stratagene, La Jolla, CA, USA) was used according to the protocol provided by the manufacturer. This reaction produced the plasmid designated pGST–DSP-C126(W278G). To make the K307A single mutation, an oligonucleotide pair with sequences 5′-CCTGAAGGACAGTTTGAATATGCTTACATCATAGATGGTGAATGG-3′ and 5′-CCATTCACCATCTATGATGTAAGCATATTCAAACTGTCCTTCAGG-3′ was used in order to convert an AAA codon to GCU (note the corresponding underlined residues in the oligonucleotides) to produce the desired K to A mutation. This plasmid was designated pGST–DSP-C126(K307A). The single-mutation plasmid pGST–DSP-C126(W278G) was then used as the template with the same oligonucleotide pair used to produce the K307A single mutation in a new reaction to produce the double mutant, designated pGST–DSP-C126(W278G/K307A). All three plasmids were sequenced to confirm the presence of the desired mutations and the absence of undesired mutations, and were used to express recombinant proteins as described previously.
The mutated DSP–CT GST fusion proteins were expressed and purified as for the wild-type protein and shown to be pure based on SDS–PAGE analysis (data not shown). GST was expressed as described by Bridges et al. (2005) and purified as for the GST–DSP fusion proteins.
Non-hydrolysed corn starch (Sigma) was made to 200 mg ml−1 in PBS and mixed with DSP fragments (approximately 0.5 μg each) in PBS so that the final concentration of starch was 10 mg ml−1. As the WT and N terminal fragment (DSP domain) were not pure, as judged by SDS–PAGE and Ponceau S staining (Figure 3b), the following amount of each DSP was added to each binding assay to reflect an equivalent amount of DSP fragment in the assay based on the degree of purity [WT, 1 μg; N terminal, 5 μg; C terminal, 0.5 μg]. Binding was performed at room temperature by mixing end-over-end for 30 min. Starch was pelleted by centrifugation at 14 000 rpm for 2 min, and the supernatant removed and boiled in SDS cocktail. The pellets were washed five times with 500 μl PBS by gentle vortexing and centrifugation, as above. The starch pellets were boiled in SDS cocktail to obtain a volume equal to the supernatants for SDS–PAGE and blotting analysis.
Amylose- (type III, Sigma), amylopectin- (Fluka, St. Louis, MO, USA) and pullulan- (Sigma) binding experiments were performed exactly as described for starch, except that the pullulan was pelleted by centrifugation at 75 000 rpm for 30 min at 4°C in a Beckman tabletop centrifuge and the pellet washed 1× with PBS.
DSP–CT antibody production and affinity purification
The DSP–CT GST fusion was expressed and purified as above, and used to make antibodies in a rabbit using standard procedures (Smith et al., 2004a). The diluted crude immune serum was found to detect easily 5 ng DSP–CT GST fusion protein; and to detect weakly 50 ng GST alone or another GST fusion protein (Figure 4a). To remove any antibodies that recognize GST, 5 ml crude serum was diluted twofold with 50 mm Tris–Cl pH 7.5, 0.2% (v/v) Triton X-100, and incubated for 4 h end-over-end with 1 mg GST protein previously coupled to 1 ml CH-Sepharose (Amersham-Pharmacia, Piscataway, NJ, USA). After this incubation, the slurry was poured into a column and the breakthrough fraction collected and used for Western analysis. The rabbit pre-immune serum was prepared and used in an identical fashion. For affinity purification of antibodies, 1 mg DSP–CT was coupled to 1 ml CH-Sepharose using the manufacturer's instructions. Crude serum (5 ml), previously depleted of GST antibodies (above), was incubated with the matrix for 2 h end-over-end, then poured into a column and washed with 10 mm Tris–Cl pH 7.5 plus 0.5 m NaCl. Antibodies were eluted from the matrix with 100 mm glycine pH 1.8, followed by immediate neutralization. Protein was determined with the Bradford assay using BSA as standard. Affinity-purified antibodies were diluted to 0.1 μg ml−1 and secondary antibodies 3000-fold in Western blocking buffer containing 5% (w/v) milk powder.
Immunofluorescence experiments using DSP–CT antibodies
Immunological staining of fixed cell samples was performed as described below. Briefly, A. thaliana suspension cells grown as in (Smith et al., 2003) were fixed by adding a 4% (w/v) formaldehyde solution made in Arabidopsis cell culture media. Cells were fixed for 15 min before being washed with PBS (3 × 1 ml). The cell wall was partially digested with 500 μl of a 0.1% (w/v) pectolyase Y-23 (Seishia Pharmaceuticals, Tokyo, Japan) solution made in PBS, for 15 min at 30°C. The cell membrane was permeabilized by incubation with 1% (v/v) Triton X-100 for 5 min at room temperature. The cells were then washed with PBS (3 × 1 ml), blocked with 5% (w/v) milk powder in PBS for 10 min, then incubated overnight at 4°C in affinity purified anti-DSP–CT antibodies (0.1 μg ml−1) or affinity-purified antibodies that were blocked with a 10-fold excess DSP–CT. The following day, cells were washed with PBS (3 × 1 ml) and incubated for 1 h at room temperature with anti-mouse secondary antibody conjugated to the fluorophore Alexa-488 (Molecular Probes, Eugene, OR, USA) and diluted in blocking buffer. The samples were washed again in PBS (5 × 1 ml) before being analysed with a fluorescence microscope (Leica DMR, Wetzlar, Germany) fitted with the FITC filter set. For the observation of chloroplasts, chlorophyll fluorescence was achieved using the UV filter set. Images were captured using a cooled CCD camera (Retiga 1350 EX; Qimaging, Burnaby, Canada), and image enhancement and deconvolution confocal algorithm manipulations were performed using the openlab software package (ver. 3.0; Improvision, Lexington, MA, USA).
Isolation of starch particles
Wild-type A. thaliana (ecotype Columbia) plants were grown in growth chambers at 24°C under 24-h light at 150 μE. Arabidopsis plants were seeded into soil mix PM-05 Arabidopsis growing medium (Lehle Seeds, Round Rock, TX, USA) and plants were fertilized once a week with 20-19-18 Peters professional fertilizer. One month after the seeding date, Arabidopsis plants were harvested, weighed, and immediately frozen in liquid nitrogen. Plants were then stored at −80°C. Starch was isolated from Arabidopsis plants using an established procedure (Zeeman et al., 1998). Briefly, plants were homogenized in a blender with 2 vol extraction buffer (50 mm Tris–Cl pH 7.5, 1 mm EDTA pH 7.5, 1 mm EGTA pH 7.4, 0.1% (v/v) 2-mercaptoethanol). The extract was filtered through two layers of Miracloth and the supernatant was then filtered through a 37-μm nylon mesh. The insoluble material remaining on the mesh was washed with extraction buffer to remove any residual starch. This wash was combined with the initial supernatant filtrate and starch was pelleted by centrifugation for 10 min at 4500 rpm. The white starch pellet from the spin step was resuspended in extraction buffer, filtered through the nylon mesh, washed as above, and re-pelleted by centrifugation at 4500 rpm for 15 min at 4°C. This step was repeated twice to obtain a purified starch pellet, which was resuspended and boiled in SDS sample buffer. This sample is denoted the seedling starch fraction. The same procedure was performed to isolate starch from Arabidopsis suspension cells, except the cells were broken by one pass through a French pressure cell (Smith et al., 2003).
D.K. was supported by the National Science Foundation (NSF ROA DBI-9975808; MCB-0209686). T.R.C. was supported by the National Science Foundation (MCB-0209686). G.M. is funded by the Natural Sciences and Engineering Research Council of Canada. ABRC (Ohio State University) provided cDNA clone U14967. The authors gratefully acknowledge Dr John Ward of the University of Minnesota for his kind assistance with gene-expression data.