Address correspondence and reprint requests to Seth G. N. Grant, The Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK. E-mail: email@example.com
Characterization of the composition of the postsynaptic proteome (PSP) provides a framework for understanding the overall organization and function of the synapse in normal and pathological conditions. We have identified 698 proteins from the postsynaptic terminal of mouse CNS synapses using a series of purification strategies and analysis by liquid chromatography tandem mass spectrometry and large-scale immunoblotting. Some 620 proteins were found in purified postsynaptic densities (PSDs), nine in AMPA-receptor immuno-purifications, 100 in isolates using an antibody against the NMDA receptor subunit NR1, and 170 by peptide-affinity purification of complexes with the C-terminus of NR2B. Together, the NR1 and NR2B complexes contain 186 proteins, collectively referred to as membrane-associated guanylate kinase-associated signalling complexes. We extracted data from six other synapse proteome experiments and combined these with our data to provide a consensus on the composition of the PSP. In total, 1124 proteins are present in the PSP, of which 466 were validated by their detection in two or more studies, forming what we have designated the Consensus PSD. These synapse proteome data sets offer a basis for future research in synaptic biology and will provide useful information in brain disease and mental disorder studies.
sodium dodecyl sulfate–polyacrylamide gel electrophoresis
An important goal of cognitive science is to identify neural mechanisms for processing information. The synapse both transmits information between cells and processes it by detecting specific patterns of neural activity and converting this electrical activity into intracellular biochemical events that change the properties of the neuron (Greengard 2001; Kandel 2001). In recent years it has become clear that synapses, like other cell–cell interactions in metazoans, utilize signal transduction complexes and pathways with a high degree of molecular complexity and cross-talk (Pawson and Nash 2003; Sheng and Kim 2002). A major challenge is to devise strategies that extract emergent physiological properties and simple biological principles from highly complex genomic and proteomic data sets, and shed light on both mechanisms and disease processes. The synapse proteome is a suitable prototype to explore these general issues for several reasons. First, it contains a highly localized set of proteins found in dendritic spines. Second, an important role for signalling complexes and pathways has been established. Third, signalling can be studied using patterns of action potentials and, finally, genetic and pharmacological perturbations result in behavioural changes. However, the molecular complexity and global organization is not well understood.
The neurotransmitter glutamate activates synaptic plasticity primarily via the ionotropic NMDA receptor and metabotropic (mGluR) receptors. This leads to an increased Ca2+ level in the dendritic spine and signal transduction to α-amino-3-hydroxy-5-methylisoxazole-4-propionate (AMPA) receptors and other effector mechanisms. The cytoplasmic C-termini of NMDA receptor subunits (NR2A/ɛ1, NR2B/ɛ2) bind PDZ domains of PSD-95, a membrane-associated guanylate kinase (MAGUK) protein. Previous proteomic analysis of NMDA receptor–PSD-95 complexes identified 77 proteins including mGluR receptors, whereas AMPA receptors were found in different complexes (Husi et al. 2000). These complexes are embedded in the postsynaptic terminal of excitatory synapses with other postsynaptic proteins.
Here we present a large-scale proteomic analysis of the postsynaptic proteome (PSP) of mouse brain. We have performed a MS-based analysis of MAGUK-associated signalling complexes (MASCs), AMPA receptor complexes (ARCs) and isolated PSDs, and validated a set of MASC and PSD proteins by large-scale immunoblotting. In addition to generating novel data, we have systematically compiled data from publicly available sources to produce a comprehensive data set of the composition of the PSP. This includes data on the PSD and various subcomponents, namely NMDA receptor complexes (NRCs), ARCs and MASCs.
MASCs were isolated using a peptide affinity method modeled on the PDZ binding domain of NR2 subunits as reported elsewhere (Husi et al. 2000). ARCs were isolated using immuno-precipitation of AMPA-receptor subunits with an anti-GluR2 antibody (BD Biosciences, Oxford, UK). PSD fractions were prepared as described previously (Carlin et al. 1980). Protein samples were separated by sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS–PAGE) and stained with colloidal Coomassie blue. The entire gel lane was excised into 42 individual protein bands, reduced, alkylated and digested with trypsin.
The resulting peptide mixtures were analysed by on-line liquid chromatography tandem MS (LC–MS/MS) to generate peptide sequence information (Link et al. 1999). Chromatographic separations of the peptide mixture were performed on a 180-µm (intensely stained bands) and 75-µm (weakly stained bands) PepMap column using an Ultimate LC system (LC Packings, Amsterdam, The Netherlands) delivering a gradient of formic acid (0.05%) and acetonitrile. The LC column was connected directly to a micro-electrospray interface on a Q-TOF mass spectrometer (Micromass Ltd, Manchester, UK). The mass spectrometer operated in automated function switching mode, selecting precursor ions based on intensity for peptide sequencing by tandem MS. Several hundred MS/MS spectra could be generated per run allowing the analysis of complex mixtures without any prior interpretation. The search engine used in this work was Mascot (http://www.matrixscience.com/). Data were reduced to MassLynx ‘PKL’ format peak lists before searching. Proteins were identified based on matching the MS/MS data with mass values calculated for selected ion series of a peptide. The searches on an in-house non-identical protein database were performed without applying any constraints on molecular weight or species of origin. Most proteins were identified with several peptide matches, although a few proteins were assigned on the basis of a single peptide provided that a near-complete peptide sequence had been obtained.
Protein samples were subjected to reducing SDS–PAGE and transferred to polyvinyldifluoride membrane (Bio-Rad, Hercules, CA, USA) at 4°C for 90 min at 75 V in 10% (v/v) methanol, 10 mm 3-(cyclohexylamino)-1-propane-sulfonic acid (CAPS), pH 11.0. Dilution of primary antibodies was between 1 : 100 and 1 : 1000, depending on the quality of the IgGs. Detection of signals was carried out using peroxidase-linked secondary IgGs and enhanced chemiluminescence.
Composition of the PSP
We isolated MASCs using a previously described peptide affinity method (Husi and Grant 2001). We used a peptide corresponding to the six C-terminal residues of NR2B that binds MAGUK proteins (including PSD-95), as a specific affinity ligand to isolate NMDA receptor–MAGUK-containing complexes. Purified complexes were separated by SDS–PAGE, and bands were excised and digested and analysed by MS. Western blotting of peptide-purified complexes for candidate proteins was also performed. These complexes are similar in composition to NRCs and MAGUK complexes isolated with antibodies (Husi and Grant 2001; Husi et al. 2000). Of the 100 proteins previously identified in the NRC by analysis of anti-NR1 immuno-purifications, 84 were also found by the peptide approach. In addition, 86 new proteins were identified by the peptide approach, many of which were in the molecular weight range obscured by the contaminating antibody chains in the immuno-purification strategy. Because of the large overlap, we combined data sets of proteins found in both the NRC and the MAGUK-pep (NR2B peptide-affinity purification) under the term MASCs. Ninety-five MASC proteins were detected by western blotting of purified complexes and 35 of these were validated by MS data. The MASC set represents a comprehensive account of NMDA receptor–MAGUK complexes identified by large-scale western blotting and MS-based analyses of both antibody and peptide affinity-based purifications.
We performed immuno-precipitation of the ARC with an antibody directed against the GluR2 subunit and found a significantly smaller complex of nine proteins by MS analysis, which included all four AMPA receptor subunits.
In order to gain an overall view of the PSP and to validate the components of the MASCs and ARCs, we performed a systematic analysis of purified PSDs. PSD proteins were separated by SDS–PAGE (Fig. 1) and 37 gel slices were analysed by LC–MS/MS. Some 7402 peptides were identified, corresponding to 620 non-redundant protein identifications (hereafter referred to as the G2C PSD data set), with an average of 34 proteins identified per gel slice (corresponding to an average of 16.7 unique proteins per gel slice) (Supplementary Table 1). These data sets are available at http://www.genes2cognition.org/science/proteomics/synapse.html.
Integration of PSP data sets
To evaluate the coverage of our PSD data set, we compiled data from six published proteomic studies of the PSD and 119 individual papers reporting proteins localized to the PSD. A number of different sequence databases were used in these studies for searching peptide mass spectra searching and, as a result, we decided to use UniProt (http://www.expasy.uniprot.org/) as our standard protein identifier. In a few cases protein identifiers could not be mapped to UniProt, so their original identifiers were retained. We observed redundancy in data sets, with the same protein accession numbers present numerous times, as well as many accession numbers corresponding to the same gene product. We then mapped UniProt accession numbers to Unigene (http://www.ncbi.nlm.nih.gov/) via EMBL and Entrez Gene to cluster the merged Total PSD data set to remove redundancy. This process enabled grouping of numerous entries into Unigene clusters and resulted in reduction of the dataset to 1124 non-redundant entries (Supplementary Tables 2 and 4). An evaluation of the relative coverage of each component PSD data set is presented in Supplementary Table 3. This matrix of percentage coverage shows that the G2C data set has the highest coverage of each data set in the group, indicating that this set of proteins is the most representative and complete single data set.
We addressed the issue of bias arising from the use of different methodologies by comparing the distribution of protein identifications. The analytical approaches used in these six studies varied from use of one- and two-dimensional electrophoresis for protein separation, ICAT (Isotope-coded affinity tag) for sample simplification, to a number of peptide chromatographic methods and MS methods (Supplementary Table 3). We compared our data set (one-dimensional SDS–PAGE and LC–MS/MS); the data set of Yoshimura et al. (2004) (multi dimensional liquid chromatography–MS/MS) and the data set of Li et al. (2004) (two-dimensional gel/ICAT and matrix-assisted laser desorption ionization–time of flight, LC–MS/MS) by generating a two-dimensional scatter plot of log Mr versus pI (Supplementary Fig. 1). It can be seen that the two-dimensional LC–MS/MS approach used by Yoshimura et al. (2004) has the benefit of identifying many low molecular weight proteins as this approach does not utilize any gel electrophoresis step in which these proteins would be lost. Our data set is biased towards integral membrane proteins, with this class representing 24% of our data set. Both two-dimensional LC and two-dimensional gel electrophoresis are poorer approaches for identifying membrane proteins, with a representation of 17 and 7% transmembrane proteins respectively.
The distribution of protein identifications in the Total PSD data set shows that the majority of proteins were detected only once (58%) (Fig. 2a) and that 198 proteins (18%) were detected twice. The number of proteins detected five times or more was just 72 (6%), although this value is limited by the smaller data sets included in the Total PSD data set. In order to define a set of higher-confidence proteins we placed proteins that were identified two or more times (466 proteins) in a group termed the Consensus PSD (cPSD). We reasoned that the majority these proteins are abundant, so are more likely to be detected in multiple analyses, and that the set of proteins detected once are perhaps of lower abundance or may reflect differences in sample preparation. The coverage of each individual data set with respect to the consensus and Total PSD data sets is illustrated in Fig. 2(b). Some 363 proteins in our data set have been validated by their presence in one or more of the other data sets, representing 78% coverage of the cPSD. In addition, our data set contains 257 unique proteins, representing 39% of proteins detected once.
Verification of PSP proteins
We sought to verify the subcellular location of a selected set of PSD and MASC proteins by large-scale immunoblotting of whole extract, synaptosomes and PSDs from mouse forebrains. Forty-eight commercially available antibodies were tested on these three fractions to demonstrate the presence and enrichment of postsynaptic proteins (Supplementary Table 5). Overall, we found that 21 proteins (44%) were clearly enriched, eight (17%) showed similar abundance, 13 (27%) were present but not enriched and six (12%) were undetectable in isolated PSDs compared with other brain fractions (Supplementary Figs 2 and 3). Thirty-three of the 48 proteins tested are cPSD proteins and 15 of these displayed enrichment in the PSD fraction. Proteins contained in the MASC data set were identified in purifications from whole-brain extract and, although 144 of of 186 have been verified by their identification in the PSD fraction, the remainder of this data set requires validation. Fifteen antibodies against MASC proteins were used to confirm their presence in the PSD; of these eight were not shown to be PSD proteins, by MS, and seven were found in the PSD once by MS. We confirmed the PSD localization of 14 of the 15 MASC proteins tested (Supplementary Figs 2 and 3), indicating that the majority of MASC proteins are probably bona fide PSD proteins.
In addition to verification by immunoblotting, we performed extensive searching in PubMed for PSD localization data for these 1124 proteins (Supplementary Table 2). We found that 119 (10.6%) of these proteins have been previously reported to be localized in PSD. As shown in Supplementary Table 3, we identified more known PSD proteins (66% of PSD proteins from the literature) in our analysis of the PSD than did other similar analyses, indicating the high coverage of our data set.
Classification of PSP proteins
The initial approach to characterizing the organization of the PSP was to classify the proteins into categories that describe their known functional properties (Table 1). Within the overall PSP, representatives of many classes of proteins representing a broad range of cell biological functions were observed: membrane-bound receptors, adhesion proteins and channels; a plethora of signalling proteins and adaptors; and proteins involved with regulation of transport, RNA metabolism, translation and transcription. The cPSD contains similar numbers of proteins in most classes as our PSD data set, with the exception of channels and receptors, signalling proteins and synaptic vesicle proteins. This becomes evident when the cPSD is compared with the Total PSD; the reduced presence of these protein classes in the cPSD is most probably due to the lower abundance of these proteins, and thus a decreased likelihood of detection. This supports the idea that the cPSD represents a consensus of protein identifications which, on one hand, decreases individual data set bias for contaminating proteins (nuclear or mitochondrial) but, on the other, has limited coverage of some protein classes because of abundance issues. It is worth noting that the G2C PSD data set contributed all three tyrosine kinases found in the PSD, an indication of the sensitivity of our approach as tyrosine kinases are of relatively low abundance. The G2C PSD data set also has more coverage of most protein classes than individual data sets that constitute the Total PSD.
Table 1. Summary of PSD protein classifications
Proteins found in the Total PSD data set, our PSD data set and the cPSD data set are classified functionally to show protein class enrichment. The distributions of our PSD data set and cPSD protein classes are similar except for signalling molecules and enzymes, which are particularly enriched in our PSD data set. In addition, the main contribution of low-abundance proteins such as tyrosine kinases and phosphatases to the Total PSD data set is from our PSD data set. In fact, our study was the only one to identify tyrosine kinases in the PSD, further supporting the increased sensitivity of our approach. Details for individual proteins and further subclassifications are found in Supplementary Table 2.
Channels and receptors
G proteins and modulators
Signalling molecules and enzymes
Transcription and translation
Cytoskeletal and cell adhesion molecules
Synaptic vesicles and protein transport
Protein domain analysis
Do synapse proteomes contain particular functional subsets of proteins? We examined their protein domain profiles using the InterPro resource (http://www.ebi.ac.uk/interpro). All domains in three PSD data sets [G2C PSD (594 proteins with 1709 domains), the Total PSD (1032 proteins with 2786 domains and the cPSD (447 proteins with 1298 domains)] were identified and ranked by frequency (% of proteins in each PSD data set with a specific domain) and compared with the UniProt mouse proteome (http://www.ebi.ac.uk/integr8). The enrichment of the top 10 domains found in the cPSD was compared with the Total and our PSD data set (Supplementary Fig. 4). The greatest enrichment in these data sets is for protein interaction domains and, in particular, the cPSD has a striking enrichment of PDZ and SH3 domains (7.7- and 7.2-fold respectively). This is in agreement with the notion that the cPSD contains abundant proteins such as synaptic scaffolders containing PDZ and SH3 domains. The most common domains in all three data sets are associated with kinase function and the enrichment profile for these domains is similar across these data sets. This enrichment in kinase domains correlates with enrichment in other domains commonly found in protein kinases, which allow them to interact in networks (e.g. SH3, PDZ interaction domains (Manning et al. 2002).
Importantly, domains involved with Ca2+-dependent signalling, a major feature of postsynaptic signal transduction, were also highly abundant (C2, C2 calcium/lipid-binding). Similarly, members of the Ras GTPase superfamily were enriched by over 5-fold in the Total PSD data set. Conversely, some domains that are highly abundant in the mouse proteome were absent from all PSD data sets, such as olfactory receptor homeobox domains (second and 19th most abundant domains respectively). The same signalling domains enriched at the synapse are known to have expanded in the genome with the evolutionary step from single to multicellular organisms (Manning et al. 2002). Some 62% of domains found in proteins contained in the Total PSD are present in Saccharomycescerevisiae, indicating that a large proportion of the basic building blocks of mammalian synaptic proteins are conserved from before the metazoan expansion. The complement of these domains increases after this expansion with Caenorhabditis elegans, Danio rerio and Drosophila melanogaster having 85, 86 and 87% of Total PSD protein domains respectively. This indicates that after this expansion there was an increase in the order of 20–30% in the appearance of new domains as well as domain shuffling to produce new proteins.
Multiprotein complexes in the PSD
Previous proteomic studies showed that NMDA receptor–PSD-95 complexes were 2–3-MDa particles, which included mGluR receptors, and that AMPA receptors were in different complexes (Husi and Grant 2001; Garry et al. 2003). The mGluR5 receptor complex has recently been characterized in a similar fashion to the MASC and was found to contain 76 proteins, including the NMDA receptor subunit NR2A (Farr et al. 2004). The majority of proteins in all three glutamate receptor complexes have been independently identified as PSD proteins (Fig. 3). These complexes also contain proteins that have not yet been detected in PSD analyses, presumably owing to abundance issues in analyses of PSD preparations compared with enriched protein complexes. We have validated the presence of seven such proteins in the PSD by immunoblotting and this has been taken into account in the numbers shown in Fig. 3. It is interesting to note that the majority of the components of these complexes have been validated as PSD proteins and that overlap between these complexes occurs in the cPSD (Fig. 3), supporting the notion that the cPSD is an important subset of the Total PSD data set. This is consistent with a postsynaptic organization in which the MASC is connected to multiple cell biological effector mechanisms organized into their respective complexes. In addition to MASCs and ARCs, components of other complexes, such as cell adhesion, growth factor, cytoskeletal, transport and ribosomal complexes, were detected.
We have presented a comprehensive analysis of isolated PSDs and embedded multiprotein complexes associated with NMDA and AMPA receptors. We have combined existing synapse proteomic data with that reported here to provide the most complete picture of the PSP to date. To our knowledge, the PSP is one of the most complicated subcellular structures in the eukaryotic cell but, unlike other subcellular structures, the PSP appears to hold a high degree of cellular autonomy. The composition of the PSP in terms of functional protein classes is diverse, with the necessary molecular machinery for not only classical synaptic processes and a myriad of signalling pathways, but for a host of supporting activities such as protein synthesis and degradation, transport and metabolism. Moreover, 10% of PSP proteins are novel and may participate in synaptic processes that are currently unknown.
It is clear that there are a large number of proteins at the postsynaptic membrane that coordinate a wide variety of cellular processes. Is there evidence of functional organization at the postsynaptic membrane? We have observed organization at many levels; from the classes of proteins present at the PSD, the enrichment of scaffolding and signalling domains in PSD proteins, to macromolecular organization in terms of multiprotein complexes. These levels of organization contribute to, or are dependent on, protein–protein interactions and so mapping and understanding protein–protein interactions in the PSP will be very useful. Protein interactions are essential to the functionality of proteomes because proteins rarely act alone, but rather in complexes. Large-scale mapping of protein interactions in the yeast proteome reveals highly complex networks of protein interactions (Fromont-Racine et al. 1997; Gavin et al. 2002). To facilitate analysis of the organization of the PSP at the level of protein interactions we have constructed an in-house database (http//:http://www.PPID.org). This database is a mammalian (mouse, rat, human) Protein–Protein Interaction Database (PPID) describing approximately 8000 biochemically defined protein interactions for more than 2000 proteins. So far we have performed literature mining for PSP protein interactions and have found 650 protein–protein interactions for 281 PSP proteins (of which the majority are MASC proteins). We are using these data to perform network analyses to investigate protein interaction network architecture in the PSP (Grant 2003a). In addition to literature mining, we are performing immuno-affinity purifications on PSP proteins to map additional multiprotein complexes at the synapse.
The large number of proteins found at the PSD, and the presence of many ribosomal, mitochondrial and nuclear proteins, has prompted speculation over the purity of isolated PSD fractions used in proteomic analyses. There are numerous examples of such proteins being validated as true PSD proteins. Some proteins, such as voltage-dependent anion channel 1, which was originally found in the outer mitochondrial membrane, is present in the PSD (Moon et al. 1999). Nuclear proteins such as AIDA-1 A, mKIAA0417 and hnRNP localize to both the nucleus and the PSD (Jordan et al. 2004). It is apparent that many proteins do not exist in discrete locations or indeed possess the same functions in different cellular contexts.
The data presented and compiled in this study represent a draft of an average isolated PSD. The starting material for the analyses incorporated into our draft PSP are from different species (mouse and rat) and from different brain regions (forebrain and whole brain) and encompass hundreds of neuronal cell types. In addition, it is probable that some components of the MASC are interacting with NMDA receptors during their trafficking, localization and internalization. It is now is becoming feasible to compare different types of synapse proteomes. This could be achieved by subtractive proteomics (Andersen et al. 2003), a quantitative approach whereby synaptosomes and isolated PSDs could be isotopically tagged (ICAT or iTRAQ (isotopic tags for relative and obsolute quantification)) and the resulting quantitative MS data would indicate enrichment or depletion of proteins in the PSD compared with synaptosomes. This type of analysis would provide confident assignment of true PSD proteins, highlight contaminating proteins and would validate the PSD localization of proteins currently considered to be restricted to other cellular compartments.
In addition, isolated PSDs or protein complexes from different brain regions or cultured cell types could be compared quantitatively using isotopic labelling. Alternatively, peptides derived from SILAC (stable isotope labelling with amino acids in cell culture)-labelled cells could be used quantitatively to compare the expression of proteins in many tissue samples (Ishihama et al. 2005). Using these approaches, proteins enriched in a particular cell type or brain region would be identified as being potentially more relevant to the physiology of that particular cell or tissue type. It is at this stage that systematic immuno-cytochemistry of these enriched proteins would be feasible and would provide complementary information regarding their specific protein localization at the PSD and throughout the cell type in question. Current projects, such as Gene Expression Nervous System Atlas (GENSAT), in which large-scale in situ hybridization and Bacteriol Artificial Chromosome (BAC) transgenic reporter genes are being used to map gene expression in neuronal cell types, will provide complementary information to the proteomic experiments suggested here.
This draft PSP data set constitutes a candidate gene list for the Genes to Cognition research programme in which the functions of a large number of these genes are being studied by systematic targeted mutations in mice (Grant 2003b). Comprehensive phenotypic analysis of these mutants in terms of biochemistry, electrophysiology, cell biology and behaviour is allowing functional annotation of PSP components. This approach, together with single-nucleotide polymorphism screening in humans for all genes in the PSP, will produce systematic data relevant to synaptic physiology and disease. Establishment of this PSP data set (available at http://www.genes2cognition.org/science/proteomics/synapse.html) is essential for the progression of synapse proteomics and, with new quantitative proteomic strategies that may be applied to activated or diseased states of the synapse, will also provide a benchmark for analysis of dynamic aspects of the synapse proteome.
This work was supported by the Genes to Cognition programme (Wellcome Trust) (to MOC, HH, LY, JMB, CNGA, JSC and SGNG) and the Biotechnology and Biological Sciences Research Council (to MOC). For detailed contributions of authors to this work see http://www.genes2cognition.org.