The application of proteomic approaches to the study of mammalian spermatogenesis and sperm function



Spermatogenesis is the process by which terminally differentiated sperm are produced from male germline stem cells. This complex developmental process requires the coordination of both somatic and germ cells through phases of proliferation, meiosis, and morphological differentiation, to produce the cell responsible for the delivery of the paternal genome. With infertility affecting ~ 15% of all couples, furthering our understanding of spermatogenesis and sperm function is vital for improving the diagnosis and treatment of male factor infertility. The emerging use of proteomic technologies has played an instrumental role in our understanding of spermatogenesis by providing information regarding the genes involved. This article reviews existing proteomic literature regarding spermatogenesis and sperm function, including the proteomic characterization of spermatogenic cell types, subcellular proteomics, post-translational modifications, interactomes, and clinical studies. Future directions in the application of proteomics to the study of spermatogenesis and sperm function are also discussed.


two-dimensional electrophoresis


affinity purification MS


heat shock protein


immobilized metal ion affinity chromatography


multiple reaction monitoring


phosphoprotein phosphatase 1


post-translational modification


synaptonemal complex protein


tandem affinity purification


Mammalian spermatogenesis is a precisely regulated biological process resulting in the production of spermatozoa, one of the most unique and highly differentiated cell types. Spermatogenesis consists of three distinct phases within the seminiferous epithelium, all of which are associated with the somatic Sertoli cells. The first phase, the proliferative phase, refers to the mitotic division of spermatogonia, which serves to provide an increased number of germ cells for differentiation and to repopulate the stem cell niche. Next is the meiotic phase, in which tetraploid spermatocytes undergo meiotic division to produce haploid spermatids. The final phase is the differentiation phase, known as spermiogenesis, wherein the spermatids undergo a series of dramatic morphological changes, leading to functional sperm. Although the stages of spermatogenesis are well characterized at the cellular level, the precise biological mechanisms regulating this process are not entirely understood. Enhancing our understanding of spermatogenesis will prove useful in improving the diagnosis and treatment of male factor infertility, a condition that negatively affects the quality of life for over 100 million couples worldwide.

To better understand the process of spermatogenesis, we must uncover which genes are involved, what roles they play, and how they are regulated. To date, targeted mutagenesis studies have produced ~ 400 different knockout mouse models with reproductive defects [1], not limited to the ~ 4% of all mouse genes revealed by transcriptome analysis to be specifically expressed in the postmeiotic male spermatogenic cells [2]; both testis-specific and ubiquitously expressed genes can be found in the list of targeted mutations that only affect spermatogenesis, the latter probably reflecting functional redundancy in most tissues of paralogous genes. This collection of data underscores the complex nature of spermatogenesis in mammals and our need for an increased understanding of the process. In contrast, numerous attempts over the past 10 years, since the publication of the human genome sequence, to identify mutations linked to male infertility affecting ~ 5% of men have been unsuccessful in uncovering good candidates for clinical genetic screening; the only tests routinely used in andrology clinics are for Y microdeletions, chromosome abnormalities, and cystic fibrosis transmembrane conductance regulator mutations, which affect ~ 10% of patients [3, 4]. Emerging proteomic technologies can provide a number of useful tools for studying mammalian spermatogenesis. The use of proteomics is particularly important for spermatogenesis, because the semiquantitative correlation between RNA and protein expression is lower in the testis than in other tissues [5], indicating that oligonucleotide microarray and genomic studies are less informative in this context. One factor that complicates the comparison between RNA and protein expression in the testis is the transcriptional silencing found late in spermatogenesis, which necessitates the storage of earlier-produced transcripts for later use. This was illustrated in a recent isobaric tags for relative and absolute qauntitation-based quantitative proteomics study that identified a large number of proteins for which this is the case, especially during the spermatocyte to round spermatid transition [6]. Compounding these difficulties further is the abundance of tissue-specific alternative splicing observed in the testis [7], one prominent example being the Ppp1 cc gene, which is essential for the completion of spermatogenesis and encodes both the ubiquitous PPP1CC1 and testis-specific PPP1CC2 isoforms [8]. Despite its importance, we have only recently begun to scratch the surface of the potential of proteomic research application to spermatogenesis. As an increasing number of researchers have made use of such technologies, it is not possible to discuss all of the excellent research in the space available. Likewise, it is not our aim to cover the technical details of proteomic methodologies and data analysis. Many of the studies described in this review have utilized model organisms, most prominently the mouse and rat. The precise level of conservation in the testis/sperm proteomes between these species and humans remains unknown, because complete proteome coverage has not been accomplished. However, comparative studies have revealed, that for many genes, there are spermatogenesis-associated homologs that have similar expression patterns even over large evolutionary distances [9], and, in general, conservation throughout mammals is considered to be high. Comparative studies between mouse and human reproductive proteins found good correlations between mice and humans in general, but also that proteins arising from the seminal vesicles were showing a higher rate of divergence [10]. Another comparison of published sperm proteome datasets from a number of species revealed that a number of functionally linked protein groups were conserved throughout mammals [11]. This review will highlight key studies that demonstrate the potential of proteomic research in a number of different contexts – including the proteomic characterization of different spermatogenic cell types and subcellular components, post-translational modifications (PTMs), clinical studies, and protein–protein interaction networks. Proteomic studies of spermatogenesis serve as an important line of inquiry that complements genomic, transcriptomic and epigenetic studies, which are beyond the scope of this review but, together, hold the key to understanding gene regulation during this process. Furthermore, we hope that, by drawing attention to the wide range of currently available datasets, this review will serve as a useful resource for researchers interested in the process of spermatogenesis.

Proteomic characterization of spermatogenic cells

Spermatogenesis includes a number of different cell types, many of which are in close contact in the seminiferous epithelium, the site of spermatogenesis within the testis (Fig. 1). Each spermatogenic cell type represents a step towards the production of sperm, and thus, by characterizing the proteome of the different cell types, we can gain insights into the genes and proteins involved in each step. Current estimates suggest that the human sperm proteome contains approximately 2500–3000 proteins [11]; however, less differentiated spermatogenic cells may contain a much higher number [6], as much of the cytoplasmic material, including organelles, is removed during the final stages of spermiogenesis in order to streamline the cell for motility and fertilization.

Figure 1.

The testis is a complex and dynamic tissue. A cross-sectional view is shown of a single mouse testis visualized by light microscopy with periodic acid–Schiff and hematoxylin staining. Left: a cross-sectional view showing several seminiferous tubules, each with different complements of developing spermatogenic cells. Spermatogenesis progresses in a wave-like pattern along the length of the seminiferous tubule, meaning that different segments of the tubules (cross-section) show different stages of spermatogenic cells. Right: a closer look at a single seminiferous tubule, showing spermatogenic cells at various stages in development. The complex architecture and mixture of cell types at various stages of development makes the testis a challenging tissue to analyze.

A number of studies have utilized whole testis protein extracts to examine protein expression throughout the entire testis in a variety of species. In the mouse, both fetal [12] and sexually mature whole testis extracts [13] have been examined by two-dimensional electrophoresis (2DE) followed by MS. At least three studies have examined the human testis proteome, with varying methodologies [14-16]. Two of these studies are from the laboratory of Sha and colleagues, where, in 2008, with SDS/PAGE followed by LC-MS/MS, 1430 testis proteins were identified [14], and in 2010, 2DE followed by MALDI-TOF MS analysis identified 462 unique proteins [15]. In a more recent study, Li et al. used 2DE and MALDI-TOF MS analysis to identify 725 unique proteins in human testis protein extracts [16]. Although these studies represent some of the most extensive human testis proteome studies to date, < 200 annotated proteins have been found in all three datasets. This suggests not only that experimental variation can result in the generation of very different datasets, but that we are probably very far from reliably characterizing the entire testis proteome, as these studies clearly do not approach the required level of coverage. This highlights key limitations to 2DE-based approaches. Owing to more stringent protein size and dynamic range constraints, and lower resolving power, they are less suitable for the identification of large and diverse sets of proteins than LC-MS/MS-based techniques, and are subject to a greater degree of experimental variation. Consequently, most laboratories are now using LC-MS/MS-based methods for such studies, often in conjunction with a suite of prefractionation and/or differential labeling strategies [6, 17].

Although studies of the whole testis proteome can yield a considerable amount of data, they tell us little about the roles of specific proteins in spermatogenesis, as no information regarding the spatiotemporal pattern of expression is obtained. To gain such information, different approaches are needed, such as following changes in expression during the first wave of spermatogenesis at the onset of puberty, which is synchronous, affording the opportunity to investigate relatively homogeneous cell populations. In contrast, in the adult testis, spermatogenesis has a wave-like pattern along the seminiferous tubules, such that all spermatogenic cell types are represented simultaneously in the testis. Thus, by examining gene expression patterns at different time points, we can gain spatiotemporal information and insights into the potential roles of genes in different aspects of spermatogenesis. Several groups have used such an approach in a variety of different species. One recent study, by Huang et al., used 2DE to analyze changes in testis protein expression patterns in boar spermatogenesis at three time points – 1 week (Sertoli cells and spermatogonia only), 3 months (onset of spermatogenesis), and 1 year (maturity) [18]. The authors were then able to identify 90 differentially expressed proteins via MS. Several studies have used a similar approach in mice, including one that utilized 2DE followed by MALDI-TOF/TOF MS to identify 257 proteins that were differentially expressed between six different time points in the first wave of mouse spermatogenesis (0, 7, 14,21, 28 and 60 days) [19]. The authors then applied clustering analysis of their data, and found six distinct expression patterns that were each enriched for cellular processes specific for particular stages of spermatogenesis (stem cell properties, mitosis, meiosis, spermiogenesis, and fertilization). This analysis allowed the authors to link a number of proteins with unknown functions to specific stages of spermatogenesis, such as heat shock protein (HSP)27 (meiosis) and peroxiredoxin-4 (spermiogenesis), which would not have been possible if they had been looking at sperm alone. The results of these studies show how the nature of the first wave of mammalian spermatogenesis can be used to gain information regarding a protein's possible function, and allow us to infer potential cell type-specific/enriched expression patterns. However, to truly characterize specific spermatogenic cell types, it is necessary to analyze them in isolation.

To date, there have been a number of studies that have sought to characterize the proteomes of isolated germ cell populations. This is done by separating cell suspensions on the basis of size or DNA content with a number of methods – fluorescence-activated cell sorting (DNA content) [20], gravity sedimentation in a StaPut apparatus [21], or centrifugal elutriation [22]. All of the major types of spermatogenic cells – spermatogonia, spermatocytes, spermatids, and sperm – have been isolated and subjected to analysis in order to characterize their proteomes. Some experiments have examined even narrower classes of spermatogenic cells, such as those described by Delbes et al., who performed a proteomic analysis of elongated spermatids [23]. Many of these datasets are publicly available, and provide a valuable resource for other research groups. The first spermatogenic cells to appear in the mammalian testis, the spermatogonia, have, in fact, the least characterized proteome, probably because of technical difficulties in the isolation of purified cell populations. Two early studies from the Pineau laboratory identified 53 and 102 nonredundant proteins from Staput-isolated rat spermatogonia, with 19 proteins in common between the two datasets [24, 25], and another study performed a proteomic analysis on isolated spermatogonial stem cells from adult mice that had been cultured in conditions designed to maintain stem cell-like behavior; although the number of proteins identified was limited, the authors observed minimal differences in the proteomes of the two cell types, which they hypothesized reflects their similar developmental competence [26]. In spermatocytes, the largest proteomic dataset currently available was produced by Guo et al. [27], who identified 3625 unique proteins (3427 unique Entrez genes) in fluorescence-activated cell sorting-isolated primary spermatocytes, including almost 400 testis-specific proteins and 172 proteins associated with meiosis. These included 28 different proteins that had previously been identified as being essential for completion of male meiosis, including the prominent synaptonemal complex proteins synaptonemal complex protein (SYCP)1, SYCP2, and SYCP3. Further analysis revealed a large number of proteins known to be involved in DNA repair and transcription, corresponding to the peak in transcriptional activity that is known to occur in pachytene spermatocytes. The same group used a similar approach to identify 2116 spermatid proteins mapping to 1924 unique genes, with ~ 300 testis-specific proteins represented [28]. The spermatid proteome was found to contain a large number of vesicle-related proteins, reflective of the development of the acrosome in these cells, and the authors identified a novel protein, vesicle-associated mmebrane protein 4, that they linked to this process. As mentioned above, one group isolated a more specific spermatid population – elongated spermatids – and were able to confidently identify 632 proteins with two or more unique peptides [23]. Recently, one highly informative study featured a quantitative proteomic comparison of isolated mouse spermatogonia, pachytene spermatocytes, round spermatids, and elongating spermatids [6], which identified 2008 different proteins, over half of which belonged to one of four expression pattern clusters reflecting important aspects of spermatogenesis, such as mitotic proliferation, meiosis, and spermiogenesis. Found in the cluster of proteins with higher expression in haploid spermatogenic cells were protein phosphatases and kinases, including the testis-specific phosphatase phosphoprotein phosphatase 1 (PPP1)CC2, which plays key roles in spermatogenesis. The authors' comparison of proteomic changes and transcriptomic data further classified genes into five different regulatory mechanisms, including prominent post-transcriptional regulation of gene expression in the testis, illustrating the benefit of considering these types of data side-by-side. This study provides a wealth of information regarding mechanisms for regulation of gene expression during spermatogenesis in general, as well as the ability to see how many individual genes are regulated. In addition, this study represents the largest spermatogonia proteomic dataset published to date [6]. Considering all of these studies together, we can see that different spermatogenic cell types contain different proteomic complements, reflecting the different biological processes involved at different times in germ cell maturation.

In contrast to immature spermatogenic cells, mature sperm do not require any specialized isolation procedures, and thus can easily be collected even from human subjects. As a consequence, there have been significantly more studies of the proteome of mature sperm than of the proteomes of other spermatogenic cell types. The largest number of human sperm proteins identified in a single dataset was 1760, published by Johnson et al. in 2005 [29]; however, the authors did not make their dataset publicly available. The largest publicly available human sperm proteome datasets consist of 1056 proteins comprising Triton-X-soluble and insoluble fractions [30], and 1429 proteins in dissociated head and tail fractions [31]. Similarly, the same group has published high-quality mature sperm proteomic datasets for mouse [32] and rat [33] that represent excellent resources for those researching spermatogenesis. Subsequent bioinformatic analysis of these three sperm proteomes revealed considerable overlap, despite the fact that the sperm proteome has yet to be fully covered [11]. In other species, one very large dataset has identified thousands of proteins in mature bull sperm [34]. However, when high-fertility and low-fertility groups were compared, only 20% overlap was observed, a strikingly low number for such a large dataset. A second examination of this dataset by Baker and Aitken has shown that a large number of the identified proteins are represented by only a single unique peptide, which can lead to a high incidence of false positives, bringing the total number of identified proteins into question [11]. Nonetheless, the dataset can still be a useful resource, provided that certain caveats are considered during analysis. The issue of single-peptide identifications is key when examining proteomic data, and one should always look deeper into the data before interpreting any protein identification as absolute.

The utility of proteomic analysis of whole testes or isolated germ cell populations goes beyond simply cataloging the proteome of the cell types. Proteomic approaches have also been used to ask specific biological questions by an increasing number of groups. For example, the elongating spermatid proteome investigation by Delbes et al. described above compared the proteomic profiles of wild-type and Paip2a/Paip2b double-knockout mice, and identified 29 differentially expressed proteins, several of which were subsequently verified by western blot analysis [23]. As the PAIP2 proteins are important translational repressors in spermatogenesis, differential expression of proteins in this knockout background revealed genes that are regulated post-transcriptionally by this pathway. To examine the effects of Sertoli cell conditional Dicer1 deletion on protein expression throughout the early postnatal testis, Papaioannou et al. performed a quantitative analysis of 130 testis proteins, and found that a large proportion (~ 38%) were upregulated [35], although this study would have been more informative had the authors compared conditional Sertoli-specific Dicer null testes, which completely lack germ cells, owing to Sertoli cell dysfunction, with wild-type testes from mice treated with busulfan, a germ cell toxin that eliminates all germ cells except the stem cells; both are Sertoli cell-only testes, but the former lack Dicer-mediated transcriptional and translational regulation. Another group used a proteomic analysis of enriched mouse spermatocytes to test the effect of androgen deprivation and replacement, and identified 88 differentially regulated proteins, including several with known roles in meiosis, such as HSPA2 [36]. Comparative proteomics has been used to identify proteins that are differentially regulated throughout spermatogenesis by comparing the proteomes of isolated spermatogonia, pachytene spermatocytes and early spermatids via 2D difference in gel electrophoresis. Subsequently, 123 proteins that were differentially expressed between the three stages of rat spermatogenesis were identified [37], including the protein phosphatase PPP1CC2, which is known to be essential for completion of spermatogenesis in mice [8]. These four studies are representative examples of how proteomic analysis of spermatogenic cells has the potential to go far beyond the generation of ‘protein lists’, to answer biological questions regarding male fertility.

Subcellular proteomics in spermatogenesis

Despite significant advances in the resolution of proteomic technologies, to date no one protein extraction/identification method can identify all proteins in the cell, owing to technological limitations, and differences in dynamic range and solubility. Furthermore, the presence of a protein in a cell type does not necessarily tell us much about its function. By breaking the cell down into its constituent parts (be they fractions, structural elements, or organelles), better coverage of the proteome and the subcellular localization of proteins within the spermatogenic cell types can be obtained, which helps to provide functional insights. In fact, the sperm are uniquely suited to this form of analysis. Being arguably the most differentiated cell type in the body, they contain a number of different component parts that can be readily isolated and subjected to proteomic analysis.

Several reproductive biology research groups have taken a subcellular proteomic approach to the study of spermatogenesis. The bulk of this research has focused on sperm, because of their relative ease of isolation and unique morphology (see Table 1 for a list of studies that have examined subcellular sperm fractions). The sperm surface has been one subcellular locale of great interest for proteomic analysis, because of its dynamic nature during epididymal maturation and capacitation, and its importance to fertilization [38-43]. Recently, two groups have published studies that have used different biochemical approaches to isolate sperm membrane fractions containing lipid rafts – dynamic, sterol-enriched and sphingolipid-enriched microdomains that are believed to play key roles in regulating sperm function at the membrane. Asano et al. used a detergent-free approach to isolate three distinct types of mouse sperm membrane raft, as well as a ‘nonraft’ membrane fraction [38]. Following gel-based separation, in-gel trypsin digestion, and LC-MS/MS analysis, they were able to identify 190 proteins between the three sperm raft subtypes, as well as a number of additional ‘nonraft’ proteins on the sperm surface. Nixon et al., conversely, used a mild detergent-based protocol to isolate proteins from the detergent-resistant membrane fraction from capacitated human [42] and mouse [41] sperm, which were then subjected to 2DE and LC-MS/MS analysis. From this analysis, 124 and 100 proteins, respectively, were identified. Interestingly, there were more proteins in common between the two mouse datasets with different methodologies (35) than between the human and mouse datasets (14) that were obtained with the same mild detergent-based method for detergent-resistant membrane isolation. This discrepancy is interesting, but perhaps not surprising, as interspecies proteomic studies have shown that sperm surface proteins are subject to more rapid evolutionary change than other sperm proteins [44]. These studies all identified known zona pellucida-binding proteins, as well as novel cell adhesion and signaling molecules that may play important roles in sperm function and fertilization. Also, the comparison of these datasets illustrates that different strategies for preparation of subcellular fractions produce different pools of proteins – each with their own contaminants and losses. Although they may have considerable overlap, these differences must be taken into account when proteomic data from different sources are analyzed. Furthermore, datasets such as these do not account for quantitative differences between subcellular fractions, or differences in PTMs between fractions.

Table 1. Recent examples of subcellular proteomic studies of sperm. AKAP4, A-kinase anchor protein 4; IZUMO1, Izumo sperm–egg fusion 1; MFGE8, milk fat globule epidermal growth factor 8; ODF1, outer dense fiber 1; PRM2, protamine 2; SPACA1, sperm acrosome-associated 1; TSSK6, testis-specific serine kinase 6; ZPBP1, zona pellucida-binding protein 1
Sperm subcellular fractionSpeciesProtein separation methodMS/MS methodNumber of proteins identifiedExample protein and biological functionReference
MembraneMouse1D SDS/PAGELC-MS/MS190MFGE8, fertilizationAsano et al. [38]
MembraneHuman2D SDS/PAGELC-MS/MS124ZPBP1, binding of sperm to zona pellucidaNixon et al. [42]
MembraneMouse2D SDS/PAGEMALDI-TOF + nanoLC-MS/MS100IZUMO1, fusion of sperm to egg plasma membraneNixon et al. [41]
TailHuman1D SDS/PAGELC-MS/MS1049ODF1, spermatogenesisAmaral et al. [46]
TailHuman1D SDS/PAGELC-MS/MS901AKAP4, sperm motilityBaker et al. [31]
HeadHuman1D SDS/PAGELC-MS/MS704TSSK6, sperm chromatin condensationBaker et al. [31]
NucleusHuman1D SDS/PAGE, 2D SDS/PAGELC-MS/MS403PRM2, DNA packagingde Mateo et al. [49]
Acrosomal matrixMouse1D SDS/PAGELC-MS/MS1026SPACA1, acrosome assemblyGuyonnet et al. [45]

In addition to the sperm surface, a number of other subcellular fractions of sperm have been examined with proteomic methodologies, such as the acrosomal matrix, a critical subcellular compartment during fertilization in which > 1000 proteins were identified [45]. This number is surprisingly large, which could indicate a much more complicated subcellular compartment than initially thought, or conversely, a high degree of false-positive identification/contamination during fractionation. The sperm tail proteome has been investigated intensively in two recent studies. Amaral et al. produced the largest dataset, with a total of 1049 proteins identified in the human sperm tail [46], including a surprisingly high number of peroxisomal proteins, given that the peroxisome is an organelle that is thought to be largely absent in mature sperm. Concurrently, Baker et al. published a study wherein the head and tail portions of human sperm were separated, and each fraction was subjected to proteomic analysis [31]. The resultant dataset identified 900 proteins in the sperm tail and 700 proteins in the sperm head, with just 159 proteins being found in both subcellular fractions. This relatively low degree of overlap points to the specialized nature of the sperm subcellular structures, each with a specific role in the transmission of the paternal genome. Almost half (46%) of the sperm tail proteins identified in the Baker et al. dataset were also identified by Amaral et al. Although this is clearly a significant amount of overlap, it is lower than one might expect, given the size of these datasets in relation to the hypothesized size of the entire sperm proteome, signifying that a substantial portion of the proteome has been missed in all of the experiments conducted to date.

The sperm nucleus is another subcellular structure of great interest in reproductive biology research [47]. During spermiogenesis, the spermatid nucleus undergoes dramatic chromatin condensation. This process is regulated by proteins in and around the nucleus, and is commonly found to be disrupted by gene deletions, resulting in male infertility [48]. It is not surprising that several groups have turned to proteomics to analyze the nuclear content of sperm. One recent study isolated human sperm nuclei to 99.9% purity, and, following gel fractionation of the protein content, identified 403 proteins by LC-MS/MS analysis [49]. Strikingly, more than 50% of the identified proteins had never been reported in human sperm, despite the availability of a number of such datasets (see above). This illustrates one of the benefits of subcellular fractionation prior to proteomic analysis – simplification of the sample leads to an increased depth of coverage and, thus, a greater amount of useful data. To look even closer at the proteins involved in DNA packaging during spermatogenesis, Govin et al. [50] devised a strategy to extract proteins with either the potential to bind DNA (basic proteins) or capable of binding basic proteins (acidic proteins) from isolated mouse stage 12–16 spermatid nuclei. This analysis identified 70 proteins, which were putative DNA-packaging proteins and their chaperones in spermatogenesis. Furthermore, this study highlighted a clear link between proteomics and epigenomics, as the proteins identified provide a clear link to epigenetic regulation of the postmeiotic nucleus. A comparison of five different subcellular human sperm proteomic datasets reveals that a large number of proteins appear to be unique to each individual dataset (Fig. 2). This demonstrates the increased proteome coverage that can be achieved when individual fractions are analyzed, as well as the specialized nature of sperm components. However, as a caveat, some of this interstudy variation could result from differences in analytical strategies, especially with regard to LC-MS/MS analysis and peptide database searching. The amount of this variation that arises from true proteomic differences in subcellular compartments and experimental error remains an open question that future studies should strive to address. On the basis of our analysis, only five proteins were common to all human sperm subcellular fractions (Table 2; see Tables S1 and S2 for full data on overlap between datasets, and a list of proteins common to three or more human sperm datasets). The fact that these proteins are found in all five subcellular proteome datasets does not necessarily imply that they have a role in spermatogenesis or sperm function, and they could simply represent prominent housekeeping genes. In fact, if a protein is restricted to only one highly specialized structure, it may have a specific and essential role in the structure or function of that structure. Collectively, these studies account for > 2200 proteins, approaching the estimated size of the sperm proteome. It is clear from the examples presented above that there is a wealth of information to be obtained when subcellular fractionation is combined with proteomic analysis. Not only can localization information be gained, but increased depth of coverage can also be achieved by simplifying the sample and decreasing the dynamic range. Despite the progress to date, there are still other structures and cell types to be analyzed, and even among those for which datasets exist, much of the subcellular proteome remains undiscovered. For example, many male infertility mutations show similar phenotypes, despite their range of gene ontology functions, often starting at the round spermatid stage, which is characterized by the evolving development of the acrosome. These mouse mutations often mimic the testicular failure phenotypes observed in infertile men. Proteomic analysis of the developing acrosome would be a useful addition to the repertoire.

Table 2. Proteins common to five human sperm subcellular proteome datasets. Testis/sperm specificity data were compiled from, and information regarding the existence of infertile mouse models was compiled from
Official gene symbolUniprot accessionNameTestis/sperm-specific?Infertile mouse model?
RAB2A B2R5W8 RAB2A, a member of the RAS oncogene familyNoNo
HSPA2 Q9UE78 Heat shock 70-kDa protein 2NoYes
VCP Q969G7 Valosin-containing proteinNoNo
SPANXB1 Q5JYZ7 SPANX family, member B1YesNo
LTF Q8IU92 LactotransferrinNoNo
Figure 2.

Overlap in protein identification between subcellular fractions of human sperm. A Venn diagram is shown, depicting the number of proteins that are unique and shared between the datasets from human sperm listed in Table 1. The diagram shows that a considerable number of proteins appear to be unique to each fraction, whereas a few proteins are identified in all five datasets. The Venn diagram was generated with the VIB/UGent Bioinformatics & Evolutionary Genomics web-based tool available at

PTMs in spermatogenic cells

Another area of proteomics that has garnered increased interest over the past several years is the study of PTMs. Once translated in the cell, proteins can be covalently modified in a number of different ways, which can govern the activity of proteins and signaling networks. Identifying PTMs of testis proteins provides a more detailed picture of how those proteins exist in the tissue, and can offer clues regarding their functions. Also, by characterizing changes in PTM status in response to different stimuli or at different time points in spermatogenesis, we can gain even more insights into the regulation of a protein's activity. The analysis of PTMs is especially relevant in mature sperm, as they are both transcriptionally and translationally silent [51], meaning that the bulk of cellular functions must be governed by post-translational modulation of protein function.

Regulated by the opposing activity of kinases and phosphatases, phosphorylation is by far the most studied PTM in spermatogenesis. Following recent technological advances in the isolation of phosphorylated peptides, such as those by Larsen et al. [52], the field of phosphoproteomics has experienced significant growth over the past several years. The study of spermatogenesis has been no exception to this, as an increasing number of groups have performed phosphoproteome analysis in sperm and spermatogenic cells. However, the largest existing testis phosphoproteomic datasets were not produced from laboratories focusing directly on spermatogenesis, but from large-scale studies aiming to map phosphorylation sites across multiple tissues. One large-scale analysis of nine tissues from 3-week-old mice identified > 2500 phosphoproteins in the testis corresponding to > 10 000 phosphorylation sites, following immobilized metal ion affinity chromatography (IMAC) phosphopeptide enrichment [53]. Another phosphoproteomic study utilized titanium dioxide (TiO2) phosphopeptide enrichment on 14 different tissues and organs in the rat, and also identified > 10 000 phosphorylation sites in the testis, including over 200 testis-specific phosphorylation sites, and provided quantitative data by the use of extracted ion chromatograms [54]. These studies, although not directly focused on spermatogenesis, have provided a wealth of data to the research community, and insights into the post-translational regulation of spermatogenesis. As an example, both studies found that the testis was second only to the brain in the number of tissue-specific phosphorylation sites (17% of all identified sites in one experiment [53]), which suggests the possibility that protein phosphorylation may be particularly important in spermatogenesis and the regulation of sperm function.

Aside from these large-scale analyses of the mammalian testis, most groups have chosen to focus on sperm for phosphoproteomic analyses, owing to their ease of retrieval and transcriptional/translational silent status. Baker et al. have published two separate examinations of changes in protein phosphorylation during rat epididymal maturation, using TiO2 phosphopeptide enrichment, and have identified 53and 22 differentially phosphorylated proteins [55, 56]. Perhaps the most extensively studied aspect of sperm maturation via phosphoproteomic analysis is capacitation, which has been examined in the mouse [57], rat [58], and human [59]. Interestingly, no common phosphorylation event was found between species, which points to key methodological differences, as well as the probability that a large portion of the phosphoproteome remains unexamined. This may also reflect evolutionary divergence; analysis of phosphosites, both predicted and known, across a range of species may yield insights into those sites that are biologically relevant (see, for example, Another investigation even performed phosphoproteomic analysis of clinical samples, comparing sperm from fertile individuals with those suffering from asthenozoospermia (reduced sperm motility), and identified 66 differentially phosphorylated peptides [60]. In contrast to those groups examining sperm, our group has focused on the response of developing spermatogenic cells in the testis to the loss of the protein phosphatase PPP1CC, and identified 10 proteins that are hyperphosphorylated in response to this deletion [61]. Presently, we are undertaking a larger phosphoproteomic analysis of the developing mouse testis, which has identified > 700 phosphorylated proteins, with the ultimate aim of identifying candidate substrates of PPP1CC2, the testis-specific isoform (G. MacLeod, P. Taylor, L. Mastropaolo, S. Varmuza, in preparation). However, a key challenge in such experiments is differentiating between changes resulting from perturbation of direct phosphorylation/dephosphorylation events, and indirect effects such as compensatory changes, and or downstream signaling events.

In addition to the growing number of phosphoproteomic studies relating to spermatogenesis and sperm function, a series of other types of PTM have also been catalogued (Table 3). Similarly to the aforementioned phosphoproteomic study, Lundby et al. recently quantified > 15 000 lysine acetylation sites on > 4500 proteins across a number of rat tissues, including almost 2000 proteins in the testis [62]. Lysine acetylation is of particular interest in spermatogenesis, owing to its well-characterized involvement in the modification of histones, which have a central role in spermatogenesis. Protein glycosylation in the mouse testis has been examined in at least two large-scale studies, which identified 239 and 634 unique glycoproteins [63, 64]. One of these studies identified four glycoproteins that were dominantly expressed in the testis over any other tissue – dipeptidase 3, zona pellucida 3 receptor, TEX101, and Dickkopf-like 1 [63]. Other PTMs that have been studied in human sperm include S-nitrosylation, by Lefièvre et al. (240 modified proteins) [65] and, most recently, SUMOylation by Vigodner et al. (55 modified proteins) [66]. Another PTM of interest is ubiquitination; however, to our knowledge, no large-scale analysis of the ubiquitin-modified proteome has been conducted in the mouse testis or sperm, and a recently published survey of several mouse tissues analyzed only liver, kidney, heart, muscle, and brain [67]. However, the spermatogenesis defects in mice lacking functional ubiquitin-conjugating enzyme E2B, a ubiquitin ligase, is a clear sign that this PTM is critical to germline development [68].

Table 3. Recent examples of large-scale studies characterizing PTMs in mammalian testes or sperm. Studies listed are only those that published a complete list of mapped PTMs (i.e. not only those showing significant change). AAL, Aleuria aurantia lectin; ConA, concanavalin A; RCA120, Ricinis communis agglutinin-120; SPEG, solid-phase extraction of N-linked glycopeptides
PTMCell/tissueSpeciesEnrichment methodNo. of modified testis proteinsReference
PhosphorylationTestisMouseIMAC2714Huttlin et al. [53]
PhosphorylationTestisRatTiO23430Lundby et al. [54]
PhosphorylationTestisMouseIMAC + TiO2755MacLeod, Taylor, Mastropaolo and Varmuza (unpublished results)
PhosphorylationSpermHumanTiO2120Baker et al. [58]
Lysine acetylationTestisRatAnti-acetyl-lysine immunoprecipitation1941Lundby et al. [62]
N-GlycosylationTestisMouseSPEG239Tian et al. [63]
N-GlycosylationTestisMouseLectin columns (ConA, RCA120, AAL)634Kaji et al. [64]
S-NitrosylationSpermHumanBiotin-switch assay240Lefièvre et al. [65]
SUMOylationSpermHumanAnti-SUMO immunoprecipitation55Vigodner et al. [66]

It is clear from these studies that the nature of a protein is not fully determined when it is translated – PTMs confer an extremely high diversity of protein forms within a given cell type, a complexity that must be better understood for deciphering of a protein's function. Fortunately, technological improvements in MS instrumentation and protein enrichment are moving us closer to this goal.

Protein–protein interaction networks (interactomes) in spermatogenesis

Proteins rarely, if ever, function in isolation – it is the interaction between numerous protein species that results in functions being performed. Thus, to fully understand a process as complex as spermatogenesis, we must look not only at what proteins are present in a given space and how they are modified, but at which proteins they interact with – the interactome. By characterizing interactomes, i.e. the full complement of protein–protein interactions for a given protein, we have a much better chance of understanding a protein's function than if we look at each in isolation. In the past, the most common approach to studying interactomes was the yeast two-hybrid assay. Although this approach has been quite successful in identifying protein–protein interactions, even with regard to genes that are important to spermatogenesis (e.g. [69-71]), it has several prominent limitations, owing to the artificial nature of the system (e.g. the absence of species/tissue-specific PTMs) and the fact that it looks only at binary interactions. Currently, the yeast two-hybrid system has been largely replaced by affinity purification/MS (AP-MS)-based approaches, featuring either single or tandem affinity purification (TAP) tags. Other commonly employed techniques for uncovering protein–protein interactions, such as coimmunoprecipitation, size-exclusion chromatography, and immunohistochemistry, have been successfully used in the testis; however, they are less amenable to large-scale and high-throughput analysis than AP-MS. These approaches are typically applied to mammalian tissue culture systems, and can simultaneously identify a large number of protein–protein interactions, including multiprotein complexes. AP-MS, when applied with the appropriate controls, also results in the identification of fewer false-positive interactions. Additionally, the use of a mammalian system is, for obvious reasons, preferable to the use of a yeast-based system (see [72] for an overview of such systems). Although tissue culture-based AP-MS systems have been used with some success to examine the interactomes of proteins involved in spermatogenesis [73], they still suffer limitations, owing to their artificial nature. This limitation is particularly important in the study of spermatogenesis.

As outlined in the preceding sections, the testis is particularly abundant in tissue-specific protein expression as well as PTMs, and the complex architecture of the testis cannot be modeled in culture. Thus, if the interactome of a protein involved in spermatogenesis is defined by the use tissue culture, it is likely that a large amount of information will be missed. For this reason, it is important to conduct interactome studies directly in the testis when possible. This prospect is significantly more demanding than tissue culture-based approaches, because of the need to have either highly specific antibodies or the ability to generate a transgenic line with an affinity-tagged version of the gene of interest. Despite these technological demands, a number of groups have successfully performed interactome studies in the testis. In 2009, Chen et al. used immunoprecipitation followed by gel-free LC-MS/MS to characterize the interactomes of MIWI and MILI in the mouse testis [74]. The authors then went on to perform a reciprocal experiment using one of their identified interaction partners, TDRKH, and characterized a multiprotein interaction network in the testis. Similarly, another experiment used a modified immunoprecipitation and MS approach to identify an additional member of the CATSPER complex in the mouse testis [75]. The authors of this study had undertaken this approach because of their inability to successfully express the CATSPER complex in any other system, which underscores the importance of examining protein interactions in their natural environment. Other groups have generated transgenic mouse lines in order to perform TAP directly in the testis. To identify novel interactors involved in Bardet–Biedl syndrome, a common feature of which is male infertility, Seo et al. generated a mouse line expressing a green fluorescent protein-coupled and S-tag-coupled Bardet–Biedl syndrome 4 construct, and performed TAP in transgenic testes, successfully identifying a novel protein complex member [76]. Another TAP experiment in the testis using a mouse expressing TAP-tagged 14-3-3ζ identified a large number of novel protein–protein interactions, although the lack of appropriate control experiments leaves the total number of true interactors in question [77].

The generation of transgenic mouse lines expressing affinity-tagged genes is not a trivial process, and requires a significant amount of time for design and production. For this reason, systems have been developed to aid in the production of affinity-tagged transgenic lines in a more efficient manner. For example, the Floxin vector system can be used to derive knock-in embryonic stem cell lines from gene-trap lines in just a few relatively basic steps [78]. This system has the added benefit of allowing for the introduction of the transgene into the endogenous locus to minimize any overexpression/mis-expression-related artefacts. Our laboratory has used the Floxin vector backbone to generate a streptavidin-binding peptide–3 × FLAG tandem affinity tagged knock-in vector that can be used to produce N-terminally tagged knock-ins of any gene [73]. This system was used to create an embryonic stem cell line expressing streptavidin-binding peptide–3 × FLAG–PPP1CC2, which was used to identify a novel interacting protein, dolichyl-diphosphooligosaccharide–protein glycosyltransferase, with a proposed role in spermatogenesis. However, this system has not yet been used to successfully generate a transgenic mouse for interactome studies. Although culture-based AP-MS strategies remain a powerful tool for interactome discovery, the full power of such approaches can only be harnessed when they are used in a more biologically relevant context. With technological advances in the generation of transgenic animals and AP-MS methodologies, such approaches will probably become more common in the coming years.

The utilization of proteomics in clinical studies of male infertility

The ultimate aim in furthering our understanding of spermatogenesis is to improve the ability to diagnose and treat male infertility. In recent years, a number of investigators have applied proteomic analysis to clinical samples in an effort to go beyond basic research. By analyzing the genital tract proteomes of infertile men, it may be possible to detect aberrations that explain impairments and evaluate the feasibility and course of future treatment. Although it is difficult to obtain a sufficient amount of starting material from spermatogenic cells in the testes of infertile men, ejaculated sperm and seminal fluid are much easier to obtain, and have thus been the subject of the bulk of the studies to date.

The proteome of sperm from infertile donors has been examined by a number of groups, to date primarily by 2DE followed by identification of differentially expressed protein spots via MS. Experiments have identified proteins that are differentially expressed in patients showing generalized infertility [49, 79], low sperm count and motility [80], asthenozoospermia [81-83], Sertoli cell-only syndrome [84], in vitro fertilization failure [85], diabetes, and obesity [86]. Collectively, these studies have been able to identify a large number of candidate biomarkers for sperm defects that could be clinically relevant in the coming years. However, these types of study have had limited impact at the clinical level, in part because of the heterogeneity of the disease paradigms, and because of the difficulty in parsing datasets from tissues that are catastrophically different from controls.

The seminal plasma provides a protective and facilitative environment for sperm transit that is critical in a number of ways for proper sperm function. Therefore, a number of clinical studies have applied proteomic analysis in the hopes of finding useful biomarkers for defects of the male reproductive system. According to Batruch et al. [87], the seminal plasma is an excellent fluid for clinical research, given its ease of collection and the fact that it contains secreted and shed proteins originating from several different tissues, which can offer clues to the origin of clinical defects. Many mouse mutations resulting in testicular failure are characterized by the exfoliation of immature germ cells into the seminiferous lumen, which would then contribute breakdown products to the seminal fluid as they transit the epididymis. The same feature can be seen in infertile men whose ejaculate contains immature germ cells, which are sometimes used for intracytoplasmic sperm injection in the treatment of male factor infertility. Although examinations of seminal plasma proteins have been performed for decades, in recent years the characterization of the fluid has reached new heights, owing to the increased use of LC-MS/MS technologies, leading to the identification of ~ 3000 seminal plasma proteins. In 2006, Pilch and Mann, using Fourier transform MS, identified > 900 proteins in seminal plasma from a single individual, representing, at the time, the largest catalog of proteins in that fluid [88]. Since then, a number of groups have used proteomic approaches to identify novel biomarkers for male infertility [87, 89-93]. One investigation, by Wang et al., utilized one-dimensional SDS/PAGE followed by LC-MS/MS to identify 741 seminal plasma proteins from normal fertile donors and those suffering from asthenozoospermia, including 101 that were differentially expressed between the two groups [93]. The most comprehensive proteomic analyses of the seminal plasma published to date are represented by a series of papers from Jarvi and colleagues, which include label-free quantitative analysis [87, 89, 91, 94]. Collectively, among five published accounts, > 2500 seminal plasma proteins have been identified in samples from fertile donors, postvasectomy patients, and those suffering from nonobstructive azoospermia and prostatitis. From these studies, a large number of potential biomarkers have been identified, including those useful in discrimination between obstructive and nonobstructive azoospermia, which provide a noninvasive diagnostic alternative to the current practices of testis biopsy and histology. Another study incorporated gene expression data from publicly available databases alongside seminal plasma proteomic data to identify biomarkers for pathologies of the male genetic tract [95]. Similar proteomic approaches are being contemplated for the diagnosis of other male reproductive tract disorders and diseases, such as prostate cancer. The flip side to these clinical needs is the development of a nonsteroidogenic male contraceptive; one promising compound interferes reversibly with the action of the testis-specific chromatin regulator BRDT to temporarily interrupt spermatogenesis [47].

One factor that requires further investigation in clinical studies such as those mentioned above is the extent of interindividual variability in the proteomic content of sperm and/or seminal plasma. One study by Milardi et al., which examined the seminal plasma proteomes of five different fertile men, found only 83 proteins in common between all individuals in a study that identified between 919 and 1487 proteins per sample [92]. This number seems surprisingly low, and perhaps points to interexperiment variability as much as interindividual variability. The fact remains, however, that until we better understand what ‘normal’ variation in proteomic content constitutes, it will be difficult to identify a full complement of phenotype-related abnormalities. Despite this limitation, the routine use of proteomic analysis in a clinical setting may be practical in the near future. Alternatively, experiments such as those listed above can be used as discovery tools to produce smaller biomarker panels that are simpler to use and interpret, less expensive, and thus easier to institute in a clinical setting [94].

Emerging proteomic applications and the study of spermatogenesis

In the preceding sections, future applications of spermatogenesis research have been highlighted for different proteomic subdisciplines. If information from all of these types of studies are considered together, a great deal can be learned about a protein's function in spermatogenesis. As currently existing technologies improve and new ones emerge, their application to the study of spermatogenesis will follow. Other methods, such as antibody microarrays, have the potential to generate useful information when applied to the study of spermatogenesis, although this approach has yet to gain purchase in the field (see, for example, [96]). One emerging proteomic technology that will probably be beneficial to the study of spermatogenesis constitutes the interface between imaging and proteomics – MALDI imaging MS. This technique allows for the identification and detection of proteins directly in tissue slices which offers a wealth of spatiotemporal information. This technique has been applied to both the rat testis [97] and the mouse epididymis [98], in both cases allowing the mapping of a number of different proteins throughout those tissues. Despite a high amount of promise, this technique has been held back by a series of technological hurdles; once these have been resolved, this technique could become an extremely powerful new tool.

Technological advances in label-free quantitative proteomics, such as selective reaction monitoring, multiple reaction monitoring, and MS1 filtering, could possibly constitute the most important development in the proteomic study of spermatogenesis in the near future. These technologies give researchers the ability not only to determine which proteins are present, but also to accurately quantitate them in a variety of biological contexts at a resolution far exceeding that obtained with spectral counting-based approaches. Furthermore, label-free quantitative approaches are more amenable for use on tissue samples than labeling strategies such as stable isotope labeling with amino acids in cell culture, and thus may find more widespread use in reproductive biology research. The development of cross-platform, open-source software packages such as skyline for label-free quantitative analysis should allow more researchers to have access to these quantitative proteomic methods, as well as allow for increased comparison of results across multiple laboratories [99]. In fact, our laboratory has recently used this software in a quantitative phosphoproteomic study of the mouse testis (G. MacLeod, P. Taylor, L. Mastropaolo, S. Varmuza, in preparation). Quantitative proteomics should allow for more accurate assessment of sperm and seminal plasma samples in cases of male infertility as well; either in global proteomic analysis of these samples to identify quantitative changes, or, perhaps more likely, to validate potential biomarkers identified in larger studies. For example, Drabovich et al. [94] used selective reaction monitoring with labeled internal standards to reanalyze 20 candidate biomarkers that discriminate between fertile, postvasectomy and nonobstructive azoospermia patients. These approaches could facilitate pilot studies to identify more specific subsets of proteins that could then be quantitated on larger samples with less expensive technologies such as ELISA.

As the technologies discussed in this article continue to improve, data are being generated at an increasing rate. Thus, a pressing issue in the future of proteomics is how to integrate the data from a variety of sources, including genetic studies. A need exists for improved methods for turning ‘protein lists’ into testable hypotheses. Also, questions of the accuracy of much of the data in existing proteomic databases have been raised; a particularly contentious issue is the field of PTM site assignment. However, despite these questions, it is clear that the use of proteomic technologies to study biological processes such as spermatogenesis and sperm function is becoming more and more prevalent and powerful. The field of reproductive biology as a whole will benefit from this, and, as our understanding of these processes improves, our ability to diagnose and treat male infertility will be greatly enhanced.