Glycosylation is the most structurally complicated and diverse type of protein modifications. Protein glycosylation has long been recognized to play fundamental roles in many biological processes, as well as in disease genesis and progression. Glycoproteomics focuses on characterization of proteins modified by carbohydrates. Glycoproteomic studies normally include strategies to enrich glycoproteins containing particular carbohydrate structures from protein mixtures followed by quantitative proteomic analysis. These glycoproteomic studies determine which proteins are glycosylated, the glycosylation sites, the carbohydrate structures, as well as the abundance and function of the glycoproteins in different biological and pathological processes. Here we review the recent development in methods used in glycoproteomic analysis. These techniques are essential in elucidation of the relationships between protein glycosylation and disease states. We also review the clinical applications of different glycoproteomic methods.
Glycoproteomics is the study of proteins containing carbohydrate modifications. Rather than the separate glycan and protein analyses into glycomics and proteomics, glycoproteomics often starts with enrichment of glycoproteins containing particular glycan structures from a protein mixture using innate properties of certain carbohydrates attached to proteins followed by quantitative proteomic analysis using MS 1. Protein glycosylation is the most abundant protein modification and ∼50% of all proteins are estimated to be glycosylated 2, 3. Glycosylation plays fundamental roles in cell functions, such as cell adhesion, migration, and signal transduction 4–7, as well as in disease genesis and progression 8, 9. Aberrant glycosylation has been implicated in many diseases, including immune deficiencies, neurodegenerative diseases, hereditary disorders, cardiovascular diseases, and cancer 10. Therefore, glycoproteomics has emerged as a critical branch of proteomic study and results from these analyses have great potential for clinical applications. To examine disease-related changes in glycoproteins, efforts have been made to develop sensitive, and robust analytical methods of glycoprotein analysis. The purpose of this review is to summarize current strategies and methods used in glycoproteomics and recent achievements toward clinical application of these methods.
2 Glycoproteomics: The strategies and methods
Two types of protein glycosylation are most common: N-glycosylation, where glycans are attached to asparagines in a consensus sequence N-X-S/T (X can be any amino acid except proline) via an N-acetylglucosamine (N-GlcNAc) residue, and O-glycosylation, where the glycans are attached to serine or threonine through acyl linkages. The O-glycosylated proteins include proteins modified by oligosaccharides and proteins that are modified by O-GlcNAc. O-GlcNAc modification of proteins was originally discovered in Gerald Hart's laboratory and the recent comprehensive review from Wang and Hart covers the methods and the recently developed technologies for specific enrichment and detection of O-GlcNAc sites and their quantification 11. Here we will not review the analysis of O-GlcNAc glycoproteins.
N- and O-linked glycosylations are mediated through different biosynthetic pathways and likely to have different functions 12. For example, cell–cell interactions are involved in cancer growth and metastasis, such as cell differentiation, angiogenesis, migration, and invasion. Precise glycosylation of macromolecules is required for these biological processes. Reviewing the roles of O-linked and N-linked glycosylation on the structure and function of glycoprotein hormones, Fares concluded that the O-linked glycosylation plays a minor role in in vitro bioactivity and receptor binding, but is critical for in vivo bioactivity and half-life of glycoprotein hormones 13. In contrast, the N-linked glycosylation plays an important role in bioactivity of glycoprotein hormones.
Since protein glycosylation plays fundamental roles in disease development, numerous attempts have been made to develop methods to identify glycoprotein changes related to diseases. So far, analytical approaches can broadly be categorized as glycoprotein-based analysis or glycopeptide-based analysis. The workflows of the two approaches are illustrated in Fig. 1. The former begins with enrichment of glycoproteins in an attempt to identify which proteins are glycosylated. Various separations, such as size exclusion, ion exchange, affinity chromatography, and chemical immobilization have been applied to enrich protein fractions in glycoproteins. This approach can provide characterization the primary structure of the glycoproteins but may not be able to identify the glycosylation site. In glycopeptide-based approaches, the glycoproteins are digested enzymatically and/or chemically. The resulting mixture is enriched in glycopeptides, and the glycopeptides are deglycosylated and identified by MS analysis. This strategy has been widely used for identification of glycoproteins and their glycosylation sites. It is essential to enrich glycoproteins or glycopeptides present in complex biological samples prior to MS analysis. Several of the most commonly used strategies will be reviewed here; other methods have been reviewed recently 14, 15.
2.1 Enrichment by lectin chromatography
Various lectins have been used for enrichment of glycoproteins or glycopeptides through affinity chromatography 16, 17. Due to lectin binding to distinct oligosaccharide epitopes 18, lectins immobilized on appropriate matrices like membranes, agarose, or magnetic beads can be used to isolate and fractionate glycoproteins on the basis of specific affinities of lectin to different glycan structures 19, 20. However, because individual lectin show unique binding specificity, separation with a particular lectin will enrich in only one fraction of glycoproteins or glycopeptides that bind to that lectin 21–23. Therefore, most lectin affinity chromatography methods use lectins with relatively broad specificity 24, 25. Kaji et al. presented one of the earliest works of identifying N-linked glycoproteins using isotope-coded tagging and con A-affinity column 26.
The limitation of selective capture of a subset of glycoproteins for a given lectin may be overcome by a technique that involves double-lectin chromatography prior to identification with MS 22. Recently, a multi-lectin column has been developed that allows for an almost complete enrichment of glycoproteins from biological fluids 27, 28. Enrichment with certain lectin may be very useful for isolation of glycoproteins or glycopeptides with particular glycan structure. In addition, lectin microcolumns have been generated for high-pressure analytical schemes and can be directly coupled on-line to ESI-MS to enable a highly sensitive semi-automated profiling of glycoproteins 29, 30.
2.2 Enrichment with hydrazide chemistry
The glycopeptide isolation can also be performed by chemical methods. Zhang et al. developed a method that is based on the conjugation of glycoproteins/glycopeptides to a solid support with hydrazide chemistry after oxidation of the carbohydrates; the N-linked glycopeptides are released with PNGase F 31, 32. The advantage of this method is its high specificity; over 90% of isolated glycopeptides are identified as glycopeptides 33. The method simplifies the peptide mixture by only isolating and analyzing on average of one to two deglycosylated glycopeptides from each glycoprotein; the glycoproteins with glycopeptides that are not detectable by MS are not identified. However, the glycoprotein coverage can be improved by using multiple enzymes for digestion, such as trypsin, pepsin, and thermolysin, prior to capturing 30, 31, 34. Using hydrazide chemistry, both complex N-linked and O-linked glycoproteins/glycopeptides are conjugated to solid support via a covalent linkage. However, in contrast to the identification of N-linked glycopeptides, O-linked glycopeptides cannot be specifically released from solid support due to a lack of efficient enzyme. Chemical approaches for the removal of O-linked glycosylation in complex samples, such as β-elimination, have been explored with limited success.
Compared with lectin affinity approach, the hydrazide chemistry enrichment of glycoproteins is based on covalent reaction, which results in less non-specific binding. However, the glycan structure information is lost because all glycopeptides are captured. Lectin affinity approach is relative simple and flexible to use single, combination or series of lectins. However, non-specific bindings of non-glycoprotein to lectin affinity column often occur due to the affinity binding. The outcome of the two methods was evaluated using human cerebrospinal fluid regarding number of identification and capturing specificity 35. The basic conclusions were: (i) many glycoproteins were observed only in one of the methods due to the methods' different mechanisms and structural and complexity of a specific protein; (ii) the hydrazide chemistry method has higher specificity (81%) than the lectin method (69%). A similar observation was reported by McDonald et al. 36. In addition, comparison of enriched different populations of glycoproteins by the lectin and hydrazine chemistry methods were illustrated by Lee et al in analyzing rat liver membrane glycoproteins 37. Proteins enriched by lectins were mostly of high molecular weight and had more potential for involvement in signal transduction and cell adhesion. Conversely, proteins enriched by hydrazine chemistry were mostly of low molecular weight and appear to function as enzymes. These results indicated that the two analytic approaches, lectin affinity capture and hydrazine chemistry, may be complementary to each other for glycoprotein and N-glycosylation site identification.
2.3 Enrichment with boronic acid
Glycoprotein enrichment can also be achieved by reaction with boronic acid 38, 39. The principle of this method is that boronic diesters formed by the reaction of geminal diols with boronic acid are stable under basic conditions. Recently, Sparbier et al. reported using this method to detect low-abundance glycoproteins in human blood samples 39. Xu et al. synthesized a novel diboronic acid functionalized mesoporous silica (FDU-12-GA) and successfully applied it for specific glycopeptide enrichment 40.
2.4 Enrichment with other methods
Hydrophilic interaction using solid-phase extraction has been used for glycopeptide enrichment 41. In addition, Alvarez-Manilla et al. demonstrated that the size-exclusion chromatography enriches N-linked glycopeptides relative to non-glycopeptides. This method is based on the fact that glycopeptides have increased mass relative to non-glycosylated peptides 42.
3 Clinical applications of glycoproteomics
Glycoproteins may have a range of glycoforms. Potential glycosylation sites may and may not be occupied by different glycans and different forms may have different biological functions or result in clinical aberrations. There are several reasons that glycoproteins are of clinical interests: (i) Protein glycosylation not only affects properties of the proteins but also regulates diverse biological functions through particular protein–carbohydrate recognition 43–47. (ii) The aberrant glycosylation of glycoproteins is a fundamental characteristic of disease genesis and progression 48–52. (iii) Extracellular proteins are usually glycosylated 53 and have potential to enter the blood stream; these proteins may provide biomarkers for certain disease states. (iv) By focusing on the glycoprotein subproteome, glycoproteomic analysis greatly reduces sample complexity so that significantly increases the detection sensitivity for low abundance proteins. A number of publications have reported the clinical applications of glycoproteomic analyses and demonstrated the utilities of different glycoproteomic methods. Here, we include a few representative examples.
3.1 Plasma biomarker discovery
Human blood has been the subject of disease-related biomarker discovery because blood sampling is relatively non-invasive and contains valuable markers of physiological and pathological conditions. A good example of a plasma biomarker for disease diagnosis is the prostate-specific antigen (PSA) used for prostate cancer screening and treatment monitoring. PSA is expressed specifically in prostate tissue and its level in blood is associated with disease progression, which makes it clinically useful.
Plasma proteomics is challenged by detection of low abundance proteins. This is due to both the high dynamic concentration range of plasma proteins (12 orders of magnitude) and the fact that the plasma proteome is dominated by a few highly abundant proteins and their diverse forms. To overcome this challenge and increase our ability to analyze low abundance proteins that are relevant to certain diseases, depletion has been used to remove highly abundant proteins from sera or plasma. However, the most abundant proteins are not easily removed completely. One feasible approach to increase the limit of detection for plasma proteome is to combine the glycopeptides isolation methods with other analytical techniques to focus on a particular interest subset of the proteome. In general, glycopeptides are present in relatively low abundance (2–5%) in peptide mixtures compared to non-glycopeptides. The number of identified low abundance proteins is dramatically increased if glycoprotein enrichment applied to sera. Glycoproteomics shows promise in plasma biomarker discovery, are not only because of the reduced complexity of the analyses, but also because this subproteome is rich in disease-related information. Aberrant glycosylation has been implicated in many diseases, including various cancers. Therefore, the glycoproteins secreted from tumor can serve as potential targets for disease diagnosis. So far, many clinical biomarkers are glycoproteins, such as Her2/neu for breast cancer, PSA for prostate cancer, and CA 125 for ovarian cancer.
A number of recent studies showed that disease-related glycoproteins have been directly identified from sera or plasma. Different glycosylation profiles were observed in human normal and lung adenocarcinoma sera by Hongsachart et al.54. Using wheat germ agglutinin lectin, this group identified 39 differentially expressed glycoproteins in lung adenocarcinoma serum samples, including 27 up-regulated and 12 down-regulated proteins. In addition, the correlations with lung cancer development of three up-regulated glycoproteins (adiponectin, cerulolasmin, and glycosylphosphatidyl-inositol-80) and two down-regulated glycoproteins (cyclin H and Fyn) were validated by western blot analysis. These particular glycoproteins may be useful for the early detection of lung cancer and for monitoring the disease progression. Another example, a mixed column of Jacalin, Con A, and wheat germ agglutinin was used to perform a comparative glycoproteomic analysis of sera from breast cancer patients 27. Some low abundance proteins, such as neuropilin-1 and pregnancy zone protein, were identified and a number of proteins associated with lipid transport and cell growth were detected of changes in expression. Soltermann et al. applied hydrazide chemistry to capture the glycopeptides from malignant pleural effusions of patients with lung cancer and controls 55. They were able to analyze the glycoproteins with low protein concentration range (μg/mL to ng/mL) and identified several proteins associated with tumor progression or metastasis, e.g. CD44, CD166, CA-125 and lysosome-associated membrane glycoprotein 2 (LAMP-2).
To further reduce sample complexity, glycoprotein capture has been applied to depleted plasma or serum. Qiu et al. identified several potential markers that distinguish colorectal cancer from adenoma and normal cells using lectin affinity chromatography after immuno-depletion of the most abundant plasma proteins 56. Using a glycoprotein-based strategy, Ueda et al. demonstrated that lectin-coupled ProteinChip technology enabled high-throughput and specific recognition of cancer-related aberrant glycosylations 57. In this study, a three-step approach was performed: (i) depletion of 14 abundant proteins from the serum sample, (ii) enrichment of glycoproteins with lectin-coupled ProteinChip arrays, and (iii) MS analysis using an acidic glycoprotein-compatible matrix. Liu et al applied subsequent chemical fractionation based on cysteinyl peptide and N-linked glycopeptides isolation to the serum with immune-depletion of abundant serum proteins 58. This approach identified 2910 unique N-glycopeptides resulting in 662 N-glycoproteins and 1553 N-glycosylated sites. In this study, numerous low-abundance plasma components, such as 78 cytokines and cytokine receptors and 136 human cell differentiation molecules, were identified as well.
Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related deaths in the world, partially as a result of late diagnosis of HCC. In an attempt to discover serum markers for early detection, Block et al. identified proteins with aberrant fucosylation in this disease state 59. To focus on the low abundance proteins, Comunale et al. depleted HCC serum of the 12 most abundant serum proteins and then identified fucosylated proteins that were changed in HCC using targeted lectin extraction 60. Studies have demonstrated that these proteins could be used in HCC diagnosis using analysis of 300 patient samples using lectin-ELISA. In their study, fucosylated hemopexin showed an excellent correlation with diagnosis with an AUROC (area under the receiver operating characteristic curve) of 0.95, achieving 92.0% in sensitivity and 92% in specificity (Fig. 2).
3.2 Mapping glycosylation sites on proteins of interest
Protein glycosylation is heterogeneous as proteins may have various potential glycosylation sites and a number of carbohydrate modifications for each site. Analysis of protein glycosylation has taken two main directions: One is to analyze proteins and glycans from glycoprotein mixtures; the other is to analyze the occupancy of glycosylation sites or determine the expression level of proteins with particular glycosylation sites. Both are thought to correlate with disease states.
Disturbances in the glycoconjugate biosynthesis result in diseases with heterogeneous biochemical and clinical characteristics 61. Congenital disorders of glycosylation (CDGs), a family of N-linked glycosylation defects, are associated with severe clinical features. To investigate the correlation of the degree of N-glycosylation site occupancy and the severity of the disease, Hulsmeier et al. quantified the extent of under-glycosylation in CDG and healthy controls using isotopically labeled standard peptides and multiple reaction monitoring (MRM). In healthy controls, the peptides, transferrin and α1-antitrypsin, had 98–100% occupancy of all N-glycosylation sites, whereas the level of glycosylation site occupancy in CDG samples was decreased to a variable extent for each individual N-glycosylation sites and extent of the decrease was correlated with the severity of the disease 62.
Glycosylation of platelet proteins is critical for cell–cell recognition, ligand binding, pathogen binding, and antigen presentation. Therefore, analysis of platelet protein glycosylation will further our understanding of platelet biology. To elucidate the occupancy of N-glycosylation sites of human platelet proteins, Lewandrowski et al. enriched glycopeptides using both lectin affinity chromoatography and hydrazide beads followed by MS analysis 63. Over 70 glycosylation sites from 41 unique proteins were identified; the majority had not been identified in previous studies. Among those proteins, immunoglobulin receptor G6f, the protein known to be involved in downstream signaling of the Ras-mitogen-activated protein kinase pathway in the immune system, was for the first time identified in human platelets.
The HIV-1 envelope protein (Env) plays a key role in mediating viral entry and fusion to host cells and is thus a major target for HIV vaccine development. Glycosylation contributes critically to HIV pathogenesis by impacting folding of the envelope spike and therefore affecting the protein's antigenicity and immunogenicity 64–68. Furthermore, glycosylation is involved in HIV immune-evasive mechanisms by conformationally masking epitopes and through glycan shielding 69–73. The HIV-1 envelope has at least 24 potential N-linked glycosylation sites and its transmembrane subunit gp41 has four or five potential N-linked glycosylation sites located in the extra-viral domain 74. The gain or loss of glycosylation at these sites can significantly change the biological activity of the envelope spike. Attempts have been made to define the glycosylation pattern of gp120 of several HIV strains 75–80. Recently, Go et al. characterized the glycosylation pattern of HIV-1 envelope by elucidating the occupied glycosylation sites and the type of glycan modification on each glycosylation sites 81. The glycosylation pattern of an rVV derived synthetic Env immunogen, CON-S, and wild-type clade B Env protein, JR-FL were determined (Table 1). This study revealed that not all Env glycosylation sites are utilized in Env; this data may guide design of effective immunogens against the HIV virus.
Table 1. Glycosylation sites detected in CON-S and JR-FL 81
4 The future direction of glycoproteomics in clinical applications
Changes in protein glycosylation are clearly a feature of disease progression. These modifications have important biological functions and have potential value as targets for diagnosis, treatment, and therapy of various human diseases. In the future, glycoproteomics will move toward functional study of glycoproteins. Glycoproteomics will play a key role in analysis of the function of glycosylation in pathological processes, such as tumor metastasis and inflammation. The challenge of dynamic ranges of detection needs to be addressed for plasma proteome. In addition, analytic methods are needed to achieve more accurate and reproducible quantitation.
The authors have declared no conflict of interest.