Plasma proteome plus site‐specific N‐glycoprofiling for hepatobiliary carcinomas

Abstract Hepatobiliary cancer is the third leading cause of cancer death worldwide. Appropriate markers for early diagnosis, monitoring of disease progression, and prediction of postsurgical outcome are still lacking. As the majority of circulating N‐glycoproteins are originated from the hepatobiliary system, we sought to explore new markers by assessing the dynamics of N‐glycoproteome in plasma samples from patients with hepatocellular carcinoma (HCC), cholangiocarcinoma (CCA), or combined HCC and CCA (cHCC‐CCA). Using a mass spectrometry‐based quantitative proteomic approach, we found that 57 of 5358 identified plasma proteins were differentially expressed in hepatobiliary cancers. The levels of four essential proteins, including complement C3 and apolipoprotein C‐III in HCC, galectin‐3‐binding protein in CCA, and 72 kDa inositol polyphosphate 5‐phosphatase in cHCC‐CCA, were highly correlated with tumor stage, tumor grade, recurrence‐free survival, and overall survival. Postproteomic site‐specific N‐glycan analyses showed that human complement C3 bears high‐mannose and hybrid glycoforms rather than complex glycoforms at Asn85. The abundance of complement C3 with mannose‐5 or mannose‐6 glycoform at Asn85 was associated with HCC tumor grade. Furthermore, stepwise Cox regression analyses revealed that HCC patients with a hybrid glycoform at Asn85 of complement C3 had a lower postsurgery tumor recurrence rate or mortality rate than those with a low amount of complement C3 protein. In conclusion, our data show that particular plasma N‐glycoproteins with specific N‐glycan compositions could be potential noninvasive markers to evaluate oncological status and prognosis of hepatobiliary cancers.


Introduction
Hepatobiliary cancer ranks sixth in the world among all malignancies and is the third leading cause of cancer mortality. Hepatocellular carcinoma (HCC) is the most common primary hepatic malignancy, with an average survival period between 6 and 20 months [1]. Risk factors for HCC include chronic hepatitis B or C virus infection, alcoholic liver disease, steatohepatitis, and liver cirrhosis. Cholangiocarcinoma (CCA), appearing as an intrahepatic type, a perihilar type (also known as Klatskin tumor), or a distal extrahepatic type, is the second most common liver cancer [2]. In contrast to the high prevalence of HCC (more than 700 000 new cases diagnosed every year globally), CCA has an annual incidence rate of approximately 2 per 100 000 people in western countries and 5 per 100 000 people in northeastern Asia [3][4][5]. Nevertheless, the overall incidence of CAA has increased over the past four decades. Risk factors for CCA include primary sclerosing cholangitis, liver fluke infection (Opisthorchis viverrini), chronic ulcerative colitis, biliary malformation (choledochal cysts or Caroli's disease), and thorotrast [6].
Diagnosing hepatobiliary cancers at an early stage remains a challenge owing to its 'silent' clinical characteristics (most patients with early stage disease are asymptomatic), its difficult-to-access anatomical location, and its highly desmoplastic phenotype [2]. Currently, surgery works better than chemotherapy, immunotherapy, and radiotherapy for HCC and CCA. However, only a small group of patients are amenable to resection or liver transplantation. To improve early diagnosis, disease progression monitoring, and postmedication evaluation of these aggressive tumors, exploiting new tests have become imperative topics.
The hepatobiliary system synthesizes the majority of plasma N-glycoproteins. Studies have shown that an aberrant serum/plasma N-glycome during liver cirrhosis [7][8][9][10] or HCC [11][12][13][14][15][16][17] reflects an unhealthy status of the liver. The clinical implications of glycoscience in oncology have become clear and have impacted significantly. It is reasonable to assume that the delineation of glycosylation pattern at a single protein-single site level may not only manifest the feature of tumors with higher sensitivity than total protein N-glycome but also holds great specificity for distinguishing different hepatobiliary cancer types that are hard to pinpoint in the initial stage. Therefore, we executed a quantitative proteomic investigation with site-specific glyco-profiling to identify noninvasive N-glycoprotein/ N-glycoform markers from plasma samples of patients with HCC, CCA, or combined HCC and CCA (cHCC-CCA). From this has grown the hope that oncomedicine based on glycoproteins in liquid biopsies can be tailored in addition to conventional medical imaging.

Study design and patients
This study was approved by the Institutional Review Board of National Cheng Kung University Hospital (NCKUH) (No. B-ER-103-133). Plasma samples, clinical data, laboratory data, TNM tumor stage, and tumor differentiation grade of patients with CCA (n = 60), HCC (n = 148), and cHCC-CCA (n = 12) were obtained from the Tissue Bank, Research Center of Clinical Medicine, NCKUH. All the patients were anonymized. Participants in the control group (n = 95), who were negative for hepatobiliary diseases, were enrolled from the Health Examination Center of NCKUH. Informed consent was obtained from each subject of the control group. All plasma samples were stored at −80 C until they are used.
Albumin and IgG depletion, protein trypsinization, and N-glycan removal Five microliters of plasma in 100 μl of 1× phosphatebuffered saline were incubated with 100 μl of Cap-tureSelect™ Human Albumin Affinity Matrix (Life Technologies, Carlsbad, CA, USA) and 50 μl of Protein G-sepharose beads (GE Healthcare, Piscataway, NJ, USA) at room temperature for 2 h with gentle inversion. After centrifugation, the unbound proteins in supernatants were collected and kept on ice. The beads were washed with 200 μl of 1× phosphate-buffered saline three times. All washes and the unbound proteins were combined together as the albumin-IgG depleted fraction. Proteins that bound to the beads were eluted using 0.1 M glycine-HCl (pH 2.8) at room temperature for 10 min with vigorous vortexing as the albumin-IgG enriched fraction. Two fractions of proteins were both denatured using 10% sodium dodecyl sulfate plus 10 mM dithiothreitol at 95 C for 10 min and alkylated with 10 mM iodoacetamide at 37 C in dark for 1 h. Salt removal and protein concentration were conducted using Amicon Ultra-0.5 ml centrifugal filter (molecular weight cut off 3000 Da) device (Merck Millipore, Darmstadt, Germany). Devices were washed with 500 μl of deionized water three times. Concentrated proteins were quantified and half of them were treated with Peptide-N-Glycosidase F (PNGase F; New England Biolabs, Ipswich, MA, USA) at 37 C overnight to remove N-glycans. Proteins with or without N-glycans were then digested with sequencing grade trypsin (Promega, Fitchburg, WI, USA) in an enzyme-to-substrate ratio of 1:50 at 37 C overnight. The tryptic peptides were vacuum dried and stored at −80 C until they were used.

Liquid chromatography-tandem mass spectrometry analysis
Peptides from 1.5 μg of protein samples were analyzed using Ultimate 3000 RSLC system (Dionex, Sunnyvale, CA, USA) coupled with a Q-Exactive mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA). Mobile phase A was 0.1% fluoroacetic acid and mobile phase B was 0.1% fluoroacetic acid in 95% acetonitrile. The liquid chromatography (LC) separation was performed using a C18 column (Acclaim PepMap RSLC, tific) with the gradient consisting of (1) a linear increase from 1% to 25% B over 45 min, (2) a linear increase from 25% to 60% B over 10 min, and finally (3) an isocratic elution at 80% B for 10 min at 250 nl/min for separation. A full mass spectrometry (MS) scan was performed over the range of a mass-to-charge ratio from 300 to 2000 with a mass resolution of 140 000. The 10 most intense ions from MS scan were subjected to fragmentation for MS/MS analysis.

Bioinformatic analysis of the glycoproteome
For protein identification, the raw LC-MS/MS data were processed into peak lists by a Proteome Discoverer 1.4 for Mascot database (version 2.4.1, Matrix Science Ltd., London, UK) search against the Swiss-Prot_2015_07 database (548 872 sequences; 195 617 763 residues) with the following parameters: enzyme, trypsin; missed cleavages, 1; peptide mass tolerance, 10 ppm; fragment mass tolerance, 0.05 Da; fixed modification, carbamidomethyl (C); variable modification, oxidation (M), deamidated (NQ). The algorithm for protein quantification from large-scale identification data by LC-MS/MS has been previously described [18,19]. In brief, the exponentially modified protein abundance index (emPAI), which was calculated by the number of sequenced peptides per protein, was used to relatively estimate the amount of each protein in a database search. Then, the percentage of each emPAI from the summation of the emPAI values for all of the identified proteins was used to calculate the content of each protein. For the proteins that were detected in both albumin-IgG-enriched and albumin-IgG-depleted fractions, a higher value of protein content and the associated fraction were selected. Regarding the postproteomic N-glycan analysis, amino acid residues located before and after peptide sequences were merged to avoid a missing identification in the consensus motif for protein N-glycosylation (Asn-Xxx-Ser/Thr, where Xxx can be any amino acid except proline) after the protein trypsinization. N-Glycopeptides were verified by the presence of a deamidation reaction of the Asn residue on this consensus motif. GlycoPeptideSearch was used to assign glycopeptide first [20]. Oxonium ions in the collisioninduced dissociation MS/MS spectra, which displayed a specific set of Y-ions consisting of intact peptides with various attached glycan moieties, were used for the identification of glycopeptides. GlycomeDB database was then applied to confirm glycan structures attaching on the glycopeptides by searching the molecular weights of intact glycopeptides that were consistent with MS/MS spectra. All the results of site-specific glycan analyses were checked manually. Analysts were blinded to any information about the subjects.

Enzyme-linked immunosorbent assay
Levels of complement C3 and galectin-3 binding protein in plasma were measured using the Human Complement C3 ELISA kit (ab108823; Abcam, Cambridge, UK) and the Human Galectin-3BP ELISA kit (ab213784; Abcam), respectively. The level of apolipoprotein C-III in plasma was measured by a direct ELISA method as previously described [21].
Statistical analysis SPSS 18.0 for Windows (International Business Machines Corporation, Armonk, NY, USA) was used for most statistical analyses. Continuous variables were compared using Mann-Whitney U tests for two independent groups or Kruskal-Wallis tests with Dunn's post hoc tests for three or more groups. Nominal variables were compared using Fisher's exact tests or Pearson Chi-square tests. The Pearson correlation coefficient (r) was used to evaluate the relationship between two factors. The analyses and Venn diagrams for proteins and peptides were obtained using InteractiVenn (http:// bioinfogp.cnb.csic.es/tools/venny/) [22]. Receiver operator characteristic curves were used to identify proteins expressing differentially in hepatobiliary cancers (the area under the receiver-operating characteristic [ROC] curve >0.7 and p < 0.00001). Kaplan-Meier analyses and log-rank tests were used to assess the significance of proteins on recurrence-free survivals and overall survivals. Stepwise Cox regression analyses were used to identify factors that were associated with tumor recurrence and mortality of the patients. Significance was defined as p < 0.05. All p values were two-tailed.

Characteristics of the patients
There was no gender difference between each patient group and the control group (Table 1); however, a male-predominant gender distribution was found in the HCC group when comparing to the CCA group (see supplementary material, Table S1). Three groups of patients had no age difference with each other (see supplementary material, Table S1) but they were all older than the controls (Table 1). Patients with HCC or cHCC-CCA had abnormal alanine transaminase (ALT) and aspartate aminotransferase (AST) levels. Moreover, all groups of patients had a higher level of alkaline phosphatase (Alk-P) and a lower level of albumin than the control group. Hematological tests revealed that all patients had a higher white blood cell count and patients with HCC or CCA had lower levels of red blood cells and hemoglobin than the controls. In addition, α-fetoprotein level was abnormal in patients with HCC or cHCC-CCA while carbohydrate antigen 19-9 (CA 19-9) level was abnormal in patients with CCA or cHCC-CCA. More than 80% of the patients with HCC or cHCC-CCA had hepatitis B or C virus infection and more than one-fifth of them had fatty liver. Approximately 60% of the patients with HCC had been diagnosed with liver cirrhosis. Percentages of the patients with tumor stage greater than 3 were 24% in HCC, 45% in CCA, and 42% in cHCC-CCA, respectively. More than 70% of the patients with HCC or cHCC-CCA had recurrent tumors within a 5-year posthepatectomy follow-up.
Five-year survival rates in the three groups of patients were all lower than 40%.

Plasma N-glycoproteome profiles in hepatobiliary cancers
A flowchart of this study is shown in supplementary material, Figure S1. A total of 43 236 peptides ( Figure 1A, upper left panel) derived from 5358 proteins ( Figure 1A, upper right panel), of which 721 were commonly expressed, were identified in plasma from all participants. A total of 1555 proteins were detected in both the albumin-IgG-enriched and albumin-IgGdepleted fractions. The number of common proteins was 2015 between the control and the HCC groups ( Figure 1A, upper right panel; the intersection of two sets; n = 387 + 865 + 721 + 42), 1861 (865 + 721 + 58 + 217) between the control and the CCA groups, and 2092 (446 + 865 + 721 + 60) between the HCC and the CCA groups. However, the number of    Figure 1A, lower left panel) originated from 1152 proteins ( Figure 1A, lower right panel) were N-glycosylated. There were 246, 172, 17, and 180 N-glycoproteins that were uniquely detected in the HCC patients, the CCA patients, the cHCC-CCA patients, and the controls, respectively. Looking at the 203 Plasma N-glycoproteome for HCC, CCA, and cHCC-CCA proteome and peptidome profiles in each group, the patients with HCC had the lowest median number of proteins (303) and peptides (2086) in plasma (see supplementary material, Figure S2). However, they had a higher percentage of N-glycoproteins (49.8%) than did the controls (47.8%, p < 0.001) and the patients with CCA (48.0%, p < 0.001).
Statistical analyses (the area under the ROC curve >0.7 and p < 0.00001) of protein contents revealed a remarkable upregulation of 24 proteins and a downregulation of 33 proteins in hepatobiliary cancers ( Table 2). The fold change of 12 differential proteins was greater than 10 when compared to the controls ( Figure 1B and see supplementary material, Table S2). Overall, the transcript level of these genes in cancerous tissues corresponded with their protein contents in plasma (see supplementary material, Table S3). Thirtyone differential proteins were detected with one or more N-glycosylation sites. The contents of 12 and 9 proteins were particularly high and low in HCC, respectively (see supplementary material, Table S4). Moreover, the content of eight proteins in intrahepatic CCA differed from those in perihilar CCA (see supplementary material, Table S5). Only two of these differential proteins were slightly influenced by the age (see supplementary material, Table S6). Proteins that were affected by hepatitis B or C virus infection, liver cirrhosis, or steatosis in HCC were also illustrated (see supplementary material, Table S7).

Essential factors for tumor stage, differentiation, and prognosis
We next assessed the clinical relevance of the protein contents by emPAI % value of these 57 candidates in different hepatobiliary cancers. Proteins that were associated with the tumor stage, tumor grade, recurrence-free survival, and overall survival of HCC, CCA, and cHCC-CCA, respectively, are shown in Figure 2A and see supplementary material, Figures S3  and S4. Of note, the emPAI % values of complement C3 and apolipoprotein C-III were associated with the tumor progression and prognosis of HCC, as galectin-3-binding protein in CCA and 72 kDa inositol polyphosphate 5-phosphatase in cHCC-CCA. Strong correlations between emPAI % values and actual amounts of these essential proteins were observed (see supplementary material, Figure S5).

Site-specific N-glycan profiling
Apolipoprotein C-III and 72 kDa inositol polyphosphate 5-phosphatase were not N-glycosylated. An N-glycan structure analysis was performed only for the glycopeptides containing Asn85 of complement C3 and was excluded for glycopeptides containing Asn939 of complement C3 and glycopeptides corresponding to galectin-3-binding protein because of weak signal intensities of these fragments in the mass spectra. In regard to the glycosylation pattern of complement C3 Asn85, the patients with HCC or cHCC-CCA had a higher proportion of Hex7HexNAc2 (mannose-7; Man7) glycoform than the patients with CCA and the controls (see supplementary material, Table S8). Furthermore, when compared with the controls, all the patient groups had lower proportions of Hex5HexNAc2 (mannose-5; Man5) and Hex6HexNAc3SA1 (hybrid) glycoforms. We next analyzed the clinical relevance of each glycoform of at Asn85 of complement C3 in HCC. The proportion of each glycovariant in the patients with HCC was shown in supplementary material, Table S9. The concentration of complement C3 with Man5, Man6, or Man7 glycoform at Asn85 closely linked to the tumor grade ( Figure 2B) and the association was stronger than did α-fetoprotein, a renowned HCC biomarker (see supplementary material, Figure S6). The glycoprofile of complement C3 Asn85 was independent to the age (see supplementary material, Table S10). Results from Kaplan-Meier analyses showed that levels of total complement C3 protein and certain C3 glycovariants were associated with the recurrence rate and the mortality rate of HCC ( Figure 3A,B). Stepwise Cox regression analyses revealed that tumor stage, AST, complement C3 with Man5 glycoform, and complement C3 with hybrid glycoform were independent factors for the recurrent HCC (Table 3). Furthermore, tumor stage, albumin, liver cirrhosis, and complement C3 with hybrid glycoform were associated with the mortality rate of HCC. The correlation of complement C3 bearing Man5 or hybrid glycoform with the postsurgery prognosis of HCC was stronger than the total complement C3 protein level (Table 3).

Discussion
Hepatobiliary cancer is highly progressive. Despite a wide array of tumor markers and treatment options, the prognosis of hepatobiliary cancer remains poor. Recent advances in glycan-detection approaches have accelerated interest in clinical glycoproteomics for the discovery of detection markers or therapeutic targets for chronic disease and cancer. Here, we identified circulating N-glycoprotein/N-glycoform markers to help early diagnosis, monitoring of disease progression, and

204
T-T Chang et al
One major challenge in mass spectrometry-based clinical glycoproteomics is to quantify native proteins in specimens in a label-free manner. Several algorithms have been proposed to estimate proteinabundance accompanied by large-scale validation results including the spectral count and its derivatives [23][24][25]. Rappsilber et al first posed the PAI method [26], which evaluates the number of peptides observed from a protein relative to the total number of observable peptides. However, the length and amino acid composition of peptides, and ionization efficiency, and so on may disturb the observability of peptide fragments by the mass spectrometer. Later reported by Ishihama et al, emPAI estimated protein amount in proteomics by the number of sequenced peptides per protein [18] and showed a satisfactory correlation with the actual protein amount in complex mixtures. In 2010, Shinoda et al presented emPAI %, a powerful and accurate calculation method for acquiring a relative content of individual proteins [19].

T-T Chang et al
Herein, complement C3, apolipoprotein C-III, and galectin-3-binding protein were selected under this algorithm and their emPAI % values showed a high correspondence to the actual protein concentrations. Using emPAI % as a screening platform, though it may result in a loss of target detection, holds great potentials for the application of plasma proteome to routine laboratory tests, especially when we currently have not been able to quantify whole proteins in specimens. We identified 57 differential proteins in hepatobiliary cancers. It is easy to understand the downregulation of proteins produced by the liver, such as albumin and serotransferrin, as a result of impaired liver function during hepatocarcinogenesis. We also observed the upregulation of two glycosylation-related enzymes, UGT8, and UGGT2, in the plasma samples of the patients. UGT8 promotes the biosynthesis of galactocerebrosides, which are abundant sphingolipids of the myelin membrane of the central and peripheral nervous systems. UGGT2 transfers a glucose monomer to the misfolded glycoproteins, thus providing quality control for protein transport out of the endoplasmic reticulum. Currently, there is no evidence to prove a relationship between hepatobiliary cancers and UGT8 or UGGT2. However, indirect effects of UGT8 and UGGT2 upregulation on the change of the liver microenvironment and tumorigenic events of hepatocytes or cholangiocytes might be suspected.
Four proteins closely related to tumor progression and prognosis of hepatobiliary malignancies were found. Galectin-3-binding protein, also named MAC-2-binding protein, is known to mediate cell-to-cell adhesion and initiate pathologic, proinflammatory responses [27][28][29]. Enhanced galectin-3-binding protein expression has been linked to poor prognosis in colorectal cancer, diffuse large B-cell lymphoma, and lung cancer [30][31][32][33]. It has also been reported to be an accurate diagnostic marker for CCA [34], which is consistent with our finding. A novel marker we identified for cHCC-CCA is 72 kDa inositol polyphosphate 5-phosphatase, which is involved in intracellular

208
T-T Chang et al 209 Plasma N-glycoproteome for HCC, CCA, and cHCC-CCA calcium mobilization, insulin-related signal transduction, and glucose homeostasis. The pathological roles and detailed mechanisms of 72 kDa inositol polyphosphate 5-phosphatase on cHCC-CCA need to be further addressed. Regarding HCC, the first marker apolipoprotein C-III is a major structural component of very-low-density lipoprotein but is also present in chylomicrons and high-density lipoprotein. It inhibits lipoprotein lipase and hepatic lipase and promotes the assembly and secretion of very-low-density lipoprotein particles from hepatic cells [35]. Apolipoprotein C-III has attracted much attention owing to its relationship with hyperlipoproteinemia and fatty liver disease but yet directly touches HCC. The other HCC marker complement C3 is a front and center factor of classical, alternative, or lectin pathways of the complement system. Cleaved complement C3 triggers activation of the complement cascade, which augments host immune functions including lysis of bacteria and cells by forming membrane-attack complex, opsonization, and chemotaxis of leukocytes [36]. It is not surprising that downregulation of complement C3 was detected in patients with HCC [37,38] because of their compromised immune system during hepatocarcinogenesis. However, beyond empirical speculation, we found that the level of complement C3 correlated positively with poor differentiation of tumor cells and an unfavorable prognosis of HCC. A growing body of evidence supports roles for activated components of the complement system in various aspects of carcinogenesis, including chronic inflammation, tumor immunoescape, tumor cell proliferation, angiogenesis, and tumor invasion [39][40][41]. Moreover, complement inhibition-related therapeutic strategies for cancer treatment have been designed [42,43]. Given the above, it could be expected that complement C3-targeted inhibitors, such as APL-2 and compstatin, may find application in cancer pharmaceutics in the future.
The postproteomic glycan analysis herein primarily focused on Asn85 of complement C3 because apolipoprotein C-III and 72 kDa inositol polyphosphate 5-phosphatase were not N-glycoproteins. Moreover, other glycopeptide fragments belonging to complement C3 and galectin-3 binding protein did not possess enough signal intensity for high-resolution glycan analyses under a whole plasma proteome. Our data are akin to previous reports showing that mainly highmannose sugar chains cover human complement C3 protein; Man5 or Man6 on Asn85 and Man8 or Man9 on Asn [44][45][46]. One can easily understand that the C3 gene is highly conserved among species owing to its importance in the immune system [47]. Nonetheless, it is intriguing to detect human complement C3 proteins that are equipped with high-mannose glycans since high-mannose type structures on mature proteins are usually present in lower eukaryotes but are rarely found in higher eukaryotes except the precursor oligosaccharides during glycan biosynthesis. More studies are needed to comprehend why high-mannose N-glycans are retained in the human complement C3 protein. In addition, a secondary structural model proposes that all three Asn residues on complement C3 are part of reverse turns [48]. Accordingly, it is plausible to assume that alteration of glycan composition at Asn85 in patients with HCC has a profound effect on the interaction of complement C3 with other factors, thereby contributing to hepatocellular carcinogenesis.
Taken together, our findings, in the context of HCC, CCA, and cHCC-HCC, may enable new insight and foresight on the diagnosis, monitoring of tumor progression, and prognosis of different hepatobiliary cancers. With the continued emergence of new biotechnologies beyond the realm of the glycoproteome, validation cohorts, even clinical application, of inhibitors or antagonists of these tumor markers for the treatment of hepatobiliary cancers may be developed soon.
quantification. C-HH was responsible for the experimental performance, data analyses, and manuscript writing.

Availability of data and material
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier (Accession number: PXD013629; Username: reviewer19947@ebi. ac.uk; Password: 7Lmoj8AB). Other data are available from the corresponding author on reasonable request. Figure S1. Flowchart of the study design   Table S1. Comparisons of characteristics of the patients with different types of hepatobiliary cancer Table S2. Comparison of differential protein content between patients with hepatobiliary cancers and controls Table S3. Changes in mRNA levels of differential proteins in hepatobiliary cancers Table S4. Comparison of differential protein content among different hepatobiliary cancers Table S5. Comparison of differential protein content between patients with intrahepatic CCA and patients with perihilar CCA Table S6. Relationship between age and differential protein content in patients with hepatobiliary cancers