A proof-of-principle gel-free proteomics strategy for the identification of predictive biomarkers for the onset of pre-eclampsia

Authors


Dr RT Blankley, Manchester Interdisciplinary Biocentre, University of Manchester, 131 Princess Street, Manchester M1 7DN, UK. Email richard.blankley-2@manchester.ac.uk

Abstract

Objective  Progress in the prevention and treatment of women at risk of pre-eclampsia (PE) still remains hindered by the lack of clinical screening tools that can accurately predict which mothers are at risk. The identification and validation of predictive biomarkers is therefore seen as a critical milestone towards improved healthcare provision and the clinical testing of new therapeutic strategies. Gel-free proteomic technologies offer the capability of analysing hundreds of plasma proteins simultaneously, but as yet these methods have not been applied to pregnancy complications. To assess the feasibility of such an approach to plasma biomarker research in pregnancy we have applied the technique to samples from women with PE to gestation-matched controls.

Sample  Pooled plasma samples taken at time of disease from women with PE (n = 23) and gestation-matched controls (n = 23).

Methods  Proteomics strategy for relative quantification of proteins using mass spectrometry.

Results  We identified several differences, including elevated levels of endoglin, PAPP-A and PSG1 in PE plasma. Increased levels of endoglin were validated using immunoassay analysis of individual plasma samples.

Conclusions  Although at a relatively early stage, this mass spectrometry-based approach shows promise as a tool to identify global protein changes in plasma. The application of these methods to pre-disease samples is the next step in the identification of clinically useful biomarkers.

Introduction

Pre-eclampsia (PE) is a severe pregnancy-associated disorder affecting 3–5% of pregnancies.1 To date the only effective clinical intervention is to induce delivery of the fetus and placenta. The testing of potential therapeutic agents has been hindered by the lack of a sensitive early pregnancy screening test capable of discriminating affected from unaffected pregnancies. The availability of a test to identify a cohort of women at high risk of developing PE would allow a targeted clinical trial of potential therapies, compared with assessing such prevention in a low-risk population. Changes in feto-placental function have been documented in early pregnancy in women who go on to develop PE, therefore, it is perceived that a screening test would be most useful at a gestation where intervention is likely to be of benefit, i.e. before 20 weeks.

Although several proteins have been shown to be altered in PE, some prior to the onset of clinical disease, none of these have yet been found to have the sensitivity necessary to be useful in a clinical setting.2 Despite the demonstration of statistically significant differences in several candidate markers, their use as effective screening tools has been impeded by considerable overlap in the reference ranges between women with normal and abnormal outcome. Clinically useful predictive tests for PE are most likely to come from combined measurements of several markers; these markers may be peptides/proteins, nucleic acids or cellular metabolites. Mass spectrometry-based methods facilitate an unbiased analysis of multiple proteins simultaneously,3,4 therefore, their place alongside hypothesis-driven research could be invaluable in identifying clinically useful biomarkers in addition to providing novel insights into the pathogenesis of PE.

To our knowledge, the potential of gel-free proteomics methods have not yet been explored in the context of PE; here, we describe a preliminary study to test the applicability of such methods to the identification of differentially expressed proteins in this condition.

Proteomics-based biomarker discovery

The identification and quantification of changes in either protein abundance or protein structure between case and control samples is fundamental to biomarker discovery. A significant proportion of biomarker discovery papers in the current literature use two dimensional (2D) gel-based proteomics approaches. These methods are conceptually simple; proteins are separated by pI (isoelectric point) in the first dimension and molecular weight in the second. When gels from different samples are compared it is possible to identify gel spots that are different between samples. A related technique allows fluorescent labelling of proteins such that two samples can be mixed and compared on a single gel (DIGE).5 Gel spots of interest are excised and trypsin digested before being subjected to mass spectrometry analysis to determine the identity of the protein(s) which compose the spots. A clear advantage to this technique is that mass spectrometry analysis is targeted towards a small number of differentially expressed proteins. However, the technique does have inherent problems in that it is labour intensive, has poor reproducibility and is limited to the detection of highly abundant plasma proteins. It remains to be seen whether this class of plasma proteins will have the sensitivity for early disease detection and the disease selectivity that are required for clinically useful biomarkers.

A family of closely related gel-free proteomics strategies represent state-of-the-art mass spectrometry technology. In these strategies, extensive protein/peptide fractionation occurs using liquid chromatography (LC) columns such that the peptides/proteins always remain in solution. Removal (immunodepletion) of abundant plasma proteins followed by extensive fractionation/separation steps is a vital step in the biomarker discovery process because the exceptionally complex plasma proteome spans at least ten orders of magnitude in concentration; this massive dynamic range is the major challenge to mass spectrometry-based biomarker discovery.6 While the depletion of proteins bound to albumin is undesired, the extra proteome penetration achieved following abundant protein removal more than compensates for these losses.

Gel-free proteomics approaches analogous to the gel-based DIGE technique are now routinely applied to relative quantification of proteins across multiple samples. Here, we have employed a popular labelling chemistry called iTRAQ (isobaric tagging for relative and absolute quantification) which allows the simultaneous relative quantification of proteins in up to eight samples using a tandem mass spectrometer.7,8 In contrast to the 2D gel methods, the labelling and fractionation occurs at the peptide level rather than the protein level, that is, to say that the mixture of plasma proteins is digested with trypsin protease before the LC fractionation. Although this increases the complexity of the analytical mixture, the performance of modern mass spectrometers is such that these ‘shotgun proteomics’ approaches allow a plasma proteome penetration much deeper than that achieved using gel-based techniques. In this study, labelled peptides were separated using LC and spotted directly onto steel target plates prior to their analysis using a MALDI-TOF/TOF (matrix-assisted laser desorption and ionisation – time-of-flight/time-of-flight) mass spectrometer. This LC-MALDI approach has allowed us to identify and quantify plasma proteins that are well below the level of detection using 2D gels.

Here, we have used iTRAQ, a commercially available peptide labelling kit, for the relative quantification of plasma proteins from patients diagnosed with PE compared with matched controls. Pooled, time-of-disease samples have been used in this preliminary study to ensure large protein differences between disease and control samples and to facilitate the analysis of a significant number of case and control samples (n = 23 each).

Materials and methods

Plasma samples

Study group

Local Research Ethics Committees gave approval for this work and written informed consent was obtained from all women. Patients were recruited as part of a longitudinal study as previously described9 and women with a pre-existing medical disease were not included. Blood samples were taken at diagnosis from women who developed PE (n = 23) and from normotensive controls (n = 23), matched for gestation at sampling and parity. PE was diagnosed using standard definitions from the International Society for the Study of Hypertension in Pregnancy.10

Plasma samples

Blood samples, collected in pre-cooled EDTA vials, were centrifuged for 15 minutes at 3000 g at 4°C and the plasma removed. Aliquots of pooled plasma from the PE and control groups were created and stored at −80°C.

Sample preparation

Pooled plasma samples were immuno-depleted of the 12 most abundant plasma proteins using an IgY-LC2 HPLC column (Beckman) following the manufacturers protocol. The immunodepleted plasma was concentrated using a 5 kD MWCO spin filter and each sample passed down the IgY-LC2 column a second time to remove all traces of the 12 abundant proteins. The immunodepleted plasma samples were concentrated and buffer exchanged into 0.5 m triethylammonium bicarbonate buffer, pH 8.5 using a 5 kD MWCO spin filter unit (Agilent, Stockport, UK). Protein concentration was then determined using a Bradford assay (Pierce, Cramlington, UK).

iTRAQ labelling

A schematic illustrating the iTRAQ experimental workflow is shown in Figure 1. The normal pregnancy (NP) and PE complicated pregnancy (PE) plasma samples were labelled in duplicate. Eighty microgram of each protein sample was reduced, alkylated and trypsin digested exactly as directed in the iTRAQ kit protocol. The pooled NP sample was divided in half and labelled with iTRAQ reagents 114 and 116, whereas the PE sample was split and labelled with the 115 and 117 reagents. The four samples were then mixed and dried in a Speed-Vac concentrator (Eppendorf, Cambridge, UK).

Figure 1.

 iTRAQ workflow and example tandem mass spectrum. (A) Schematic of the iTRAQ workflow. (B) iTRAQ isobaric tag structure showing the reporter (114–117) and balance (28–31) segments of the isobaric tags which are covalently attached to peptides via an amine reactive group. (C) An example tandem MS spectrum from the data set with zoomed low mass iTRAQ reporter region. Peptide has been assigned to a unique peptide from endoglin (GEVTYTTSQVSK), the theoretical Y and B series fragment ions for this sequence are marked on the spectrum.

Peptide fractionation

Labelled peptides were initially fractionated by off-line strong cation exchange chromatography (SCX) using a Polysulfoethyl column (PolyLC, Columbia, MD, USA) with the following mobile phases: Buffer A, 20% acetonitrile (ACN), 0.1% formic acid (FA), Buffer B, 20% ACN, 0.1% FA, 1 m KCl. A 60-minute linear gradient from 0% to 60% B was used to elute the peptides during which 90-second fractions were collected. Thirty-two SCX fractions were collected and dried using a speed-vac concentrator before being stored at −20°C.

LC-MALDI workflow

Each SCX fraction was separated by reverse-phase LC using an Ultimate 3000 nano-LC system (Dionex, Sunnyvale, CA, USA) with a Pepmap 100 C18 column (15 cm, 180 μm i.d. 100A particle size; Dionex). The eluent from the nano-LC was connected to a Probot fraction collector/target spotting robot (LC Packings, San Francisco, CA, USA). MALDI matrix (5 mg/ml HCCA in 70% ACN, 0.1% TFA) was delivered from the Probot syringe driver at 0.7 μl/minute and mixed with the nano-LC eluent. Samples were spotted out at a density of 1664 sample spots per target plate with 30 second elution times per spot. Using this pattern eight SCX fractions were spotted per target plate.

Mass spectrometry

Mass spectrometry was performed using an ABI4800 MALDI-TOF/TOF (Applied Biosystems, Foster City, CA, USA). MALDI target plates were calibrated using a standard peptide mix; additionally an internal MS calibration was performed on per-spot basis using peptide standards spiked into the MALDI matrix (GluFib, m/z = 1570.6 and Insulin beta-chain oxidised, m/z = 3494.6, both from Sigma, Poole, UK). Laser power and tuning parameters were optimized for each sample plate.

Data analysis

The mass spectrometry data were analysed using ProteinPilot v2.0 (Applied Biosystems); peptide identifications were made using the Paragon algorithm11 searching against the Uniprot human protein database. We used a 99% confidence interval cut-off for significant peptide identifications and allowed single-peptide protein identifications. The false positive protein identification rate was determined by searching all peptide data against a decoy database containing both forward and reversed protein sequences. Relative quantification was carried out using ProteinPilot v2.0 software (Applied Biosystems, Warrington, UK).

ELISA immunoassays

A human endoglin ELISA kit (R&D Systems Europe, UK) was used to determine the plasma levels of endoglin protein. Individual plasma samples were diluted 1:10 with dilution buffer and 50 μl used for each reading. In all ELISA assays, the experimental scientist was blinded to the clinical outcome of the individual plasma samples during all stages of analysis. ELISAs for human clusterin (Adipogen, Seoul, Korea) and PAPP-A (R&D Systems Europe, UK) were also performed using the provided protocols.

Results

Patient characteristics

Women who developed PE had significantly elevated blood pressures (by definition) and delivered lower birthweight babies (see Table 1). Women with PE were delivered at 36 (27–41) weeks gestation and plasma samples were obtained 5 days (range −7 to 8) within a diagnosis of PE being made. Control samples (n = 23) were obtained from women with uncomplicated pregnancies who delivered at term (>37 weeks) who were matched for gestation at sampling (±2 weeks) and parity.

Table 1.   Demographic data for sample cohort
 Pre-eclampsia plasma samples (n = 23)Normal outcome plasma samples (n = 23)
  1. *P < 0.05 Mann Whitney U test.

  2. **Individualised birthweight ratio.

Max (S) BP (mmHg)152* (130–199)124 (96–140)
Max (D) BP (mmHg)103* (84–126)78 (60–90)
Delivery gestation (weeks + days)36+6* (26+5–40+5)40+5 (38+4–41+6)
Birthweight (g)2285* (590–4780)3480 (2670–4380)
IBR** (centile)   16* (1–100)38 (12–99)
Parity     0 (0–2)0 (0–2)

Relative quantification of plasma proteins

The ProteinPilot software was used for the identification and relative quantification of iTRAQ labelled peptides. The ProteinPilot software calculated the aggregate false discovery rate (estimated % of misidentified proteins) to be 2.01%. This allowed us to identify proteins that were significantly (P < 0.05) over- or under-represented in the PE pooled plasma compared with the NP control pool (Table 2). Most of the proteins in which a significant difference in abundance was recorded are high abundance plasma proteins, including complement cascade proteins, which exist at μg/ml to mg/ml levels. Examples of differentially expressed proteins included vitronectin, inter-alpha inhibitor and clusterin. Vitamin D binding protein is found on the lists of over- and under-represented proteins with different protein database accession numbers. A careful manual analysis of the peptide data from this protein revealed that the majority of peptides were under-represented in the PE plasma, whereas three peptides matching to a different allele (GC2, T420K) were at relative higher abundance in the PE plasma.

Table 2.   Proteins under- or over-represented in the pooled PE plasma
Under-represented in PE plasmaOver-represented in PE plasma
  1. Table of proteins which show a statistically significant (P < 0.05) difference in their abundance between the pooled normal pregnancy plasma and the pooled pre-eclampsia (PE) plasma. Uniprot (http://www.uniprot.org) accession numbers are provided for each protein identification. Significance level calculated using ProteinPilot 2.0 software. Some proteins of interest for which there are too few peptide identifications to achieve the 5% significance level are shown in Figure 2.

UniprotNameUniprotName
P01023Alpha-2-macroglobulinQ7Z600Apolipoprotein B
Q6GTG1Vitamin D binding proteinP19827Inter-alpha inhibitor H1
P04004Vitronectin precursor|P19823Inter-alpha inhibitor H2
P43652Afamin precursor (Alpha-albumin)Q3B7H5Inter-alpha inhibitor H3
Q59EH1Fibronectin 1P13671Complement C6
P07477Trypsin-1P10643Complement C7
Q5T5G4Extracellular matrix protein 1Q5T7J4PAPP-A
P02747Complement C1qP22891Vitamin K-dependent protein Z
P05155Plasma protease C1 inhibitorP09871Complement C1s
P02765Fetuin-AP04278Sex hormone-binding globulin
O43345Zinc finger protein 208P10909Clusterin
Q6U2E9Complement C4BP00742Coagulation factor X
  Q6UPU6Coagulation factor V
  P35858Insulin-like growth factor binding protein complex acid labile chain precursor (ALS)
  A55181Pregnancy-specific B-1 glycoprotein 11
  Q6ICR4Pregnancy-specific B-1 glycoprotein 1
  P02774Vitamin D binding protein
  P02743Serum amyloid P-component
  Q53GZ8Complement C2
  Q6LEU7Pregnancy-specific glycoprotein 9
  Q6B0J6Paraoxonase 1
  P32119Peroxiredoxin-2
  P15169Carboxypeptidase N catalytic chain

Several pregnancy-specific or pregnancy-enriched proteins were identified as showing a statistically significant increase in PE plasma compared with the NP plasma (Table 2). These included PAPP-A (pregnancy-associated plasma protein-A), PSG1 (pregnancy-specific β-1 glycoprotein 1) and the closely related PSG9 (pregnancy-specific β-1 glycoprotein 9). Gene ontology tools were used to address whether any pathways or molecular functions were enriched in the lists of differentially regulated proteins, but they were not informative. In our experience, these tools (closely related to KEGG enzyme classifications) are more useful for interrogating lists of intra-cellular proteins.

Modest differences in the relative levels of highly abundant proteins were statistically significant because of a large number of data points (peptides) and the reproducibility of the technique.12 An inspection of the total relative quantification data set (>25 000 peptides) revealed some lower abundance proteins that are of biological interest that show expression changes below the P < 0.05 significance threshold. In some cases, there was evidence for a relative abundance difference, but not enough unique peptide identifications to reach the calculated significance threshold. Examples of this from our data set included endoglin (two unique peptide assignments) and placental lactogen (three unique peptide assignments). Both proteins showed an increase in the PE plasma compared with the NP plasma (Figure 2). Pregnancy zone protein (PZP) is an example of a protein that was under expressed in the PE plasma compared with NP plasma (Figure 2).

Figure 2.

 Log plot showing relative abundance levels of peptides of interest in the pooled plasma. Log2 plot of iTRAQ reporter ion ratios (115:114 and 117:116) for unique peptides (99% confidence level) from eight proteins of interest. For clarity a maximum of ten peptides (20 data points) are shown per protein. Inset shows a partial sequence alignment of two common alleles of Vitamin D binding protein, the T436K substitution introduces a new trypsin cleavage site. The peptide unique to the GC2 allele was found to be over-represented in the PE plasma, whereas the GC1 peptide had the opposite expression profile. P.Lactogen – Placental lactogen, PBP – Platelet basic protein, Vit D BP – Vitamin D binding protein, PSG1 – Pregnancy-specific β-1 glycoprotein, PZP – Pregnancy zone protein.

Validation of iTRAQ results

As a means to validate the iTRAQ relative quantification results, immunoassays were performed for two proteins; endoglin for which there was evidence for a relative increase but a paucity of peptide data, and clusterin for which we measured a statistically significant abundance difference. ELISA assays were performed on a subset of the individual plasma samples that contributed towards the pool (n, PE = 9, NP = 11).

The plasma levels of endoglin were significantly higher (P = 0.0014) in the individual PE samples compared with the NP samples, median and interquartile ranges measured using the ELISA were 41.8, 27.9–50.8 ng/ml versus 16.0, 7.9–24.3 ng/ml (Figure 3). The technical variability was determined by analysing the same plasma sample four times, the resulting coefficient of variation (CV) was 9.4%.

Figure 3.

 Immunoassay analysis of endoglin levels in individual patient samples. Plasma levels of endoglin (CD105) were determined for 20 individual samples that contributed towards the pooled plasma samples analysed using iTRAQ. Horizontal bars indicate median values for each sample set (n, NP = 11, PE = 9). There is a statistically significant (P < 0.01) difference in endoglin levels between the two sample sets (Mann–Whitney test).

Using commercial ELISA kits for both PAPP-A and clusterin, we recorded higher median values for protein levels in the PE samples compared with NP samples, but the differences were not significant (P > 0.05) (data not shown). Following the manufacturer’s protocols, we were unable to obtain acceptable sample replicate CV values (31% clusterin and 24% PAPP-A).

Discussion

Relative quantification using iTRAQ and LC-MALDI mass spectrometry

The aim of this proof-of-principle study was to apply a proteomics technique to identify and quantify protein differences between pooled plasma samples from women with established PE and matched controls. Time-of-disease samples were chosen for analysis as there should be significant changes between the plasma proteome of the PE and NP samples. Identification and validation of these differences gives us the confidence to apply the same strategy to early pregnancy samples in which the changes are likely to be much more subtle.

The gel-free proteomics strategy applied here has several advantages over 2-D gel-based methods; (i) it allows a deeper plasma proteome penetration by identifying proteins which occur at lower concentrations, (ii) the identification and quantification of proteins is achieved simultaneously, (iii) the methods are more robust and simple to achieve for new users. Although not exploited here, the same iTRAQ strategy can now be used for the relative quantification of up to eight samples simultaneously.8 For the purposes of this preliminary study, we used pooled samples run in duplicate, which allowed the analysis of a larger number of patients in a practical timeframe and also provided an important assessment of the technical variability. However, the multiplex nature of the new isobaric tagging strategies (e.g. iTRAQ, ExacTag, TMT) offer the opportunity to analyse individual samples where pooling is not possible or desired; this opportunity is not as readily afforded by any gel-based proteomics strategies.

Using the iTRAQ strategy, we identified significant (P < 0.05) differences in the abundance of >20 proteins. Changes in complement cascade and immune activation proteins are perhaps unsurprising given the severe and multisystemic nature of PE.13 An important caveat to this finding is that the distinction between statistically significant and biologically significant changes is one which is beyond the scope of any proteomics experiment.

Increased levels of PAPP-A (a zinc-binding metalloprotease) in maternal plasma in women suffering PE has been identified previously.14,15 Interestingly, in one study, serum PAPP-A levels were measured as being reduced in the first trimester of women who go on to develop PE in comparison with controls who develop no complications.16 Pregnancy-specific glycoproteins have also been described in the literature as markers for poor pregnancy outcomes, including PE.17

Many of the proteins with significant abundance differences between the NP and PE plasma are classic plasma proteins that exist in relatively high concentration in the blood. This class of proteins could probably have been identified using an exhaustive gel-based relative quantification strategy. For example clusterin, identified here as a protein over-represented in PE plasma, was the only protein identified and validated in a gel-based analysis of pooled NP plasma versus PE plasma.18 The advantage of the iTRAQ approach is that lower abundance proteins are more likely to be identified; this is demonstrated here by the identification of endoglin and placental lactogen. However, the sheer complexity of plasma means that the number of lower abundance proteins (<μg/ml) identified using iTRAQ is still far from ideal. The continued experimental challenge is to increase the enrichment or sequence coverage of these types of proteins so as to obtain better relative quantification data on them. These improvements are likely to come from refinements of the sample pre-fractionation strategy and the use of faster acquisition mass spectrometers.

Proteomics as a hypothesis-generating tool

The observation of potential allele-specific differences in Vitamin D binding protein levels between the PE and control plasma is included here to demonstrate the power of mass spectrometry as a hypothesis generating tool. Interestingly, this data supports some observations from a disc electrophoresis comparison of PE and control plasma.19 This observation also demonstrates the potential of this proteomics workflow to detect differences in isoform expression as well as the potential pitfall of interpreting isoform differences as relative abundance changes if the data is not carefully scrutinised.

Endoglin is over-represented in PE plasma

Several candidate markers for predicting PE exist in the literature, all of which were identified using hypothesis-driven experiments.2 The identification and relative quantification of endoglin in our global proteomics analysis is therefore very encouraging; it is evidence that we are penetrating the plasma proteome to a level at which biologically interesting molecules can be quantified. Several studies have recently used immunoassays to measure increased protein levels of endoglin in maternal serum from pregnancies complicated by PE compared with controls.20–23 It should be noted, however, that other putative markers for PE such as sFlt, PP13 and inhibin A were not identified at all in our proteomics screen. Given that there are tens of thousands of different proteins in plasma it is clear that all current strategies are massively under-sampling the total proteome, in this regard it is no surprise that selected low abundance proteins were not identified.

The ELISA measurements of endoglin in individual plasma samples validate our proteomics approach in several ways. First, the relatively modest change in endoglin levels that we detected in pooled plasma using the iTRAQ quantification was validated using an independent quantification method. Secondly, the decision to pool plasma samples based on phenotype (detailed pregnancy outcome data) is validated through the result that altered endoglin levels in the pooled plasma were reflected in individual sample analysis.

The ELISA measurements of clusterin and PAPP-A levels highlight the potential for discordance between our mass spectrometry-based quantification strategy and antibody-based quantification methods. The relatively high sample replicate CV values for the clusterin and PAPP-A ELISAs and low numbers of available samples hampered our efforts to demonstrate significant differences in protein levels. In a previous study in which clusterin was identified as a marker for PE18, the validation was performed using a monoclonal Ab assay compared to the polyclonal Ab used in our ELISA. The larger sample sizes (n = 80) and lower technical variability in that study allowed the demonstration of significant differences in protein levels between the NP and PE groups at time of disease.

While, there was some overlap in the plasma endoglin levels between the PE and NP groups we have shown there to be a significant difference in median levels; despite extensive efforts this was not the case for clusterin or PAPP-A. The overlapping reference ranges in all proteins (and indeed almost all potential biomarker candidates in the literature) demonstrate the drawbacks in trying to find a single biomarker that is predictive or diagnostic for PE.

Proteomics as a tool for clinical biomarker identification

To our knowledge this is the first study in which a gel-free proteomics strategy has been applied to the identification of proteins which are differentially expressed in PE. Despite global attention, the field of proteomics-based biomarker discovery has thus far failed to deliver on the expectation in terms of the identification and validation of markers that have made the transition into the clinic. While, we have demonstrated some encouraging findings which provide grounds for optimism, this work also highlights the need for improvements in the proteomics instruments and workflows that will be required to deliver success.

Disclosure of interest

None of the authors have any conflicting interests to declare.

Contribution to authorship

All of the named authors satisfy the criteria for authorship and have approved the final version of the manuscript.

Details of ethics approval

Ethical approval was obtained from the Manchester Local Research Ethics Committees (Ref 03/TG/140).

Funding

This work was supported by a University of Manchester Stepping Stones award to JEM and Tommy’s the Baby Charity.

Acknowledgements

We thank Maureen Macleod who recruited patients and collected all plasma samples, and the patients at Ninewells Hospital, Dundee who participated in this study. Particular thanks go to the staff at University of Dundee and University of Manchester who aided in sample collection and processing. This work was supported by the NIHR Biomedical Research Centre.

Ancillary