Challenges and opportunities for discovery of disease biomarkers using urine proteomics


  • Alex Kentsis

    1. Division of Hematology/Oncology, Children's Hospital Boston, and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts, USA
    Search for more papers by this author

Alex Kentsis, MD, PhD, 44 Binney St, Boston, MA 02115, USA. Email:


Modern medicine has experienced a tremendous explosion in knowledge about disease pathophysiology, gained largely from understanding the molecular biology of human disease. Recent advances in mass spectrometry and proteomics now allow for simultaneous identification and quantification of thousands of unique proteins and peptides in complex biological tissues and fluids. In particular, proteomic studies of urine benefit from urine's less complex composition as compared to serum and tissues, and have been used successfully to discover novel markers of a variety of infectious, autoimmune, oncological, and surgical conditions. This perspective discusses the challenges of such studies that stem from the compositional variability and complexity of human urine, as well as instrumental sampling limitations and the effects of noise and selection bias. Strategies for the design of observational clinical trials, physical and chemical fractionation of urine specimens, mass spectrometry analysis, and functional data annotation are outlined. Rigorous translational investigations using urine proteomics are likely to discover novel and accurate markers of both rare and common diseases. This should aid the diagnosis, improve stratification of therapy, and identify novel therapeutic targets for a variety of childhood and adult diseases, all of which will be essential for the development of personalized and predictive medicine of the future.

Modern medicine has experienced a tremendous explosion in knowledge about disease pathophysiology, gained largely from deeper understanding of the molecular biology of human disease. However, this progress has been limited for many common and rare diseases that affect adults and children alike. For example, acute appendicitis is a common disease of children and the most common surgical emergency that continues to escape correct diagnosis and timely treatment, even by most seasoned physicians, leading to unnecessary appendectomies in cases of false positive diagnosis and ruptures in cases of delayed or false negative diagnosis.1 Likewise, Kawasaki disease is a rare systemic vasculitis that, in spite of intensive study, remains a syndrome without specific diagnostic markers or pathophysiologically targeted therapies.2 The fundamental premise of the recent application of proteomics to the study of human disease is that in-depth investigation of the protein composition of human tissues and fluids will enable the discovery of novel and accurate markers of disease that will aid diagnosis, allow for therapy stratification and monitoring for effects of treatment, and ultimately identify new therapeutic targets.

Why urine?

By virtue of tissue perfusion, blood serum is the most generally useful material for the discovery of biomarkers. However, the relatively high concentration of serum proteins, as well as their wide concentration range, spanning at least nine orders of magnitude, often limit the study of serum biomarkers.3 Of the human body fluids amenable to routine clinical evaluation, urine has the advantage of being obtained frequently and non-invasively. It is relatively abundant, and as a result of being a filtrate of serum, relatively simple in composition.4,5 Furthermore, studies of urine may permit detection of species that are rapidly eliminated from circulation by virtue of their biological properties, and therefore difficult to detect in serum, such as hormones and cytokines vide infra.

Mass spectrometry proteomics

Recently, several related approaches using mass spectrometry proteomics have been used to investigate the composition of human urine with ever increasing accuracy and depth. Though protein-based mass spectrometry (so-called top-down MS) is far advanced, studies of human urine have been carried out using peptide (bottom-up) MS that takes advantage of peptide identification using tandem MS and automated statistical database matching; for a recent review see Mallick and Kuster.6

One approach to measure the urine proteome involves physicochemical methods to capture and fractionate urinary proteins, and liquid chromatography (LC) and tandem mass spectrometry (MS/MS) to sequence peptides for protein identification using searches of known human proteins. Using ultracentrifugation to fractionate urine proteins based on molecular weight and LC-MS/MS for protein identification, Knepper and colleagues identified 295 proteins in urinary exosomes, and more than 1000 proteins in total7,8 ( Using ultrafiltration and denaturing electrophoresis for protein fractionation and LC-MS/MS using an LTQ-Orbitrap hybrid spectrometer, Mann and colleagues identified more than 1500 proteins9 ( Zeng and colleagues used protein precipitation and ion exchange chromatography and LTQ-Orbitrap LC-MS/MS to identify 1300 proteins.10 By combining ultracentrifugation, protein precipitation, and ion exchange chromatography for protein capture and fractionation and LTQ-Orbitrap LC-MS/MS for peptide sequencing, we identified more than 2300 unique proteins in routinely collected clinical urine specimens11 (

Another approach involves identification of candidate markers using MS or differential electrophoresis (DIGE), and subsequent identification of candidates using LC-MS/MS sequencing. DIGE-based investigations identified hundreds of protein spots in urine,12 though most of them remain unidentified.13 Recently, Mischak and colleagues used a combination of ultrafiltration and capillary electrophoresis MS to detect several thousand peptides in clinical urine specimens.14 Likewise, Zucht and colleagues applied a differential display approach and LC-MS/MS to detect thousands of peptides in human urine.15

Notably, the Human Proteome Organization (HUPO) has established the Human Kidney and Urine Proteome Project (HKUPP) to advance these approaches ( Indeed, much work remains to be done. In particular, the human proteome needs to be defined with respect to its size, tissue origin, chemical modifications, such as proteolysis and phosphorylation, and physical components, such as exosomes, not to mention their relationships to physiological and disease states.

Urine proteomics of human disease

Notwithstanding these questions, methods to profile urine with increasing depth reaching several thousands of unique peptides and proteins have been applied to discover markers of a variety of common and rare diseases, involving both the urogenital tract, as well as distal organs, whose proteins are glomerularly filtered from serum. This has to do with the fact that about 90% of urinary proteins are expressed in the urogenital tract, and therefore likely originate in the kidney and bladder.11 Specific investigations have uncovered novel markers of renal transplant rejection,16 acute kidney injury,17 nephritis,18 nephrotic syndrome,19 diabetic nephropathy,20 and ureteropelvic junction obstruction.21 Likewise, urine proteomic studies have described markers of bladder carcinoma,22 cystitis,23 and prostate carcinoma.24

Urine proteomics has also been applied to the discovery of markers of non-urogenital disease, including coronary artery disease,25 ovarian cancer,26 pre-eclampsia,27 and pancreatitis.28 Indeed, about 10% of proteins detected by deep profiling of the human urinary proteome using physicochemical protein capture and LC-MS/MS have no reported expression in the urogenital tract, suggesting that they are likely found in urine as a result of glomerular filtration from serum.11

In our study, these filtered proteins were found to be uniquely expressed in the nervous system, heart and vasculature, lung, blood and bone marrow, intestine, liver and other abdominal visceral organs. A cross-referenced list of urinary proteins and their patterns of tissue expression is available at In particular, we applied this approach to investigate the composition of urine of children suspected to have acute appendicitis, and discovered several apparently specific urine protein markers, including leucine-rich α-2-glycoprotein (LRG).29 LRG appears to be produced by neutrophils in inflamed appendices, and though it can be detected circulating in the serum, its enrichment in patients with appendicitis appears to be far greater in urine than serum, suggesting that it is rapidly cleared from circulation.29 These results suggest that the combination of biochemical and biophysical methods to fractionate and capture urinary (as opposed to blood serum) proteins and high accuracy mass spectrometry to identify them is a promising approach to discover clinically useful biomarkers for a variety of diseases.

Sampling limitations of mass spectrometry

In spite of these promising results, specific features of current urine proteomics approaches also present particular challenges to rigorous investigation. First, protein identification using mass spectrometry is dependent on tandem MS measurements, which are fundamentally limited by the instrumental (electronic) duty cycle time (currently about 10 Hz, corresponding to 10 spectra per second). Consequently, measurements of complex protein mixtures, such as urine (containing hundreds of thousands to millions of peptides usually resolved over a few hours, e.g. 100–1000 peptides s−1), are routinely under-sampled, and at best performed in the low sampling regime. A frequently used approach to overcome this limitation is to perform repeat measurements of the same sample.30 However, as many as eight technical repeats are often required to saturate sampling, which is time-consuming, and preferentially achieve measurements of the more abundant proteins, which obscure the detection of rarer and potentially more biologically informative species.

A more efficient approach is to take advantage of data-dependent MS acquisition by using inclusion or exclusion lists to reduce the fraction of time that the mass spectrometer may spend sequencing abundant or generally present proteins, such as albumin and uromodulin.30,31 However, this requires a priori knowledge of the proteins of interest. Alternatively, abundant proteins can be depleted using immunoaffinity purification.32,33 This too is not without limitations, and in particular carries the risk of depleting physically associated proteins that may be of interest.34 Another approach to improve sampling is to reduce mixture complexity by fractionation.35 This too increases the required instrument time but by separating more abundant proteins has a more substantial effect on identification yields overall and with regard to rarer proteins in particular.

For example, we and others have used ultracentrifugation to separate urinary proteins and physical complexes, such as exosomes by molecular weight, thereby separating abundant urinary proteins, such as uromodulin that forms large molecular weight polymers from the rest of the mixture.7,11 Subsequent fractionation using sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) and cation exchange chromatography enabled the most comprehensive characterization of the human urinary protein to date (2363 unique proteins), including rare hormones that are biomedically significant and would not be detected otherwise. For example, among these are circulating hormones, such as hepcidin and chromogranin, that function in iron homeostasis and neuroendocrine regulation, and are useful as diagnostic markers of iron deficiency anemia and neuroendocrine tumors, respectively.11

Compositional complexity and variability of urine

Urine proteomic measurements are subject to the physiological variability in urine concentration and composition. Among the physiological factors known to affect urine composition are age, sex, diurnal variations in hormone and urine production, and orthostasis or exercise-induced proteinuria. For example, orthostatic or postural proteinuria is a frequent finding in adolescents that causes transient increases in urinary protein excretion in response to exercise and orthostasis.36 It is thought to be due to changes in renal hemodynamics, occasionally due to positional renal vein entrapment, and partly due to alterations in glomerular filtration due to angiotensin II and norepinephrine release.37 There is some evidence that orthostatic or exercise-induced proteinuria is due to exaggerated normal diurnal variability found in urine protein concentration.38 These effects appear to be more pronounced in male subjects as compared to female subjects, and to decrease with age. However, these effects are incompletely explained by changes in glomerular filtration rates, suggesting the existence of intrinsic renal tubular factors that are currently not well understood. In addition, there is emerging evidence that excretion of abundant urinary proteins is not uniform. It is well established that urine production is diurnal, partly due to diurnal hormone fluctuations, fluid consumption, and postural hemodynamics. In addition, there appears to be no correlation in the diurnal fluctuations in the excretion of albumin and β2-microglobulin, the latter being invariant but dependent on urine pH.39 These factors require that apparent abundance of urinary proteins measured in proteomic studies be normalized.

A variety of normalization factors have been used, such as urine concentrations of creatinine, albumin, and β2-microglobulin.40 In addition, a number of mass spectrometry parameters have been used for measurements of protein abundance, including spectral counts, ion currents, and isotopic labeling.41 Currently, quantitation using spectral counts appears to offer the most robust performance, but that is largely due to limitations of current LC alignment algorithms required for accurate measurements of peptide ion currents, and isotopic labeling of whole proteomes.42 In our studies to date of children without renal disease, albumin normalization using spectral counts appears to be most robust.29 However, this may not apply to studies of patients with abnormal renal physiology, such as patients with nephrotic syndrome, nephropathy, nephritis, shock, or acidosis. Thus, these physiological factors need to be considered explicitly, and experimental protocols tailored accordingly, with particular emphasis on incorporating improved MS quantitative measures.

Noise and bias of proteomic clinical studies

Moreover, because of the depth of profiling achieved, urine proteomics, like other highly dimensioned methods, such as gene expression profiling, may be susceptible to noise and selection bias.43 This could be due to effects on the urine proteome by individual variables, such as age and gender, as well as patient features that may confound the observed differences. It is therefore crucial to compare specimens of interest with appropriate biological or pathophysiological controls, such as for example urine from patients with a specific infection compared with that from control febrile patients, who may have non-specific alterations in renal hemodynamics or acute-phase reactants. Likewise, rigorous matching of cases and controls is required in order to minimize the contribution of sex- or age-specific factors that affect the urine production or protein composition.

One approach to minimize these potential problems is to include intra-personal controls in the proteomic comparisons, thereby minimizing individual differences in age, sex, physiological state, or genetic variation. For example, our recent study of biomarkers of acute appendicitis compared urine proteomes of patients with acute appendicitis to those without, as well as the same patients after they recovered from appendectomies.29 This procedure significantly improved the candidate marker ranking and reduced the number of candidate markers from 273 to 57 by eliminating a variety of proteins that were unlikely to be related to the appendicitis response. This suggests that individually variant factors, noise, and selection bias may significantly affect biomarker discovery studies, in some cases requiring the inclusion of intra-personal control groups.

Analysis of highly dimensioned proteomics data

Analysis of highly dimensioned peptide mass spectrometry data also presents specific challenges relevant to biomarker studies. Current methods for de novo identification of peptides using mass spectrometry are not yet robust enough to be used generally. Instead, a variety of statistical approaches are used to match measured spectra to known protein sequences.44 These methods have significant rates of false positive identification, which, when combined with under-sampled specimens, can severely compromise accuracy. One approach to reduce false identifications is to adjust search stringency through the use of decoy databases to assess false discovery rates (FDR).45 For proteins identified on the basis of multiple unique (independent) peptides, this can lead to essentially error-free identification, since the protein (aggregate) FDR is geometrically related to the peptide (individual) FDR in the large number limit. For example, for a protein identified on the basis of 10 unique peptides (median for our current urine proteomic measurements) detected at an individual FDR of 1%, the estimated protein FDR is close to 0.0110= 10−20. However, for proteins or peptides identified from few detected peptides, false discovery rates can be significant,46 and additional measurements are necessary to confirm their presence. Finally, only a fraction of measured spectra are identified accurately using current analytical methods, and in particular, proteins and peptides with post-translational modifications are often not identified. Consequently, it may be useful to make spectra available in public databases, such as and so that they can be re-analyzed in the future.

Lastly, given these considerations, rigorous statistics need to be applied to identify candidate markers that are significantly enriched in examined proteomes. The highly dimensioned nature of proteomics data and limited number of replicates limit the usefulness and applicability of conventional metric-based approaches. An alternative approach is to use Bayesian statistics that calculate the ratio of likelihoods for differential detection of each protein based on distributions of spectral count and MS intensity data.47 Likewise, dimensionality reduction and machine learning algorithms are also particularly well suited for analysis of urine proteomic data; for overview, see Hilario and Kalousis.48 For example, we applied support vector machine learning and cross-validation for identification of urine biomarkers, as implemented in BDVAL ( The multi-dimensional nature of proteomic data requires correction for multiple hypothesis testing, as well as explicit consideration of over-fitting that continues to plague biomarker studies. Finally, in spite of these stringent procedures, biomarkers identified using urine proteomics require rigorous validation, ideally using blinded testing of independent cohorts.49

Even when statistically significant biomarkers can be rigorously identified, their sheer number presents a challenge in discerning pathophysiologically or biologically informative ones. In addition to using a variety of tools for functional gene and protein annotation, such as the ones provided by DAVID ( and BioMart (, efforts are ongoing to develop disease-specific annotations. For example, a number of approaches have been developed for identifying protein–disease associations based on the human protein–protein interaction networks, known gene–disease associations, and protein functional information: PhenoPred,50 Proteome BioKnowledge Library,51 and UniProt Knowledgebase.52 We used Twease ( to mine the text of abstracts published in Medline to create a specific urine proteomic database of protein–disease associations for over 500 human diseases, including those listed in the Online Mendelian Inheritance in Man (OMIM), Medical Subject Headings (MeSH), and 27 common conditions (

Such annotations can provide a useful resource for the study of human pathophysiology and discovery of candidate disease biomarkers. For example, serum levels of platelet-activating factor acetylhydrolase have recently been shown to be predictive of severity of allergic reactions.53 Its detection in urine suggests that more accessible and convenient diagnostic tests may be devised, such as for example those that seek to measure therapeutic control of allergic asthma. Even for rare diseases that do not involve the kidneys, such as the Niemann–Pick disease for example, detection of urinary acid sphingomyelinase may enable the development of diagnostic tests based on easily accessible urine specimens.54

Future directions

Recent technological developments in mass spectrometry have allowed for an unprecedented combination of accuracy and sensitivity in identification and measurement of proteins in complex biological mixtures. As a result, proteomic measurements of human body fluids, such as urine, are now capable of producing deep profiles with simultaneous measurements of thousands of unique proteins and peptides. Recent advances in instrumentation and data analysis are expected to increase both the identification depth and quantitative accuracy of these measurements, presumably to the genomic scale. In addition, they may offer biologically and pathophysiologically relevant insights given the ability to profile functional protein modifications. Combined with rigorously designed clinical trials, these methods can be used to study disease pathophysiology and discover clinically needed biomarkers. This can be used to improve the diagnosis of a wide variety of common and rare diseases, such as pneumonia, where antibiotic treatment is largely empiric and excessive, and Kawasaki disease, which remains without a definitive diagnostic test, compromising timely treatment. For many diseases, and cancers most notably, treatment efficacy remains difficult to assess early in disease or treatment course, as a result of limited measures available to stratify patients prospectively. Here, urine proteomics can offer improved therapy stratification, and markers of residual disease that can be used to alter therapy for non-responders. Finally, deep proteomic profiling, and consequent illumination of disease pathophysiology, has the potential to discover novel therapeutic targets. In the context of translational clinical trials, urine proteomics promises to make significant contributions to the development of personalized and predictive medicine of the future.


I thank Hanno Steen for comments on the manuscript.