Candidate-based proteomics in the search for biomarkers of cardiovascular disease

Authors


Corresponding author L. Anderson: PO Box 53450, Washington, DC 20009-3450, USA. Email: leighanderson@plasmaproteome.org

Abstract

The key concept of proteomics (looking at many proteins at once) opens new avenues in the search for clinically useful biomarkers of disease, treatment response and ageing. As the number of proteins that can be detected in plasma or serum (the primary clinical diagnostic samples) increases towards 1000, a paradoxical decline has occurred in the number of new protein markers approved for diagnostic use in clinical laboratories. This review explores the limitations of current proteomics protein discovery platforms, and proposes an alternative approach, applicable to a range of biological/physiological problems, in which quantitative mass spectrometric methods developed for analytical chemistry are employed to measure limited sets of candidate markers in large sets of clinical samples. A set of 177 candidate biomarker proteins with reported associations to cardiovascular disease and stroke are presented as a starting point for such a ‘directed proteomics’ approach.

Proteomics has been defined from the biochemist's viewpoint (in a remark by Kenneth Mann) as the study of more than one protein at a time, a perspective that recognizes the importance of complex relationships between the functional parts of living systems while resisting the temptation to insist on a genome style complete (and perhaps unattainable) description at the protein level. While the technologies of proteomics have made rapid strides in recent years, providing tools that have been applied to many disease processes, there is a conspicuous lack of important disease markers discovered through proteomics and now established in the clinic. In fact if the rate of new plasma diagnostic protein markers is examined over the last decade, it has actually declined from one to two per year to near zero today (Anderson & Anderson, 2002). The reasons behind this paradox deserve study because the potential importance of accessible protein biomarkers of both normal and abnormal physiology is so great, particularly if we can believe the attractive but unproven hypothesis that all abnormal physiological states leave some specific fingerprint in the composition of circulating proteins. Evidence for this hypothesis, most recently in the field of cancer detection (Petricoin et al. 2002), has been accumulating for many years in the related fields of metabolite analysis (Jellum et al. 1981) and clinical chemistry (Robertson et al. 1980). These studies add support to the general statistical argument that a panel of independent disease-related proteins considered in the aggregate should be less prone to the influence of genetic and environmental ‘noise’ than is the level of a single marker protein. The heterogeneity of disease processes, and the genetic differences between individuals in the human population, both tend to obscure what might otherwise be clear disease associations. However, if there are multiple markers affected by the disease which are not strongly correlated with one another, then a composite index combining these markers may provide a much more robust indication of disease. In measuring the acute phase response, for example, a composite index summarizing a panel of weak acute phase reactants (Doherty et al. 1998) can provide a more robust indicator of inflammation than a single marker (e.g. C-reactive protein (CRP) or serum amyloid A). Similarly the relative risk of coronary heart disease is better predicted (Rifai & Ridker, 2003) by CRP and low-density lipoprotein (LDL)-cholesterol together than by either alone (Fig. 1, replotted from published data: Rifai & Ridker, 2003). More sophisticated multiplex panels have emerged from work with microarrays. One such example is the Netherlands breast cancer study (van't Veeret al. 2002), which sought to distinguish between patients with the same stage of disease but different response to treatment and overall outcome. The success of this initial study motivated a more extensive independent follow-up study involving 295 patients (van de Vijver et al. 2002) which led to a nationwide clinical trial in the Netherlands in which gene expression profiles for 70 classifier genes are being collected on all breast cancer patients and used as an adjunct to classical clinical staging. The belief that this phenomenon will be general for both proteins and mRNA, and that combinations of markers can be found that will identify and stage a wide range of diseases with useful specificity and sensitivity, is among the most important hypotheses of current biomedical research.

Figure 1.


Data replotted from Rifai & Ridker, 2003 showing the improved discrimination of relative cardiovascular disease risk when two different markers (in this case LDL-cholesterol and C-reactive protein) are considered jointly.

The difficulty of finding and using new biomarkers in the blood, even given the impressive advances in proteomics technologies, becomes clear when we compare the characteristics of the plasma proteome with the capabilities of current proteome analysis strategies and technology platforms. An exploration of this juxtaposition, set out in the following sections, provides the basis for an alternative candidate-based (targeted) approach proposed in the remainder of the paper.

Challenges of the plasma proteome

Plasma, which (together with its close cognate serum) is the primary biochemically useful clinical specimen, comprises the largest and deepest version of the human proteome. This makes it the most difficult sample to work with in proteomics, despite the relatively good behaviour (i.e. solubility) of its protein components. The daunting size of the plasma proteome is a reflection of the sheer number of different proteins to be detected. A rough calculation of this number can be made as follows. (1) Assume that 10% of the ∼30 000 genes encode secreted proteins, that each of these is made in an average of three splice forms, that two cleaved versions of each exist, and that there are an average of five post-translational modifications for each protein (a low estimate given the extreme carbohydrate microheterogeneity of most major plasma proteins). Since all these events can occur independently of the others, we obtain 3000 × 3 × 2 × 5 = 90 000 different secreted molecules. (2) Assume that all the non-secreted human proteins and their various modified forms are released into plasma at some low level as a result of cell turnover in the tissues. Using levels of modification similar to the secreted proteins, we obtain a further 810 000 protein species present at low levels. (3) Finally, assume that there are ∼10 000 000 distinct clonal immunoglobulin sequences present in plasma reflecting the immune history of the individual. The sum of these admittedly rough estimates is > 106 different molecules representing products of all ∼30 000 genes: in other words, plasma is the largest version of the human proteome in one sample. Proteomics technologies can typically resolve ∼100 different species per dimension of separation, indicating that 3 or more perfectly independent separative dimensions would be required, or more probably 4–5 dimensions of realistically implementable technology.

The enormous ‘depth’ of the plasma proteome is a reflection of the dynamic range (difference between the highest and lowest concentration) over which proteins must be detected. Approximately half of the total protein mass in plasma is accounted for by one protein (albumin, present at ∼55 000 000 000 pg ml−1), while roughly 10 proteins together make up 90% of the total. At the other end of the concentration histogram are the cytokines, such as interleukin-6 (IL-6), which is normally present at 1–5 pg ml−1. The difference in concentration between albumin and IL-6 is thus ∼1010. This range, of course, covers the proteins we know and consider useful as markers today, and ignores potentially valuable markers to be found in the future at even lower concentrations. The fact that we know anything about the concentrations of these proteins, and hence have been able to use them as biomarkers, is due to the power of specific protein tests, typically immunoassays of one protein at a time, and not to proteomics as currently defined, where currently technology is limited to a dynamic range of 103–104 (see Fig. 2).

Figure 2.


A plot in which normal plasma concentrations for 115 proteins from Table 1 (distributed along the X-axis but unlabelled because of legibility limitations) are plotted on a log scale (pg ml−1 along the Y-axis). The proteins are sorted by abundance to reveal the smooth distribution across > 10 logs of concentration. Each protein is represented by a symbol that indicates in how many of three proteomics datasets (see text) it was detected.

Proteomic strategies for the discovery and validation of biomarkers in plasma

Given the analytical challenges inherent in the plasma proteome, what practical strategies exist for finding and confirming protein biomarkers? The problem can be approached from two opposite directions: (1) complete analysis (to see all differences) and (2) targeted analysis (to measure one or more hypothesis-generated candidates). The advantages of complete analysis, if it is possible, are substantial. Complete analysis would allow the direct selection of optimal biomarker proteins at the outset, thus skipping over what is currently a very long and laborious iterative process. Not surprisingly, progress towards complete analysis has been the focus of most proteomics research for the past decade. The number of proteins detectable in plasma has risen from 40 in 1975 (Anderson & Anderson, 1977) to 300–1000 reported in various recent studies (Adkins et al. 2002; Pieper et al. 2003a; Tirumalai et al. 2003). The latter datasets have been combined (Anderson et al. 2004b) to generate a non-redundant set of 1173 proteins, which revealed surprisingly small commonality between the results of these three different proteomics platforms (respectively multidimensional chromatography of proteins followed by 2-D electrophoresis and mass spectrometry (MS) identification of resolved proteins; tryptic digestion and multidimensional chromatography of peptides followed by MS identification; and tryptic digestion and multidimensional chromatography of peptides from low-molecular weight plasma components followed by MS identification). When these datasets are searched for a group of candidate disease markers (the cardiovascular candidates described below) for which plasma concentration normal values exist, the result illustrates the limited sensitivity of the platforms as a means of complete plasma proteome analysis (Fig. 2). Most proteins in the top 3 logs of the concentration distribution are detected by two or three of the three platforms, a fair proportion of the proteins in the middle two logs are seen by at least one of the platforms, and very few of the proteins in the bottom 5 logs are detected by any of the three. Thus it appears that current proteomics technology is unlikely to be able to provide a complete analysis of the most relevant diagnostic samples (e.g. serum and plasma). An additional important feature of this plot is that the candidate proteins show a smooth distribution between 10 and 109 pg ml−1, demonstrating that presumed disease relationships appear to occur independently of a protein's plasma concentration. In particular there does not seem to be a bias towards either very low abundance proteins (e.g. cytokines) or high abundance molecules. Since plasma concentration was not a criterion in the selection of these proteins (just a relationship with cardiovascular disease or stroke), this observation is probably meaningful.

Targeted analysis, which emerged as a means of searching for disease marker associations in the 1950s (in the form of enzyme assays), has a longer history than proteomics (which emerged in the form of 2-D electrophoresis in the mid-1970s), and has produced most of the protein markers now in diagnostic use. Typically a researcher interested in a specific protein develops hypotheses regarding a specific disease, and arranges to apply a lab bench assay to sets of samples from patients and controls. The specific assays involved are usually immunoassays, which, because of the great specificity of antibodies, are often able to detect proteins in plasma at much lower concentrations than current proteomics platforms. While this approach adheres to the conventions of hypothesis-driven research (and is thus fundable through grants), it has a substantial weakness in the poor probability of success when one marker is tested in one disease at a time: there are, as indicated, at least tens of thousands of candidate protein forms, and at least hundreds of disease entities. Even if it were the case that there is one protein capable of serving as a robust marker of each disease state, this method will take a long time to find them, and unfortunately it will take as much effort to find the last such marker as it took to find the first. More discouraging yet is the fact that any disease state in which several markers need to be considered together to produce an accurate result (i.e. a multiplex panel) would represent an enormous combinatorial discovery problem: since the experimental assays are typically developed in separate laboratories, bringing them together for application as a prototype panel is an organizational challenge, compounded by the increased sample requirement of multiple separate assays.

Thus both the complete and targeted analytical approaches have important limitations (sensitivity and mono-analyte focus, respectively) that diminish the output of novel disease marker proteins. This situation has led in recent years to consideration of hybrid approaches, in which a set of preselected proteins could be measured at high sensitivity. By focusing on a limited number of candidate biomarker proteins, assay technologies providing higher sensitivity and dynamic range than current proteomics could be used. By looking at multiple proteins, instead of one, the odds of finding useful disease associations, and effective panels, would be increased. The odds can be further improved through intelligent selection of candidate markers: here there is an opportunity to make use of information from many sources in addition to proteomics: expression microarray data suggesting tissue-specific or disease-altered synthesis of specific proteins, relationships of proteins to disease pathways, and classical biochemistry. Such a hybrid approach, combining the multiprotein view of proteomics and the advantages of targeted specific assays can be termed targeted proteomics.

Technology platforms for targeted proteomics of candidate markers

The central technical issue in targeted proteomics is how best to measure a limited set of proteins in complex samples such as plasma. Two broad strategies are developing: miniaturized, mulitplexed immunoassays and quantitative mass spectrometry. The former approach, which includes antibody arrays in both planar and particle suspension formats (recently reviewed by Joos (Joos et al. 2002) and a review in this series), has the advantage that immunoassays are well-understood, sensitive and specific. Antibody arrays are limited, however, by the availability of suitable antibodies, and this has proved to be a critical bottleneck for the development of immunoassays for new marker content. While a single research immunoassay costs less to assemble than the $2–4 million required for a commercial diagnostic test, each additional new marker assay costs the same again as the first, typically requiring development of two different high-affinity antibodies. It thus appears that substantial time will be required to generate large sets of new immunoassays to candidate markers, and that an alternative approach based on quantitative mass spectrometry may serve to evaluate candidate biomarkers prior to investment in immunoassays. Here I focus on the emerging MS methodologies for specific protein quantification.

Mass spectrometry is widely used for the quantitative measurement of specific small molecules (e.g. drugs (Streit et al. 2002, 2004), drug metabolites (Kostiainen et al. 2003), hormones (Tai et al. 2004), and pesticides (Sannino et al. 2004)), with excellent precision (Tai et al. 2004) and very high throughput (Bakhtiar et al. 2002; Deng et al. 2002). In these methods, a sample is typically subjected to some form of high-throughput prefractionation (e.g. solid phase extraction; SPE) followed by a rapid reversed-phase chromatography separation, and the resulting output stream is introduced through an ionizing spray interface into a triple-quadrupole MS (TQMS). Within the MS, the first mass analyser (MS1) is set to pass the parent molecule (the ‘analyte’), rejecting components of other mass-to-charge ratios (m/z). The analyte is then fragmented in a collision chamber and passed to a second mass analyser (MS2) set to pass a known specific fragment. This two-stage selection of parent and fragment ions (selected reaction monitoring: SRM) affords great specificity, with the result that the detected signal usually traces a peak in the chromatogram at the expected retention time corresponding to the selected analyte. Integrating this peak gives a measure of the quantity of the analyte. Figure 3 presents an example of this approach in which a specific tryptic peptide of the coagulation protease prothrombin is measured in a tryptic digest of plasma. This measurement, based on precise molecular characteristics of the peptide, is in fact more specific for prothrombin than a typical immunoassay (in which lack of perfect antibody specificity is usually overlooked). An internal standard is often spiked into the sample to provide a reference signal to which the analyte is compared for absolute quantification. Lower limits of quantification (LLOQ) of 5–25 ng ml−1 (∼20 nm) can be obtained for drug metabolites (Zhang et al. 2000a), and < 10 ng ml−1 for pesticides in vegetable samples (Sannino, 2004). In serum and plasma, methods based on two-stage mass spectrometry (MS/MS) quantify the drugs mycophenolic acid (Streit et al. 2004) (0.5 ng ml−1) and sirolimus (Streit et al. 2002) (0.25 ng ml−1), as well as hormones and metabolites such as thyroid hormone T3 (Tai et al. 2004) (a reference method with coefficient of variation (CV) < 3%), homocysteine (Magera et al. 1999; Arndt et al. 2004; Stabler & Allen, 2004), S-adenosylmethionine and S-adenosylhomocysteine (Struys et al. 2000) (LLOQs of 3 and 1 ng ml−1, respectively, with CV < 8%).

Figure 3.


An example showing MS/MS detection of a prothrombin peptide (TATSEYQTFFNPR) in a tryptic digest of unfractionated plasma, using the SRM transition 781.4/909.7 (parent/fragment masses). Prothrombin is present in normal plasma at 100 mg ml−1, and the peptide is detected at a signal-to-noise ratio (S/N, smoothed peak height/3 s background) of 85. In the figure, the arrow in panel MS1 shows the peak in the peptide MS spectrum selected as the parent, the arrow in panel MS2 shows the fragment chosen from the MS/MS spectrum (the y7 ion), and panel MRM shows the ion current detected at this parent/fragment SRM transition (with unit mass windows) over the entire course of a 3 h LC run. The MS/MS spectrum in MS2 unambiguously identifies the prothrombin peptide by sequence, providing absolute specificity better than immunoassay.

The SPE–LC–MS/MS approach (where LC is liquid chromatography) has also been successfully applied to peptides, which typically have higher masses than the small molecules discussed above. Peptides yield specific fragments suitable for MS/MS measurement, and suitable internal standard peptides can be prepared by chemical synthesis. Small amounts (picomoles) of neuropeptides (enkephalins (Desiderio & Kai, 1983), endorphins (Dass et al. 1989), substance P (Lisek et al. 1989)) were detected by MS/MS and measured against stable isotope-labelled standards in the 1980s. More recently this approach has been used in standardized assays for larger peptides in serum such as 3 kDa thymosin a1 (LLOQ 0.5 ng ml−1 (Tuthill et al. 2000) CV < 10%) and for small proteins like the 10 kDa recombinant protein rK5 (LOQ 100 ng ml−1 in monkey serum (Ji et al. 2003) and later 10 ng ml−1 in human serum (Ji et al. 2004), CV of 3%). The structural specificity of MS/MS allows better analyte discrimination than immunoassays: particular forms of insulin and its fragments can be selectively detected (Kippen et al. 1997), and in fact MS/MS is now used as a reference against which to standardize different immunoassays for C-peptide (Fierens et al. 2003).

However, the method as described above is not generally useful for proteins larger than about 10 kDa, whose higher mass is not as well resolved by current MS or LC systems as peptides, which do not fragment efficiently into a few discrete pieces, and for which labelled internal standards are significantly more expensive. MS analysis of whole proteins from plasma is typically restricted to non-quantitative applications in which an available high affinity antibody is used to capture the protein, which is then eluted and analysed by MS (Kiernan et al. 2003; Nepomuceno et al. 2004), or digested to peptides that can be subjected to MS/MS for structural analysis (Labugger et al. 2003; Nedelkov et al. 2004). Such methods are useful for detecting protein sequence variants and post-translational modifications (PTMs), and can be quantitative in the rare cases where a purified cross-reacting homologue from another species is available to serve as an internal standard (e.g. the assay of 7.6 kDa IGF1 (Nelson et al. 2004) at ∼100 pg ml−1).

Thus in order to effectively leverage the successful methods of LC–MS/MS quantification to proteins in a sample such as plasma, one must ‘disassemble’ each protein quantitatively into its constituent peptides by complete chemical or enzymatic cleavage. Within this digest one can select a monitor peptide to serve as a quantitative surrogate for the protein, and achieve accurate quantification by spiking with a stable isotope-labelled version of the same peptide as internal standard (Stemmann et al. 2001). Such ‘postdigest’ assays have been generated for some higher-abundance plasma proteins such as ApoA-I lipoprotein (Barr et al. 1996) (CV < 4%) and Hb A1C (Jeppsson et al. 2002) (an International Federation of Clinical Chemistry reference method in which a glycated peptide is measured with interlaboratory CVs of 1.4–2.3%). Attempts to assay the 26 kDa cancer marker prostate-specific antigen (PSA) (Barnidge et al. 2004) using a standard LC–MS/MS system yielded a detection limit of 4.5 mg ml−1 (0.17 mg ml−1 of the monitored peptide, a level ∼1000 times higher than the clinically relevant level), while measurement of CRP (after a molecular weight enrichment by SDS gel) yielded quantitative measurements at < 1 mg ml−1 (Kuhn et al. 2004).

While individual analytes within each class of molecule vary, the published data lead us to conclude that serum concentrations in the order of 1 ng ml−1 for drugs, 1–10 ng ml−1 for plasma peptides, and ∼100 ng ml−1 for peptides in a complex plasma digest can be measured by existing LC–MS/MS-based assay methods. On average, proteins in plasma are ∼34 times as large as the roughly 10 amino acid-long monitor peptides chosen to represent them, and thus the protein detection limit (measuring a peptide in a digest) would be expected to be roughly 3 mg ml−1.

Two additional elements are required to enable quantitative MS/MS for targeted proteomics: the capability to assay many proteins at a time and a means to extend sensitivity downwards to the level of low abundance biomarkers such as cytokines (∼10 pg ml−1).

Multi-analyte methods are implemented in TQMS by rapidly switching between pairs of MS/MS parameters during the LC run. Published methods have measured up to 29 pesticides in one run (Barr et al. 2002) and prototype studies of up to 200 multiple-reaction monitoring (MRM) analytes performed. Sensitivity of MS assays can be increased by additional stages of fractionation prior to LC–MS/MS. Two such methods of particular promise involve the subtraction of specific high-abundance plasma proteins (e.g. albumin, transferrin, Igs, haptoglobin, etc.) using specific antibody columns (Pieper et al. 2003b), and the specific enrichment of selected monitor peptides through binding and release from antipeptide antibody columns (Anderson et al. 2004a). The former method provides a 10-fold improvement in sensitivity (by subtracting 90% of the mass of protein in plasma), while the latter method yields an additional 100-fold average improvement using relatively crude rabbit polyclonal antibodies. These extensions provide a reasonable basis for the expectation that panels of 20–50 protein analytes taken from the top 6 or 7 (of 10) orders of magnitude plasma concentration should be accessible for routine MS/MS measurement.

Candidate markers of cardiovascular disease

Given a technology platform for measuring a limited number of identified proteins, intelligent candidate selection is a high priority. As an example of a set of candidates to start with, I present here a table of proteins reported to have some connection with cardiovascular disease (here considered in a broad sense, and including heart disease, stroke, vascular disease, hyper- and hypo-coagulation) from literature and other sources (Table 1).

Table 1.  A table of 177 candidate markers of cardiovascular disease (CVD) and stroke, assembled through literature search
 
Name

Accession
Normal
concentration
(pg ml−1)
Source for
concentration

Reason
Coagu-
lation
Lipo-
protein
Acute
Phase
  1. The common name, Swissprot sequence accession number, normal plasma concentration (and source of concentration measurement), justification for inclusion, and membership in one of three general CVD-related groups (coagulation pathway, lipid transport, and acute phase reactants) are tabulated. Concentrations are mean values where given, or a geometric average of high and low normal values where a range was given. Blanks occur where the search has not yet found reliable published values. Some entries have multiple accessions (multiple subunits separated by +, or lack of sufficient information to select among homologues separated by commas), and in some cases multiple candidates share a single accession (when different processed forms of one protein are considered separately).

1activin AP084766.0E +02(Eldar-Geva et al. 2001)Released by heparin from vascular endothelium (Phillips et al. 2000) 
2adiponectin (ADPN)Q158484.8E +06(Mallamaci et al. 2002)Higher levels in essential hypertensives (Mallamaci et al. 2002) 
3albuminP027684.1E +10(Specialty Laboratories, 2001)Negative acute phase reactant, lower levels associated with increased risk of cardiovascular mortality (Shaper et al. 2004) +
4aldolase CP099724.0E +03(Asaka et al. 1990)A more specific and sensitive marker of cerebrovascular diseases than aldolase A (Asaka et al. 1990) 
5alpha 2 antiplasmin (alpha 2 AP)P086977.0E +07Progen test insertAn important regulator of the fibrinolytic system+ 
6alpha 2 macroglobulin (alpha 2 m)P010231.8E +09(Specialty Laboratories, 2001)Major plasma protease inhibitor 
7alpha(1)- antichymotrypsin (ACT)P010114.2E +07(Putnam, 1975)Major plasma protease inhibitor +
8alpha1 acid-glycoprotein (AAG)P027636.9E +08(Specialty Laboratories, 2001)Acute phase reactant +
9alpha1-antitrypsin (AAT)P010091.4E +09(Specialty Laboratories, 2001)Major plasma protease inhibitor 
10angiotensin- converting enzyme (ACE)P12821 Lower in stroke patients than controls (Catto et al. 1996) 
11angiotensinogenP010191.5E +06(Bloem et al. 1995)Precursor of major blood pressure control peptide 
12antithrombin III (AT III)P010082.0E +08(Kalafatis et al. 1997)Major inhibitor of thrombin+ 
13apolipoprotein A-IP026471.4E +09(Glowinska et al. 2003)Low level associated with mortality and myocardial infarction five years after CABG(Skinner et al. 1999) ++
14apolipoprotein A-IIP026522.4E +08(Luo & Liu, 1994)Lipoprotein + 
15apolipoprotein A-IVP067271.6E +08(Kondo et al. 1989)A relatively independent risk factor for CHD (Warner et al. 2001) + 
16apolipoprotein BP041147.3E +08(Glowinska et al. 2003)Major component of LDL + 
17apolipoprotein C-IP026546.1E +07(Riesen & Sturzenegger, 1986)Lipoprotein + 
18apolipoprotein C-IIP026553.3E +07(Bury et al. 1986)Lipoprotein + 
19apolipoprotein CIIIP026561.2E +08(Onat et al. 2003)Marker of CHD independent of cholesterol (Onat et al. 2003) + 
20apolipoprotein DP05090 Lipoprotein + 
21apolipoprotein EP026494.0E +07 Presence of epsilon4 allele a strong independent predictor of adverse events (Brscic et al. 2000) + 
22apolipoprotein L1O14791 Lipoprotein + 
23aspartate aminotransferase, mitochondrial (m-type)P00505 Giagnostic for early detection of myocardial infarction (Yoneda et al. 1992) 
24basic fibroblast growth factor (bFGF)P090386.0E +03(Song et al. 2002)sICAM-1level increases in acute cerebral infarction (Song et al. 2002) 
25beta(2)-glycoprotein I, nickedP02749 May control extrinsic fibrinolysis via a negative feedback pathway loop (Yasuda et al. 2004)+ 
26B-type neurotrophic growth factor (BNGF)P011387.0E +02(Reynolds et al. 2003)Candidate stroke marker (Reynolds et al. 2003) 
27cathepsin BP078582.1E +03(Kos et al. 1998)Potential biomarker for vulnerable plaques (Chen et al. 2002) 
28CD105 (endoglin)P178133.4E +04(Takahashi et al. 2001)Potential myocardial infarction and stroke marker (Li et al. 1998) 
29CD40 ligand, soluble (sCD40L)(= CD154)P299652.9E +03(Schonbeck et al. 2001)Patients with unstable angina have elevated plasma levels of soluble CD40L (Schonbeck et al. 2001) 
30ceruloplasminP004502.8E +08(Kim et al. 2002)Ceruloplasmin reported to be an independent risk factor for cardiovascular disease (Kim et al. 2002) +
31chitotriosidaseQ13231 Significantly increased in individuals suffering from atherosclerosis disease (Artieda et al. 2003) 
32cholesteryl ester transfer protein (CETP)P115971.9E +06(Sasai et al. 1998)Alleles affect CVD (Blankenberg et al. 2003) + 
33chromogranin AP106451.1E +05(Ceconi et al. 2002)Increased in chronic heart failure (Ceconi et al. 2002) 
34clusterinP109093.7E +08(Hogasen et al. 1993)Induced in media and neointima after vascular injury (Miyata et al. 2001) + 
35coagulation Factor IXP007405.1E +06(Kalafatis et al. 1997)Coagulation+ 
36coagulation Factor VP122596.6E +06(Kalafatis et al. 1997)Most common genetic CVD risk factor to date is a single point mutation (FV Leiden) (Dahlback, 2003)+ 
37coagulation Factor VIIP087095.0E +05(Kalafatis et al. 1997)Coagulation+ 
38coagulation Factor VII-activating proteaseQ145207.5E +06(Romisch et al. 1999)Coagulation+ 
39coagulation Factor VIIIP004512.0E +05(Kalafatis et al. 1997)Coagulation+ 
40coagulation Factor XP007421.0E +07(Kalafatis et al. 1997)Target for novel antithrombotic agents+ 
41coagulation Factor XIP039514.8E +06(Kalafatis et al. 1997)Coagulation+ 
42coagulation Factor XIIP007483.0E +07(Kalafatis et al. 1997)Coagulation+ 
43coagulation Factor XIIaP007482.0E +03(McLaren et al. 2002)Levels of 2 ng ml−1 or more have an increased risk of CHD (McLaren et al. 2002)+ 
44coagulation Factor XIIIP00488, P051601.0E +07(Katona et al. 2000)Coagulation+ 
45collagen I degradation byproduct (ICTP)0 Altered in hypertrophic cardiomyopathy (Lombardi et al. 2003) 
46collagen I synthesis byproduct (PICP)0 Altered in hypertrophic cardiomyopathy (Lombardi et al. 2003) 
47collagen I synthesis byproduct (PINP)0 Altered in hypertrophic cardiomyopathy (Lombardi et al. 2003) 
48collagen I synthesis byproduct (PIP)01.0E +05(Lopez et al. 2001)May be useful to assess the cardioreparative properties of antihypertensive treatment in hypertensives (Lopez et al. 2001) 
49collagen III propeptide (PIIIP)0 (Nomura et al. 2003) 
50collagen III synthesis byproduct (PIIINP)05.0E +03(Poulsen et al. 2000)Correlates with infarct size in MI (Poulsen et al. 2000) 
51complement C1 inactivatorP051553.0E +08(Oshitani et al. 1988)Can preserve ischaemic myocardium from reperfusion injury (Buerke et al. 1995) 
52complement C3P010241.3E +09(Specialty Laboratories, 2001)C3 is more strongly associated with previous myocardial infarction than other risk factors (Muscari et al. 2000) +
53complement C4P010282.7E +08(Specialty Laboratories, 2001)Associated with previous myocardial infarction (Muscari et al. 1995) +
54C-reactive protein (CRP)P027412.3E +06(Menon et al. 2003)CRP levels strongly predicts cardiovascular death (Park et al. 2002) +
55creatine kinase-MBP12277, P06732 Specific biochemical marker of myocardial injury (Ay et al. 2002) 
56endothelial cell protein C receptor (EPCR)Q9UNN81.0E +05(Kurosawa et al. 1997)Protein C activation is augmented by EPCR (Esmon, 2003) 
57endothelial leucocyte adhesion molecule 1 (ELAM-1)P165819.2E +02(Carson et al. 1993)Stroke caused an initial transient increase of sELAM-1 (Fassbender et al. 1995) 
58endothelin-1 (ET-1)P053053.6E +00(Tsutamoto et al. 1995)ET-1 levels are elevated in acute MI (Monge, 1998) 
59endothelin-1, BigP053051.2E +01(Erbas et al. 2000)Elevated Big endothelin-1 is a strong predictor of atrial fibrillation (Masson et al. 2000) 
60enolase, beta, skeletal muscleP13929 Concentrations significantly increased in acute MI (Nomura et al. 1987) 
61enolase, gamma, neurone-specificP091049.6E +03(Oh et al. 2002)May be a useful marker for severity in acute ischaemic stroke (Oh et al. 2002) 
62erythropoietin (EPO)P015882.6E +02(Masaki et al. 1992)Protects neurones from hypoxic/ischaemic injury (Ehrenreich et al. 2002) 
63E-selectin, solubleP165811.5E +04(Galvani et al. 2000)sE-selectin significantly elevated in the acute stage of ischaemic stroke (Frijns et al. 1997) 
64Fas, soluble (APO-1/CD95)P254452.0E +03(Ohtsuka et al. 1999)Increased plasma sFas levels are predictive of future CVD (Troyanov et al. 2003) 
65fatty acid-binding protein, heart-type (H-FABP)P054132.0E +03(Glatz et al. 1998)Performs as well as myoglobin as a marker of cardiac reperfusion (de Groot et al. 2001) 
66ferritinP02792 + P027948.2E +04(Zuyderhoudt et al. 1978)Possible relationship with carotid atherosclerosis potentiated by LDL cholesterol (Wolff et al. 2004) +
67fibrinogenP02671 + P02675 + P026792.5E +09(Glowinska et al. 2003)Strongly related to cardiovascular risk (Koenig, 2003)+ +
68fibrinopeptide AP026719.0E +02(Cronlund et al. 1976)iIncreased in patients with ACS and is associated with adverse outcome (Ottani & Galvani, 2001) 
69fibrinopeptide B beta 1–42P02675 May be predictive of recurrent ischaemia (Scharfstein et al. 1996) 
70fibrinopeptide B beta 15–42P02675 Candidate haemostasis marker (Fareed et al. 1998) 
71fibronectinP027511.4E +06(Castellanos et al. 2004)Cellular fibronectin may be a marker protein for endothelial cell activation (Kanters et al. 2001) 
72follistatinP198836.0E +02(Eldar-Geva et al. 2001)Released by heparin from vascular endothelium (Phillips et al. 2000) 
73gamma- glutamyltransferase (GGT)P19440 Marker of liver dysfunction, alcohol intake and stroke (Whitfield, 2001) 
74glial fibrillary acidic protein (GFAP)P141364.5E +02(van Geel et al. 2002)Marker of brain damage (Herrmann et al. 2000) 
75glycogen phosphorylase BB, cardiacP112163.0E +03(Hofmann et al. 1989)Classical cardiac marker 
76GMP-140 (soluble P-selectin)P161092.0E +05(Facer & Theodoridou, 1994)Elevated in elderly hypertensives (Li et al. 2001b) 
77gp130, soluble (sgp130)P401892.7E +05(Li et al. 2001a)Correlated with variables reflecting deranged haemodynamic status (Aukrust et al. 1999) 
78GPIIb/IIIa, solubleP08514 Implicated in the pathogenesis of acute coronary syndromes (Wagner et al. 1998) 
79growth hormone (GH)P012412.0E +02(Krassas et al. 2003)Associated with an increased incidence of cardiovascular disease (Vahl et al. 1999) 
80haptoglobinP007376.2E +08(Specialty Laboratories, 2001)Subjects with Hp 2–2 had significantly higher serum total and free cholesterol concentration (Braeckman et al. 1999) 
81haemopexinP027907.6E +08(Jakob, 2002)Acute phase protein 
82heparin cofactor II (HCII)P05546 Protein inhibitor of coalgulation (Mann et al. 2003)+ 
83hepatocyte growth factor (HGF)P142102.0E +02(Matsumori et al. 2000)Reflects the clinical course in patients with acute MI (Sato et al. 1997) 
84hexosaminidase AP06865 Subjects in the 95–100%ile showed significantly increased frequency of myocardial infarction of their fathers and of stroke in their mothers (Hultberg et al. 1994) 
85hydroxybutyrate dehydrogenase (HBDH)Q023381.3E +05(Akenzua et al. 1992)Mitochondrial enzyme useful for estimation of infarct size in MI (van der Laarse et al. 1984). 
86immunoglobulin G09.8E +09(Specialty Laboratories, 2001)Acute phase protein 
87insulinP013082.0E +03(Green et al. 1976)Serum insulin quantitatively associated with cardiovascular risk factors (Chen et al. 1999) 
88insulin C-peptideP013081.7E +03(Donatelli et al. 1991)C-peptide quantitatively associated with cardiovascular risk factors (Chen et al. 1999) 
89insulin precursor (proinsulin)P013084.3E +01(Burtis & Ashwood, 1999)Increased concentrations predict death and morbidity caused by CHD over a period of 27 years, independent of other major cardiovascular risk factors (Zethelius et al. 2002) 
90insulin-like growth factor binding protein-1 (IGFBP-1)P088336.0E +04(Wacharasindhu et al. 2002)Correlated negatively with several established cardiovascular factors (Heald et al. 2001) 
91insulin-like growth factor-1 (IGF-1)P013431.9E +05(Oh et al. 2004)May be a risk factor for certain cardiac disorders (Ren et al. 1999) 
92intercellular adhesion molecule 1, soluble (sICAM-1)P053625.3E +05(Song et al. 2003)sICAM-1 related to the estimated risk of coronary heart disease (Witte et al. 2003) 
93interleukin-1 beta (IL-1 beta)P015841.2E +00(Lu et al. 2004)Higher in MI group or UA (Wang et al. 2004) 
94interleukin-1 receptor antagonist (IL-1Ra)P18510 Plasma levels appear to be a valuable independent predictive factor of major adverse cardiac events in unselected patients undergoing PCI (Patti et al. 2002) 
95interleukin-1 receptor family member, ST2Q01638 Increased in the serum 1 day after myocardial infarction (Weinberg et al. 2002) 
96interleukin-10 (IL-10)P22301 Increased serum levels detected in stroke patients (Dziedzic et al. 2002) 
97interleukin-18 (IL-18)Q141165.9E +01(Blankenberg et al. 2002)Significantly increased in unstable angina and MI (Mallat et al. 2002) 
98interleukin-2 (IL-2)P605685.1E +01(Mizia-Stec et al. 2003)Significantly higher in patients with MI (Mizia-Stec et al. 2003) 
99interleukin-6 (IL-6)P05231 Increased serum level was a significant predictor of death or new heart failure episodes (Orus et al. 2000) 
100interleukin-6 receptor, soluble (sIL-6R)P088871.0E +05(Disthabanchong et al. 2002)Increased in MI and UA (Bossowska et al. 2003) 
101interleukin-8 (IL-8)P101451.7E +00(Zhang et al. 2003)Level higher in UA (Romuk et al. 2002) 
102leptinP41159 Patients with advanced CHF show elevated serum levels (Schulze et al. 2003) 
103leptin receptor, solubleP483572.3E +04(Schulze et al. 2003)Patients with advanced CHF show elevated serum levels (Schulze et al. 2003) 
104lipoprotein lipase (LPL)P068582.8E +05(Dugi et al. 2002)Significant association between the LPL protein mass and NYHA class (Kastelein et al. 2000) + 
105lipoprotein receptor-related protein 1, soluble (sLRP1) (alpha-2- macroglobulin receptor)Q079546.0E +06(Quinn et al. 1997)May antagonize the clearance of ligands by cell bound LRP perturbing lipid metabolism (Quinn et al. 1997) + 
106lipoprotein(a) (Lp(a))P085191.4E +08(Glowinska et al. 2003)An index of atherosclerosis risk (Malaguarnera et al. 1996) + 
107lipoprotein- associated phospholipase A2 (Lp-PLA2)P040541.5E +03(Kugiyama et al. 1999)Potential biomarker of coronary heart disease, plays a proinflammatory role in the progression of atherosclerosis (Dada et al. 2002) + 
108 l-selectin, soluble (sL-selectin) (CD62L)P141516.7E +05(Atalar et al. 2002)CD62L expression increased during cardiopulmonary bypass (Hambsch et al. 2002) 
109macrophage colony-stimulating factor (MCSF)P096036.8E +02(Saitoh et al. 2000)Mean concentration in patients with coronary events was significantly higher than controls (Saitoh et al. 2000) 
110matrix metalloproteinase-1 (MMP-1)P03956 Patients with atrial fibrillation (AF) had lower levels of MMP-1 (Marin et al. 2003) 
111matrix metalloproteinase-2 (MMP-2)P082538.1E +05(Noji et al. 2004)Higher in hypertrophic cardiomyopathy than controls (Lombardi et al. 2003). 
112matrix metalloproteinase-3 (MMP-3)P082548.0E +03(Sangiorgi et al. 2001)Levels are strongly associated with carotid lesions (Beaudeux et al. 2003) 
113matrix metalloproteinase-9 (MMP-9)P147809.0E +03(Sangiorgi et al. 2001)Predicts haemorrhagic transformation in acute ischaemic stroke (Castellanos et al. 2003) 
114monocyte chemoattractant protein-1 (MCP-1)P135001.6E +02(de Lemos et al. 2003)Appears to play a crucial role at multiple stages of atherosclerosis (de Lemos et al. 2003) 
115myelin basic protein (MBP)P026862.5E +03 Marker of cerebral damage (Zhou et al. 1992) 
116myeloperoxidase (MPO)P05164 Predicts increased risk for subsequent cardiovascular events (Baldus et al. 2003) 
117myoglobin, cardiac (Mb)P021444.2E +04(Burtis & Ashwood, 1999)Cardiac muscle damage marker 
118myosin heavy chain, cardiacP13533, P12883 Cardiac muscle damage marker 
119myosin light chain I, cardiacP085901.0E +03(Uji et al. 1991)Cardiac muscle damage marker 
120myosin light chain II, cardiacP109162.0E +03(Hirayama et al. 1990)Cardiac muscle damage marker 
121natriuretic peptide, atrial, C-terminal (C-ANP)P01160 Diagnostic utility in detecting left ventricular dysfunction (Lee et al. 2002) 
122natriuretic peptide, atrial (ANP)P011605.6E +01(Goto et al. 2002)Diagnostic utility in detecting left ventricular dysfunction (Lee et al. 2002) 
123natriuretic peptide, atrial, N-terminal (N-ANP)P01160 Diagnostic utility in detecting left ventricular dysfunction (Lee et al. 2002) 
124natriuretic peptide, atrial, propeptide (31–67)P01160 Increased moderately with primary pulmonary hypertension (Goetze et al. 2004) 
125natriuretic peptide, brain (BNP)P168601.9E +02(Goto et al. 2002)Diagnostic utility in detecting left ventricular dysfunction (Lee et al. 2002) 
126natriuretic peptide, brain, N-terminal (NT-BNP)P16860 Diagnostic utility in detecting left ventricular dysfunction 
127natriuretic peptide, brain, pro-form (proBNP)P16860 40-fold increase in primary pulmonary hypertension (Goetze et al. 2004) 
128neurone-specific enolase (NSE)P091048.0E +01(Oh et al. 2002)Significantly elevated in patients with acute cerebral infarction (Oh et al. 2002) 
129neutral endopeptidase 24.11 (NEP)P084732.5E +02(Zhang et al. 1994)A target for ACE-inhibitor-like drugs 
130neutrophil gelatinase-associated lipocalin (NGAL)P801888.7E +04(Elneihoum et al. 1997)Levels higher in stroke (Falke et al. 2000) 
131neutrophil protease-4 (NP4)P241582.3E +04(Elneihoum et al. 1997)Levels higher in stroke (Elneihoum et al. 1996) 
132osteoprotegerin (OPG)O003002.3E +02(Browner et al. 2001)Serum levels associated with cardiovascular mortality, may be a marker for vascular calcification (Browner et al. 2001) 
133paraoxonase (PON1, 2, 3)(P27169, Q15165, Q15166)5.9E +07(Kujiraoka et al. 2000)Plasma levels influence the risk of developing cardiovascular disease (Getz & Reardon, 2004). + 
134phosphoglycerate mutase (PGM) B-typeP18669 Novel marker for diagnosis of cerebral stroke and its severity (Hayashi & Matuo, 2001) 
135plasminogenP007471.0E +08(Marchal et al. 1996)Major enzyme of thrombolysis+ 
136plasminogen activator inhibitor (PAI)-1-antigenP051214.2E +04(Glowinska et al. 2003)High plasma levels reported in coronary artery disease and stroke (Diamantopoulos et al. 2003)+ +
137platelet endothelial cell adhesion molecule-1, soluble (sPECAM-1)P162846.6E +03(Zeisler et al. 2001)Stroke patients displayed statistically significant higher levels of sPECAM-1 in sera (Zaremba & Losy, 2002) 
138platelet factor 4P027767.7E +03(Cella et al. 1983)Elevated in brain lacunar infarctions with long-lasting signs (Oishi et al. 1999) 
139platelet-activating factor (PAF) acetylhydrolaseQ13093 Deficiency associated with stroke, myocardial infarction, brain haemorrhage, and non-familial cardiomyopathy (Tjoelker & Stafforini, 2000) 
140platelet-derived growth factor (PDGF)P04085 + P011271.7E +02(Cimminiello et al. 1994)iIncreased levels in chronic arterial obstructive disease (Cimminiello et al. 1994) 
141pregnancy-associated plasma protein A (PAPP-A)Q13219 Elevated in acute coronary syndromes (Bayes-Genis et al. 2001) 
142proreninP007973.7E +01(Sealey, 1991)Involved in blood pressure regulation 
143protein CP040703.7E +06(Yan & Dhainaut, 2001)Major regulator of haemostasis (Yan & Dhainaut, 2001)+ 
144protein C inhibitor (PCI)P051545.3E +06(Laurell et al. 1992)Inhibitor of key component of natural anticoagulant pathway+ 
145protein C, activated (APC)P040702.0E +03(Yan & Dhainaut, 2001)Key component of natural anticoagulant pathway+ 
146protein SP072252.1E +07(Kalafatis et al. 1997)Deficiency of protein S constitutes a major risk factor of venous thrombosis (Dahlback, 2004)+ 
147protein ZP22891 In the context of juvenile stroke, high plasma levels may represent a prothrombotic condition (Lichy et al. 2004)+ 
148prothrombinP007341.0E +08(Kalafatis et al. 1997)Coagulation+ 
149prothrombin fragment 1.2P007341.2E +03(McKenzie et al. 1999)Stroke patients had higher values than controls (Soncini et al. 2000)+ 
150P-selectin glycoprotein ligand-1 (PSGL-1)Q14242 Serum levels decreased during CV surgery (Osmancik et al. 2002) 
151P-selectin, soluble (GMP-140)P161094.7E +04(Carter et al. 2003)Significantly elevated in the acute stage of ischaemic stroke (Frijns et al. 1997) 
152resistinQ9HD891.5E +04(Fujinami et al. 2004)Concentrations of adipocytokines such as resistin and adiponectin determine inflammation status of vasculature, and in turn the progress of Atherosclerosis (Kawanami et al. 2004) 
153S-100betaP04271 A promising early biochemical marker for cerebral injury following cardiac surgery (Farsak et al. 2003) 
154serum amyloid A protein (SAA)P02735 Classical inflammation marker (with CRP) +
155serum placenta growth factorP49763 Associated with the occurrence of subsequent preeclampsia (Su et al. 2001) 
156sex hormone-binding globulin (SHBG)P04278 A biological marker for insulin resistance, which is linked to cardiovascular risk in African-American women (Sherif et al. 1998) 
157smooth muscle myosin heavy chainP35749 Intracoronary level may be a biochemical marker for the prediction of restenosis (Tsuchio et al. 2000) 
158tau proteinP10636 Correlated with brain infarct volume and disability after 3 months (Bitsch et al. 2002) 
159thrombin activatable fibrinolysis inhibitor (TAFI)Q9P2Y63.5E +06(Wada et al. 2002)Indirectly affects clot stability (Mann et al. 2003)+ 
160thrombomodulin, soluble (sTM)P072044.5E +04(Blann et al. 1997)Strong, graded, inverse association with incident coronary heart disease (Salomaa et al. 1999)+ 
161thrombospondin-1P079962.0E +05(Hayden et al. 2000)Might function as an alternative substrate for thrombus formation (Jurk et al. 2003)+ 
162tissue factor (TF)P137262.8E +02(Zemanova et al. 2003)Good predictor of cardiac allograft vasculopathy (CAV) (Yen et al. 2002)+ 
163tissue factor pathway inhibitor (TFPI)P106462.3E +04(Nomura et al. 2003)Significantly higher in acute MI (He et al. 2002)+ 
164tissue inhibitor of metalloproteinases-1 (TIMP-1)P010339.5E +04(Noji et al. 2001)Significantly higher in HCM patients than in control subjects (Noji et al. 2004) 
165tissue inhibitor of metalloproteinases-2 (TIMP-2)P160353.4E +04(Noji et al. 2004)Significantly higher in patients with HCM accompanied by systolic dysfunction (Noji et al. 2004) 
166tissue plasminogen activator (t-PA)P007507.3E +03(Glowinska et al. 2003)Predicted coronary events during a very long-term follow-up (Niessner et al. 2003)+ 
167transforming growth factor-beta (TGF-beta)P011374.5E +03(Shariat et al. 2001)Concentrations decreased in patients with coronary artery disease (CAD) (Tashiro et al. 2002) 
168tropomyosin 1 alpha chainP094932.0E +03(Cummins et al. 1981)Elevated ∼50-fold in MI (Cummins et al. 1981) 
169troponin I, cardiacP194291.0E +03(Kini et al. 2004)A clinical marker of cardiac muscle damage 
170troponin T, cardiacP453793.0E +00(Xue et al. 2003)A clinical marker of cardiac muscle damage 
171tumour necrosis factor receptor I, soluble (sTNF-RI)P194388.9E +02(Weiss et al. 1996)Significant independent predictor of cardiovascular mortality (Falke et al. 2000) 
172tumour necrosis factor receptor II, soluble (sTNF-RII)P203331.7E +03(Weiss et al. 1996)Increased in patients with CHF (Nowak et al. 2002) 
173tumour necrosis factor-alpha (TNF-alpha)P013758.3E +00(Mizia-Stec et al. 2003)Levels were elevated in all CAD groups (Mizia-Stec et al. 2003) 
174vascular endothelial growth factor (VEGF)P156923.2E +01(Lavie et al. 2002)Levels increased in patients with peripheral artery disease (PAD) (Makin et al. 2003) 
175vitronectinP040042.6E +08(Hogasen et al. 1993)A cofactor for rapid inhibition of activated protein C by plasminogen activator inhibitor-1 (Gechtman & Shaltiel, 1997) 
176von Willebrand Factor (vWF)P04275 Elevated plasma concentrations are increasingly recognized as a cardiovascular risk factor (Vischer et al. 1997)+ 
177von Willebrand Factor, propeptide (vWf:AgII)P042757.0E +05(Vischer et al. 1997)Could provide a sensitive plasma marker of acute endothelial secretion (Vischer et al. 1997)+ 

Cardiovascular disease (CVD) is the leading cause of death in the United States (∼40% of all deaths), and a major economic burden ($227 billion in direct medical costs this year) (2003). In 2001, there were more than 4 million visits to emergency departments with a primary diagnosis of CVD, and more than 6 million inpatient cardiovascular operations and procedures were performed (American Heart Association, 2003).

Cardiovascular disease includes a range of phenomena differing markedly in timescale, physical size, and relative effects of genes and environment. It includes slow processes such as atherosclerosis, which can evolve over decades, and very rapid events such as myocardial infarction, which can be lethal in a matter of minutes. It involves subtle changes at the molecular level, as coagulation enzymes are activated at the site of a ruptured arterial plaque, and large-scale physical consequences, when a blood clot physically plugs a major coronary artery. Genetic factors (e.g. familial hypercholesterolaemia or levels of lipoprotein (a) (Lp(a)) are strongly involved, as are environmental and lifestyle factors, the most obvious of which are lipid intake and smoking. Largely on account of this breadth of causes and effects, and the diversity of treatment strategies that this makes possible, major progress has been made in the development of life-saving interventions. Damaged hearts can be repaired physically, by coronary artery bypass grafting (CABG) or percutaneous coronary intervention (PCI), or enzymatically, by administering recombinant human tissue plasminogen activator (tPA) to digest a clot; elevated blood pressure can be controlled by several different classes of drugs, and coagulation can be enhanced (in treatment of haemophilia by replacement of missing clotting factor proteins) or diminished (with aspirin, heparins, and platelet GP IIb/IIIA receptor antagonists).

A major challenge in medicine is thus deciding when, and upon whom, these effective interventions should be carried out. A patient presenting with chest pain may have an acute myocardial infarction (MI) requiring immediate PCI or tPA treatment, stable angina requiring nitroglycerine, oesophageal spasm with no cardiovascular consequences, etc. Given the urgency of this issue, the cardiology community has promulgated detailed guidelines concerning triage of chest-pain patients (Ryan et al. 1996; Braunwald et al. 2000). Perhaps most importantly, there is a window of opportunity, while conditions such as atherosclerosis and hypertension gradually worsen, in which the ability to anticipate an imminent acute event (e.g. MI or stroke) can have immense benefit. Where causal molecules or telltale molecular fingerprints can be identified, objective and reproducible laboratory tests can be created, helping to implement best medical practices at institutions large and small. Such tests are typically inexpensive in relation to drug treatment or surgical intervention, providing a major health economic benefit. And they can be fast, providing critical results in < 15 min when implemented in automated instruments near the patient.

History of protein markers in CVD Cardiovascular disease is the most likely area in the spectrum of human disease to yield protein markers in plasma. Most pathologies of the cardiovascular system involve plasma proteins directly (e.g. the coagulation cascade with its positive and negative modulators (> 29 proteins), or proteins of lipid transport involved in atherosclerosis (> 16 proteins)), or proteins that interact with vessel walls, platelets, or both. In addition to these, numerous inflammatory modulators transported in the blood have direct and indirect relationships to cardiovascular disease, while release of proteins from the heart itself provides evidence of cardiac damage.

Consistent with this expectation, a number of very successful protein diagnostics have emerged in cardiovascular medicine. The most definitive of these is cardiac troponin I (TnI, or the alternative TnT, both muscle contractile proteins) as a primary indicator of myocardial infarction (Jaffe, 2001), often in combination with the cardiac isozyme of creatine kinase (CK-MB) and myoglobin. In this case, the diagnosis of MI typically includes a finding of elevated cardiac marker (e.g. TnI > 1 ng ml−1), leading to initiation of reperfusion treatment based on the knowledge that the marker signals destruction of cardiac muscle tissue surrounding an infarct. Brain-type natriuretic peptide (Maeda et al. 1998) (BNP or NTproBNP), a molecule produced in and released by the left ventricle, has recently been adopted as an effective test for congestive heart failure. Because of the clinical importance of these tests, they are performed in very large numbers: ∼85 million troponin assays and ∼10 million BNP assays are performed each year. Similarly the levels of inflammation markers like C-reactive protein (Ridker et al. 1998) (CRP), lipoprotein(a) (Agewall & Fagerberg, 2002), fibrinogen (Kannel et al. 1992), and the apportionment of cholesterol between high- and low-density lipoproteins (Luria et al. 1991) (usually distinguished in assays by their protein components) all serve as valuable measures of cardiovascular risk.

In fact, many proteins in plasma show changes associated with cardiovascular disease states. Thus the strategy of seeking single-protein tests (each with a defined reference interval, or normal range, outside of which a patient value is clearly diseased) has been vigorously pursued. Unfortunately, in most cases these changes are not sufficiently specific to provide a test of useful predictive value: the change may be real but too small in relation to genetic and environmental ‘noise’, or it occurs with other diseases as well. Where useful biomarkers have emerged, the discovery and development of each test was the result of efforts over a number of years. The appearance of cardiac troponin in plasma in MI was reported in 1987 (Cummins et al. 1987), the test was introduced commercially in 1995, and it emerged as the core parameter for MI diagnosis in 2000 (Alpert et al. 2000; Braunwald et al. 2000). BNP, probably the most rapidly adopted new diagnostic test in CVD, was shown to be diagnostic for congestive heart failure (CHF) in 1996 (Yamamoto et al. 1996) and introduced as a commercial test in 2002. However, most markers have been under investigation for many years: myoglobin since 1977 (Rosano et al. 1977), cardiac fatty acid-binding protein (FABP) since 1992 (Kleine et al. 1992) and cardiac myosin light chain 1 since 1994 (Uchino et al. 1994). On average, there appears to be a delay of approximately 10 years between discovery of a CVD marker and its commercial implementation in a form that can benefit clinical medicine (assuming it is specific and sensitive). Reducing this time lag while maintaining the rigor of clinical validation is a high priority.

Collection of candidate CVD markers Table 1 presents a set of proteins that are confirmed or potential plasma markers of some aspect of cardiovascular disease (in the heart, vessels or brain). To my knowledge, no comparable list of proteins associated with a specific disease area has been assembled and published. Results from several sources were pooled to generate this list. A large set (> 2000) of papers was selected through keyword searches on cardiovascular disease and stroke, and these were classified and clustered using the RefViz program where titles and abstracts were scanned for protein names. A table of these proteins was constructed in an Excel spreadsheet, to which was added additional ‘pathway’-derived potential markers derived from a literature survey of the protein components of coagulation and thrombolysis pathways, as well as acute phase reactants and known inflammatory markers. The resulting list comprised 177 protein targets, some of which were composed of multiple subunits, and some of which were different fragments of a single protein. Where possible, the normal plasma concentration was extracted from the literature references, or, in the case of existing clinical markers, from the normal range values used in test interpretation. These values are of critical importance in developing strategies for measurement: the 50 most abundant candidates are likely to be measurable by MS/MS (as in Fig. 3) without additional enrichment steps, while the others may require more elaborate sample preparation or fractionation prior to quantification.

While almost all of these candidates have been evaluated in some form of CVD or stroke, none has been surveyed across all forms of these diseases, and very few have been investigated jointly in the same sample sets. Thus these candidates include many proteins that have disease relationships that are significant (though not definitive enough to provide a specific single protein test): precisely the kinds of candidates from which multiplex panels of great specificity might be drawn.

Table 2 presents 28 additional known or candidate biomarkers of CVD that are not individual proteins. These include specific protein complexes, protein modifications, antibodies against specific proteins and smaller molecules (typically metabolites). While these markers are not directly accessible to the MS-based approach outlined here, they can be measured by immunoassay or by alternative MS-based methodologies.

Table 2.  Other candidate CVD markers
  1. Twenty-eight candidate markers of other types relevant to cardiovascular disease and stroke. These occur in four categories: protein complexes (where the amount of protein in heteromultimer complexes provides separate information from the concentrations of individual components); protein modifications (where the amount of specifically modified protein is relevant); antibodies (where the corresponding antigen is specified); and smaller molecules (which are not proteins, but rather metabolites). The first three categories are ultimately accessible to modified proteomics approaches. A citation is provided for each, illustrative of the connection to cardiovascular disease or stroke.

Protein complexesfibrinogen d-dimer(Ince et al. 1999)
plasmin-alpha(2)-antiplasmin complex (PAP)(Sakkinen et al. 1999)
thrombin-antithrombin III complex (TAT)(Brodin et al. 2004)
tissue factor pathway inhibitor-factor Xa (TFPI-Xa) complex(Ohkura et al. 1999)
tissue plasminogen activator (tPA)-plasminogen activator 
inhibitor-1 (PAI-1) complex (tPA/PAI-1 complex)(Johansson et al. 2000)
Protein modificationshaemoglobin, glycated (HbA1c)(Schillinger et al. 2003)
lipoprotein(a), glycated(Zhang et al. 2000b)
Antibodies to:angiotensin II receptor (AT1)(Fu et al. 2000)
beta 2-glycoprotein I (beta2-GPI)(Ebeling et al. 2003)
cardiac actin(Dangas et al. 2000)
cardiac myosin(Ebeling et al. 2003)
cardiolipin (aCL)(Dangas et al. 2000)
chlamydial LPS(Lowe, 2001)
heat shock protein 65(Birnie et al. 1998)
oxidized LDL(Ogawa et al. 2001)
phospholipid [lupus anticoagulant (LA)](Guerin et al. 1998)
prothrombin(Guerin et al. 1998)
Smaller moleculesasymmetric dimethylarginine (ADMA)(Tarnow et al. 2004)
dehydroepiandrosterone sulphate (DHEAS)(Jansson et al. 1998)
folate(Riddell et al. 2000)
homocysteine (HCY)(Abbate et al. 2003)
kallidin (a tissue kinin)(Wagner et al. 2002)
malonyldialdehyde (MDA)(Belboul et al. 2001)
marinobufagenin (MBG)(Fridman et al. 2002)
melatonin(Grote, 2004)
N-acetyl-aspartate(Stevens et al. 1999)
oxidized phosphatidylcholine (OxPC, formed in OxLDL)(Itabe, 2002)
uric acid(Leyva et al. 1998)

Discussion

This paper makes an argument for a candidate-based approach to protein biomarker development, supplementing the methods of classical proteomics that seek a complete analysis of a target proteome. Specific features of the plasma proteome, including its complexity and dynamic range, make it resistant to complete analysis in the near future. A targeted proteomics approach, aimed at selected candidates, can provide greater sensitivity and thus greater coverage of markers across the 10 orders of magnitude spanning known markers.

The fact that a non-exhaustive search for candidates related to CVD and stroke produced 177 different proteins (and protein forms) is revealing. A great deal of exploratory work has already been done, providing a targeted approach with an excellent starting point. The fact that most of these proteins have not yet become stand-alone clinical markers does not prevent them from providing incremental statistical improvement to multiprotein panels yielding improved specificity.

Two other factors also motivate a targeted approach. In the limiting case, the number of human genes is relatively small (∼25 000), and it might be reasonable to design specific MS-based assays (and ultimately antibodies for immunoassays) for all of these. Quantifying a major form of each human protein as a candidate disease marker is an attractive goal, though obviously far less comprehensive than the complete analysis goal (all forms of all proteins) implicit in the aims commonly expressed in proteomics.

A second and more practical factor favouring targeted assays is quantification itself. Most of the methods currently employed in proteomics can detect many proteins, but generally with poor quantitative accuracy. In particular when aiming for greatest sensitivity, proteome surveys of plasma detect quite variable subsets of proteins, even in repeat runs on the same sample. This makes it very difficult to assemble a coherent analytical dataset, since proteins are typically detected in one run but not the next: the dataset is filled with holes. This is acceptable when one is looking for hints as to the involvement of individual proteins in specific processes, but it is a major disadvantage when trying to develop a statistical case associating a protein with a disease in the human population. In this case accurate determinations of a protein in each sample are needed, as one obtains from specific assays.

By fusing the approaches taken by proteomics, analytical chemistry and clinical chemistry, hybrid methods should emerge capable of rapidly expanding the range of biomarkers for the study of disease, ageing and physiology.

Appendix

Acknowledgements

I wish to thank my collaborator Dr Christie Hunter of Applied Biosystems (Foster City, CA, USA) for generating Fig. 3.

Ancillary