Liquid chromatography–tandem mass spectrometry applications in endocrinology


  • Mark M. Kushnir,

    Corresponding author
    1. ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT 84108
    2. Division of Analytical Chemistry, Department of Physical and Analytical Chemistry, Uppsala University, Uppsala, Sweden
    • ARUP Institute for Clinical and Experimental Pathology, 500 Chipeta Way, Salt Lake City, UT 84108.
    Search for more papers by this author
  • Alan L. Rockwood,

    1. ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT 84108
    2. Department of Pathology, University of Utah, Salt Lake City, UT,
    Search for more papers by this author
  • Jonas Bergquist

    1. Division of Analytical Chemistry, Department of Physical and Analytical Chemistry, Uppsala University, Uppsala, Sweden
    Search for more papers by this author


Liquid chromatography–tandem mass spectrometry (LC–MS/MS) has been recognized as a primary methodology for the accurate analysis of endogenous steroid hormones in biological samples. This review focuses on the use of LC–MS/MS in clinical laboratories to assist with the diagnosis of diverse groups of endocrine and metabolic diseases. Described analytical methods use on-line and off-line sample preparation and analytical derivatization to enhance analytical sensitivity, specificity, and clinical utility. Advantages of LC–MS/MS as an analytical technique include high specificity, possibility to simultaneously measure multiple analytes, and the ability to assess the specificity of the analysis in every sample. All described analytical methods were extensively validated, utilized in routine diagnostic practice, and were applied in a number of clinical and epidemiological studies, including a study of the steroidogenesis in ovarian follicles. © 2009 Wiley Periodicals, Inc. Mass Spec Rev 29:480-502, 2010


Clinical chemistry became a field of science at the end of the 18th century (Berzelius, 1812; Rosenfeld, 2002). Many fundamental discoveries and developments have been made during the last two centuries, which enabled the use of analytical chemistry to detect diseases, and as a result, clinical laboratories have become an essential part of practice of medicine. It has been estimated that medical laboratory testing plays a significant role in 60–70% of all decisions related to establishing patients' diagnoses, and in selection and monitoring of treatments (, 2008, accessed February 5, 2008). In part, this is related to the fact that many diseases have a similar clinical presentation, and testing often helps to differentiate between diseases, and leads to more efficient treatments and better outcomes.


The effectiveness of a diagnostic testing depends on the appropriateness of the utilized markers of diseases. Successful markers are features that could be objectively measured and evaluated as indicators of biological and pathological processes (Biomarkers Definitions Working Group, 2001; Sahab, Semaan, & Sang, 2007). Biomarkers can be anatomic, physiologic, or biochemical in nature and must be associated with a disease. To be medically useful, a biomarker must be detectable and measurable with objective techniques such as physical examination, imaging, or an analytical measurement. Biochemical markers are endogenous compounds that are either not present in a normal physiological state (e.g., certain tumor markers) or present within certain range of concentrations (e.g., intermediates and products of metabolic pathways). Biomarkers are important because accurate diagnoses and treatment monitoring make the foundation for successful outcomes. Biomarkers might serve for early diagnostic needs, as indicators of severity of a diseases, response to a treatment, recurrence of the diseases, or to determine patient's prognosis (Fig. 1).

Figure 1.

Example of diagnostic and prognostic use of methylmalonic acid, a biomarker of vitamin B12 deficiency. *Diagnostic marker and **prognostic marker.

A variety of biomarkers utilized in contemporary human diagnostics, range from DNA and RNA, to proteins, lipids, polysaccharides, and small molecules. One of the main tasks of genetics is to predict a difference in biological systems based on genetic variations. A single gene is translated into messenger RNA, which encodes multiple proteins. The proteome is more complex than the genome, because protein expression (with over 200 known post-translational modifications) might differ between different cells, at different times and physiological conditions. One example is a biomarker, alkaline phosphatase. There are at least four different alkaline phosphatase genes in humans: tissue-non-specific, intestinal, placental, and germ cell. The tissue-non-specific alkaline phosphatase occurs in at least three forms (liver, bone, kidney), differs in carbohydrate content, and is produced by post-translational processing of the same gene products. The relative distribution of the bone and liver alkaline phosphatases is age-dependent, and concentrations of the different forms of alkaline phosphatase correlate with a number of normal and pathologic conditions, including pregnancy, altered liver function, bone disorders, and certain tumors (Roberts et al., 2005).

Knowledge of the gene and protein expression is very useful, but alone it cannot detect and explain all phenotypic variations and changes. As a result of protein expression, many biochemical interactions and changes occur in living organisms and produce physiological functions that lead to the formation of active and inactive metabolites. These metabolites, along with proteins, often serve as markers of diseases and physiological conditions. Genetic and protein analyses provide information about systemic regulation on higher levels, whereas metabolites and intermediates of biochemical pathways represent at a physiological response to the regulation.

The Role of Mass Spectrometry in Clinical Diagnostic Laboratories

Modern clinical laboratories use diverse techniques and instrumentation that vary in reliability and specificity. Since the introduction of tandem mass spectrometry in clinical laboratories, it proved to be one of the most specific analytical techniques available for clinical diagnostics.

Clinical applications of tandem mass spectrometry could be sub-divided into two groups: screening and target analysis. Typically, screening methods are intended to detect multiple markers of diseases, drugs, or toxins (e.g., newborn screening for metabolic diseases, toxicology screening). In these applications, the goal is to achieve high throughput of testing and a low false-negative rate. In target analysis, the main focus is on accurate and precise quantitation and the assurance of the analyte identity.

In all analytical applications measurement accuracy is important. Errors encountered in the clinical diagnostics, however, are especially costly compared to other fields, because they might lead to a misdiagnosis, mistreatment, patient injury, and even to the loss of life (Plebani and Carraro, 1997; Witte et al., 1997; Plebani, 2006). Therefore, highly specific methods are required for clinical diagnostic testing, and strict guidelines must be followed with respect to the quality control (QC) and management of pre- and post-analytic variables.

The challenges of clinical diagnostic testing are related to the complexity of the biological samples, the large diversity of classes of molecules present in the samples, variability of the sample matrices among individuals, and the wide range of concentrations of the constituents in the samples. Figure 2 (based on Roberts et al., 2005) shows some of the clinically useful diagnostic biomarkers along with their biologically relevant concentrations, which span over 10 orders of magnitude. Such a diversity of concentrations suggests a need for highly sensitive and specific instruments to enable an accurate measurement of minor sample constituents in the presence of excessive amounts of other endogenous substances.

Figure 2.

Distribution of concentrations (medians and ranges) of endogenous small molecules in serum of adults (based on data from Roberts et al., 2005).

The main emphasis of this review is on mass spectrometry-based diagnostic methods, which were developed in our laboratories for the measurement of biochemical markers of endocrine and metabolic diseases.



The majority of the liquid chromatography–mass spectrometry (LC–MS) methods for quantitative target analysis use triple quadrupole mass spectrometry (MS/MS) (Paul & Steinwedel, 1953). Data collected during the LC–MS/MS analysis contain three dimensions: retention time, m/z of the precursor ion, and m/z of the product ions. The masses of the precursor and the product ions represent fundamental properties of the molecules: their molecular weight and the structure. Reliance on these fundamental properties is the basis for the high degree of analytical specificity of LC–MS/MS.

Quantitative analysis with tandem mass spectrometry detection is typically performed in multiple-reaction monitoring (MRM) mode, with both mass analyzers fixed on transmission of the compound-specific precursor and product ions (Busch, Glish, & McLuckey, 1988). The MRM mode of operation is highly selective and sensitive because the mass analyzers transmit only ions, characteristic of the target analyte, and remove most of the chemical noise. The combination of LC with MS/MS in MRM mode represents a technique with one of the highest achievable analytical specificities and sensitivities available in clinical diagnostic laboratories.

Ionization techniques commonly used for the analysis of steroids with LC–MS/MS include atmospheric pressure chemical ionization (APCI), atmospheric pressure photo ionization (APPI), and electrospray ionization (ESI). In past, APCI was the preferred type of ionization in methods for the analysis of steroids, because of its advantages over ESI to ionize non-polar and poorly ionizable molecules. APPI is a newer technique that allows to gain sensitivity for non-polar and poorly ionizable molecules through more selective ionization (Robb, Covey, & Bruins, 2000; Hanold et al., 2004). In APPI ion source, UV light initiates the ionization process, as opposed to corona discharge in APCI. The principle utilized for the ionization is based on the irradiation of the vaporized effluent of the high-performance liquid chromatography (HPLC) column inside the ion source with UV light, which initiates a cascade of gas-phase reactions leading to ionization of the targeted analyte. Comparison of the relative sensitivity of the APPI ion source with APCI for the measurement of steroids in biological matrices showed a three- to fivefold enhancement in the sensitivity (Alary, 2001). One of the advantages of the APCI and APPI over ESI is that they are less prone to ion suppression and matrix effects. On some occasions neither of the above ion sources provide sensitivity required for the targeted application; in such cases chemical derivatization could be an alternative approach to enhance the method's performance.

Sample Preparation

One of the challenges in the analysis of biological samples is related to the complexity of the sample matrices. Sample preparation reduces the complexity of the samples and is especially important for the methods intended for routine use to improve the ruggedness and the specificity of the methods. Tasks commonly performed during sample preparation include removal of the harmful matrix components, pre-concentration of the analyte, and changing sample solvent. Sample preparation usually becomes more critical when low detection limits are required, or when potentially interfering substances are present in the samples.

The most common technique to separate analytes from sample matrix is liquid–liquid extraction. Principles of the liquid–liquid extraction are based on the solubility, acid–base equilibrium, and a partitioning of the analyte between the aqueous and organic phases (Wells, 2003). A variety of organic solvents, immiscible with biological fluids, are used for liquid–liquid extraction (e.g., hexane, heptane, ethyl acetate, ethers). The choice of a solvent might have an effect on the specificity and efficiency of the extraction. The choice of the solution pH during the extraction depends on the pKa of the analytes and should be made with regard to shifting the equilibrium in solution to the non-ionized form of the analyte, because it is predominantly the non-ionized form that will be partitioned into the less polar organic phase.

Solid-phase extraction (SPE) is another commonly used method to separate analytes from the sample matrix. In SPE, an analyte is adsorbed by a stationary phase, whereas impurities are either unretained or weakly retained by the adsorbent and need to be removed as much as possible with weak solvents. After the wash the analyte must be eluted from the adsorbent using a strong solvent. SPE is based on the principle of frontal chromatography (Rhoads & Friedberg, 1993), where partitioning of sample components between the liquid and the stationary phases take place while sample is continuously applied onto a SPE column packed with an adsorbent. In contrast to traditional chromatography, where discrete sample is injected into a column, in frontal chromatography the sample is continuously applied. Frontal separations typically have a relatively low efficiency; however, well-designed SPE methods might remove many of the harmful matrix constituents to make samples more amenable for the LC–MS/MS analysis.

Traditionally, SPE is performed off-line, and it is time-consuming and labor-intensive. The same principle as in traditional SPE is used in the on-line separation methods (Barcelo and Hennion, 1997; Lacorte et al., 1998; Morr et al., 2006; Kushnir et al., 2008). The on-line separation usually leads to a reduced complexity of the sample preparation, reduced labor, and improvement in the detection limits.

Analytical derivatization is another technique that might be used during sample preparation for LC–MS analysis. The derivatization is often used to make analytes more amenable for the analysis (Kushnir et al., 1999; Gao, Zhang, & Karnes, 2005;). Chemical properties of the analytes are the most important factor affecting sensitivity and specificity in LC–MS/MS analysis. Type of ionization, presence of ionized or ionizable groups, polarity of the molecules, surface activity, composition of the mobile phase, and fragmentation pattern have effect on the sensitivity of the LC–MS/MS methods. Generally, ESI provides greater sensitivity for the molecules, which can form ionic species in solution, whereas APCI and APPI work better for the molecules with low-to-medium polarity, having high proton-affinity atoms. The derivatization changes physical and chemical properties of the molecules, resulting in changes in the efficiency of the ionization, fragmentation, chromatographic retention, potential for the ion suppression, and the matrix effects. Most of the derivatization techniques used for ESI employ incorporation in the structure of the molecules of permanently charged or easy-ionizable groups, whereas the derivatization techniques for APCI aimed to introduce in the molecules group with high proton or electron affinity (Higashi, Awada, & Shimada, 2001).

Derivatization in the methods for the analysis of steroids could be beneficial because of the challenges involved in measuring endogenous concentrations of many of the steroids. Functional groups commonly targeted for the derivatization of endogenous steroids are keto and hydroxyl groups (Anari et al., 2002; Xu et al., 2004; Kushnir et al., 2006a,b, 2008); and the gain in the sensitivity in methods that use chemical derivatization can be up to few orders of magnitude.

Introduction of an ionizable moiety in the structure of steroids can be achieved by reaction with derivatizing reagents containing nitrogen atoms, especially an amine group. Reaction with hydroxylamine and its derivatives, and Diels–Alder reagents were shown to improve the sensitivity for keto-steroids (Kushnir et al., 2006a,b); dansyl derivatives allow to significantly improve the sensitivity for steroids containing within the structure hydroxyl or phenol groups (Anari et al., 2002; Nelson et al., 2004; Kushnir et al., 2008). Derivatization with mono(dimethylaminoethyl)succinyl imidazolide was shown to significantly improve the sensitivity for cholesterol and dehydrocholesterol (Johnson, ten Brink, & Jakobs, 2001).

Introduction of the high proton-affinity atoms, such as oxygen, has been shown to improve sensitivity of detection with APCI ionization; the improvement is likely attributed to the more efficient proton transfer in gas-phase reactions in the ion source (Novak & Yuan, 2000). High sensitivity can also be achieved with APCI ionization for derivatives made with Cookson-type reagents, 4-[4-(6-methoxy-2-benzoxazolyl) phenyl]-1,2,4-triazoline-3,5-dione or 4-[2-(6,7-dimethoxy-4-methyl-3-oxo-3,4-dihydroquinoxalyl)ethyl]-1,2,4-triazoline-3,5-dione (Higashi, Awada, & Shimada, 2001). This reaction allowed increasing the sensitivity of detection of the vitamin D metabolites to the femtomol range.

Introduction of the electron-capturing groups in the molecules can also enhance the sensitivity. Reaction of alcohols with pentafluorobenzyl bromide allowed to achieve the attomole levels of sensitivity for steroids and prostaglandins (Singh et al., 2000).

The analytical derivatization in LC–MS/MS applications would likely to gain wider use in future, as the demand for high-sensitivity methods increases. Cases when derivatization can be warranted are when required sensitivity of detection is unachievable with conventional ion sources or a very limited sample is available for analysis. In addition to lower detection limits, greater sensitivity allows to reduce the volume of the sample required for the analysis, and consequently to reduce the volume of the blood drawn from the patients. Reduction in the volume of blood required for testing is important for all groups of population, but especially beneficial in pediatric testing and testing samples from critically ill patients.

Some issues that need to be considered in conjunction with the use of the derivatization are associated with increased complexity to the sample preparation, necessity of using high purity reagents, non-specific fragmentation of the derivatives, and in some cases need in additional sample clean-up after the derivatization. In cases when derivatives lack analyte-specific distinct product ions, due to the loss of the moiety added by the derivatization (as it is the case for the dansyl derivatives), it is necessary to rely on more extensive chromatographic separation to achieve required specificity of the analysis. Sample clean-up after the derivatization can be adequately addressed through the use of on-line SPE or multidimensional chromatographic separation.

Derivatization was shown to be useful to increase the molecular weight of a precursor ion and to improve ionization efficiency (Chace et al., 1993; Johnson, 1999, 2001; Gao, Zhang, & Karnes, 2005; Kushnir et al., 2006a,b), to eliminate the interferences (Kushnir et al., 2001) and to change the polarity of the detection (Johnson 1999; Kushnir et al., 2001), to improve the fragmentation patterns and to reduce the matrix effects (Johnson, 1999; Kushnir et al., 2001). Derivatization proved to be useful in the high-throughput screening of acylcarnitines (Millington et al., 1989), amino acids (Chace et al., 1993), and organic acids (Johnson, 1999); high-throughput quantitative analysis of methylmalonic acid (MMA) (Kushnir et al., 2001; Kushnir, Rockwood, & Nelson, 2004); to reduce the sample volume required for the analysis of endogenous steroids (Kushnir et al., 2006a,b, 2008); and in cases when very limited samples are available for analysis (Kushnir et al., 2009).


Evaluation of the Methods' Performance

Due to the nature of the clinical diagnostic testing, the acceptable performance of analytical methods is required before they could be used clinical practice. The goals for the method's performance characteristics must be established prior to the development of a new method. Method development usually includes an optimization of all individual steps involved in the sample preparation, chromatographic separation, ionization, and mass spectrometric detection; followed by an optimization of the entire process of the analysis along with selection of the laboratory equipment, reagents, standards, and reference materials. Validation experiments for mass spectrometry-based diagnostic tests commonly include an evaluation of the precision, sensitivity, linearity, accuracy, and analytical specificity (Dharan, 1977; Shultz, Aliferis, & Aronsky, 2005). A flowchart of sequence of the experiments is shown in Figure 3.

Figure 3.

Sequence of the experiments for validation of mass spectrometry-based methods intended for use in clinical diagnostics.

To avoid systematic errors during the validation, it is preferable to use calibration and control materials from a commercial source (ideally traceable to a certified reference material), and to prepare standards and controls in amounts sufficient for the length of the entire validation. As shown in Figure 2, physiological concentrations and the degree of the biological variation vary extensively between the markers. Thus, targeted values for sensitivity, accuracy, precision, and the analytical measurement range (AMR) should be based on the clinical needs of each individual marker (Ricos et al., 1999;, 2008). Greater precision and accuracy are usually required for the methods intended to measure the markers with a narrow distribution of physiological concentrations (e.g., free T4). Commonly used requirements for LC–MS/MS methods intended for use in clinical diagnostics are listed in Table 1.

Table 1. Commonly used criteria for the performance characteristics of LC–MS/MS methods intended for use in clinical diagnosticsThumbnail image of
  • LOQ, limit of quantitation; LOD, limit of detection; ULOL, upper limit of linearity; AMR, analytical measurement range.

  • Precision is one of the main performance characteristics of an analytical method and represents reproducibility of a repeat analysis. Evaluation of precision is performed with replicate analysis; it is typically evaluated within- and between-run; then the total precision of the measurements is calculated based on the above values. Precision is usually evaluated with at least three samples prepared at concentrations around, below, and above the clinical decision level (concentrations of an analyte at which the diagnosis could be established or management of patients needs to be changed). The samples should be prepared in a matrix similar to the targeted clinical samples, and analyzed in replicates of three to five, on at least five separate occasions. Figure 4 shows an example of the results of the evaluation of precision of an assay (Kushnir et al., 2008).

    Figure 4.

    Results of the evaluation of the precision of LC–MS/MS assay for estrone (based on data from Kushnir et al., 2008).

    The lowest concentration that could be reliably detected by a method is the limit of detection (LOD). At the LOD an analyte must be qualitatively identified, and a typical criterion is that the s/n should be >5. The limit of quantitation (LOQ) is the lowest concentration at which an analyte could be qualitatively identified and the quantitation is accurate. The upper limit of linearity (ULOL) is the highest concentration, at which quantitation is accurate. The AMR represents concentrations between the LOQ and the ULOL of the method; quantitation should not be performed outside of the AMR.

    Samples for evaluation of the sensitivity are often prepared from a patient sample that contains a measurable concentration of the analyte. The sample can be serially diluted with an analyte-free matrix and analyzed in replicate. Potential problems with this approach could be encountered while analyzing endogenous compounds, because it might be difficult or impossible to obtain analyte-free physiological matrix. If analyte-free matrix is not available, one might resort to selectively remove the analyte (e.g., dialysis, charcoal stripping) from the matrix or to use an artificial matrix (e.g., bovine serum albumin in phosphate-buffered saline). One of the pitfalls associated with the removal of the analyte by charcoal stripping is related to the residual amount of fine particles remaining in the purified matrix. These particles may adsorb analyte added into the calibration standards; consequently this would lead to the positive bias in the assay.

    Samples for the evaluation of the linearity are often prepared from two patient samples, one with a low concentration of the target analyte and another with a highly elevated concentration. The samples need to be mixed in different proportions to obtain five to seven samples with concentrations that overlap the range of interest, and analyzed in replicates.

    Before implementing a mass spectrometry-based assay in diagnostic use, it should be compared to other available assays by analyzing patient samples. Concentrations of the analyte in the samples used for the method's comparison should be evenly distributed over the measurement range and should overlap the physiologically relevant range and the clinical decision levels. The use of patient samples is preferred (compared to analyte-supplemented matrix) because it allows to identify potential problems of the assay, especially those related to endogenous and exogenous metabolites and the matrix effects. In cases when variances of the methods under comparison are well defined, the agreement between the methods should be assessed with a Deming regression (Cornbleet & Gochman, 1979). In cases when the variances of the methods are not known, the evaluation should be performed with a Passing–Bablok regression (Passing & Bablok, 1983). The Passing–Bablok regression does not require knowledge of the type of the distribution of the results and the measurement errors. The agreement between the methods is considered acceptable when the slope of the regression line is statistically not different from 1.0, the intercept is statistically not different from 0, and the correlation coefficient is greater than 0.95. Another way of visual evaluation of the agreement between the methods is through use of Bland–Altman approach (Altman & Bland, 1983). The Bland–Altman plot provides better representation of the agreement between the methods than linear regression, because correlation coefficient in the linear regression is dependent not only on the degree of the agreement between the measured values but also on the range of the concentrations in the samples included in the comparison. The Bland–Altman plot is independent of this flaw and is constructed as relationship between the mean of the two measurements as the abscissa value, and the difference between the two values as the ordinate.

    Constant or proportional biases, poor precision, or interfering substances that affect one or both methods might cause disagreement between the methods. In cases when the methods do not agree, the same samples could be re-tested with the evaluated method (to reduce the imprecision of the measurements); and the means of two measurements should be compared with the comparative method. If the original comparative method is suspected of causing the disagreement, then an alternative comparative method must be selected. To further investigate the problem, discrepant samples could be re-tested with more efficient chromatographic separation to evaluate whether interfering peaks were present under the peaks of the analyte or the internal standard (IS). In some cases, agreement among methods cannot be reached; for example, it is not unusual for mass spectrometric methods to produce lower values compared to immunoassay (IA)-based methods; often as a result of the cross-reactivity in the IAs (Taieb et al., 2003).

    When working with biological samples, handling and storage conditions are important aspects of the analysis. Important factors that must be determined during the method development are (a) type of collection tubes that should be used for the samples, (b) sample types suitable for the test (e.g., serum, plasma, urine, cerebrospinal fluid, saliva), (c) acceptable anticoagulants and preservatives, and (d) stability of the analyte during storage. Samples that were improperly collected or stored might be compromised and this may lead to erroneous test results.

    Assessment of Analytical Specificity

    In past, IAs were the most common methodology used for analyzing endogenous steroids. In many cases, IAs have adequate sensitivity for measurement of clinically relevant concentrations of steroids, while not sufficiently specific (Taieb et al., 2002; Lee et al., 2006). Well-developed and validated tandem mass spectrometry-based methods are more specific than most, if not all, other analytical techniques. One advantage of tandem mass spectrometry-based methods is related to the ability of simultaneous analysis of multiple analytes and the assessment of the specificity of the analysis in every sample. Assessment of the specificity could be accomplished through monitoring multiple mass transitions of analytes, or an evaluation of full product ion mass spectra of the precursor ions. In Kushnir et al. (2005), various approaches and the acceptability criteria to assess specificity of analysis in methods using tandem mass spectrometry detection were evaluated.

    Compounds that potentially interfere with tandem mass spectrometry detection could be isobars of the target analyte [e.g., cortisone/prednisolone, cortisol/tetrahydroprednisolone, testosterone (Te)/dehydroepiandrosterone (DHEA), morphine/norcodeine, cortisol/fenofibrate, MMA/succinic acid]; isotopic ions of the molecules with lower m/z, and adducts of the impurities, which are isobaric to the analyte or the IS.

    Common types of interference among steroids are A + 2 isotopes [e.g., cortisone and cortisol; estrone (E1) and estradiol (E2); prednisone and prednisolone; prednisolone and cortisol; cortisol and fenofibrate; Te and DHT]. One pitfall in the target analysis with LC–MS/MS is associated with inappropriate selection of isotopically labeled ISs (Urry et al., 1996). If isotopically labeled IS has less than three isotope atoms, natural-isotope ions of the target analyte would contribute to the intensity of the molecular ion of the IS. This type of interference results in a reduction of the linear dynamic range of an assay; this effect is more pronounced for analytes with higher molecular weight. Interference in the methods using MS/MS detection may also be caused by the conjugates of the analytes of interest (e.g., glucuronide, sulfate). Exposure of the conjugates to elevated temperature in the ion source could result in the hydrolysis of the conjugates and formation of the free analyte that would artificially elevate the measured concentration (in cases when peaks of the conjugates and free analytes are not chromatographically resolved).

    A substance interferes with an analyte of interest if their peaks co-elute and they have the same characteristic precursor and product ions (Kushnir et al., 2001, 2003; Meikle et al., 2003). The degree of interference in MS-based methods is determined by factors such as efficiency of ionization, relative intensity of the parent and product ions, relative concentration of the analytes. The molar response of the instrument to the interfering compound may be small compared to the target compound, but when the interfering substance is present in samples at high concentration, the interference may be significant. A list of potentially interfering substances for newly developed methods could be compiled by a search of the mass spectral databases for isobars of the target analytes; molecules, adducts of which might be isobars; or molecules that might fragment in the ion source and produce isobars of the target analytes. The experiments to evaluate the interference potential must include isobars of the target analyte, compounds structurally related to the analytes, common endogenous sample constituents, drugs that might be administered at the targeted physiological condition, commonly prescribed drugs, and known drug metabolites (Meikle et al., 2003; Kushnir et al., 2005). Mass transitions used in a method should be extensively evaluated to assure that only the target analyte is measured and that the method does not suffer from interferences. As part of the evaluation of the specificity and robustness of the methods, a large number of patient samples should be analyzed, and results should be evaluated for signs of the interference.

    Other types of commonly recognized interferences that affect LC–MS methods are impurities introduced with solvents and reagents, and ion suppression (Annesley, 2003; Matuszewski, Constanzer, & Chavez-Eng, 2003). Some of the impurities present in the solvents and reagents might cause changes in the ionization efficiency, chemical noise, loss of sensitivity, and even degradation of the analytes. Therefore, high-purity reagents and solvents are preferred for all mass spectrometric methods (Annesley, 2007).

    Several methods could be used to determine whether sample matrix affects performance of a method (Bonfiglio et al., 1999; King et al., 2000; Matuszewski, Constanzer, & Chavez-Eng, 2003; Kaufmann and Butcher, 2005). Common approach to evaluate ion suppression is with post-column infusion of the target analyte in the effluent from the chromatographic column, while analyzing patient samples (Kaufmann & Butcher, 2005). Negative peaks on a chromatogram at the retention time of the analyte of interest are signs of the ion suppression. Another approach to evaluate the matrix effects is by direct comparison of the signal intensity of the analyte measured in samples without matrix and in different matrices (Annesley, 2003; Kaufmann & Butcher, 2005).

    Quality Management

    To maintain high-quality diagnostic testing it is necessary to monitor the entire process of the analysis including the pre-analytical, analytical, and post-analytical stages. This could be achieved by quality assurance (QA) and QC programs (Westgard & Klee, 2005). Goals of the QA are to establish policies and practices necessary for a reliable performance of the entire process of testing. The QC represents a set of techniques and procedures that aimed to monitor assays' performance and to recognize potential problems related to the analytical and post-analytical phases of testing. Major tasks for a QC program are to provide information on methods' performance, and to assure acceptable quality of the results. To assess performance of a method, QC samples with known concentration of the analyte are analyzed with every set of patient samples. The acceptability of the results of QC samples should always be evaluated prior to an acceptance of the results of the clinical samples (Westgard & Klee, 2005).

    Reference Intervals

    Results of the diagnostic testing are not clinically useful unless they are related to appropriate reference values. The reference values could be concentrations of biomarkers observed in samples from a general population of healthy people, people with a specific disease, or earlier test results of the same individual. The reference values help to establish the basis for an accurate clinical interpretation of the test results and to determine the clinical decision levels (Alström et al., 1993; Gräsbeck, 2004; Ritchie & Palomaki, 2004; Solberg, 2004; Sikaris et al., 2005; Solberg, 2005). Population-based reference values are generally obtained from a well-defined group of individuals, which resemble the targeted population in all aspects, except the presence of the disease or condition for detection of which the testing is performed. The conditions under which samples for establishing reference values are obtained, collected, and processed must be analogous to the targeted clinical application. All the testing should be performed with standardized methods and appropriate QC. In cases when the distribution of the concentrations is Gaussian, reference intervals are usually determined using parametric method that is based on estimates of the population parameters, the mean, and the standard deviation. If the reference distribution is not Gaussian and cannot be transformed to the Gaussian (as often the case for endogenous biomarkers), non-parametric or robust statistics methods can be used (Solberg, 2004). Non-parametric methods are generally considered as the gold standard for reference interval determination because they make no assumptions about the functional form of the probability distribution in the reference population. Number of samples required for establishing reference intervals needs to be greater than 100, and larger number of samples is recommended if the distribution of the reference values is skewed (Linnet, 1987). If the reference values are skewed, it is possible to transform data mathematically to obtain distribution that approximates Gaussian, calculate the 2.5 and 97.5 percentiles, and then transform the values back to the original measurement scale. In cases when the distribution is skewed to the sub-zero values, one-sided reference intervals are used.

    Validated and well-characterized mass spectrometry-based methods are often more specific compared to other commonly used techniques. Because of this, the implementation of these methods in diagnostic practice requires establishing new reference intervals, which in many cases are different compared to the reference intervals used with older techniques (Herold & Fitzgerald, 2003; Taieb et al., 2003; Kushnir et al., 2006a,b; Sikaris et al., 2005). Reference intervals could be specific to age, gender, ethnic origin, physiological state, collection time, and need to be established prior to the use of a method in clinical diagnostics. Other important values that should be determined prior to the diagnostic use of a method are the clinical decision levels. The clinical decision levels are concentrations of an analyte at which the diagnosis could be established or management of patients needs to be changed. The clinical decision levels are based on the reference intervals, clinical information, and epidemiological studies.

    The usefulness of diagnostic biomarkers depends on the ability to distinguish samples of people who have disease, from samples of healthy individuals. Some of the factors that affect the usefulness of the biomarkers are natural variation of the concentrations within and between individuals, and the magnitude of the difference of concentrations in populations with and without the disease. The diagnostic tests have better clinical value for the markers, for which there is no (or minimal) overlap in the reference intervals between healthy individuals and the affected patients. In cases when the distributions partially overlap, other factors (clinical symptoms, complementary diagnostic testing, medical risk related to a false-positive result relative to the risk of a false-negative result, etc.) must be taken into consideration. In many cases an initial diagnosis is established based on the symptoms, patient presentation, and history; and laboratory testing is performed for confirming the diagnosis (Carel & Léger, 2008).

    The utility of diagnostic tests can be evaluated with receiver-operating characteristic (ROC) curves (Beck & Shultz, 1986; Lasko et al., 2005). The ROC curves are based on the clinical sensitivity and the clinical specificity of biomarkers, and plotted as dependence of the clinical sensitivity versus clinical specificity. The clinical sensitivity is the probability that a marker correctly predicts the presence of a condition in individuals having the disease, and clinical specificity is the probability that the measured value correctly predicts that the condition is not present in healthy individuals. The area under the ROC curve represents the probability that a test would correctly distinguish between the affected and non-affected individuals. A biomarker with no predictive ability produces a curve that follows the diagonal of the grid [area under the curve (AUC) = 0.5]. For a biomarker with perfect sensitivity and specificity, the ROC curve passes through the point (0, 1) on the graph, and the AUC is 1.0. The closer the ROC curve comes to this ideal point, the better is the discriminating ability of a biomarker.


    Biosynthesis of Steroids and Related Diseases

    Steroid hormones are synthesized from cholesterol (Fig. 5) through a series of enzyme-controlled reactions (Miller 1988; Payne & Hales, 2004). The rate-limiting step in the biosynthesis is the conversion of cholesterol to pregnenolone (Pregn). Biosynthesis of steroids occurs in the mitochondria and smooth endoplasmic reticulum of cells, and all the conversions between steroids in the pathway are controlled by enzymes (Payne & Hales, 2004). The enzymes involved in this pathway belong to two classes, the P450 (membrane-bound proteins CYP11, CYP17, CYP19, CYP21, etc.); and the hydroxysteroid dehydrogenases (HSDs, short-chain alcohol dehydrogenase reductases, 3βHSD, 11βHSD, 17βHSD, etc.). The difference between P450 enzymes and HSD's is that a single gene produces each of the P450 enzymes, whereas all the isoforms of the HSDs are products of separate genes, which are expressed in a cell- and tissue-specific manner (Payne & Hales, 2004). The isoforms and isoenzymes vary in tissue distribution, sub-cellular localization, catalytic activity, substrate, and co-factor specificity. Because of the difference in enzyme distribution among tissues, steroids have different primary sites of production that result in the ability for biosynthesis of many of the steroids in more than one tissue.

    Figure 5.

    Biosynthesis of steroid hormones of cholesterol pathway.

    The physiologic effect of steroid hormones is initiated when they enter target cells and bind to steroid receptors, which act as transcriptional activators of steroid-responsive genes. Depending on the type of receptor to which they bind, steroid hormones are classified into five groups: mineralocorticoids, progestines, glucocorticoids, androgens, and estrogens. A major fraction of steroids after secretion from the cells binds to carrier plasma proteins. Some of these proteins, like albumin, have a low binding affinity (e.g., for Te, Kd = 2.5 × 10−5), whereas other carrier proteins, such as sex hormone binding globulin (SHBG), have high binding affinity (e.g., for Te, Kd = 1 × 10−9). Binding affinity affects the steroids' half-life, availability for the target tissues, and the elimination rate. Free and non-specifically bound steroids are physiologically available, whereas steroids bound to the binding globulins have more limited physiological activity. In past protein-bound steroid hormones were considered inactive, but recently it was shown that SHBG-bound androgens and estrogens play important role in the development and maturation of reproductive organs (Hammes et al., 2005; Kahn et al., 2008).

    To assess concentrations of different fractions of steroids, for diagnostic needs it is important to be able to determine not only the total concentration, but also the free and non-specifically bound fractions. The free fraction of steroids can be measured with dialysis or ultrafiltration of the samples (Sinha-Hikim et al., 1998; Van Uytfanghe et al., 2004), followed by instrumental analysis; alternatively it could be calculated using methods based on the dissociation constants and the law of mass action (Vermeulen et al., 1999; Ly & Handelsman, 2005).

    High-Sensitivity Methods for Analysis of Endogenous Steroids

    Adrenal Steroids

    The adrenal cortex is the main site for biosynthesis of a major fraction of the steroid hormones; especially important are glucocorticoids, mineralocorticoids, and DHEA sulfate, a precursor of sex steroids. The major sites of production of the sex hormones are the ovaries in women and testes in men, although the adrenal cortex produces relatively small amounts of sex steroids. Excessive biosynthesis of sex hormones in the adrenal cortex is characteristic of a group of diseases known as congenital adrenal hyperplasia (CAH). Depending on the specific type of mutation that caused the disease, symptoms of CAH could range from life-threatening conditions like salt wasting crisis (a condition that results in a sudden death if not treated timely) and adrenal insufficiency, to hermaphroditism, precocious puberty, infertility, and hypo- and hypertension.

    Congenital adrenal hyperplasia (CAH) is caused by a deficiency in one of the four enzymes (Fig. 5) required for the biosynthesis of glucocorticoids, mineralocorticoids, and sex hormones (Summers, Herold, & Seely, 1996; Moran, Potter, & Reyna, 1999; Pang, 2001; Grumbach, Hughes, & Conte, 2002; Lacey et al., 2004). Of all forms of CAH, the most common type is 21-hydroxylase deficiency, which is the result of mutations in gene CYP21. The other cases of CAH are caused by a deficiency of one of the enzymes: 11β-hydroxylase, 17β-hydroxylase, or 3βHSD. The evaluation of concentration of the intermediates of the pathway [Pregn, 17-hydroxyprogesterone (17OHP), 17-hydroxypregnenolone (17OHPregn), and 11-deoxycortisol (11DC)] allows the detection of these four enzyme defects. If a person is deficient of one of these enzymes, then precursors of the deficient enzyme accumulate and this result in a decreased concentration of the products below the blockage, whereas excess precursors lead to an overproduction of other steroids in the adjacent branches of the pathway (Fig. 5). The testing for steroids also allows to differentiate late onset CAH and polycystic ovary syndrome (PCOS); conditions that have many of the clinical features in common (Asuncion et al., 2000).

    Because majority of the steroids are normally present in blood at low concentrations in a complex sample matrix, the analytical methods for their measurement must be sensitive and specific. Until recently IAs were the most common methodology used for analyzing 11DC, Pregn, and 17OHPregn. Poor clinical sensitivity of the IAs for 17OHPregn and Pregn was observed in number of studies (Wong et al., 1992; Speiser, 2004), and the need for development of more specific methods for the analysis of these steroids was suggested.

    The difficulties in analysis of 11DC, Pregn, and 17OHPregn with LC–MS/MS methods in the past were related to the poor ionizability of these molecules, and their non-specific collision-induced dissociation (CID) mass spectra. The approach that we used to enhance the sensitivity of the mass spectrometric detection and to improve the fragmentation patterns of these steroids was through incorporation in the structure of a functional group that promotes ionization (Higashi & Shimada, 2004; Kushnir et al., 2006a, 2008). The reaction of choice for keto-steroids in this method was oximation with hydroxylamine (1). Hydroxylamine reacts with keto-groups of steroids to form oxime derivatives. The derivatization enhances the ESI of the keto-steroids through an improvement of the solvation of the molecules, and an introduction in the structure of a tertiary amine moiety. Especially significant enhancement in the sensitivity was observed for 17OHPregn and Pregn, this is likely related to the position of the double bond in the structure of these molecules.

    1Thumbnail image of

    Chromatograms of Pregn, 17OHPregn, 17OHP, and 11DC extracted from a human serum sample are shown in Figure 6, and the method's performance characteristics are listed in Table 2.

    Figure 6.

    MRM chromatograms (two characteristic mass transitions per compound) of steroids extracted from human serum samples: (A) cortisol, (B) cortisone, (C) 11-deoxycortisol, (D) pregnenolone, (E) 17 hydroxypregnenolone, (F) 17 hydroxyprogesterone, (G) estrone, (H) estradiol, (I) estriol, and (J) testosterone.

    Table 2. Summary of the methods' performance characteristicsThumbnail image of

    To diagnose abnormalities in the biosynthesis of adrenal steroids, accurate reference intervals for children and adults are required. Reference intervals were determined for healthy adults and children of different Tanner stages (TS) and ages (Kushnir et al., 2006a,b; Meikle et al., 2007).

    Reference intervals for Pregn, 17OHPregn, 17OHP, and 11DC were found to be age- and gender-specific, with different trends observed between boys and girls in relationship to the stage of the physiological development. In adults, the median observed concentrations of Pregn and 17OHP were comparable between men and women; whereas the median concentrations of 17OHPregn and 11DC were up to twice as high in men compared to women.


    Cortisol plays an important role in human physiology and is commonly analyzed to diagnose Cushing's disease and adrenal insufficiency. An abnormal metabolism of cortisol might also be related to insulin resistance, obesity, hypertension, glucose intolerance, type-2 diabetes mellitus, and apparent mineralocorticoid excess syndrome (Edwards et al., 1983; Andrews & Walker, 1999; Stewart et al., 1999; Reinke, 2000). Cortisol is produced and secreted by the adrenal gland, and its concentration in tissues is controlled by the relative activity of 11βHSD type I and II, which are responsible for the inter-conversion between cortisol and cortisone (Fig. 5), and maintenance of the physiologically available cortisol (Edwards et al., 1983; Stewart, 1996). The relative activity of the 11βHSD type I and II enzymes can be evaluated through a simultaneous measurement of cortisol and cortisone in blood (Taylor, Machacek, & Singh, 2002; Kushnir et al., 2004).

    Method for the simultaneous measurement of cortisol and cortisone (Kushnir et al., 2004) used a soft-ionization technique, APPI. The APPI ion source improved the signal-to-noise ratio for cortisol and cortisone in this method by a factor of 3, compared to APCI, a technique that was primarily used in earlier published LC–MS methods for the measurement of cortisol. The chromatograms of cortisol and cortisone extracted from human serum sample analyzed with this method are shown in Figure 6, and the method's performance characteristics are listed in Table 2. No statistically significant difference between men and women were observed in the reference intervals for cortisol, cortisone, and the cortisone/cortisol ratio (Kushnir et al., 2004).

    Cortisone is an inactive metabolite of cortisol, and its measurement alone currently does not have clinical utility; however, the cortisone/cortisol ratio represents the activity of the enzymes 11βHSD type I and II. A deficiency of the type I enzyme causes an inability to convert cortisone to cortisol. This enzyme is also required for the conversion of the synthetic steroid prednisone to active glucocorticoid prednisolone; because of this, people with 11βHSD type I deficiency do not respond to treatment with prednisone. The deficiency of the type II enzyme is the cause of a congenital form of hypertension (apparent mineralocorticoid excess syndrome), which manifest as an inability to convert cortisol to cortisone.

    Changes in the cortisol/cortisone ratio were found to be associated with various physiological states and diseases (Stewart, 1996; Stewart et al., 1999). A study of the relationship between the cortisol/cortisone ratio and cardiovascular disease (Anderson et al., 2007) demonstrated an association between the cortisone/cortisol ratio and the outcome of a myocardial infarction in patients with type II diabetes (P = 0.01).


    Testosterone (Te) is the major androgen in males (Fig. 5), and predominant bioactive androgen in women. Te plays an important role in the development and maintenance of male phenotype, and it is also important for many non-gender-specific functions. An accurate measurement of concentrations of Te in women and children is important to assist with the diagnosis and follow-up of steroid hormone-related diseases. In women, Te is measured as part of the investigation of alopecia, acne, hirsutism, osteoporosis, libido disorders, detection of androgen-secreting tumors, late-onset CAH, PCOS, and other endocrine and reproductive diseases (Haymond & Gronowski, 2005). In children, Te is analyzed for gender assignment of infants with ambiguous genitalia, follow-up of children with precocious or delayed puberty, and CAH (Summers, Herold, & Seely, 1996; Grumbach, Hughes, & Conte, 2002). In men, measurement of low concentrations of Te is needed during the treatment of hypogonadism, and to monitor the androgen-suppression therapy during prostate cancer treatment.

    Prior to 2005 IAs were the predominant methodology to analyze Te in samples from all groups of population. IAs for Te have an acceptable performance for concentrations characteristic of healthy men, but suffer from a lack of specificity at concentrations, characteristic of women and children (Herold & Fitzgerald, 2003; Taieb et al., 2003; Matsumoto & Bremner, 2004). The concentrations of Te in women and children are 90–95% lower than in men. An evaluation of 10 commercially available automated IAs (Taieb et al., 2003) for Te showed that none of these assays had an adequate specificity to measure Te in samples of women. Because of the poor agreement among different IAs, the follow-up of patients over time was difficult and required use of the same laboratories/methods for follow-up testing. The studies (Taieb et al., 2003; Matsumoto & Bremner, 2004; Wang et al., 2004) suggested the need for the development of high-sensitivity/high-specificity methods for the measurement of Te in samples from women and children. The need to standardize measurements of Te has been emphasized in a recently published statement of The Endocrine Society (Rosner et al., 2007).

    As for other keto-steroids (Kushnir et al., 2006b), sensitivity of detection of Te can be enhanced with the use of oxime derivative (Kushnir et al., 2006a). A chromatogram of Te extracted from the serum of a healthy woman is shown in Figure 6, and the method's performance characteristics are listed in Table 2. Considering the analytical sensitivity and small volume of sample used (100 µL), this method is one of the most sensitive of the published LC–MS/MS methods for the measurement of Te (Cawood et al., 2005; Vicente et al., 2006; Borrey et al., 2007; Thienpont et al., 2008). To address the issue of poor performance of commercial IAs for Te, it was suggested to use the isotope-dilution LC–MS/MS method for the routine measurement of Te in samples from women and children (Rosner et al., 2007). Currently, a majority of the testing for Te in samples from women and children in commercial clinical laboratories in developed countries is performed with LC–MS/MS methods (Thienpont et al., 2008). To enable the use of this method in clinical diagnostic practice, reference intervals were established using LC–MS/MS method with samples from healthy volunteers for total Te (free and bound to carrier proteins), bioavailable Te (free and bound to albumin), and free Te in the serum of women, men, and children (Kushnir et al., 2006a).

    The method (Kushnir et al., 2006a) was used in epidemiological studies of Te-replacement therapy in HIV-infected women (Choi et al., 2005) and healthy post-menopausal women (Singh et al., 2006). It has been reported (Coodley & Coodley, 1997; Grinspoon et al., 1997; Kong & Edmonds, 2002) that an increase of Te concentrations into the upper end of the normal (for female) range was associated with improvements in the muscle strength and well-being of women. The objective of the first study (Choi et al., 2005) was to assess whether Te re-placement therapy increases weight and muscle strength in HIV-infected women with an androgen deficiency. At the doses of Te used in the study, the Te concentration in the blood of the HIV-infected women was increased to upper-normal (for females) levels, but increased Te concentrations did not significantly increase the patients' body weight or muscle strength. The objective of the second study (Singh et al., 2006) was to determine the time-course profile of serum Te concentrations in post-menopausal women during treatment with different doses of Te gel and to assess whether Te treatment affects endogenous concentrations of E2. The Te supplementation increased the concentrations of Te in blood to the targeted concentrations, whereas did not affect the biosynthesis of E2 (although Te is immediate precursor of E2).


    Estrogens have their highest biological activity in the 17β-hydroxy configuration, and the reductive function of 17βHSD is essential for their biosynthesis (Fig. 5). Eleven 17HSD iso-enzymes have been identified, which participate in the inter-conversion between E1 and E2 in various tissues. The 17HSDs differ in tissue distribution, specificity, sub-cellular localization, and mechanism of regulation (Payne & Hales, 2004). The inter-conversion of estrogens in different tissues is responsible for their tissue-specific action and contributes to the concentration of E2 in circulating blood.

    In females, E2 is responsible for the development and maintenance of secondary gender characteristics and reproductive function. Low concentrations of E2 in women are associated with disturbed puberty, oligoamenorrhea, estrogen deficiency, and menopause (Naessen et al., 1992; Iughetti et al., 2000; Grumbach, Hughes, & Conte, 2002; Writing Group for the Women's Health Initiative, 2002; Kol, 2003; Haymond & Gronowski, 2005; Naessen & Rodriguez-Macias, 2006). Recent studies suggest that low concentrations of E2 in both genders are associated with osteoporosis, cardiovascular, cognitive, and neurological diseases (Green et al., 2000; Dubey and Jackson, 2001; Napoli et al., 2005; Singh, Dykens, & Simpkins, 2006). An accurate measurement of low concentrations of estrogens is important for the diagnoses of the above conditions, for clinical and epidemiological studies.

    Challenges in the measurement of estrogens in blood of post-menopausal women, men, and children are related to the low physiologic concentrations and the presence in the samples of endogenous and exogenous interfering substances. The analysis of estrogens in biological samples is commonly performed with IAs (Leung et al., 1997; Sinicco et al., 2000; Lee et al., 2006). Recently, it was shown that commercial IAs for E2 are inaccurate to measure low endogenous concentrations (Lee et al., 2006). Earlier published LC–MS/MS methods for estrogens (Anari et al., 2002; Nelson et al., 2004) also lack the sensitivity and the specificity needed to measure low concentrations of estrogens. One of the approaches for enhancing the sensitivity was through the use of negative ion mode ESI on a more sensitive instrument API5000 (Applied Biosystems/Sciex) (Guo et al., 2008). Additional enhancement in the sensitivity and the specificity for measurement of estrogens was achieved through the use of the derivatization with the use of the derivatization with dansyl chloride—an amine-containing sulfonyl halide (2), in combination with a multidimensional chromatographic separation (Kushnir et al., 2008). Advantages of analyzing estrogens as dansyl derivatives include a significant gain in the sensitivity (through the introduction of a tertiary amine in the structure), mild reaction conditions, and quantitative yield of the products.

    2Thumbnail image of

    As the sensitivity of the detection improved through the use of the derivatization, the specificity of the analysis was enhanced with a two-dimensional (2D) chromatographic separation. The best selectivity for the 2D separation was observed using separation on a C1 column for the first dimension, and a phenyl column (retention based on π–π interactions) as the second dimension. The derivatization and the 2D separation used in this method produced synergistic improvement and enhanced the sensitivity and the specificity of the analysis (Kushnir et al., 2008). Chromatograms of E1, E2, and estriol (E3) extracted from a human serum sample are shown in Figure 6, and that method's performance characteristics are listed in Table 2. Method comparisons for E1 and E2 showed good agreement with LC–MS/MS assays that were performed at another clinical laboratory. Comparison with commercial IAs for E1 and E2 showed a discrepancy between the methods that was likely related to the cross-reactivity of the antibodies used in the IAs (Kushnir et al., 2008).

    Because sensitive and specific methods to measure estrogens were not available in the past, limited information has been published on pediatric reference intervals of estrogens (Klein et al., 1994; Norjavaara, Ankarberg, & Albertsson-Wikland, 1996). LC–MS/MS method (Kushnir et al., 2008) was used to establish the reference intervals of estrogens in children and adults (Kushnir et al., 2008).

    Considering the low concentrations of Te and estrogens in children and elderly adults, high-sensitivity/high-specificity methods are required in the clinical and epidemiological studies that involve measurements of sex steroids in these groups of population. The methods for estrogens (Kushnir et al., 2008) and Te (Kushnir et al., 2006a) were used in an epidemiological study (Meier et al., 2008) of the relationship between sex steroids and bone health in elderly men. In that study it was found that in men over the age of 60, low serum Te concentrations were associated with an increased risk of osteoporotic fractures. These results were in agreement with observation of Mellström et al. (2006), but contrary to earlier studies (Green & Simpkins, 2000; Goderie-Plomp et al., 2004), where the concentration of E2 was linked to the osteoporotic fractures. The difference in the findings is likely related to the greater specificity of the LC–MS/MS methods used in this study, compared to the IAs, used in all earlier studies. These observations suggest that measurement of sex hormones in blood might provide useful clinical information to assess the fracture risk in elderly men. Considering the low concentrations of Te and estrogens in children and elderly adults, high-sensitivity/high-specificity methods are required in the routine diagnostic use for the measurement of sex steroids in these groups of population.

    Liquid chromatography–tandem mass spectrometry (LC–MS/MS) methods to analyze endogenous steroids described in this review have high sensitivity and high specificity. In part, this is related to the inherent specificity of tandem mass spectrometry, but the sensitivity and the specificity were also enhanced through an exploitation of the chemical properties of the measured analytes. Despite the high specificity of MS/MS detection, this technique is not immune from all the interferences; because of this LC–MS/MS methods used for clinical diagnostic must utilize strategies aimed on the assessment of the specificity of analysis and assuring quality of the results in every sample (Kushnir et al., 2005).

    Ovarian Steroidogenesis

    Steroids in Ovarian Follicles of Healthy Women

    In women of fertile age, ovarian follicles are the site of biosynthesis for a major fraction of the estrogens and androgens present in circulation. Follicular steroids are secreted by granulosa and theca cells under the control of gonadotropins, and the hormonal microenvironment affects the development of the follicles and viability of the oocytes (Speroff, Glass, & Kase, 1999). In normal ovulatory cycles, a higher concentration of E2 in follicular fluid (FF) is associated with healthy follicles that contain oocytes capable of meiosis, and higher concentrations of androgens indicative of atretic changes (degeneration and subsequent resorption of the follicles) (Greenwald & Roy, 1994; Speroff, Glass, & Kase, 1999). The majority of earlier studies of steroids in ovarian follicles (McNatty et al., 1979; Dehennin, Jondet, & Scholler, 1987a,b; Itskovitz et al., 1991) were undertaken to obtain prognostic parameters for the likelihood of a successful implantation during in vitro fertilization (IVF); however, the relationship between steroid hormones and follicular development in regularly menstruating (RM) healthy women was not well studied. In part, this is related to the very small sample size of FF that could be obtained from follicles of healthy women during the follicular stage of menstrual cycle, and absence of sensitive and specific methods for simultaneous quantitation of multiple steroids in small samples. Using LC–MS/MS methods (Kushnir et al., 2004, 2006a,b, 2008), concentrations of endogenous steroids and the patterns of distribution of the steroids in FF were determined in healthy women and in women after ovarian stimulation for IVF treatment (Kushnir et al., 2009). The difference of this study from earlier published studies (Dehennin, 1990; De Sutter et al., 1991; Andersen, 1993; Smitz et al., 2007) was in the use of more sensitive and specific methods, which allowed simultaneous measurement of a large number of bioactive steroids and the intermediates of their biosynthesis in samples from individual follicles. The majority of earlier studies of ovarian steroidogenesis used IAs for determining concentrations, and each study focused on the effect of one or few of the steroids on the follicular development. The analysis of a large number of steroids in FF during early stages of the follicular development was not possible in past because IA would require at least a few milliliters of FF for such measurements, sample size unrealistic for the follicles during early developmental stages. One of the pitfalls associated with the use of IAs for the analysis of steroids in FF samples is related to the significantly different distribution of the concentrations of the steroids in FF compared to serum. The difference in the sample matrix and substantial difference in the concentrations of the steroids and their metabolites in FF compared to serum results in changes in the specificity of the IAs and cross-reactivity than not observed in serum samples (for which IAs are developed and validated). The changes in the specificity and types of the interferences would not be relevant to well-developed and validated mass spectrometry-based methods. The use of highly sensitive and specific LC–MS/MS methods (Kushnir et al., 2004, 2006a,b, 2008) and the ability to simultaneously measure multiple steroids allowed to analyze 15 steroids in 40 µL of FF sample. FF samples from RM women and women undergoing IVF treatment were analyzed in using the above methods (Kushnir et al., 2009) and concentrations of 15 steroids were measured in individual samples. Concentrations of androgens, 17OHP, and estrogens in the ovarian FF of RM women were 200- to 1,000-fold greater than in serum. Compared to RM women, women who underwent ovarian stimulation had significantly higher concentrations of E2, Pregn, 17OHP, and cortisol; significantly higher ratios of cortisol/cortisone, E2/E1, and E2/Te; and significantly lower concentrations of 17OHPregn, 11DC, cortisone, DHEA, androstenedione (A4), and Te. The pie diagram with distribution of the median concentrations of the steroids in FF of RM women is shown in Figure 7 (Kushnir et al., 2009). These data will be useful to better understand ovarian physiology and the effects of ovarian stimulation on steoridogenesis in FF.

    Figure 7.

    Pie diagrams of the distribution of median concentrations of steroids in FF of (A) RM women and (B) women diagnosed with PCOS (reprinted from Kushnir et al., 2009, with permission from Clinical Chemistry, copyright 2009).

    PCOS and Steroids in Follicular Fluid of PCOS Patients

    Polycystic ovary syndrome (PCOS) is one of the most common endocrine disorders that affect 4–7% of reproductive-age women (Stein & Leventhal, 1935; Clayton et al., 1992; Franks, 1995). In PCOS patients, the chronic absence of ovulations result in the accumulation in the ovaries of a large number of atretic follicles, which produce an excess of androgens. In addition to reproductive abnormalities and hyperandrogenism, symptoms that are characteristic of PCOS include obesity, hyperinsulinemia, type II diabetes, and dyslipidemia. Among the PCOS patients there is also high incidence of cardiovascular disease, and greater predisposition to endometrial and breast cancers (Clayton et al., 1992; Dahlgren et al., 1992; Franks, 1995; Talbott et al., 1995). The etiology and the mechanisms that underlie PCOS are not fully understood; however, it is known that an imbalance of insulin, abnormalities in the enzymes involved in biotransformation between steroid hormones, and genetic predisposition play a role in PCOS (Legro et al., 1998; Urbanek et al., 1999).

    The accumulation of large number of follicles in the ovaries, and an excess of androgens are characteristic, but not specific, markers of PCOS (Clayton et al., 1992; Franks, 1995). Because of the absence of specific markers, PCOS is considered a diagnosis of exclusion, meaning that the diagnosis is made after exclusion of the presence of all other diseases that cause similar symptoms. It is a common practice to base PCOS diagnosis on a patient's history, physical examination, and semi-specific laboratory tests (e.g., LH/FSH ratio, free and total androgens) (Luppa et al., 1995; Koskinen et al., 1996; Turhan et al., 1999; The Rotterdam ESHRE/ASRM-Sponsored PCOS Consensus Workshop Group, 2004).

    In an attempt to identify potential biomarkers of PCOS, steroid profiles in ovarian FF of women diagnosed with PCOS were studied and compared to the profiles observed in FF of RM women. FF samples from 27 women with PCOS and 21 women without PCOS (control group) were used in this study. The FF samples were analyzed with LC–MS/MS methods (Kushnir et al., 2004, 2006a,b, 2008). Figure 7 shows pie diagram of the distribution of the median concentrations of the steroids in FF samples of the control group and women diagnosed with PCOS. Androgens were the dominant class of steroids in FF of the PCOS patients; progestines, pregnenolones, and glucocorticoids were present at lower concentration; and estrogens were the least-abundant class of steroids. The median concentrations of all measured androgens in the PCOS group were 30–50% greater, and the median concentrations of the estrogens were 40–70% lower than in the control group. A significant difference between the PCOS and the control groups was also observed in the ratios of the concentrations of 17OHPregn/Pregn (P < 0.0001) and total estrogens/total androgens (P = 0.028). Figure 8 shows ROC curves for the potential biomarkers that were identified in the study (only markers with AUC >0.75 are shown). The identified differences in concentrations of the steroids, and the ratios of their concentrations in FF samples (representative of enzyme activities), might potentially serve as biomarkers of PCOS.

    Figure 8.

    ROC curves for identified potential biomarkers of PCOS in FF samples.


    This review describes the use of LC–MS/MS for diagnosing endocrine and metabolic diseases. All the methods described in the review were extensively validated to assure their accuracy and the diagnostic utility, and were used to establish reference intervals for the biomarkers in serum samples of adults and children. Since the introduction of tandem mass spectrometry in clinical laboratories, it proved to be one of the most specific analytical techniques available for clinical diagnostics.

    Compared to older techniques, advantages of tandem mass spectrometry include enhanced specificity, ability to simultaneously measure multiple analytes in highly complex biological fluids, and reduced amount of sample needed for analysis, thereby enhancing diagnostic capabilities and enabling better patient care. The effectiveness of the diagnostic testing also depends on the reliability of the methods, and the knowledge of the association of the concentrations of biomarkers with diseases and pathologic conditions.

    The majority of the LC–MS/MS methods for target analysis in clinical laboratories use triple quadrupole mass spectrometry and MRM mode of acquisition. Sample preparation for LC–MS/MS methods is typically required prior to the analysis as a way of reducing the complexity of the samples, and to make analytes more amenable for chromatographic separation and detection. Common techniques to separate analytes from sample matrix include liquid–liquid extraction, off-line and on-line SPE, and multidimensional chromatographic separation. Analytical derivatization is another technique that proved to be useful in cases when conventional types of ionization and fragmentation patterns of the targeted analytes are not adequate for an acceptable methods' performance. Derivatization changes physical and chemical properties of the molecules, resulting in the changes in the efficiency of the ionization, fragmentation, chromatographic retention, and matrix effects.

    Despite enhanced specificity, tandem mass spectrometry-based methods are not totally free of interference from isobaric molecules (Kushnir et al., 2003; Meikle et al., 2003). To assure adequate quality of the results of diagnostic testing, tandem mass spectrometry-based methods used in clinical laboratories should be thoroughly validated to assure their accuracy and diagnostic utility and should use strategies for the assessment of the specificity of analysis in every sample (Kushnir et al., 2005).

    Because tandem mass spectrometry-based methods are typically more specific than older techniques, reference intervals determined with older methods are often not transferable to the mass spectrometry-based methods, and new reference intervals typically need to be established with LC–MS/MS methods to replace the reference intervals established with older, less specific methodologies.

    High sensitivity and specificity LC–MS/MS methods open new possibilities for their use in early diagnostics and monitoring of diverse group of endocrine and metabolic diseases. The methods described in the review are used for routine diagnostic testing and were used in a number of epidemiological and clinical studies (Kushnir et al., 2002, 2009; Meikle et al., 2003; Choi et al., 2005; Miller et al., 2006; Singh et al., 2006; Anderson et al., 2007; Meier et al., 2008). The data on the steroid concentrations in FF samples and association between the precursors and the products of the pathway of steroids biosynthesis (Kushnir et al., 2009) could be useful for better understanding of the ovarian physiology and pathology, and open new possibility for their use as diagnostic tests.

    Great progress has been made in clinical mass spectrometry and many new developments will come in the future. During the last 15 years mass spectrometry-based methods revolutionized approaches for diagnosing congenital metabolic disorders in newborns (Millington et al., 1989; Chace et al., 1993); current and future developments in mass spectrometry will likely have major impact on the diagnostic practices in endocrinology and other fields of medicine. Future applications of LC–MS in endocrinology will likely focus on the measurement of intermediates, metabolites, and final products of the pathways of hormones biosynthesis and quantitative analysis of peptide and protein hormones.



    androgen-dominant follicles


    analytical measurement range


    atmospheric pressure chemical ionization


    atmospheric pressure photo ionization


    area under the curve




    congenital adrenal hyperplasia


    collision-induced dissociation


    coefficient of variation


    P450 enzyme








    estrogen-dominant follicles


    electrospray ionization










    follicular fluid


    gas chromatography–mass spectrometry


    high-performance liquid chromatography


    hydroxysteroid dehydrogenase




    internal standard


    in vitro fertilization


    liquid chromatography–tandem mass spectrometry


    limit of detection


    limit of quantitation


    methylmalonic acid


    multiple-reaction monitoring


    mass-to-charge ratio


    polycystic ovary syndrome






    regularly menstruating


    receiver-operating characteristic


    succinic acid


    standard deviation


    sex hormone binding globulin


    signal-to-noise ratio


    solid-phase extraction


    Tanner stages




    quality assurance


    quality control


    upper limit of linearity






    We thank Dr. A. Wayne Meikle and Dr. William L. Roberts (Department of Medicine and Pathology, University of Utah, Salt Lake City, UT, USA) and Dr. Tord Naessen (Department of Obstetrics and Gynecology, Uppsala University Hospital, Uppsala, Sweden) for discussions and suggestions. We thank the ARUP Institute for Clinical and Experimental Pathology® (M.M.K. and A.L.R.) and Swedish Research Council (Grant 629-2002-6821, 621-2005-5379, J.B.) for financial support.