Variability in vitamin D assays impairs clinical assessment of vitamin D status


  • J. K. C. Lai,

    Corresponding author
    1. National Centre for Epidemiology and Population Health, The Australian National University, Canberra, Australian Capital Territory
    Search for more papers by this author
  • R. M. Lucas,

    1. National Centre for Epidemiology and Population Health, The Australian National University, Canberra, Australian Capital Territory
    Search for more papers by this author
  • E. Banks,

    1. National Centre for Epidemiology and Population Health, The Australian National University, Canberra, Australian Capital Territory
    Search for more papers by this author
  • A.-L. Ponsonby,

    1. Murdoch Childrens Research Institute, Royal Children's Hospital, Melbourne, Victoria, Australia
    Search for more papers by this author
  • Ausimmune Investigator Group

    Search for more papers by this author
    • The Ausimmune Investigator Group includes: Dr Caron Chapman, Professor Alan Coulthard, A/Professor Keith Dear, Professor Terry Dwyer, Professor Trevor Kilpatrick, A/Professor Robyn Lucas, Professor Tony McMichael, Professor Michael P Pender, Professor Anne-Louise Ponsonby, A/Professor Bruce Taylor, Dr Patricia Valery, Dr Ingrid van der Mei, Dr David Williams.

  • Funding: Funding for the Ausimmune Study was provided by the National Multiple Sclerosis Society of the United States of America, the National Health and Medical Research Council of Australia and Multiple Sclerosis Research Australia. A/Professor Lucas is supported by a Multiple Sclerosis Research Australia Fellowship and the Royal Australasian College of Physicians Cottrell Fellowship. Professor Banks is supported by an Australian National Health and Medical Research Council Senior Research Fellowship.

  • Conflict of interest: None.

Jeffrey K. C. Lai, National Centre for Epidemiology and Population Health, The Australian National University, Canberra, ACT 0200, Australia. Email:


Background:  Measuring serum 25(OH)D concentration is common in clinical practice despite the questionable reliability of assays.

Aims:  The aim of the present study was to examine agreement in 25(OH)D concentrations measured by different assays and laboratories, and consider related clinical implications.

Methods:  Serum samples from 813 participants in the Australian Multicentre Study of Environment and Immune Function (the Ausimmune Study) were assayed for 25(OH)D concentration. Duplicate samples from subsets of subjects were sent to different laboratories, two using DiaSorin Liaison (Laboratory A and B) and one using Liquid Chromatography-Tandem Mass Spectrometry (LC-MS/MS – selected here as the nominal gold standard). Pairwise within-assay (both within-laboratory and between-laboratories) and between-assay agreement was examined using Deming regression and Bland-Altman plots. Common 25(OH)D cut-points for classification of vitamin D deficiency were used to compare the different assays.

Results:  25(OH)D concentrations measured using Liaison were substantially lower at Laboratory A than at Laboratory B (mean bias −11.60 nmol/L, 95% limits of agreement −46.39, 23.18). Both Liaison assays returned much lower 25(OH)D concentrations than LC-MS/MS (mean bias up to −26.05 nmol/L, 95% limits of agreement of −13.21, 65.31). For Laboratory A participants, 46% (355/765) were classified as vitamin D deficient (25(OH)D <50 nmol/L) using Liaison compared with 17% (128/765) using LC-MS/MS. For Laboratory B participants, the respective figures were 36% (76/209) and 20% (41/209). Hence, between 1-in-5 and 1-in-3 participants were misclassified as ‘deficient’.

Conclusion:  Bias and variability in 25(OH)D measurements sufficient to affect significantly clinical decision-making were found both between-laboratories and between-assays. The adoption of common standards to allow assay calibration is required urgently.


The direct consequence of vitamin D deficiency is osteomalacia in adults and rickets in children. The assumed role of vitamin D in bone health has led to several guidelines recommending supplementation for at-risk populations.1,2 Recently, vitamin D insufficiency has also been proposed as a risk factor for other conditions, including various cancers, type 1 and 2 diabetes, hypertension and multiple sclerosis.3

Measurement of serum 25-hydroxyvitamin D (25(OH)D) concentration is widely used in clinical practice to assess vitamin D status and to guide supplementation requirements. Thresholds to define vitamin D sufficiency are debated although in a recent International Osteoporosis Foundation position statement, the majority of Working Group members nominated 75 nmol/L as the appropriate target concentration for older individuals.4 The remaining members opted for a lower target of 50–75 nmol/L.4 Others have argued for even higher thresholds for ‘optimum’ vitamin D status despite the long-term health effects of sustained high concentrations of 25(OH)D being unknown.5 However, the adoption and interpretation of thresholds will remain problematic so long as measurement uncertainty is present.

Several assays are used to measure 25(OH)D concentrations, with the most common in Australia being the DiaSorin Liaison, a commercial immunoassay.6 Liquid Chromatography-Tandem Mass Spectrometry (LC-MS/MS) methods are considered by some to be the gold standard7 and, despite the high costs and limitations for high-volume throughput, are becoming more common in commercial laboratories. The relative strengths of the LC-MS/MS method were noted in a recent UK Food Standards Agency Workshop Consensus, leading to the selection of this assay as the preferred method for measurement of 25(OH)D concentrations in the UK National Diet and Health Survey.8

Limitations of vitamin D assays are well documented, with significant variability in results between assay methods and laboratories noted in the literature.9–11 One small study has previously noted marked variation in both accuracy and precision of vitamin D assays in Australia.12 For the clinician, this calls into question whether results from any given laboratory are adequate to assess a patient's true vitamin D status, and therefore supplementation requirements. Furthermore, if there is considerable variation between assays in determination of 25(OH)D concentration, defining vitamin D status according to a single universal cut-off point may be inappropriate; assay-specific definitions may be required.

Here we investigate the within-assay and between-assay variability of measurements of 25(OH)D concentration using DiaSorin Liaison and LC-MS/MS assays at three different Australian laboratories (two hospital settings and one research laboratory). We highlight the clinical implications of this variability by quantifying how often vitamin D status would be misclassified as deficient, based on commonly used cut-points. Finally, we investigate the additional use of relative concentrations of 25(OH)D, to consider the implications of measurement error in the research setting.


Samples, study participants and assay methods

Serum samples from 813 participants recruited as part of the Australian Multicentre Study of Environment and Immune Function (the Ausimmune Study) were collected during the period January 2004 through July 2007. The Ausimmune Study is a multicenter matched case–control study examining environmental risk factors for central nervous system demyelinating disease onset. Participants were aged 18–59 years, recruited in four specified geographical regions in Australia (Brisbane, Newcastle, Geelong and Tasmania). Ethics approval was obtained from the nine Human Research Ethics Committees of the participating institutions. Further details of the study design are outlined in Lucas et al.13

Venous blood samples of 15 mL were taken from both cases and controls. Serum was stored in 1 mL aliquots at −80°C. Separate aliquots (from the same original sample) were analysed for serum 25(OH)D concentrations at three laboratories using two different assay methods, DiaSorin Liaison TOTAL (Laboratories A and B) and LC-MS/MS (Laboratory C).

DiaSorin Liaison TOTAL immunoassay uses a two-step incubation process with human serum calibrators. The manufacturer claims an analytical range of 10–375 nmol/L, within-run precision (CV) of 2.9–5.5% and total precision (CV) of 6.3–12.9%. Cross-reactivity is 104% to 25(OH)D2 and 100% to 25(OH)D3.

LC-MS/MS determination of 25(OH)D2 and 25(OH)D3 concentrations in human serum samples was carried out using a liquid/liquid radio-isotope dilution assay run on an Applied Biosystems 4000 Q TRAP with solvent delivery from a Shimadzu Prominence HPLC system. The assay being performed at Laboratory C is based on the methodology published by Maunsell et al.14 Partial validation of the assay gave the following precision data from n= 5 individual experiments: 25(OH)D2 QCL Intra-batch precision ranged from 2.6–5.5% CV; QCM 1.6–5.6%; QCH 2.3–5.7% CV. Inter-batch precision was 3.9, 3.5 and 3.6% CV respectively. 25(OH)D3 QCL Intra-batch precision ranged from 4.6–7.8% CV; QCM 1.6–5.6%; QCH 0.8–4.9% CV. Inter batch precision was 6.5%, 2.6% and 2.8% CV respectively. Further details of the method are outlined in Appendix I.

Of the 813 samples, 207 were analysed at both Laboratory A and B, by Liaison, 765 were analysed at Laboratory A (Liaison) and Laboratory C (LC-MS/MS), and 209 were analysed at Laboratory B (Liaison) and Laboratory C (LC-MS/MS).

Data and statistical analysis

Measurement agreement within- and between-assays was investigated using pairwise comparisons of Liaison (Laboratories A and B) and LC-MS/MS (Laboratory C), using LC-MS/MS as the nominal gold standard. Assay agreement was formally assessed using Deming regression and Bland-Altman plots.

In the Deming regression residual variances in both assays were assumed to be equal and 95% confidence intervals (CI) for the slope coefficient and intercept were obtained using bootstrap samples of 1000. For reference, a slope coefficient of 1 and an intercept of 0 would correspond to perfect assay agreement. LC-MS/MS was selected as the nominal gold standard reference on the horizontal axis where possible.

Bland-Altman plots were used to identify mean bias (the average of the differences between measurements obtained from the two compared assays) and 95% limits of agreement between methods. We report the average of the absolute differences between duplicate measures for within-laboratory comparisons.

For each assay we determined the proportion of the sample with vitamin D ‘deficiency’ defined as serum 25(OH)D concentration below targets commonly used in clinical practice, i.e. 50 nmol/L and 75 nmol/L.4 Agreement in classification of results between laboratories was assessed using Cohen's Kappa (agreement: <0.4, poor; 0.4–0.75, fair to good; >0.75, excellent15) and the correlation between laboratories compared using Pearson's correlation coefficient.

Relative vitamin D concentrations were obtained by ordering the data into quartiles, with agreement between assays assessed using the Weighted Kappa coefficient with squared weights. The Weighted Kappa coefficient includes partial agreements when samples are classified in adjacent quartiles. All analyses were undertaken using the R computing package (version 2.5.1; 2007, available at


The mean measurements, standard deviations and correlation of each pairwise comparison are shown in Table 1.

Table 1.  Comparison of mean (SD) 25(OH)D concentrations (nmol/L) from different assays and laboratories
Sample size for comparison bloodsDiaSorin Liaison (Laboratory A)DiaSorin Liaison (Laboratory B)LC-MS/MS (Laboratory C)Pearson correlation coefficient
n= 76554.3 (26.0) 80.4 (30.9)0.77
n= 209 64.6 (30.9)76.2 (30.1)0.86
n= 20753.5 (26.7)65.1 (31.3) 0.83

Within-assay comparison

Differences in serum 25(OH)D concentrations were observed even within duplicate samples assayed at a single laboratory (duplicate samples were not included for the Liaison assay at Laboratory B). In the samples assayed at Laboratory A (Liaison), 14 samples with two measurements and two samples with three measurements had a mean absolute difference of 10.96 nmol/L (95% CI (6.05, 15.87)). For the gold standard LC-MS/MS, there were 37 duplicate samples and a mean absolute difference of 9.79 nmol/L (95% CI (6.88, 12.70)). The small sample sizes did not allow for a formal comparison of within-laboratory performance across laboratories.

Between-assay comparison

The serum 25(OH)D concentrations of different aliquots of the same blood sample measured using Liaison at Laboratory A were on average 26.05 nmol/L lower than those measured using LC-MS/MS (95% limits of agreement: −13.21, 65.31) (Fig. 1A). The magnitude and variation in these differences increased at higher 25(OH)D concentrations (Fig. 2A). There was considerable misclassification of subjects into deficient or not, according to commonly used cut-points (see Table 2a), with the frequency of misclassification slightly greater for a cut-point of 75 nmol/L compared with 50 nmol/L. Using a threshold concentration of 25(OH)D of <50 nmol/L to define ‘vitamin D deficiency’, based on the results of the Liaison assay (Laboratory A) 46% (355/765) of participants would be classified as deficient, compared with 17% (128/765) according to the results from the LC-MS/MS assay (Table 2a). Nevertheless, comparison of quartiles of vitamin D concentrations showed that there was good agreement for this relative measure (Weighted Kappa = 0.73).

Figure 1.

Bland-Altman plots of (A) LC-MS/MS vs DiaSorin Liaison (Laboratory A); (B) LC-MS/MS vs DiaSorin Liaison (Laboratory B); and (C) DiaSorin Liaison (Laboratory A vs B). The solid lines indicate the mean bias (middle line) and 95% limits of agreement (top and bottom lines). All measurements in nmol/L.

Figure 2.

Deming Regression of (A) LC-MS/MS vs DiaSorin Liaison (Laboratory A), intercept −10.09 (95% CI (−13.74, −6.26)) and slope 0.80 (95% CI (0.74, 0.85)); (B) LC-MS/MS vs DiaSorin Liaison (Laboratory B), intercept −13.87 (95% CI (−22.74, 4.33)) and slope 1.03 (95% CI (0.89, 1.16)); and (C) DiaSorin Liaison (Laboratory A vs B), intercept 0.20 (95% CI (−7.58, 7.71)) and slope 1.21 (95% CI (1.06, 1.36)). All measurements in nmol/L. Dotted lines show perfect agreement between assays (intercept of 0 and slope of 1).

Table 2.  Comparison of the number and percentage of subjects classified as above and below (i) 50 nmol/L and (ii) 75 nmol/L in assays conducted at different laboratories
 Laboratory A (Liaison) Laboratory A (Liaison)
<50 nmol/L≥50 nmol/L<75 nmol/L≥75 nmol/L
(a) n= 765
 <50 nmol/L124 (16%)4 (1%)<75 nmol/L331 (43%)5 (1%)
 ≥50 nmol/L231 (30%)406 (53%)≥75 nmol/L294 (38%)135 (18%)
Cohen's Kappa0.35Cohen's Kappa0.27
 Laboratory B (Liaison) Laboratory A (Liaison)
<50 nmol/L≥50 nmol/L<75 nmol/L≥75 nmol/L
(b) n= 209
 <50 nmol/L41 (19%)0 (0%)<75 nmol/L109 (52%)4 (2%)
 ≥50 nmol/L35 (17%)133 (64%)≥75 nmol/L41 (20%)55 (26%)
Cohen's Kappa0.60Cohen's Kappa0.55
 Laboratory B (Liaison) Laboratory A (Liaison)
<50 nmol/L≥50 nmol/L<75 nmol/L≥75 nmol/L
  1. Bold cells represent misclassification according to the relevant cut-point.

(c) n= 207
Laboratory A (Liaison)  Laboratory A (Liaison)  
 <50 nmol/L68 (33%)31 (15%)<75 nmol/L139 (67%)33 (16%)
 ≥50 nmol/L6 (3%)102 (49%)≥75 nmol/L8 (4%)27 (13%)
Cohen's Kappa0.63Cohen's Kappa0.45

Comparing the results from the Liaison assay at Laboratory B with the results of the LC-MS/MS assay (Laboratory C), 25(OH)D concentrations using Liaison were on average 11.61 nmol/L lower than those using LC-MS/MS (95% limits of agreement: −21.11, 44.33) (Fig. 1B). Here the magnitude of the differences was consistent across a range of 25(OH)D concentrations (Fig. 2) although there was increasing variation in the differences at higher 25(OH)D concentrations. There was considerable misclassification using common cut-points for deficiency (see Table 2b). Measurements using Liaison at Laboratory B showed that 36% (76/209) of participants had 25(OH)D concentrations of <50 nmol/L, compared with 20% (41/209) using the LC-MS/MS assay (Table 2b). Again, the frequency of misclassification was slightly greater if the threshold was set at a 25(OH)D concentration of 75 nmol/L (Table 2b). The Weighted Kappa for quartiles was 0.79 indicating excellent agreement.

Therefore, the results from both of the Liaison assays yielded lower 25(OH)D concentrations than from the LC-MS/MS assay. Assay concordance was better between LC-MS/MS and the Laboratory B Liaison assay, compared with LC-MS/MS and the Laboratory A Liaison assay. The limits of agreement were clinically significant in all cases, suggesting limited assay agreement.

Between-laboratory comparison (DiaSorin Liaison (Laboratory A vs B))

Serum 25(OH)D concentrations in different aliquots of the same sample assayed with Liaison at Laboratory A were on average −11.60 nmol/L lower than those from the Liaison at Laboratory B, with 95% limits of agreement of −46.39, 23.18 (Fig. 1C). Both the magnitude and variability of differences were greater at higher 25(OH)D concentrations (Fig. 2C). Even the results between these two assays using the same methodology showed considerable misclassification (see Table 2c). As for the previous comparisons using relative 25(OH)D concentrations (quartiles), there was excellent agreement in the results from the two Liaison assays (Weighted Kappa of 0.77).


Our results suggest that commonly used assays for vitamin D status are not reliable for detecting vitamin D deficiency, in terms of repeatability of measures, and agreement of results between laboratories (for the same type of assay) or between different assay methods.

This study compares the performance of DiaSorin Liaison and LC-MS/MS assays for 25(OH)D concentration at three laboratories using 813 samples. The DiaSorin Liaison TOTAL is the most commonly used commercial platform among laboratories participating in the International External Quality Assessment Scheme for Vitamin D metabolites (DEQAS)6 and LC-MS/MS is considered by many to be the gold standard for measurement of 25(OH)D concentration, although its performance remains heavily user-dependant.16–18

Substantial variation in the measured 25(OH)D concentrations, using simple repeated measures of the same assay at the same laboratory, was observed for both Liaison and LC-MS/MS, despite the latter being widely considered the gold standard.7 In addition, there was significant difference between results from two different laboratories using one assay type (used by two different laboratories), and negative bias in the immunoassay (Diasorin Liaison) compared with the LC-MS/MS assay. The implications for patients are apparent at cut-off points for clinical decision-making: up to 1-in-3 samples were misclassified when the results from a Liaison assay were compared with those from the LC-MS/MS assay. If ‘deficiency’ resulted in treatment, 36–46% of participants in this study whose blood was assayed using a Liaison assay would be treated, compared with only 17–20% of those whose blood was assayed using the LC-MS/MS technique. That is, treatment decisions are likely to be influenced substantively by the 25(OH)D assay being used.

Previous studies have noted significant within-assay and between-assay variability in measurements of 25(OH)D concentration, but generally with smaller sample sizes than reported here.9–12,19,20 As for several of these studies, we found that, when agreement between laboratories and/or assays was assessed using correlation coefficients, the values were seemingly high, ranging from 0.77 to 0.86. However, this study highlights that even seemingly strong correlations may obscure significant misclassification, with important implications for clinical practice.21 As such the use of Bland-Altman plots with measures of bias and limits of agreement is preferable.22

The reasons for the negative bias in the Liaison assay compared with LC-MS/MS are not clear. However, one possible explanation is the under extraction of 25(OH)D3 compared with LC-MS/MS. This was noted in a previous study and was more evident at higher concentrations (>75 nmol/L).11

A strength of the current study is the comparison of results from two of the major 25(OH)D assays currently in use for clinical testing. The DiaSorin Liaison is widely used because of its capacity for high throughput, while advances in technology will continue to enable more common use of LC-MS/MS methods outside of traditional research settings. This is the largest study to date to compare the performance of these two assays, using duplicate serum measurements. Samples were collected, stored and analysed using routine methods and sample sizes were sufficiently large to obtain reliable estimates of bias and variability.

Although the 25(OH)D concentration results from a large number of pairwise samples were available, the study was somewhat limited by the small number of laboratories tested and having only a few repeat measurements for within-laboratory comparisons. As such, the results may not be generalisable to other laboratories using the same assay methods. However, given the degree of variability observed in other studies11,12,19,20 it is unlikely that the results we report are atypical of those commonly found in clinical practice. Moreover, the single estimates used are highly relevant to clinical and research practice.

These results are relevant to clinicians and researchers. Clinically, vitamin D testing has grown rapidly in Australia23 and other western countries where it is increasingly being used as a screening tool for at-risk populations. However, this study demonstrates that without reliable assay performance, the adoption of universal cut-points to define vitamin D status and to guide supplementation requirements remains problematic.

Although the concentration below which vitamin D supplementation is recommended varies, most laboratories use a threshold of between 30 and 50 nmol/L.16 Here, almost one-third of subjects would have possibly received vitamin D supplementation (assuming a threshold of 50 nmol/L) if the blood was analysed using DiaSorin Liaison (Laboratory A) that would not have received supplementation using LC-MS/MS. Alternatively, a patient being monitored for vitamin D deficiency over time could have a change in apparent clinical status merely from variability in measurements at the same laboratory or from having their serum analysed using a different assay method or laboratory.

For research studies, it is important that vitamin D status is measured in a consistent way, i.e. the same assay, across study groups with careful assessment of assay reliability. For example, data analysis within the Ausimmune Study uses only the 25(OH)D results from the LC-MS/MS assay, completed in one laboratory with insertion of duplicates to confirm assay precision.24 The excellent category agreement (Kappa = 0.73–0.79) between assays for the quartile measures supports the use of relative concentrations in population research studies. However, absolute 25(OH)D concentrations should be interpreted with caution.

This report highlights that there is a need for common standards to allow serum 25(OH)D assay calibration. The National Institute of Standard and Technology (NIST) working in conjunction with the National Institutes of Health's Office of Dietary Supplements, has developed a standard reference material (SRM) to aid in vitamin D analysis. SRM 972 Vitamin D in human serum consists of four pools of human serum with known analyte values for vitamin D metabolites.25 This can serve as a reproducible standard of comparison for laboratories. The use of common standards has been shown to improve 25(OH)D LC-MS/MS assay comparability and its widespread adoption should improve serum 25(OH)D assay performance.17 The UK Food Standards Agency consensus workshop has recently recommended the use of LC-MS/MS with concurrent standardisation using SRM 972 as the preferred method to measure 25(OH)D concentrations in the UK National Diet and Nutrition Survey.8


In this present study we have found poor agreement in measurement of 25(OH)D concentrations from different laboratories using the same assay method and where two different assay methods have been used. Clinicians should recognise the limitations in the accuracy and precision of current assays for measurement of serum 25(OH)D concentration. They should further exercise caution when interpreting isolated measurements of 25(OH)D concentration, particularly in applying universal cut-off points to define vitamin D deficiency, insufficiency or adequacy. We advocate the urgent adoption of common standards to improve the interpretability of results.


We thank the Ausimmune Study participants. We wish to acknowledge the outstanding contribution to the Ausimmune Study of the research nurses who undertook all data collection: Susan Agland, Barbara Alexander, Zoe Dunlop, Anne Wright, Rosalie Scott, Jannie Selvidge, Marie Steele, Katherine Turner, Brenda Wood and the study project officers, Jane Gresham, Helen Rodgers and Camilla Jozwick.


Appendix I

LC-MS/MS method

The LC-MS/MS method involves taking a 100 uL aliquot of serum test sample, calibration or quality control sample and adding 100 uL of an ice-cooled internal standard solution consisting of 60 nM 26, 27-hexadeuterium-25-hydroxy Vitamin D3 (Synthetica; Oslo, Norway) in 20% propan-2-ol in methanol and vortexed briefly. 500 uL of ice-cooled 1% Propan-2-ol in Hexane is added and the mixture vortexed for 20 s, then centrifuged at 5000 rcf for 5 min at 4°C. 400 uL of the organic layer is transferred to a fresh microfuge tube and allowed to dry overnight, covered from light in a fume hood. The dried sample is reconstituted in 200 uL of 70% methanol and analysed using LC-MS/MS with the separation occurring over a standard C18 column. In each batch a calibration curve was constructed in which the 25(OH)D2/25(OH)D3 calibration serum standard (Cat# 38033, Chromsystems; Munich, Germany), was diluted in saline to give a range of 25(OH)D2 (1.066–93.2 nM) and 25(OH)D3 (2 nM–175) calibrators. UTAK Laboratories Tri-Level Vitamin D Plus Serum Controls were used as the quality control samples in the assay. Quality control samples were prepared in triplicate and analysed in the same batch of samples. Runs were either accepted or failed upon the performance of the quality control samples.