Potential conflict of interest: Nothing to report.
We conducted a study to characterize the variability in the upper limit of normal (ULN) for alanine aminotransferase (ALT) across different laboratories (labs) in Indiana and to understand factors leading to such variability. A survey was mailed to all eligible labs (n = 108) in Indiana, and the response rate was 62%. The survey queried for ALT ULN, the type of chemical analyzer used, five College of American Pathologists (CAP) sample results, and methods used to establish the reference interval. There was a wide variability in the ALT ULN for both men and women. Eighty-five percent of labs used chemical analyzers belonging to one of the four brands. For all five CAP samples, there was a statistically significant difference in ALT values measured by different analyzers (P < 0.0001), but these differences were not clinically significant. The majority of labs used the manufacturers' recommendations for establishing their ALT ULN rather than in-house healthy volunteer testing (only 17%). When healthy volunteers were tested, the process for testing was haphazard in terms of the number of individuals tested, frequency of testing, and criteria for choosing the reference population. After controlling for chemical analyzer type, there was no significant relationship between ALT ULN values and the method used for its establishment. Conclusion: Wide variability in ALT ULN across different labs is more likely due to variable reference intervals of different chemical analyzers. It may be possible to minimize variability in ALT ULN by (1) each lab solely following the manufacturers' recommendations and (2) manufacturers of different analyzers following consistent and rigorous methodology in establishing the reference range. Alternatively, studies should be undertaken to identify outcome-based reference intervals for ALT. (HEPATOLOGY 2009.)
Serum alanine aminotransferase (ALT) is one of the most commonly obtained biochemical tests in clinical practice, and it is regarded as a reliable and sensitive marker of liver disease.1 It is clear that there is a strong relationship between abnormal ALT levels and subsequent mortality from liver disease.1 Importantly, recent studies suggest that elevated ALT levels may predict overall mortality rate in the general population, independent of liver disease.2–4 A recent position paper from the American Association for the Study of Liver Diseases examined the available data and opined that serum ALT is a good indicator of overall health and meets most of the accepted criteria for a screening test.5 Unfortunately, there is not a standard reference interval for serum ALT that all laboratories (labs) accept and report.
The reference ranges for routine laboratory tests are established based either on values obtained from healthy individuals or on health outcomes.1 When reference intervals are established based on healthy volunteer testing, they typically represent the central 95% of values obtained from healthy individuals. For some tests, such as fasting glucose and low-density lipoprotein cholesterol, reference intervals are defined by health outcomes. The use of outcome-based reference intervals requires that there is a high degree of standardization across different labs, and a precise relationship exists between adverse health outcome and a discrete level of the lab value. The reference interval for ALT is established based on healthy individuals, but it is our anecdotal experience that there is a wide variability in the reference interval reported by different labs. For example, three clinical laboratories on our medical school campus report three different upper limits of normal (ULN) for ALT. This wide variability in ALT ULN not only poses problems for patient management but also for research subject recruitment and monitoring of hepatic safety in clinical trials.
We conducted a survey-based study to systematically characterize the degree of variability in ALT-ULN across different labs in Indiana and to understand factors that may lead to such wide variability.
ALT, alanine aminotransferase; CAP, College of American Pathologists; CLIA, Clinical Laboratory Improvement Amendments; CLSI, Clinical and Laboratory Standards Institute; labs, laboratories; ULN, upper limit of normal.
Materials and Methods
The survey was self-administered and conducted by mail. This was a cross-sectional survey of eligible clinical laboratories in the State of Indiana. The study was reviewed and approved by our Institutional Review Board. A standardized questionnaire consisting of four sections was developed by the authors in conjunction with the Liver Research Group at the Indiana University School of Medicine (see Supporting Information). In brief, section I queried for basic information of the respondents, including the name of the lab and its director, number of years in operation, and accreditation details. Section II queried about the type of chemical analyzer used and the reference range for ALT. Section III queried about each lab's performance on 2006 College of American Pathologists (CAP) ALT specimens (n = 5), labeled as CAP-01 through CAP-05. These samples are used in the accreditation process of clinical labs by the CAP. The participating labs analyze the aliquots of the same but unknown sample sent by the CAP and report back the results. Section IV queried about the method used by each lab for establishing the reference interval for ALT—specifically, whether the lab followed the manufacturers' recommendations or conducted in-house testing on healthy volunteers (or both) or used other methods. For those who reported testing healthy volunteers, additional details (such as the number of volunteers, frequency of testing, and any exclusionary criteria for accepting volunteers) were requested.
The questionnaire, along with a Blockbuster gift card for $25, and a cover letter that assured confidentiality were initially mailed in July 2007 to all eligible labs. Each questionnaire was coded to identify nonresponders. Recipients were instructed to return the completed survey in an enclosed business reply envelope. Eight weeks later, the same questionnaire was sent to nonresponders without any additional incentive. Nonresponders after two attempts were not contacted further by mail or by telephone.
A comprehensive list of hospitals and labs in Indiana was obtained from the Indiana State Department of Health. Hospitals without associated clinical labs (for example, mental health hospitals and rehabilitation facilities) were excluded. Hospitals that shared a central lab were considered one entity, and only the main lab was contacted. Based on these criteria, 108 clinical labs in Indiana were deemed eligible for the survey.
Data were collected from all survey responders and compiled into an Excel-based worksheet. Descriptive statistics such as mean with standard deviation, median with range, and percentage were used to describe the study results. To examine the reported differences in CAP specimens across chemical analyzer models used, a Kruskal-Wallis test was applied. Chemical analyzers were divided into five main categories: Beckman, Dade, Roche, Ortho, and “Other.” Chemical analyzers belonging to the “Other” category were not included in group comparisons. To determine which models differed significantly, a pair-wise Mann-Whitney test was employed. To adjust for multiple comparisons, the Bonferoni method was applied, and P values less than 0.0083 were considered significant. To determine whether the ULN differed between men and women, a Wilcoxon signed-rank test was used. Analyses were conducted separately by sex. To investigate the relationship between ALT ULN values and the method used to establish the ALT reference range, we conducted two-way analysis of variance after adjustment for the type of chemical analyzers used. Methods used for establishing the reference interval were categorized into three groups: (1) testing health volunteers only, (2) manufacturer's recommendation only, and (3) both healthy volunteer testing and manufacturer's recommendation. To investigate the relationship between ALT ULN values and the type of chemical analyzer used, we conducted two-way analysis of variance after adjustment for the method used for establishing the ALT reference range. For multiple comparisons, the Tukey-Kramer method was employed. Statistical analyses were performed using SAS software, and P < 0.05 was considered as statistically significant.
Out of 108 labs contacted, we received 67 survey responses—a response rate of 62%. The responses received were representative of a wide geographical area in Indiana that included all major cities, towns, and counties. Their median length of operation was 50 years (range, 2–102 years), and all labs were accredited by one or more agencies (such as the CAP and the Clinical Laboratory Improvement Amendments [CLIA]).
As expected, there was a wide range for ALT ULN across different labs in Indiana, with values ranging from 31 to 72 U/L (Fig. 1). The mean ± standard deviation of ALT ULN values in men was 56.9 ± 11.9 U/L; the median (range) was 63 (32–72) U/L, with the upper and lower quartile values ranging between 65 and 45 U/L. The mean ± standard deviation of ALT ULN values in women was 51.5 ± 10.3 U/L; the median (range) was 52 (31–72) U/L, with the upper and lower quartiles ranging between 58 and 41 U/L. The degree of variability for ALT ULN was similar between men and women (P = 0.25). Not unexpectedly, respondents reported using different chemical analyzers for measuring the ALT, but the majority used chemical analyzers sold by one of the four manufacturers (Beckman 18, Ortho 18, Dade 15, and Roche 8).
Table 1 shows the methods that different responding labs used to establish the ALT ULN. Of the responding labs, 40% established their ALT ULN based on the manufacturer's recommendation for a particular chemical analyzer plus testing healthy volunteers, 38.5% followed the manufacturer's recommendation without healthy volunteer testing, 17% tested healthy volunteers, and 8% used medical textbooks.
Table 1. Methods Used to Establish ALT Reference Interval by Different Laboratories in Indiana
The total is greater than 100% because some labs belonged to more than one category.
Based on manufacturer's recommendation and healthy volunteer testing
Based on manufacturer's recommendation but not healthy volunteer testing
Based on testing healthy volunteers but not manufacturer's recommendation
Based on a textbook
Based on CAP or CLIA recommendations
There were 37 responding labs that tested healthy volunteers to establish their ALT ULN. There was no consistency in the process by which testing on healthy volunteers was conducted. The median number of healthy volunteers included was 25 (range, 20–447), generally split equally between men and women. Four labs reported that they conducted testing on healthy volunteers at yearly intervals, and five labs reported 3- to 5-year intervals; the remaining labs reported conducting healthy volunteer testing only when there was a new machine or a new assay. In terms of criteria defining healthy volunteers, nine labs had no specific criteria to exclude volunteers based on their medical history, whereas 18 labs had wide-ranging criteria (Table 2).
Table 2. Reported Exclusionary Criteria for Testing on Healthy Volunteers (n = 27)
The total exceeds 27 because some labs used more than one criterion
Any liver disease
Overweight (body mass index >25 kg/m2)
Obese (body mass index >30 kg/m2)
History of blood transfusions
History of intravenous of drug abuse
Table 3 shows the results of five CAP samples reported by different labs, stratified according to the manufacturer of the chemical analyzer used. For each CAP sample, there was a statistically significant difference among different manufacturers (P < 0.0001 for all five CAP samples), but these numerical differences were not clinically significant (Table 3). For all five CAP samples, ALT levels measured by Ortho were significantly higher than Beckman values. For four CAP samples, Dade values were significantly higher than Beckman values, and Ortho values were higher than both Dade and Roche values. Dade values were also significantly higher than Roche in two blood samples, and Roche values were higher than Beckman in only one blood sample. In general, the median ALT values measured by Ortho were the highest, followed by Dade, Roche, and Beckman analyzers. For all five CAP samples, the mean ALT results reported by Indiana labs were not significantly different from the national ALT mean values (P > 0.05 for all comparisons by single-sample t test) (Table 4).
Table 3. ALT Values on 2006 CAP Samples Measured by Different Labs, Stratified According to Chemical Analyzer
Median ALT (Range)
The results of CAP samples were reported in the survey response by only 45 labs.
Ortho (n = 14)
Roche (n = 6)
Dade (n = 8)
Beckman (n = 12)
Other (n = 5)
Table 4. Results of CAP Samples Measured by Indiana Labs Compared with National Data
The national data are expressed as the mean and standard deviation for each instrument. The national means and standard deviations are the weighted means and standard deviations of each manufacturer's instruments. None of the Indiana ALT mean values significantly differed from the national ALT mean values by single-sample t test (all P > 0.05).
Abbreviation: SD, standard deviation.
To assess the relative importance of the chemical analyzer versus the method employed to establish the ALT ULN, we conducted a two-way analysis of variance in men and women separately. The ALT ULN was significantly different across different chemical analyzers, after controlling for the method used (P < 0.01) (Tables 5 and 6). However, after controlling for the chemical analyzer type, the relationship between the method used and ALT ULN was not statistically significant (Tables 7 and 8).
Table 5. ALT ULN for Labs Using Different Chemical Analyzers After Controlling for Methods Used
ALT ULN, Mean (95% CI)
Methods used to establish the ULN include manufacturer's recommendation, testing healthy volunteers,or both.
Abbreviation: CI, confidence interval.
Table 6. Differences in ALT ULN Between Different Analyzers, After Controlling for Method Used
Difference Between Mean (95% CI) of ALT ULN
Methods used to establish the ULN include manufacturer's recommendation, testing healthy volunteers or both.
Table 7. The ALT ULN for Different Labs Using Different Methods to Establish Their ULN, After Controlling for the Type of Chemical Analyzer
Method Used to Establish ULN
ULN for ALT, Mean (95% CI)
Abbreviation: CI, confidence interval.
Based on manufacturer recommendation only
Based on testing healthy volunteers
Table 8. Differences in ALT ULN Depending on the Method Used to Establish the ULN After Controlling for Analyzer
Methods Used for Establishing ULN
Difference Between Mean Values (95% CI) of ALT ULN*
Abbreviation: CI, confidence interval.
P > 0.05.
Testing healthy volunteer versus manufacturer's recommendation
Testing healthy volunteer versus both testing healthy volunteer and manufacturer's recommendation
Both testing healthy volunteer and manufacturer's recommendation versus manufacturer's recommendation alone
The Clinical and Laboratory Standards Institute (CLSI) document “Defining, Establishing, and Verifying Reference Intervals in the Clinical Laboratory” offers a protocol for the determination of reference intervals that meet the minimum requirements for reliability and usefulness.6 The CLSI (formerly the National Committee on Clinical Laboratory Standards) recommends that the best method for establishing a reference interval is to “collect samples from a sufficient number of qualified reference individuals to yield a minimum of 120 samples for analysis, by nonparametric means, for each partition (eg, sex, age range).”6 However, it is well recognized that few small laboratories and even chemical analyzer manufacturers perform such rigorous analyses. Therefore, the CLIA recommends that individual laboratories verify reference intervals established elsewhere. This can be done in two ways: (1) a laboratory with a previously established reference interval can verify that interval by transference using a CLSI document EP09 protocol and completely avoid collecting samples from reference individuals and (2) a laboratory can verify a reference interval by collecting “as few as 20 samples from qualified reference population.”6
Our study makes several interesting and clinically important observations. First, it confirms that there is wide variability for ALT ULN across different clinical labs. Second, for a given CAP sample, there is a statistically significant analyzer-to-analyzer variability, but these differences were not clinically significant. Third, the majority of labs established their ULN based on the manufacturers' recommendations, rather than testing healthy volunteers as it is generally believed. Even in the labs that tested healthy volunteers, the process is quite haphazard in terms of the number of individuals tested, frequency of testing, and criteria for choosing healthy volunteers. Fourth, after controlling for the type of chemical analyzer, we did not find a significant relationship between the methods used to establish ULN and the ALT ULN values. Contrary to popular belief, our findings do not support the notion that suboptimal healthy volunteer testing is the major reason why ALT ULN is highly variable among different clinical labs. Our study shows that not many labs test healthy volunteers to establish ALT ULN.
Neuschwander-Tetri et al.7 surveyed 11 academic labs and reported that the ULN for ALT was quite variable, suggesting that the primary factor contributing to the widely divergent ALT ULN values is not caused by analyzer-to-analyzer variability but must be related to the characteristics of the healthy volunteers tested by each lab. Although this may be the case at tertiary care academic labs, it appears that the majority of community labs establish their ALT ULN based on manufacturers' recommendations for different analyzers. Thus, the problem of widely variable ALT ULN could be related to variable reference intervals established by different manufacturers. This leads us to speculate that the primary problem could potentially be related to the methodology by which different manufacturers establish their reference intervals. This information is not available publicly, and our efforts to obtain such details were unsuccessful—except for Ortho, which provided us with a copy of their instruction manual. Ortho lists a reference range for ALT of 21–73 U/L for adult males and 9–52 U/L for adult females, and these limits represent the central 95% of results from an internal study of 2,444 apparently healthy adults (547 females and 1,897 males) (Vitros Chemistry Products, ALT Slides, Instructions for Use, version 3.0).
One potential limitation of this study is our inability to consistently obtain from all labs surveyed the specific model of chemical analyzer used to determine ALT values. Because there is a statistically significant analyzer-to-analyzer variability for a given CAP sample, it is possible that the specific model used would further add to the variability of ALT values reported. However, given the relatively small number of total laboratories included in this survey, we could not perform analyses separately on different models of the same manufacturer. It should also be noted that our survey question about the frequency of reference interval evaluation might be misleading. Laboratories typically “establish” a reference interval at the start of using a method, and then periodically “validate” that the reference interval is still valid (in fact, this is required by CLIA regulations). It may be that the different answers about frequency of testing healthy individuals reflected uncertainty about what was being asked of the respondents.
Given that ALT is one of the most important and most widely ordered laboratory tests, it is unacceptable to have such a wide variability in its ULN, regardless of the origin of the problem. Based on our findings, it may be possible to minimize ALT ULN variability by (1) asking each lab to follow manufacturers' recommendations and (2) assuring that manufacturers of different analyzers establish the reference interval based on rigorous criteria and a strictly defined reference population. We recommend that a reference population should be healthy with normal body weight and should have no underlying acute or chronic illnesses, no significant alcohol consumption, and no intake of prescription or nonprescription medicines, herbal compounds, or dietary supplements. Furthermore, experts and various agencies should start exploring outcome-based reference intervals for ALT, rather than population-based reference values as currently practiced.
We thank Dr. Brent Neuschwander-Tetri (Saint Louis University) for his thoughtful comments.