Parametric Versus Nonparametric Statistical Tests: The Length of Stay Example


  • Munirih Qualls MD, MPH,

    1. From the Department of Emergency Medicine, Brigham and Women’s Hospital (MQ, DJP, JDS), Boston, MA; Harvard Medical School (DJP, JDS), Boston, MA; and the Division of Emergency Medicine, Children’s Hospital Boston (DJP), Boston, MA.
    Search for more papers by this author
  • Daniel J. Pallin MD, MPH,

    1. From the Department of Emergency Medicine, Brigham and Women’s Hospital (MQ, DJP, JDS), Boston, MA; Harvard Medical School (DJP, JDS), Boston, MA; and the Division of Emergency Medicine, Children’s Hospital Boston (DJP), Boston, MA.
    Search for more papers by this author
  • Jeremiah D. Schuur MD, MHS

    1. From the Department of Emergency Medicine, Brigham and Women’s Hospital (MQ, DJP, JDS), Boston, MA; Harvard Medical School (DJP, JDS), Boston, MA; and the Division of Emergency Medicine, Children’s Hospital Boston (DJP), Boston, MA.
    Search for more papers by this author

  • Presented at the Society for Academic Emergency Medicine annual meeting, New Orleans, LA, May 14–17, 2009.

  • Disclosures: Part of Dr. Schuur’s time is supported by a Jahnigen Career Development Award, funded by the Atlantic Philanthropies and the Hartford Foundation.

  • Supervising Editor: Gary Gaddis, MD, PhD.

Address for correspondence: Munirih Qualls, MD, MPH; e-mail: Reprints will not be available.


Objectives:  This study examined selected effects of the proper use of nonparametric inferential statistical methods for analysis of nonnormally distributed data, as exemplified by emergency department length of stay (ED LOS). The hypothesis was that parametric methods have been used inappropriately for evaluation of ED LOS in most recent studies in leading emergency medicine (EM) journals. To illustrate why such a methodologic flaw should be avoided, a demonstration, using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), is presented. The demonstration shows how inappropriate analysis of ED LOS increases the probability of type II errors.

Methods:  Five major EM journals were reviewed, January 1, 2004, through December 31, 2007, and all studies with ED LOS as one of the reported outcomes were reviewed. The authors determined whether ED LOS was analyzed correctly by ascertaining whether nonparametric tests were used when indicated. An illustrative analysis of ED LOS was constructed using 2006 NHAMCS data, to demonstrate how inferential testing for statistical significance can deliver differing conclusions, depending on whether nonparametric methods are used when indicated.

Results:  Forty-nine articles were identified that studied ED LOS; 80% did not perform a test of normality on the ED LOS data. Data were not normally distributed in all 10 of the studies that did perform such tests. Overall, 43% failed to use appropriate nonparametric methods. Analysis of NHAMCS data confirmed that failure to use nonparametric bivariate tests results in type II statistical error and in multivariate models with less explanatory power (a smaller R2 value).

Conclusions:  ED LOS, a key ED operational metric, is frequently analyzed incorrectly in the EM literature. Applying parametric statistical tests to such nonnormally distributed data reduces power and increases the probability of a type II error, which is the failure to find true associations. Appropriate use of nonparametric statistics should be a core component of statistical literacy because such use increases the validity of ED research and quality improvement projects.

ACADEMIC EMERGENCY MEDICINE 2010; 17:1113–1121 © 2010 by the Society for Academic Emergency Medicine

Statistical literacy is a critical skill for users of the medical literature, for clinicians and researchers alike. A key premise of evidence-based medicine (EBM) is that clinicians should depend on primary medical literature to inform patient care decisions.1 To practice EBM well requires the ability to understand and recognize sources of bias in the medical literature. Biased studies are more likely to derive incorrect conclusions that can mislead practitioners of EBM. Bias can occur due to faulty study design and/or methodology, as well as inappropriate choice of inferential statistical tests. Consequently, biostatistical training has become more emphasized in medical schools2 and emergency medicine (EM) residency programs.3

Despite this recent increased emphasis on statistical literacy, many physicians cannot demonstrate competence with basic biostatistical concepts.1,4,5 This leads to mistakes in statistical analysis6 and reluctance to conduct research.7,8 This limitation is particularly germane to EM, which has been highly self-critical regarding research methodology.9,10 Although the rigor of EM research has increased with the maturation of the specialty, and the creation of dedicated research training activities such as research fellowships,11 there is still room for improvement. Appropriate use of nonparametric statistical analyses for nonnormally distributed data represents just such an opportunity for improvement.6,12 EM researchers must understand how to choose appropriate statistical methods, and clinicians and peer reviewers who read their studies should be able to recognize basic errors.

Emergency department length of stay (ED LOS) is a continuously distributed interval variable with a nonnormal distribution due to its frequent high degree of skew, as illustrated in Figure 1. ED LOS is an important, frequently studied outcome variable in EM operations research because it is a key indicator of operational efficiency. Parametric statistical tests are usually inappropriate13–17 for analysis of ED LOS. Since the Institute of Medicine identified ED crowding as an obstacle to high-quality emergency care, and a series of studies have linked crowding to adverse safety and quality outcomes, ED LOS has become a proxy for quality-of-care processes.18–23 The National Quality Forum (NQF) has endorsed median ED LOS as an indicator of safety and efficiency,24 and the Joint Commission announced that it will include the NQF’s ED LOS measures in its 2010 hospital specifications manuals.25 These are the first steps toward inclusion of ED LOS in mandatory hospital quality measures, making it likely that within a few years every hospital’s ED LOS will be available for the public to view on the Medicare website, Hospital Compare.26

Figure 1.

 Distribution of ED LOS in the NHAMCS (n = 26,618). The graph is truncated at ED LOS ≤ 1,800 minutes. ED LOS was greater than 1800 minutes at 123 visits. LOS = length of stay; NHAMCS = National Hospital Ambulatory Medical Care Survey.

Our study has two goals. The first goal is to test the hypothesis that inferential statistical analysis of ED LOS in published studies is usually performed inappropriately, because of a suspicion that ED LOS is usually analyzed using parametric methods. The second goal is to demonstrate, by example, that inappropriate use of parametric methods increases the probability of committing a type II error. We present several analyses of ED LOS, using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS). We test the theory that type II errors, and not type I errors, predominate when inappropriate parametric tests are used.17


Study Design

We hand-reviewed all original research articles published between January 1, 2004, and December 31, 2007, in Academic Emergency Medicine, the American Journal of Emergency Medicine, Annals of Emergency Medicine, the Canadian Journal of Emergency Medicine, and Emergency Medicine Journal. The first three journals are the three highest ranked U.S. EM journals by impact factor.27 The others are the two leading English language non-U.S. EM journals. Using the 2006 NHAMCS database, we conducted illustrative analyses of ED LOS using both parametric and nonparametric methods.

Data Sources

Our literature review included any original research article using ED LOS as a primary or secondary outcome. We used the ED Benchmarking Alliance consensus definition of ED LOS: time from ED arrival to discharge, admission, or death in the ED.28 For our illustrative analyses, we used the ED component of the 2006 NHAMCS. This is a multistage probability survey of U.S. ED visits in institutional, general, and short-stay hospitals, whose methods have been detailed previously.29 NHAMCS defines ED LOS as “length of visit, calculated from arrival time to time of discharge from the ED.”29

Study Protocol

We determined standards used to describe or test nonnormally distributed data and then compared the selected articles against these standards (Table 1). Descriptor of central tendency was categorized as mean, median, or both. For all other criteria, we determined a rating of “Yes” or “No” based on the statistical techniques used. To assess whether a test of normality was performed, we recorded whether the authors stated that they performed a test of normality on the ED LOS data or if they explicitly mentioned the distribution of ED LOS. For articles that did not describe the distribution of ED LOS data, we considered ED LOS to have a nonnormal distribution, unless a normal distribution was reported or unless sufficient data were provided for us to determine that ED LOS was normally distributed. When ED LOS was not normal, we deemed the study to have conducted appropriate nonparametric analysis if it met three criteria: 1) reported a median as the description of central tendency, 2) used appropriate nonparametric bivariate tests of significance (e.g., Mann-Whitney, Wilcoxon, or Kruskal-Wallis),30 and 3) conducted appropriate log-transformations of ED LOS data prior to regression analyses.31

Table 1. 
Standards Used to Describe or Test Nonnormally Distributed Data
CriterionAppropriate Standard
  1. LOS = length of stay.

Test of normalityLOS data must be evaluated for nonnormal distribution.
Descriptor of central tendencyLOS data must be described using median, or median and mean.
Aggregation of dataAggregate LOS data, such as weekly mean ED LOS, should not be used in statistical analyses.
Bivariate statistical testsNonparametric tests of significance (Mann-Whitney, Wilcoxon, or Kruskal-Wallis) should be used to analyze data with a nonnormal distribution, rather than parametric tests such as Student’s t-test.
Transformation of LOS in regression analysisNonnormal LOS data must be appropriately transformed prior to regression analysis (e.g., log-transformation).

In contrast, we deemed a study with nonnormal ED LOS data not to have used appropriate methods if it: 1) reported only the mean as a measure of central tendency for nonnormally distributed ED LOS data, 2) used inappropriate bivariate tests of significance (such as Student’s t-test), and 3) failed to perform a log-transformation prior to linear regression analysis.

In our NHAMCS analyses, we included all adult (age ≥ 18 years) ED visits in the NHAMCS 2006 data set and evaluated associations between ED LOS and three categories of variables, which were hospital type, visit events, and patient demographic characteristics. To evaluate bivariate predictors of ED LOS, we selected 10 dichotomous variables for which there existed a clinical or administrative rationale to suggest an association with ED LOS. The hospital type variable was urban hospital defined by location in a metropolitan statistical area (Yes/No). The visit event variables were diagnostic tests performed (Yes/No), was imaging performed (Yes/No), was care by midlevel provider (nurse practitioner or a physician assistant; Yes/No), and arrival by ambulance (Yes/No). The patient demographic characteristics were Hispanic ethnicity (Yes/No), race (white vs. nonwhite), sex (male/female), age (18–65 years, > 65 years), and presenting level of pain (dichotomized to severe/not severe [none/mild/moderate]).

For the multivariate analysis, we included the 10 variables listed above, plus five additional categorical variables that were not eligible for bivariate analysis because they could not easily be dichotomized. The additional variables were as follows:

  • 1Hospital ownership (proprietary; voluntary nonprofit; government, nonfederal).
  • 2Hospital region (Northeast, Midwest, South, West).
  • 3Number of medications given in the ED.
  • 4Day of the week.
  • 5Number of procedures performed (e.g., wound care or IV fluids) during the visit.

Outcome Measures

The primary outcome measure in the literature review portion of this manuscript was the proportion of published articles that inappropriately employed parametric methods for description or analysis of ED LOS data.

Data Analysis

We analyzed the results of the literature review using descriptive statistics. We tested the NHMACS data for normality using three methods. The Anderson-Darling test is a formal test of normality; a p-value less than 0.01 signifies a nonnormal distribution.32 The second method is “mean-median difference,” which represents the degree of skewness by calculating the difference between the median and the mean as a percentage of the mean. A small percentage difference (i.e., 1% to 5%) suggests that the mean and the median are close to each other, and the data are likely to be normally distributed. A larger difference suggests that the mean and the median are far from each other and the data are not normally distributed.33 The third method is the “standard deviation to mean ratio.” If the standard deviation (SD) is more than half of the mean, the distribution is likely to be nonnormal.33

We examined the bivariate relationship between NHAMCS ED LOS and the 10 dichotomizable covariates with parametric (t-test) and nonparametric (Wilcoxon rank sum test) bivariate tests.33

We created a multivariate regression model from the 15 independent variables (10 dichotomous and five nondichotomous, as described above) using NHAMCS data. We constructed one model with raw ED LOS as the dependent variable and another with log-transformed ED LOS. The R2 values, the F-values, and the p-values of the two models were compared. The meaning of these parameters is defined elsewhere.33 In brief, the R2 is a measure of the total amount of variability described by the equation constituting a multivariate linear regression. The F-value represents the proportion of the variance explained by each predictor variable. The p-value is a means of interpreting the F-value in terms of statistical significance. For purposes of the modeling exercise, we have focused on the change in R2 and F-values that occurs when one log-transforms existing explanatory variables to better normalize their distributions, without addressing issues of colinearity, residual analysis, and outliers.

We considered a two-sided p < 0.01 to be significant, as recommended by the National Center for Health Statistics for analysis of NHAMCS data, due to the large size of the data set and the frequency with which it is queried.34 We performed bivariate parametric tests and multivariate linear regression models using both standard and weighted survey techniques that account for the design characteristics of NHAMCS. We performed nonparametric tests using standard techniques, as weighting cannot be accounted for with nonparametric techniques surveys. All statistical analyses were performed with SAS 9.1 software (SAS Institute, Cary, NC).


Literature Review

We identified 49 articles with ED LOS as a primary or secondary outcome. Ten of the 49 (20%) articles included a test of normality on the ED LOS data; all 10 of these articles reported that the data were not normally distributed. Of the 39 articles that did not perform a test of normality, 17 reported sufficient data to allow calculation of the distribution of the ED LOS data set using two of the summary methods described above. Ten of these 17 articles (59%) had a nonnormal distribution of ED LOS. The ED LOS of the remaining 22 articles was assumed to be nonnormally distributed, consistent with the methodology described above.

Sixteen of 49 articles (33%) appropriately accounted for the nonparametric distribution of ED LOS by reporting median ED LOS and analyzing ED LOS using a nonparametric bivariate test or transforming ED LOS prior to multivariate regression. Twenty-one of the 49 articles (43%) failed to account for the nonnormal distribution of ED LOS and used only parametric methods of analysis. The remaining 12 articles used a combination of parametric and nonparametric methods and descriptive statistics.

Among all 49 articles with ED LOS as an outcome, 47% exclusively reported mean LOS as the description of central tendency. Of the 32 articles that performed bivariate tests of significance, 14 (44%) used only the parametric Student’s t-test. Of the 14 articles that created multivariate regression models with ED LOS as the dependent variable, 10 (71%) used only raw ED LOS data.

Two articles (4%) used other appropriate techniques to account for skewed data: one included a histogram, a graphic representation of normality, and another reported trimming the data of outliers that contributed to a rightward skew of the data. Five articles (10%) performed statistical analyses on aggregate mean LOS data. Three used daily mean, one used weekly mean, and one used monthly mean. This is an inappropriate approach, as it does not change the underlying nonparametric distribution of ED LOS data and does not have the power advantages of nonparametric statistical techniques.

NHAMCS Analysis

Mean ± SD adult ED LOS for U.S. EDs was 229 ± 257 minutes, and median ED LOS was 164 minutes (interquartile range [IQR] = 93–272 minutes). Adult ED LOS was nonnormally distributed in the 2006 NHAMCS data set by all three tests of normality. The Anderson-Darling test revealed nonnormality, with p 0.005. Mean-median difference was 30% of the value of the mean. The mean to SD ratio was 0.9. A histogram graphically illustrates the rightward skew of adult ED LOS compared to the normal distribution (Figure 1). The distribution of ED LOS closely fits a log-normal distribution.

In our bivariate analysis, we found that three of the 10 variables evaluated as predictors of ED LOS did not meet our a priori threshold for statistical significance using the parametric Student’s t-test (Table 2). In contrast, all 10 variables were significant (p < 0.001) using the nonparametric Wilcoxon rank sum test. The three variables that were significant in nonparametric analysis and not significant in parametric analysis were: sex (p = 0.806), care by a midlevel provider (p = 0.021), and pain (p = 0.091).

Table 2. 
Results of Bivariate Parametric and Nonparametric Analysis of ED LOS From the NHAMCS*
VariableMean (95% CI)t-testMedian (IQR)Wilcoxon Rank Sum Test
  1. IQR = interquartile range; LOS = length of stay; NHAMCS = National Hospital Ambulatory Medical Care Survey.

  2. * Based on 26,618 adult ED visits in the 2006 NHAMCS. Some variables were calculated based on smaller sample sizes, as missing values were excluded. Statistical calculations are based on unweighted techniques.

  3. †Variables that differed significantly between parametric and nonparametric test.

Urban hospital239 (236–243)<0.001174 (100–286)<0.001
Nonurban hospital157 (149–166)109 (62–173)
No diagnostic tests performed133 (128–138)<0.00188 (51–154)<0.001
Diagnostic tests performed251 (247–255)183 (111–294)
No imaging performed197 (193–202)<0.001127 (70–222)<0.001
Imaging performed264 (259–268)205 (129–315)
No care by midlevel provider230 (227–233)0.021†165 (94–273)0.006
Care by midlevel provider216 (205–228)154 (89–259)
No arrival in ambulance208 (207–213)<0.001150 (105–355)<0.001
Arrival in ambulance291 (283–300)214 (137–334)
Hispanic ethnicity257 (248–267)<0.001184 (102–307)<0.001
Non-Hispanic ethnicity225 (222–228)161 (91–268)
Nonwhite race248 (242–254)<0.001180 (104–304)<0.001
White race222 (218–226)158 (90–261)
Female229 (225–234)0.806†167 (95–276)0.002
Male228 (224–234)159 (90–266)
Age > 65 yr256 (249–263)<0.001200 (125–306)<0.001
Age < 65 yr223 (219–226)155 (88–262)
High pain225 (220–231)0.091†170 (100–280)<0.001
Low pain220 (216–224)159 (91–260)

In our multivariate linear regression analysis, we found that log-transforming ED LOS resulted in a better fitting model than raw ED LOS, as determined by larger F-value (with the exception of “Hospital Ownership”) for the predictor variables, and a larger R2 value for the regression model (Table 3). Only one variable was significant in the transformed ED LOS model and not significant in the raw ED LOS model (sex, p = 0.049). In both our bivariate and our multivariate analyses, we found no type I errors resulting from inappropriate use of parametric tests.

Table 3. 
ED LOS Regression Models Using Raw and Transformed ED LOS Data From the 2006 NHAMCS*
Predictor VariableRaw ED LOSLog-transformed ED LOS
  1. LOS = length of stay; NHAMCS = National Hospital Ambulatory Medical Care Survey.

  2. *Models based on 23,842 of 26,618 adult ED visits in the 2006 NHAMCS with complete data for all predictor variables. Statistical calculations are based on unweighted techniques.

  3. †Variables which differed in significance between parametric and nonparametric test.

Urban hospital132<0.001543<0.001
Hospital ownership72.6<0.00144.4<0.001
Hospital region28.1<0.00139.6<0.001
Imaging performed87.9<0.001407<0.001
Diagnostic tests performed148<0.001817<0.001
Total number of procedures18.7<0.00161.2<0.001
Day of week (weekday/weekend)10.3<0.00137.8<0.001
Care by midlevel provider (Yes/No)0.3900.5312.400.121
Number of medications administered712<0.0011163<0.001
Arrival by ambulance (Yes/No)126<0.001127<0.001
Race White/nonwhite89.2<0.001106<0.001
Ethnicity (Hispanic/non-Hispanic)31.9<0.00163.4<0.001
Sex (male/female)3.850.049†8.660.003
Age, yr69.8<0.001187<0.001
Pain (high/low)0.810.3693.270.071
R2 for regression modelRaw ED LOS: 0.12Log-transformed ED LOS: 0.23

Sensitivity Analysis: Weighted Survey Techniques

As the NHAMCS is a multistage probability sample, it is recommended that users account for the visit weights when drawing inferences from the data. Although we used the data as an example of a nonparametric data set and are not proposing conclusions based on relationships between variables, we performed bivariate tests and multivariate linear regression using both survey and unweighted techniques, as readers familiar with NHAMCS will expect. When the weighted techniques were used for the bivariate tests (t-tests), one more variable was nonsignificant with the t-test (Hispanic ethnicity, p = 0.02), but all other directions of effect and conclusions were unchanged from the results reported above. When the multivariate models were run with survey weights applied, two more variables were nonsignificant in the raw ED LOS model (hospital region, p = 0.02; and ownership, p = 0.08) and one in the log-transformed ED LOS model (hospital ownership, p = 0.12). Most importantly, no examples of type I error were identified when data were analyzed with weighted survey techniques.


Statistical literacy is important for emergency physicians engaged in research or quality improvement or interested in practicing EBM. To recognize sources of bias in research, readers should be able to identify biases associated with inappropriate experimental design or inappropriate statistical methods. “Parameter” and “parametric” are among the most frequently misunderstood words in clinical research. A parameter is defined as the unknown value of a variable in an entire population, which is derived by estimating the value of that variable from a random sample derived from that population. The value of the variable in the sample is called the “sample value” or the “point estimate.”

Accuracy and validity of a “nonparametric” statistical inferential statistical test result does not rely on an assumption that the outcome and predictor variables are distributed normally in the source population. Nonparametric tests, unlike parametric tests, do not have to be “robust” against violations of the inherent mathematical assumptions of the test. An assumption of parametric tests is normality of the distribution of the data analyzed by the inferential test. Nonparametric statistical tests do not assume that the parameter is distributed normally, and thus they are appropriate and accurate (robust) even when the sample value is not distributed normally.

The easiest example to illustrate the difference between parametric and nonparametric is a comparison of mean versus median. The mean is inferior to the median as a summary of the central tendency of the data because the mean is a misleading indicator of central tendency when the data are skewed. In contrast, the median is said to be more robust, because it remains an accurate description of the central tendency (50th percentile) of the sample, even when the values of the individual data are skewed. In the case of income, for example, there is rightward skew, meaning that there are a few values that are very high. The presence of one billionaire in a less-developed country will change mean income significantly, without affecting median income. The mean is accurate as a summary statistic of central tendency only when the distribution is normal, or bell-shaped. On the other hand, the median is determined by identifying the middle value in the list of incomes and is an accurate description of the central tendency regardless of whether the data are skewed. In our NHAMCS example, sex provides a clear illustration of this principle. The mean LOS is not statistically different between men and women (228.6 vs. 229.5), while the median LOS differs significantly (159 vs. 167). This becomes apparent examining the distribution of ED LOS by percentile between men and women. Men had shorter LOS until approximately the 85th percentile, above which their ED LOS exceeded that of women. Thus the rightward skew of these outliers influences the mean enough to cause the populations to acquire a similar measure of central tendency when comparing means. However, by comparing ranks, it becomes clear that for the majority of the population (approximately 85% of patients), men have a shorter LOS than women.

Length of stay in health care settings tends to have a rightward skew due to the minority of patients who have disproportionately long stays (Figure 1).13,14,35 Health care expenditures, utilization of health services, and consumption of unhealthy commodities are other common skewed variables.15 Distributions of data with heavy tails, extreme skews, or unknown population characteristics should be analyzed with nonparametric tests.17Table 4 displays common statistical tests and explains why each is parametric or nonparametric.

Table 4. 
Commonly Used Parametric and Nonparametric Statistical Tests
Statistical TestWhat Is Being ComparedWhy Parametric or Nonparametric
Parametric statistical tests
 Student’s t-testWhether the mean is larger in one group vs. the other.Only accurate if the values are distributed normally.
 Linear regressionSlope, which is a meanOnly accurate if four assumptions are met: normal distribution of dependent variable, homoscedasticity, lack of autocorrelation, and linearity.
Nonparametric statistical tests
 Wilcoxon rank sum, Mann-Whitney  U, and Kruskal-Wallis tests  (inferential tests for ordinal data)Whether values tend to be larger in one group vs. the other.Result is accurate whether data are distributed normally or not.
 Chi-square test (inferential test for nominal data)Chance of falling into one group or another.Result is accurate whether data are distributed normally or not.

When authors and readers of the medical literature express concern about inappropriate use of parametric statistical tests, they often assume that inappropriate use of an inferential statistical test can result in an erroneously low p-value or an erroneously narrow 95% confidence interval (CI). This can potentiate reaching a conclusion that a difference between groups exists in the source population, when in reality no difference exists. In other words, it is widely and incorrectly assumed that inappropriate use of inferential tests potentiates making a type I error. In fact, the true problem with parametric tests is that they lack power when their assumptions are violated, as occurs when they are applied to analyze nonparametric data.17 When analyzing nonnormally distributed data with parametric tests, the analyst is more likely to fail to detect a difference in the source population when one truly exists. This is the definition of a type II error.17 Choice of the wrong statistical test is unlikely to result in type I error.

Our demonstration elucidates this principle. The 2006 NHMACS ED LOS data had a strong rightward skew. If we had used the Student’s t-test to analyze the covariates, we would have concluded that care by a mid-level provider, and patient sex, were not associated with ED LOS. However, using the Wilcoxon rank sum test, we found that these variables were significantly associated with ED LOS. If applied to actual ED LOS analyses, this error could affect quality improvement projects or interhospital comparisons.

Failure to transform a skewed dependent variable during linear regression is similar to using parametric tests on a nonnormally distributed variable. The statistical methodology underlying linear regression depends on four assumptions, one of which is a normal distribution of the dependent variable. The violation of this assumption jeopardizes the fit of the model,36 resulting in a decreased R2 value. This can affect the result of a study because an ill-fitting model underestimates the relationship between the dependent variable and the independent variables. It may be that the dependent variable is very strongly correlated with the set of independent variables, but if the dependent variable is highly skewed, the linear regression will not completely capture the magnitude of the relationship. The solution is to transform the nonnormal variable so that it approximates the normal distribution or use a nonlinear model equation that better fits the data. Figure 1 visually illustrates how a log-normal distribution closely fits the NHAMCS data, while a normal distribution does not. For highly skewed data, one can log-transform the dependent variable or use variations of multiple regression that better fit skewed data, such as Poisson regression and negative binomial regression.31

Statistical techniques that maximize power, therefore minimizing type II error, allow researchers to find significant differences between groups with smaller sample sizes.15,37 This is particularly salient for ED LOS, which is often analyzed in small single-site quality improvement projects. In these small-scale projects, the inherently lower power of any selected statistical test, due to the predictable effect of the likely small sample size, is likely to affect the results and conclusions. If a study examines an operational intervention aimed to reduce median ED LOS, comparison of means can cause the investigator to miss a true reduction in ED LOS. For example, a reduction in lab test turnaround time that accounts for 5 minutes per patient may not appear significant if a small number of patients are boarding in the ED with ED LOS of greater than 24 hours, whereas it may actually affect the median ED LOS and represent improved ED operations and care. Our NHAMCS demonstration had an extremely large sample (n = 26,618) and was therefore highly powered. Nonetheless, we were able to demonstrate type II error in both bivariate and multivariate tests conducted with inappropriate parametric methods. This effect would be amplified in a smaller study, which is inherently lower powered, due to the smaller sample size, and thus likely to yield p-values closer to the traditional “cutoff” of 0.05.

In our NHAMCS demonstration, we analyzed ED LOS in the 2006 NHAMCS data set, a chart review survey designed to allow national estimates of ED visits by multistage probability sampling. When analyzing NHAMCS to determine accurate estimates of national visits, it is important to use weighted survey techniques in statistical software. We performed analyses with both weighted survey statistics and unweighted statistics. We did not find any material differences between these techniques, so we report the unweighted results. To answer our study question, whether applying a parametric test to a nonparametric data set leads to divergent results, we believe that the unweighted statistics provide a sufficiently accurate and informative answer. First, we used NHAMCS as an illustrative data set to show the relationship between nonparametric and parametric tests when applied to nonparametric data. We are not proposing conclusions about the relationships between the actual variables (for example, patient race and ED LOS). Alternatively, we could have generated a simulated data set for the same analysis,17 but as ED LOS in NHAMCS is nonnormally distributed, it is a reasonable data set in which to test this hypothesis. Second, the unweighted results are more easily comparable between techniques, as there do not exist any straightforward nonparametric tests that account for survey weights. By design, nonparametric tests assign value to each observation’s rank, not its value, so are not directly applicable to weighted observations. Statistical models have been developed to apply nonparametric statistics to weighted samples, but they are not in common use.38–40 Focusing on the use of survey weights risks “losing the forest for the trees,” as we found no material difference in results between standard techniques and complex survey techniques. The major finding of our analysis is that parametric and nonparametric statistical tests will produce material differences in statistical results if applied to nonparametric data.


In analyzing the results of our literature review, we assumed that ED LOS data are nonnormally distributed and that the use of nonparametric methods might have changed the result of the studies in some cases. We cannot definitely prove this assumption without access to the data sets of all articles. However, LOS in health care settings is known to be skewed,13,14 and our NHAMCS simulation demonstrates that the use of nonparametric methods can quantitatively affect results at alpha levels commonly reported in medical journals. Additionally, all articles that performed tests of normality on their data elected to use nonparametric tests, which implies a widespread belief by these authors that including formal analysis of normality before deciding which statistical test to use does affect the results of a study. Our review is also likely affected by publication bias, since studies with a negative parametric result may not have reached publication. Our study investigates the effect of data distributions on hypothesis testing as opposed to parameter estimates. Hypothesis testing is frequently used in EM literature and quality improvement projects, and an understanding of the strengths and limitations inherent in different statistical methods, and their influence on type I and type II error rates, is important. An alternative strategy that is sometimes useful is graphical illustration of data (e.g., histograms) that will show distribution differences.41 However, this approach does not provide a standard way for readers to draw inferences from the data displayed and is still relatively uncommon.


Understanding the descriptive and analytic statistics that underlie EM research is an essential part of the practice of EM.1,4,5 Emergency department length of stay has been identified as a key process metric of ED function, and numerous studies have shown an association between ED length of stay and patient satisfaction,21,22 quality of care,19,20 and departmental revenue.23 With the recent National Quality Forum approval and Joint Commission implementation of ED length of stay as a national metric of quality care, there will be pressure to reduce ED LOS, as there has previously been upon door-to-balloon time in ST-elevation myocardial infarction.25 However, in the current EM literature, most studies analyzing ED length of stay use inappropriate statistical methods, designed for parametric data. This can bias the conclusions regarding quality and efficiency and may affect reimbursement in the future. We cannot afford to miscalculate our own outcomes.

Contrary to popular belief, inappropriate use of parametric tests results in an increased probability of a type II error, not a type I error. If a study reports a statistically significant finding after using inappropriate parametric statistics, the finding would be statistically significant if derived via nonparametric methods. It should therefore be considered valid, because the result would be expected to have had a lower p-value, which is to say a lower likelihood to have occurred by chance alone, if the data were reanalyzed with appropriate nonparametric methods.17