Statistical presentation and analysis of ordered categorical outcome data in rheumatology journals


  • Michael P. LaValley,

    Corresponding author
    1. Boston University School of Public Health, Boston University School of Medicine, Boston, Massachusetts
    • Boston University Arthritis Center, 715 Albany Street, A203, Boston, MA 02118
    Search for more papers by this author
  • David T. Felson

    1. Boston University School of Public Health, Boston University School of Medicine, Boston Medical Center, Boston, Massachusetts
    Search for more papers by this author



To assess the appropriateness of presentation of summary measures and analysis of ordered categorical (ordinal) data in three rheumatology journals in 1999, and to consider differences between basic and clinical science articles.


Six hundred forty-four full-length articles from the 1999 editions of 3 rheumatology journals were evaluated for inclusion of an ordinal outcome. Articles were classified as basic or clinical science, and the appropriateness of presentation and analysis of the ordinal outcome were assessed. Chi-square tests were used to evaluate difference in percentages.


Ordinal outcomes were identified in 175 (27.2%) of 644 articles. Only 69 (39.4%) had appropriate data presentation, and 111 (63.4%) had appropriate data analysis. Appropriate presentation was seen less commonly in the basic science rather than the clinical science articles, but differences in the occurrence of appropriate analysis were not seen.


Ordinal data are common in rheumatology articles, but presentation usually does not conform to recommended guidelines.

Ordinal data are generated when observations are placed into ordered categories. Such data are often generated by scoring radiographs or histologic slides, or from evaluating questionnaire responses. Ordinal data contain more information than categorical data without ordering (nominal data), but do not contain as much information as continuously measured data. This makes presentation of summary measures and hypothesis testing with ordinal data challenging.

Previous analyses of medical research articles have suggested that ordinal outcome data is often presented or analyzed in ways that do not account for either the ordering or the categorical structure of the data (1–3). This can lead to biased estimates and reduced ability (low power) to detect important effects. Ordinal variables may be dichotomized as being above or below a fixed cut-off value and treated as binary (0/1), but this combines different levels together and can sacrifice information from the original scale (1). Contingency table methods that are appropriate for unordered categorical data do not take advantage of ordering in the data, resulting in loss of information and difficulty in interpretation (1). Methods for continuous data, such as the mean, standard deviation, Student's t-test, and F test, make several assumptions (e.g., consistent spacing, symmetry, and normality of the data distribution) that are generally not satisfied by ordinal data. As noted by Altman and Bland, “Although some statistical methods, such as the t-test, are not sensitive to moderate departures from normality, it is generally preferable not to rely on this feature”(4).

To use the order information in ordinal data, but to avoid unnecessary assumptions, biostatistics textbooks (5) and journal articles (1, 3, 4, 6–9) have recommended that nonparametric methods based on ranking the data be used. These methods include use of percentiles, the median, range, and interquartile range for presentation of summary measures (9), and the Wilcoxon (1) and Kruskal-Wallis (3) tests for using the data to test hypotheses.

To evaluate whether inappropriate presentation and hypothesis testing with ordinal data is a current problem in rheumatology literature, we examined articles published in 1999, from 3 rheumatology journals. Our objectives were to assess the percentage of articles that use ordinal outcomes in rheumatology journals, to estimate the percentage of articles with presentation of summary measures and analysis with ordinal data that are appropriate, and to determine if there is a difference in percentage of articles with appropriate presentation or analysis between basic and clinical science articles. To simplify data collection and analysis, we focused on ordinal variables used as study outcomes, and excluded ordinal variables used solely as predictors of an outcome.


To assess the current use of statistics for ordinal data in rheumatology research publications, we evaluated three journals: Arthritis & Rheumatism (A&R), Journal of Rheumatology (JR), and Arthritis Care & Research (AC&R). All 1999 issues of these journals were hand searched for full-length research articles. Editorials, case reports, and letters were excluded from consideration. A statistician (ML) with a standardized extraction form evaluated articles for inclusion of an ordinal outcome (yes/no). A variable was considered to be an outcome if summary statistics were presented for the variable, if it was compared between groups, or if it was predicted by other variables. All other variables were considered to be predictors. Articles with an ordinal outcome were then evaluated for appropriateness of presentation of summary measures (yes/no), and appropriateness of analysis (yes/no). Articles were also classified in basic or clinical science categories according to either the journal subheading (A&R) or by a rheumatologist (DF). Clinical science articles were defined as those reporting research in which the whole patient was studied.

Two types of statistical methods were considered: 1) presentation of summary measures (called descriptive statistics), and 2) statistical analysis (called inferential statistics). Appropriate methods for the presentation of summary measures were defined to be any of those listed in the presentation category in Table 1. Listing the mean and standard deviation was not considered to be adequate if it was never stated in the article that normality had been assessed for the outcome. If normality was tested and present, use of methods for continuous normally distributed data were considered appropriate. If both appropriate and inappropriate presentations for an outcome were listed, the method was classified as appropriate. What is termed analysis in this article consists mainly of hypothesis testing, but also includes measures of association and confidence intervals. Appropriate analysis of ordinal outcomes was defined to be any of the methods listed in the analysis category of Table 1.

Table 1. Appropriate assessment of presentation and analysis of ordinal outcomes
PresentationPercentage within each category
 Median and range or interquartile range
 Mean and standard deviation after assessment of normality
AnalysisNonparametric test, Spearman correlation, or ordinal logistic regression
 Pearson correlation, t-test, linear regression after assessment of normality
 Logistic regression if dichotomization justified on clinical or scientific grounds

Percentages are used for summary measures in analyses of these data. Testing for associations between article type and journal on the percentages of appropriate presentation and analysis was done with chi-square tests at the 0.05 level of significance. Statistical analysis was performed with SAS version 8 (SAS Institute, Cary, NC).


A total of 644 articles were evaluated (282 A&R; 322 JR, and 40 AC&R), of which 175 (27.2%) were identified as having an ordinal outcome. Percentage of articles with ordinal outcomes varied between the journals (A&R 16.0%, JR 31.1%, AC&R 75.0%). Of these 175 articles with an ordinal outcome, 145 (82.9%) were clinical science topics and 30 (17.1%) were basic science. Some of the ordinal outcomes used in articles included in this sample were Kellgren/Lawrence score (10), staining intensity (11), histologic score (12), Rodnan skin score (13), severity of Lyme arthritis (14), erosion score (15), Larsen score (16), pain measured on a Likert scale (17), and questionnaire response (18).

Only 69 (39.4%) of 175 articles had appropriate presentation of summary measures for ordinal outcomes (Table 2). The percentage of articles using an appropriate presentation was higher for clinical science than for basic science articles (chi-square = 6.9, P = 0.0085), although the rate in both groups was low. In the 106 articles without appropriate presentation of summary results, means and standard deviations were used without assessment of whether the data were normally distributed.

Table 2. Appropriate presentation and analysis in articles with an ordinal outcome
Articlesn% Appropriate presentation% Appropriate analysis
Clinical science14544.361.4
Basic science3020.071.4

Overall, 111 (63.4%) of the 175 articles with an ordinal outcome used appropriate analysis of the ordinal outcome (Table 2). There was no significant difference in the percentages of appropriate analysis between basic and clinical science articles (chi-square = 1.21, P = 0.2719), although the percentage in basic science articles was higher. Of the 64 articles without appropriate analysis, 63 used procedures appropriate for normally distributed data (usually the t-test) without asserting that normality had been assessed, and one dichotomized the outcome without justification of the cut-off value.

When the data were analyzed for each journal, there were no significant differences in the percentage of articles with appropriate presentation of summary measures for ordinal outcomes (chi-square = 4.00, P = 0.1350), or for percentage with appropriate testing of an ordinal outcome (chi-square = 3.31, P = 0.1912).


Ordinal outcome data are common in these 3 rheumatology journals, appearing in ∼25% of all research articles. Appropriate presentation of summary results for ordinal data was uncommon, occurring in only about 40% of articles, and was less frequent in basic science articles than in clinical science. A majority of articles used appropriate hypothesis tests with ordinal outcomes, and this result did not vary significantly between article types. However, in general there is room for improvement in presentation and analysis of ordinal data.

Several assessments of the use of ordinal data in medical research articles have been performed in the past. Moses et al (1) surveyed articles from the New England Journal of Medicine for the first six months of 1982 and found 18% (32 of 168) of these used ordinal data. They found inappropriate analysis due to dichotomizing the outcome in 8 out of 27 analyses (30%) and use of contingency tables that ignored the ordering in 9 analyses (33%). Avram et al (2) examined 243 articles from two anesthesia journals in 1981 and 1983 for errors in statistical methods for all outcome types. They found that the most common error in presentation of summary measures was description of ordinal data by means and standard deviations, with 60 instances out of the 65 major presentation errors discovered. Of 308 analysis errors discovered, 24 were due to analyzing ordinal data as if continuous. Forrest and Anderson analyzed 175 papers with ordinal outcomes published in 1982 in 12 major medical journals (3). Of 188 presentations of summary measures, only 49 (26.1%) were appropriate; of 336 hypothesis tests, 116 (35%) were done appropriately.

Our results are not directly comparable to these results from previous assessments of methods for ordinal data in the medical literature. Unlike Moses et al (1) we examined the use of methods for presentation of summary measures. We have reported the percentage of articles that have used an inappropriate method rather than the percentage of errors that are of a certain type (as in Avram et al [2]) or the percentage of presentations or analyses that were inappropriate (as in Forrest and Anderson [3]). However, we found much lower rates of dichotomization and use of contingency table analysis that ignore ordering than was reported by Moses et al. In addition, if we assume that percentage of inappropriate presentations and tests found by Forrest and Anderson is similar to the percentages of articles with inappropriate presentations and tests in their sample, then we found higher percentages of articles with appropriate presentation of summary measures and analysis than they did. Any such improvement could be due to secular improvements in use of statistical methods in medical research or due to differences in the journals used in sampling.

There is a tradition of defending the use of tests designed for continuous normally distributed data for the analysis of ordinal data that originates in psychology and the social sciences (19–21), and is found to a lesser extent in the medical literature (22, 23). The defense has centered on the empirical observation that the significance level for some tests designed for continuous data (mainly the t-test and the F test) is approximately correct when used for ordinal data (20, 23). However, these empirical observations do not necessarily extend outside the particular data distributions considered in these papers. Also, there is the issue of test validity: if a test designed for continuous normally distributed data is statistically significant on ordinal data, and a rank-based test is not, which test result should be used? Use of a test that may achieve significance by drawing on assumptions known not to be true (e.g., data normality) over one that does not use these assumptions seems questionable. Finally, there is a misconception that the t-test will be more powerful statistically for ordinal data than a nonparametric test (22, 24). The t-test is more powerful for normally distributed data, but the Wilcoxon test has been shown to be more powerful on a variety of real-world continuous data distributions that are not normally distributed (25). Ordinal data are also not normally distributed.

Unlike the defense of parametric analysis noted above, there is no tradition of defending the presentation of means and standard deviations as summary measures for ordinal variables. Even apologists for analyzing ordinal data with methods designed for continuous data suggest that these methods of presentation are incorrect (20, 24). The main rationale given in introductory statistics textbooks for providing the mean and standard deviation as summary statistics is that for the normal distribution the central 95% of the data fall within 2 standard deviations of the mean (26). This rationale does not hold if the data distribution is not symmetric. In the absence of a symmetric distribution, medians and percentiles are more informative as descriptive statistics than means and standard deviations (9). Therefore, the high levels of inappropriate presentation found in our study and in the studies by Avram et al (2) and Forrest and Anderson (3) point to an ongoing concern in the medical literature.

Although we have classified as appropriate the use of testing designed for normally distributed data following assessment of normality, we do not feel that this approach should be used for ordinal outcomes because those data are not normally distributed. Similarly, we would not recommend the use of means and standard deviations following assessment of normality, although this was classified as appropriate for presentation. If means and standard deviations are presented to aid interpretation or make explicit comparisons with previous research, the median and range or percentages should also be presented. In addition, we have allowed the use of any type of ordinal logistic regression to be counted as appropriate, and this is also generous. For all of these models, the assumptions behind them need to be validated for the data under consideration before placing trust in the analysis results (27–30).

Our recommendations for methods of presentation and analysis of ordinal outcomes are listed in Table 3. These recommendations are based on the literature cited in the references and on our experiences in presentation and analysis of rheumatologic and medical outcomes. This is not intended to be an exhaustive list, but is intended to provide guidance as to the most commonly used methods. Recursive-partitioning (31), Rasch or Item-Response (32), or Bayesian (33) analyses provide appropriate alternative approaches for ordinal outcomes.

Table 3. Recommendations for presentation and analysis of ordinal outcomes
 ≤4 categories, use percentiles
 ≥5 categories, use median and interquartile range
 Group comparisons, use Wilcoxon or Kruskal-Wallis tests
 Correlation, use Spearman's rho or Kendall's tau
 Regression analysis, use ordinal logistic regression with appropriate model validation

In summary, we found that the majority of current articles in rheumatology journals presenting summary measures for ordinal data do not conform to recommendations from journal articles and biostatistics textbooks. To a lesser extent, analysis of ordinal data also does not conform to recommendations. Standard statistical software implements the recommended methods, so there are few barriers to the appropriate presentation and analysis of ordinal data.