SEARCH

SEARCH BY CITATION

Keywords:

  • biostatistics;
  • statistics;
  • epidemiology;
  • evaluation;
  • assessment

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Population of Interest
  5. Developing the Instrument
  6. Methods
  7. Results
  8. Discussion
  9. Conclusion
  10. Acknowledgments
  11. References

Introduction

Although regression is widely used for reading and publishing in the medical literature, no instruments were previously available to assess students’ understanding. The goal of this study was to design and assess such an instrument for graduate students in Clinical and Translational Science and Public Health.

Methods

A 27-item REsearch on Global Regression Expectations in StatisticS (REGRESS) quiz was developed through an iterative process. Consenting students taking a course on linear regression in a Clinical and Translational Science program completed the quiz pre- and postcourse. Student results were compared to practicing statisticians with a master's or doctoral degree in statistics or a closely related field.

Results

Fifty-two students responded precourse, 59 postcourse , and 22 practicing statisticians completed the quiz. The mean (SD) score was 9.3 (4.3) for students precourse and 19.0 (3.5) postcourse (P < 0.001). Postcourse students had similar results to practicing statisticians (mean (SD) of 20.1(3.5); P = 0.21). Students also showed significant improvement pre/postcourse in each of six domain areas (P < 0.001). The REGRESS quiz was internally reliable (Cronbach's alpha 0.89).

Conclusion

The initial validation is quite promising with statistically significant and meaningful differences across time and study populations. Further work is needed to validate the quiz across multiple institutions.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Population of Interest
  5. Developing the Instrument
  6. Methods
  7. Results
  8. Discussion
  9. Conclusion
  10. Acknowledgments
  11. References

Research training has enabled academic clinicians to contribute significantly to the body of medical research literature. Biostatistics represents a critical methodological skill for such researchers, as statistical methods are increasingly a necessary part of medical research.[1, 2] Both bivariate tests and complex analyses are also used much more frequently in the New England Journal of Medicine than in Nature, a publication focusing on bench science.[3] Training programs are in place across the United States to inculcate physicians and other health professionals with the research skills to succeed in this publication environment. However, statistical training is plagued by a history of being a difficult subject which is often poorly taught.[4] Enders (2011)[5] identified a gap in the literature; no validated knowledge and skills assessment is available that is appropriate for graduate level biostatistics in academic medical research. Such an instrument would be a critical tool upon which to base research on assessing and improving knowledge and retention of statistics. In particular, although some tools are available to assess physicians’ ability to read the medical literature[6, 7] and to assess undergraduate introductory statistics coursework,[8] no instruments are available to assess understanding of linear regression.

Population of Interest

  1. Top of page
  2. Abstract
  3. Introduction
  4. Population of Interest
  5. Developing the Instrument
  6. Methods
  7. Results
  8. Discussion
  9. Conclusion
  10. Acknowledgments
  11. References

Clinical and Translational Science (CTS) is a field of growing importance. In the past, scientific advances were often made in the laboratory setting and then not tested or implemented among patients.[9] The National Institutes of Health have created a series of grant awards to promote translation of scientific advances from bench to bedside to population. The most recent and largest of these are Clinical and Translational Science Awards (CTSA).[9] A criterion of the CTSA is that part of the funding be used to train physicians and other health professionals in research methods. As such, the CTSA national consortium has added to the existing Public Health (PH) programs to increase the totality of graduate training in research that is designed specifically for health professionals. Enders (2011)[5] has shown that CTS[10] and PH[11] have similar competencies for statistics education and has created a unified set of these competencies.

Reading the medical literature

Statistics competency is based in part on the ability to read the medical literature. Published original medical research papers now nearly always include statistics, and the complexity of the statistical methods has increased over time.[2, 12] West and Ficalora (2007) found that only 17% of physicians felt they had adequate training in biostatistics.[13] An objective assessment instrument is indicated by Young, Glasziou, and Ward (2002)[14] who found that self-assessment of statistical ability by physicians was usually inappropriately optimistic.

Publishing in the medical literature

Statistics competencies also arise from the need for trainees to develop the skills necessary for publishing their research. Guidelines for publishing randomized trials and observational studies[15-19] require high level competency for performing and communicating complex interrelationships of variables, including adjustment for confounding and assessment of interaction. These guidelines reflect the shift in the medical literature towards more complex statistical methods. Seventy percent of papers in the New England Journal of Medicine used advanced analyses in 2004.[3] Furthermore, trainees are typically expected to work in a multidisciplinary environment to achieve their research goals. This requires both the ability to effectively communicate with a statistician and the ability to effectively communicate the results of complicated analyses. Trainees may not need to perform advanced statistical methods themselves, but they certainly need to understand and interpret the results.

To target improvements in training and ensure that research trainees are adequately prepared, an assessment instrument is needed for statistics. The goal of this work is to provide initial validation of a new 27-item instrument to assess mastery of linear regression for graduate-level medical researchers.

This publication was supported by CTSA Grant Number UL1 TR000135 from the National Center for Advancing Translational Science (NCATS). Its contents are solely the responsibility of the author and do not necessarily represent the official views of the NIH.

Developing the Instrument

  1. Top of page
  2. Abstract
  3. Introduction
  4. Population of Interest
  5. Developing the Instrument
  6. Methods
  7. Results
  8. Discussion
  9. Conclusion
  10. Acknowledgments
  11. References

A firm foundation in linear regression provides a gateway to more advanced topics in statistics. By nature of its easily -visualized results, linear regression is often the foundation upon which courses in logistic regression and survival analysis are taught. Once a student understands how to cope with different types of predictor variables and understands the way in which multiple predictor variables may act in concert, these concepts translate directly to all other forms of regression. Complex topics such as confounding and interaction are typically included in courses on linear regression.[20-22] These topics were included in the assessment instrument because of their intrinsic importance for clinical and translational research.

Question development

Instrument items were initially identified from past exam questions written by the author which had both performed well and did not require excessive time. The CTS and PH statistical competencies were reviewed to ensure topics appropriate for a linear regression course were included in the draft instrument. The answer option “I don't know” was included with each question to minimize random guessing. The questions were organized into six domains: Interpreting and Using the Regression Equation (this broad topic was split into two separate but related domains for simple and multiple linear regression (SLR and MLR); Modeling and Statistical Significance; Assessing Assumptions; Confounding and Colinearity; and Interaction. The instrument was titled the REsearch on Global Regression Expectations in StatisticS (REGRESS) quiz. The mnemonic refers to the intended future use of the REGRESS quiz as an outcome measure for research on students’ overall understanding of regression.

Statistical review

The draft REGRESS quiz was sent to a set of 11 statisticians who both teach and work with clinical and translational scientists. The majority of the reviewers were identified through the CTSA national consortium's Biostatistics, Epidemiology, and Research Design group. Numerous changes were made to the questions and answer options in response to feedback. In addition, questions were added to address more simple concepts. Several questions which duplicated topics were dropped to minimize overall length. Questions were reworded to include more description of the variables and settings. Finally, discussion on how regression may be taught as a stand alone course or as one topic within a larger course led to the creation of an a priori subset of questions on SLR to assess simpler constructs more achievable in the latter setting.

Draft implementation

The draft quiz was implemented at the end of a CTSA course on linear regression. After students completed the quiz, a review session before the final exam was devoted to reviewing questions and solutions from the REGRESS. During the lengthy discussion of the questions, no issues were raised with the question wording or clarity, even after direct prompting. When taken together with the frequency of “I don't know” responses which correlated with anticipated question difficulty, this was considered adequate assessment of the instrument's wording. As the instrument was not further modified based on the draft implementation, these responses are included in the results below. The REGRESS quiz used in this study was version 5.1.

Domain descriptions

For a copy of the original instrument, please contact the author at Enders.Felicity@mayo.edu.

Interpreting and using SLR equation

Understanding SLR is critical for understanding multivariate analysis. This domain is designed to assess whether a student can apply the SLR equation and describe what the various components mean. There are also two global items contained within this domain to assess understanding of the equation as a whole, namely that association does not mean causation and that one should only use the results of the regression equation within the bounds of the data.

Interpreting and using MLR equation

Correctly interpreting the MLR equation includes understanding that the results are always adjusted for the other factors in the model. The domain also includes issues relating to what the coefficients for different types of predictors mean when used together in the same model. These concepts were often assessed by demonstrating the ability to link a figure with the appropriate equation. Assessing understanding within the context of interaction was specifically excluded from this domain.

Modeling and statistical significance

To appropriately read and communicate results from statistical software, students need the ability to understand and assess statistical significance within the SLR and MLR settings. This includes both assessing statistical significance within the results for a single model and assessing statistical significance across a series of “nested” models. Statistical significance includes the use of both P values and confidence intervals.

Assessing assumptions

This domain was designed to assess the assumptions of linear regression and the students’ ability to identify violations of those assumptions. There are a few additional items which are not included within the domain which are arguably associated with understanding of assumptions. These include the concepts that interpretation of regression coefficients refer to the mean value rather than individual data, that association is not causation, and that one should only assess within the range of the data. These items were excluded from this domain so that the domain scores would be based upon mutually exclusive items.

Confounding and colinearity

The manuscript guidelines, in particularly those for observational studies, make it clear that an understanding of confounding is critical to a researcher's success in publishing and reading the medical literature.[15-19] Colinearity was combined with confounding as both represent potential problems within the regression which may easily be missed with negative consequences. These two concepts are represented within the domain both in terms of identifying scenarios under which they are likely to occur and observing that they have occurred. The particular terms “confounding” and “colinearity” are deliberately excluded from some questions to address a deeper level of understanding.

Interaction

Also called effect modification, understanding interaction is critical for a thorough understanding of regression. It is also discussed in the manuscript guidelines[15-19] and in both the CTS[10] and PH[11] competency documents.[5] The student's ability to identify that interaction has occurred is assessed. Interpretation of interaction is assessed by a student's ability to select the correct equation associated with a figure. This requires detailed knowledge of how an interaction variable behaves within the model when different types of predictor variables are involved.

Methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Population of Interest
  5. Developing the Instrument
  6. Methods
  7. Results
  8. Discussion
  9. Conclusion
  10. Acknowledgments
  11. References

Target audience

The REGRESS quiz was designed for students and scientists in CTS and PH. For the initial validation, students in a CTS course on linear regression were invited to participate. Practicing statisticians who work with researchers in CTS were also invited to complete the quiz. Only respondents who reported holding an MS or PhD in statistics or a related field were included. The practicing statistician group had no overlap with the statistical reviewers who assessed the draft instrument.

The course used for the pre/post assessment covered topics including: assessing correlation; interpreting regression parameters for continuous, binary, and categorical predictors alone and in a MLR setting; assessing statistical significance for continuous, binary, and categorical predictors; understanding and assessing assumptions; model building; confounding; interaction; colinearity; and an introduction to logistic and Cox regression. To enroll in the course, students were required to complete a prior course on introductory statistical methods.

To achieve the target recruitment, multiple offerings of the same course were included in the sample (2010–2012). The lectures, computer laboratory assignments, and homeworks for the course remained largely unchanged during the study.

Survey administration

The REGRESS quiz is housed online through FreeOnlineSurveys.com. Participants were asked to follow a link to the survey from an email invitation. One reminder was sent at each time. Participants had the opportunity to complete the quiz more than once, but only the initial response from consenting respondents was included in the analysis sample.

Statistical methods

The sample of student data was assessed among those who completed the entire survey, defined as answering any one of the last five questions. Practicing statisticians with a master or doctoral degree in statistics or a related field served as the positive control. Nonresponse or response of “I don't know” was coded as incorrect. Demographic data was presented with the number and percentage in each category. The overall REGRESS score and SLR score were presented with the mean and standard deviation for each group and time. Among students, pre/post data were compared with a two-sample t-test. Paired analyses were not used as the subset of paired data was fairly small and an unpaired analysis was a conservative option. The data from student respondents at each time was compared with results from practicing statisticians with a 2-sample t-test. The impact of using data from multiple years was assessed by predicting the student REGRESS score with both time (pre/post) and year. Each REGRESS domain was presented with the median and range of the score for each group and time. Student pre/post domain scores were compared with the Wilcoxon rank-sum test, which was also used to compare student data at each time to the practicing statisticians’ results. Individual items were compared pre/post and between students and practicing statisticians with Fisher's exact test. Internal reliability of each score was calculated with Cronbach's alpha using responses from all groups and times. Reliability was further assessed for each score by excluding each item in turn to determine the impact on Cronbach's alpha. Two-sided 5% type I errors were used throughout. The SAS statistical software package was used (version 9.2, Cary, NC, USA).

Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Population of Interest
  5. Developing the Instrument
  6. Methods
  7. Results
  8. Discussion
  9. Conclusion
  10. Acknowledgments
  11. References

Respondents

Responses were collected from 52 students precourse and 59 students postcourse from 101 students, giving a response rate of 51% and 58%, respectively. An additional 3 and 19 students respectively submitted responses but did not consent or complete the quiz. Responses were also collected from 22 practicing statisticians.

Twenty (47%) of the students were male (one student did not provide gender). The majority of students (89%) had never studied linear regression. Those who had studied regression reported that it was one topic among many in a single course which was most often completed more than two years previously. 11 (50%) of the practicing statisticians were male. 17 (77%) had a master's degree in statistics or a closely related field; the remainder held doctoral degrees. 5 (23%) of the practicing statisticians had completed their degree within the past two years; 7 (32%) had completed their degrees six to 10 years previously, and the remainder (46%) completed their degree more than 10 years previously. All the practicing statisticians were actively engaged in research.

Summary and domain scores

The mean (SD) overall REGRESS score (out of 27 questions) was 9.3 (4.3) for students precourse and 19.0 (3.5) postcourse (P < 0.001). Precourse students had significantly lower scores than practicing statisticians who had a mean (SD) of 20.1(3.5) (P < 0.001). By postcourse, the students were similar to the practicing statisticians (P = 0.21). Figure 1 shows the distribution of the overall REGRESS score by group and time, whereas Table 1 shows summary statistics for the comparisons of interest. The REGRESS scores did not differ significantly or meaningfully by year of the course offering (data not shown).

Table 1. Summary and domain scores by group and time
 Students precourse N = 52Students postcourse N = 59P valuePracticing statisticians N = 22P value versus pre*P value versus post
  1. *Shows comparison of practicing statisticians and precourse students.

  2. †Shows comparison of practicing statisticians and postcourse students.

Summary scoresMean (SD)Mean (SD) Mean (SD)  
REGRESS score (of 27)9.3 (4.3)19.0 (3.5)<0.00120.1 (3.5)<0.0010.21
SLR score (of 11)6.3 (2.5)8.3 (1.6)<0.0019.8 (1.2)<0.001<0.001
Domain scoresMedian (range)Median (range) Median (range)  
Interpreting and using SLR equation (of 8)4 (0–8)6 (3–8)<0.0017 (5–8)<0.0010.001
Interpreting and using MLR equation (of 4)1 (0–4)3 (0–4)<0.0013 (2–4)<0.0010.37
Modeling and statistical significance (of 4)2 (0–4)3 (2–4)<0.0013 (2–4)<0.0010.86
Assessing assumptions (of 4)1 (0–4)2 (1–4)<0.0012 (0–4)<0.0010.84
Confounding and colinearity (of 4)0 (0–3)3 (0–4)<0.0013 (0–4)<0.0010.53
Interaction (of 3)0 (0–3)2 (0–3)<0.0012 (0–3)<0.0010.82
image

Figure 1. Distribution of REGRESS scores by group and time.

Download figure to PowerPoint

The average SLR score (SLR score out of 11 questions) was also different for students pre- and postcourse (means of 6.3 and 8.3, respectively; P < 0.001) and for students precourse and practicing statisticians, who had a mean SLR score of 9.8 (P < 0.001). For the SLR score, however, despite postcourse improvement the student group was not similar statistically to the practicing statistician group (P < 0.001), though the difference of 1.5 additional correct questions was small. Further information provided in Table 1.

For all the domain scores, the precourse student group was significantly different from the postcourse group (P < 0.001) and from practicing statisticians (P < 0.001). As the domains included between three and eight questions, these differences tended to be small. For all but one domain, the postcourse group was not significantly different from the practicing statisticians. For the domain of Interpreting and Using the SLR Equation, the postcourse students performed significantly poorer than the practicing statisticians (median of 6 and 7 respectively; P = 0.001). Details are provided in Table 1. The result for Interpreting and Using the SLR Equation domain is similar to that for the SLR score, as these included many of the same questions (see Table 2).

Table 2. Percent correct per item by group and time
Item number and descriptionStudents precourse number (%) of 52Students postcourse number (%) of 59P valuePracticing statisticians number (%) of 22P value versus preP value versus post
  1. *Items included in SLR score.

  2. †Shows comparison of practicing statisticians and precourse students.

  3. ‡Shows comparison of practicing statisticians and postcourse students.

Interpreting and using SLR equation
1 Find slope from graph*44 (85)56 (95)0.1121 (96)0.271.0
2 Find y-intercept from graph*8 (15)29 (49)<0.00114 (64)<0.0010.32
3 Link graph to equation*18 (35)28 (48)0.1816 (73)0.0040.05
4 Change in mean Y given X*22 (42)57 (97)<0.00119 (86)0.0010.12
6 Predicting beyond data*31 (60)25 (42)0.0918 (82)0.110.002
7 Interpret slope, unadjusted*32 (62)57 (97)<0.00121 (96)0.0041.0
8 Association is not causation*38 (73)51 (86)0.1020 (91)0.130.72
10 Assess strength of correlation*30 (58)37 (63)0.7022 (100)<0.001<0.001
Interpreting and using MLR equation
12 Link graph to equation, continuous and categorical X's13 (25)32 (54)0.00214 (64)0.0030.62
14 Link graph to equation, >1 continuous and binary X's10 (19)36 (61)<0.00114 (64)<0.0011.0
21 Predict Y, categorical X24 (46)52 (88)<0.00122 (100)<0.0010.18
22 Interpret slope, adjusted14 (28)51 (91)<0.00120 (91)<0.0010.72
Modeling and statistical significance
5 How to test association*26 (50)58 (98)<0.00122 (100)<0.0011.0
23 Select nested model, not statistically significant3 (6)30 (51)<0.00111 (50)<0.0011.0
26 Select nested model, statistically significant11 (21)49 (83)<0.00116 (73)<0.0010.35
9 Relationship between sample size and statistical significance*45 (87)54 (92)0.5422 (100)0.100.32
Assessing Assumptions
11 Presence of outliers*35 (67)39 (66)1.020 (91)0.040.03
16 Assess independence12 (23)13 (22)1.05 (23)1.01.0
17 Assess homoscedasticity13 (25)57 (97)<0.00120 (91)<0.0010.30
18 Assess normality of error terms5 (10)30 (51)<0.0015 (23)0.150.03
Confounding and colinearity
19 Predict confounding8 (15)49 (83)<0.00116 (73)<0.0010.35
27 Diagnose confounding6 (12)49 (83)<0.00113 (59)<0.0010.04
20 Predict colinearity8 (15)39 (66)<0.00113 (59)<0.0010.61
24 Diagnose colinearity9 (17)14 (24)0.4910 (46)0.020.10
Interaction
13 Link graph to equation, continuous and binary interaction3 (6)34 (58)<0.00112 (55)<0.0010.81
15 Link graph to equation, continuous interaction5 (10)39 (66)<0.00119 (86)<0.0010.10
25 Diagnose interaction9 (18)55 (93)<0.00117 (77)<0.0010.06

Comparison of individual items

For many individual items, there remained a statistically significant different from precourse to postcourse among the students and between precourse students and practicing statisticians; fewer differences were seen between postcourse students and practicing statisticians (see Table 2). There were several items for which the postcourse students remained significantly inferior to the practicing statisticians (items 3, 6, 10, and 11). The topics of these items included linking the graph to the equation, for which the issue seemed to be finding the Y-intercept when the X-axis did not extend to zero; predicting beyond the data; knowing that a strong negative correlation is still strong; and detecting the presence of outliers as opposed to another assumption violation. In contrast, there were also some items for which the postcourse students were significantly superior to the practicing statisticians (items 18 and 27). These topics included diagnosing confounding and selecting the figure in which the normality assumption was most clearly violated. For the latter, many practicing statisticians erroneously chose a histogram of the outcome variable, which fails to show the distribution around the regression line but appears similar to a histogram of error terms if one neglects to read the axis label. There were also a few items which appeared to be easy for this study sample (items 8 and 9) and thus lacked significance for the pre/postcourse comparison. These were both important concepts also taught in the previous course; association is not causation and the nature of the relationship between sample size and statistical significance.

There were a few somewhat surprising results within Table 2. Students performed better precourse than postcourse for item 6 (predicting beyond data), perhaps because the precourse survey was given at the very start of the course rather than before the course and thus overlapped with a lecture on this topic. All respondents performed much better on assessing statistical significance when the results were statistically significant vs. not (items 26 and 23), suggesting that the ability to identify a small P value may be deeply ingrained. There were two topics for which the practicing statisticians performed quite poorly (items 16 and 18). Item 18, on the topic of selecting a figure in which the normality assumption was violated, was discussed in the previous paragraph. For item 16, all groups and all times actually performed equally poorly (22% to 23% correct). This item provides a scenario which happens to include matched data and respondents are asked to select the best interpretation of the results. As such, it mimics a research setting in which an incorrect analysis has been performed. The vast majority of respondents chose to interpret the results rather than choosing the response stating that the model is inappropriate because of study design issues.

The significant difference between postcourse students and practicing statisticians for the SLR score and the Interpreting and Using SLR Equation domain from Table 1 may be observed in greater detail in Table 2. The items in the SLR score are marked with a * in Table 2. There are items contained within both these scores for which postcourse students performed slightly worse than practicing statisticians (items 2, 3, 6, and 10). Together, these likely correspond to the overall differences seen in Table 1.

Internal reliability

Cronbach's alpha for the overall REGRESS score was 0.89 across the full range of groups and times; for the SLR score this value was 0.69 (Table 3). Reliability for the domain scores was reasonable (between 0.61 and 0.75) for all domains except for Assessing Assumptions, for which Cronbach's alpha was 0.28. Excluding each item in turn from the various question sets showed no opportunity for increasing reliability by eliminating individual items.

Table 3. Internal Reliability of Summary and Domain Scores
 Cronbach's alpha*
  1. *Internal reliability was calculated for all respondents

Summary Scores 
REGRESS Score (of 27)0.89
SLR Score (of 11)0.69
Domain Scores 
Interpreting & Using SLR Equation (of 8)0.62
Interpreting & Using MLR Equation (of 4)0.67
Modeling & Statistical Significance (of 4)0.61
Assessing Assumptions (of 4)0.28
Confounding & Colinearity (of 4)0.72
Interaction (of 3)0.75

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Population of Interest
  5. Developing the Instrument
  6. Methods
  7. Results
  8. Discussion
  9. Conclusion
  10. Acknowledgments
  11. References

Scores

The REGRESS quiz is able to distinguish significant and meaningful improvement in understanding over a course in linear regression in this study population. Student scores improved from 9.3 to 19 on average (P < 0.001) so that by the end of the course students showed a similar level of understanding to practicing statisticians for the topic areas assessed.

Domains and items

The six domains all showed statistically significant improvement from pre- to postcourse assessment. For most domains, by postcourse the students demonstrated similar understanding to practicing statisticians. A caveat to these results, however, is that two questions in the Assessing Assumptions domain proved quite challenging so that even practicing statisticians did not score well. One of these two questions (item 16) was on assessing independence, a topic vital to performing analyses appropriate for the study design. Similarly, practicing statisticians performed relatively poorly on several items in the Confounding and Colinearity domain (items 20, 24, and 27). Recognizing situations in which confounding or colinearity might occur or has occurred are critical for reading the literature and for performing appropriate analyses, especially for observational studies.[15-19]

The individual REGRESS items showed a wide range of difficulty. Even within the same domain, some questions proved much more challenging than others. In general, students performed poorly on questions in which graphs were used. This is likely because of the complexity of these questions. The graph-based questions 12, 14, 13, and 15 were designed to assess students’ understanding of the meaning of different types of predictor variables with and without the presence of interaction. For instance, question 12 assesses students’ understanding that a continuous predictor creates a line although adding a categorical predictor with three levels breaks that one line into three lines. This mechanism was used to shorten the quiz, as the author felt these questions could be assessed more quickly than questions on interpretation in which the wording in multiple answer options would require detailed comparison. Some of the incorrect postcourse responses may be attributable to question complexity. It should be noted, however, that students on the whole performed well on question 1, in which they identified the slope used in a graph for a single line. Question 2 (finding the Y-intercept) was deliberately more challenging as the Y-axis did not extend to 0, and question 3 relied upon the intercept and slope found in the previous two questions.

In addition, some topics seem to simply be more challenging, as demonstrated by the difficulty practicing statisticians had with them; their results demonstrate the compelling nature of some of the distractors. For instance, question 16 (on assessing independence) is worded as a question on interpretation in a study which happens to be matched. The answer options include the correct interpretation (for an unmatched study), incorrect interpretations, and the correct answer, that the model is inappropriate because of study design issues. The question was deliberately designed to appear to be about interpretation, so as not to cue respondents beyond the way such a question would arise in research or in reading the literature. Similarly, for question 18 (assessing normality of the error terms), the primary distractor shows a histogram of the outcome variable; a respondent reading quickly might conceivably mistake that for a histogram of the error terms themselves. Overall, it is to be hoped that the practicing statisticians’ responses were often incorrect simply because much less thought was given to this quiz than would be given to a study in which one discusses the methods thoroughly with an investigator.

Limitations and future directions

This study included CTS students from only one institution assessed before and after a single course over 3 years. As such, this should be considered an initial validation. A major limitation is that the author taught the course as well as designing the REGRESS quiz. The REGRESS quiz did not use any course materials, and the course did not change after introduction of the quiz in an effort to avoid “teaching to the test.” However, because of this it is anticipated that other student groups will not show as much improvement as was observed in this cohort. Data collection with students from other institutions is under way; readers interested in using the REGRESS quiz should contact the author.

Future studies should incorporate multiple institutions and instructors and provide incentives for quiz completion to increase response rates. Future work should also address long term understanding following coursework among students who participate in research activities and/or read the medical literature. There is also a possibility that obtaining data from practicing statisticians using only those with doctoral degrees might reflect improved scores for some challenging items.

After a version of the REGRESS quiz has been fully validated, future research should focus upon using this and other tools as outcome measures for research on interventions to improve statistics education. One worry is that regression may be taught as just a mathematical model and not a scientific tool. The scientist needs to fully evaluate the applicability of the regression model and its results, rather than simply running a model within a software package. The REGRESS quiz can be a way to assess whether students have gained the skills needed to evaluate assumptions and model results, as well as whether issues such as confounding, colinearity, and interaction are appropriately addressed.

Uses

There are a number of potential uses for the REGRESS quiz beyond the intended target population of students in CTS and PH statistics coursework.

Outcome assessment for research

The primary goal of the REGRESS quiz is to serve as an outcome measure for future research studies on interventions to improve teaching, learning, and retention for regression and related topics. As statistics is a critical skill for medical research, interventions are indicated to identify methods to improve teaching, learning, and retention. In order for such interventions to have broad impact, interinstitutional assessment with validated tools is needed.

Measure competency

In programs which train future researchers, the REGRESS quiz could be used to measure and document competency on matriculation and completion of the program. When used at the start of the program, students could identify domains in which they particularly need to improve their skills. At the end of the program the change in scores could help demonstrate students’ synthesis of coursework, thus proving competency. Lee et al. (2012)[23] proposed and instrument described in Mullikin et al. (2007)[24] as a tool for evaluating CTSA programs. Although the Mullikin instrument assesses twelve steps for research independence, it is based upon self-reported competency. As shown by Young, Glasziou, and Ward (2002),[14] physicians may be poorly prepared to evaluate their own ability in statistics.

In the effort to measure statistical competency, timing may be critical. It is expected that students just completing a course on regression will perform well on the REGRESS quiz. However, a competency assessment might be more relevant upon program completion or among program graduates. As time is quite limited for such potential respondents, there may be a need to develop a shorter version of the REGRESS quiz focusing on critical advanced topics.

Self assessment by instructors

Observing students’ areas of weakness may give instructors feedback on topics to improve or expand in their course materials. Many instructors rely on course evaluations to identify areas for improvement, but measurement of students’ skills by domain might provide more objective data. If used before the course, the domain scores could also help instructors focus course material to target student needs.

Self assessment by physicians

There may also be an interest in an instrument such as this for physicians not engaged in a formal training program. Because of the need for annual continuing medical education, many physicians have begun engaging with such coursework from CTSA programs. Much shorter than traditional courses for students with dedicated time, continuing medical education offers short a la carte training in topics selected by the physician. When physicians apply for a continuing medical education course in statistics or a related field, they could be asked to complete an assessment tool first. Weaker domain scores in the REGRESS could be associated with recommended CME coursework to help fill gaps.

Expand to other populations

In addition to graduate education preparing students to perform and publish research, there are two related populations for which the REGRESS might be of use. Evidence Based Medicine uses a series of techniques and tools designed to help physicians accurately assess the evidence in the medical literature. Although Evidence Based Medicine does not include performing research, the level of understanding needed to read the medical literature is quite comprehensive. The REGRESS or a subset of REGRESS questions might prove a useful tool to assess Evidence Based Medicine scholars. Similarly, college-level statistics coursework often includes regression. Although much work has been done on assessing understanding for the first course in statistics,[8] regression is often covered in a second course. No instruments designed to assess students’ understanding of regression for undergraduate statistics have yet been developed. Finally, graduate students in biostatistics are expected to develop competency to work with colleagues in CTS. It might be useful for students nearing completion of their graduate work to assess their skills with the REGRESS to measure their understanding on the same level expected of their future colleagues.

Conclusion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Population of Interest
  5. Developing the Instrument
  6. Methods
  7. Results
  8. Discussion
  9. Conclusion
  10. Acknowledgments
  11. References

The REGRESS quiz has proved a strong instrument with success on initial validation. Students show demonstrable improvement following a course on linear regression and achieve a level of skill similar to that of practicing statisticians. Further work is required to assess the REGRESS quiz in a more varied sample to establish normative scores for medical research trainees.

Acknowledgments

  1. Top of page
  2. Abstract
  3. Introduction
  4. Population of Interest
  5. Developing the Instrument
  6. Methods
  7. Results
  8. Discussion
  9. Conclusion
  10. Acknowledgments
  11. References

Many thanks to Jill Killian for performing much of the statistical analysis.

This publication was supported by Grant Number UL1 TR000135 from the National Center for Advancing Translational Sciences (NCATS). Its contents are solely the responsibility of the author and do not necessarily represent the official views of the NIH.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Population of Interest
  5. Developing the Instrument
  6. Methods
  7. Results
  8. Discussion
  9. Conclusion
  10. Acknowledgments
  11. References
  • 1
    Horton NJ, Switzer SS. Statistical methods in the journal. N Engl J Med. 2005; 353(18): 19771979.
  • 2
    Hellems MA, Gurka MJ, Hayden GF. Statistical literacy for readers of Pediatrics: a moving target. Pediatrics. 2007; 119(6): 10831088.
  • 3
    Strasak AM, Zaman Q, Marinell G, Pfeiffer KP, Ulmer H. The use of statistics in medical research: a comparison of the New England Journal of Medicine and Nature Medicine. Am Stat Assoc. Feb 2007; 61(1): 4756.
  • 4
    Wild CJ, Pfannkuch M, Regan M. Towards more accessible conceptions of statistical inference. J R Stat Soc Ser A Stat Soc. 2011; 174(2): 247295.
  • 5
    Enders F. Evaluating mastery of biostatistics for medical researchers: need for a new assessment tool. Cts-Clin Transl Sci. Dec 2011; 4(6): 448454.
  • 6
    Windish DM, Huot SJ, Green ML. Medicine Residents’ Understanding of the Biostatistics and Results in the Medical Literature. J Am Med Assoc. 2007; 298(9): 10101022.
  • 7
    Novack L, Jotkowitz A, Knyazer B, Novack V. Evidence-based medicine: assessment of knowledge of basic epidemiological and research methods among medical doctors. Postgrad Med J. 2006; 82: 817822.
  • 8
    delMas R, Garfield J, Ooms A, Chance B. Assessing students’ conceptual understanding after a first course in statistics. Stat Educ Res J. November 2007; 6(2): 2858.
  • 9
    Kon AA. The Clinical and Translational Science Award (CTSA) Consortium and the translational research model. Am J Bioeth. Mar 2008; 8(3): 5860.
  • 10
    Workgroup CECC. Core Competencies in Clinical and Translational Science for Master's Candidates. 2009; https://www.ctsacentral.org/documents/CTSA%20Core%20Competencies_%20final%202011.pdf, last accessed March 27, 2013.
  • 11
    Calhoun JG, Ramiah K, Weist EM, Shortell SM. Development of a Core Competency Model for the Master of Public Health Degree. Am J Public Health. Sep 2008; 98(9): 15981607.
  • 12
    Reed JF, 3rd, Salen P, Bagher P. Methodological and statistical techniques: what do residents really need to know about statistics? J Med Syst. Jun 2003; 27(3): 233238.
  • 13
    West CP, Ficalora RD. Clinician attitudes toward biostatistics. Mayo Clin Proc. Aug 2007; 82(8): 939943.
  • 14
    Young JM, Glasziou P, Ward JE. General practitioners’ self ratings of skills in evidence based medicine: validation study. BMJ. Apr 20 2002; 324(7343): 950951.
  • 15
    vonElm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ. Oct 20 2007; 335(7624): 806808.
  • 16
    Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, Poole C, Schlesselman JJ, Egger M. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. PLoS Med. Oct 16 2007; 4(10): 16281654.
  • 17
    Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, Elbourne D, Egger M, Altman DG. CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials. BMJ. 2010; 340(c869): 128.
  • 18
    Armstrong R, Waters E, Moore L, Riggs E, Cuervo LG, Lumbiganon P, Hawe P. Improving the reporting of public health intervention research: advancing TREND and CONSORT. J Public Health (Oxf). Mar 2008; 30(1): 103109.
  • 19
    Des Jarlais D, Lyles C, Crepaz N, Group T. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: the TREND statement. Am J Public Health. March 2004; 94(3): 361366.
  • 20
    Kirkwood BR, Sterne JAC. Essentials of Medical Statistics. 2nd ed. Malden, MA: Blackwell Science, LTD; 2003.
  • 21
    Katz MH. Multivariable Analysis: A Practical Guide for Clinicians. Cambridge, UK: Cambridge University Press; 1999.
  • 22
    Kleinbaum DG, Kupper LL, Nizam A, Muller KE. Applied Regression Analysis and Other Multivariable Methods. 4th ed. Belmont, CS: Thomson Higher Education; 2008.
  • 23
    Lee LS, Pusek SN, McCormack WT, Helitzer DL, Martina CA, Dozier AM, Ahluwalia JS, Schwartz LS, McManus LM, Reynolds BD, et al. Clinical and Translational Scientist Career Success: Metrics for Evaluation. Clin Trans Sci. 2012; 5: 400407.
  • 24
    Mullikin EA, Bakken LL, Betz NE. Assessing research self-efficacy in physician-scientists: the clinical research appraisal inventory. J. Career Assess. 2007; 15(3): 367387.