SEARCH

SEARCH BY CITATION

The standard of much medical research is poor because of methodological and statistical weaknesses: ‘Huge sums of money are spent annually on research that is seriously flawed through the use of inappropriate designs, unrepresentative samples, small samples, incorrect methods of analysis, and faulty interpretation.’ (Altman 1994). The misuse of statistics is unethical: it wastes patients' time and may expose them to unjustified risk, it wastes research resources, and the publication of misleading results may initiate unnecessary further research (Altman 1991, pp. 477–504). While the quality of research papers is primarily the responsibility of authors, journal editors can help, in particular by strengthening the peer review process (Altman 2002). During 1999 the JAN Management Team introduced statistical reviewers for all appropriate papers in addition to the use of two expert subject reviewers for each article. By January 2000, two statisticians were regularly reviewing manuscripts; by 2002 the statistical workload had increased to such an extent that 10 reviewers were in place to meet the demand. As one of these reviewers, I became aware of the extent of poorly designed research submitted to JAN.

The design of a study is of fundamental importance to its validity. I will consider in detail a paper in this issue, which illustrates several aspects of good design (Duaso & Cheung 2002). This report, on patients' views of health promotion, is typical of many studies submitted to JAN which are surveys based on interviews or questionnaires.

Study population

  1. Top of page
  2. Study population
  3. Sample size
  4. Representative sample
  5. Piloting of questionnaire
  6. Response
  7. Characteristics of non-respondents
  8. Confidence intervals
  9. External validity
  10. Cluster randomized designs
  11. Analyses
  12. Interpretation
  13. Summary
  14. References

Firstly, the population studied was clearly defined: all 3612 patients registered in early 1999 with a specific general practice in North-east England. It would have been time-consuming and costly to survey all these patients, so the authors carefully chose a representative group (Altman 1991, section 4·3, p. 50).

Sample size

  1. Top of page
  2. Study population
  3. Sample size
  4. Representative sample
  5. Piloting of questionnaire
  6. Response
  7. Characteristics of non-respondents
  8. Confidence intervals
  9. External validity
  10. Cluster randomized designs
  11. Analyses
  12. Interpretation
  13. Summary
  14. References

Duaso and Cheung were interested in estimating the proportion of patients who would like more health advice. However, proportions estimated from a sample will not be exactly the same as the proportions in the entire study population. If the sample is larger, they are likely to be closer, but the study will be more expensive. Duaso and Cheung first estimated the size of sample required to give them estimates of proportions which were likely to be reasonably close to the actual proportions in the entire study population.

Representative sample

  1. Top of page
  2. Study population
  3. Sample size
  4. Representative sample
  5. Piloting of questionnaire
  6. Response
  7. Characteristics of non-respondents
  8. Confidence intervals
  9. External validity
  10. Cluster randomized designs
  11. Analyses
  12. Interpretation
  13. Summary
  14. References

Duaso and Cheung then selected a systematic sample: they obtained a list of everyone in the study population, sorted it by age and selected every seventh patient into their sample. This ensured that, as regards age, the sample was representative of all patients registered with the practice. An alternative, frequently used method of obtaining a representative group is to select a random sample. Most statistical packages will select a random sample – based on the generation of random numbers – from a complete list of the study population, if you specify the percentage of the population to be included in the sample. Unfortunately, many studies are invalid because they use inappropriate sampling techniques, which result in unrepresentative, biased samples of the study population. For example, a ‘convenience sample’ of people attending the surgery on a specific day would have over-represented those who were ill more often and these patients' views on health advice they had received from the practice might not have been typical. ‘Convenience samples’ are invalid samples. Regrettably, many papers submitted to JAN rely on them. Use of incorrect methods can be difficult to stop as researchers copy each others' mistakes (Altman 2002).

Piloting of questionnaire

  1. Top of page
  2. Study population
  3. Sample size
  4. Representative sample
  5. Piloting of questionnaire
  6. Response
  7. Characteristics of non-respondents
  8. Confidence intervals
  9. External validity
  10. Cluster randomized designs
  11. Analyses
  12. Interpretation
  13. Summary
  14. References

Before sending the questionnaire out to the patients in the sample, Duaso and Cheung piloted it on a different group of patients. This was a test run to check whether any unforeseen problems arose, for example with the wording of the questions. Reports of pilot studies are sometimes submitted to JAN without the authors appreciating that a pilot study is a preliminary to refinement of the design for a full-scale study.

Response

  1. Top of page
  2. Study population
  3. Sample size
  4. Representative sample
  5. Piloting of questionnaire
  6. Response
  7. Characteristics of non-respondents
  8. Confidence intervals
  9. External validity
  10. Cluster randomized designs
  11. Analyses
  12. Interpretation
  13. Summary
  14. References

Duaso and Cheung wanted as many patients in the sample as possible to complete and return the questionnaire. One of the doctors in the practice signed a covering letter that was sent out with the questionnaire and which aimed to encourage a good response. Nevertheless, only 46% of those in the sample returned the questionnaire initially. Duaso and Cheung therefore sent it out again to those who had not returned it and hence achieved a better response rate of 64%. It is basic good practice to send out a follow-up mailing to non-respondents and to try to contact remaining non-respondents by telephone.

But is a response rate of 64% good enough? It means that about one person in three did not reply. If these non-respondents were typical of the sample, then the non-response does not matter much. But if they were not, then the response was biased, which could make a big difference to the findings. For example, Duaso and Cheung report that 33% of the sample wanted more advice on stress management (see Duaso & Cheung, Figure 1). If all the 179 non-respondents also wanted advice on stress management, then 58% of the sample actually wanted such advice; on the other hand, if none of the non-respondents wanted advice on stress management, then only 21% of the sample wanted such advice. So the actual estimate of the proportion wanting advice is somewhere between 21% and 58%– and there is further imprecision which comes from using a sample rather than the entire study population. To avoid such wide imprecision, it is imperative that you do everything possible to get a high response rate.

Unfortunately, some of the studies submitted to JAN do not even report response rates, so it is very difficult to assess the validity of the results they report.

Characteristics of non-respondents

  1. Top of page
  2. Study population
  3. Sample size
  4. Representative sample
  5. Piloting of questionnaire
  6. Response
  7. Characteristics of non-respondents
  8. Confidence intervals
  9. External validity
  10. Cluster randomized designs
  11. Analyses
  12. Interpretation
  13. Summary
  14. References

How can you find out if non-response has biased your results? You may be able to find out if the non-respondents were different from the respondents in any way that was likely to have affected their replies. Often some of the characteristics of the non-respondents are known. Duaso and Cheung found that respondents were significantly more likely than non-respondents to be female and to be older. Such differences should be kept in mind when interpreting the results. How do you think the over-representation of women among the respondents would have affected the estimate of interest in advice on stress management?

Confidence intervals

  1. Top of page
  2. Study population
  3. Sample size
  4. Representative sample
  5. Piloting of questionnaire
  6. Response
  7. Characteristics of non-respondents
  8. Confidence intervals
  9. External validity
  10. Cluster randomized designs
  11. Analyses
  12. Interpretation
  13. Summary
  14. References

Duaso and Cheung present confidence intervals on their main results. What do these mean? Remember that Duaso and Cheung did not survey every patient registered with the practice – they surveyed a sample. If they had repeated the survey in the same practice, but choosing a different representative sample, they would have obtained slightly different results. If they had repeated the survey 100 times, with 100 different samples, they would have got a series of 100 different results. However, 95% of these results would probably have been inside the confidence intervals which they present. So the confidence intervals give us an idea of the range of uncertainty which is due to surveying a sample rather than the entire study population.

External validity

  1. Top of page
  2. Study population
  3. Sample size
  4. Representative sample
  5. Piloting of questionnaire
  6. Response
  7. Characteristics of non-respondents
  8. Confidence intervals
  9. External validity
  10. Cluster randomized designs
  11. Analyses
  12. Interpretation
  13. Summary
  14. References

The choice of a representative sample and the minimizing of bias in response contribute to the internal validity of a study – whether the findings give us a true picture of the study population. External validity refers to whether the findings can be applied to other, different populations. If the study population has not been clearly defined, it is impossible to assess the external validity of a study.

For example, are the results of Duaso and Cheung's study likely to be relevant to general practices in other regions of the United Kingdom – or in other countries? Duaso and Cheung tried to give us some idea of whether the findings might be applicable to other practices by describing the range of staff and medical services provided by the practice they studied and the area it covered: a town and nine villages in North-east England, containing both deprived and affluent localities but without extremes of either poverty or wealth. Do you think the results of this study would beapplicable to a practice in Inner London?

Finally, how could the study have been improved? The study would have been much more informative if more than one practice had been included, for example if a random sample of practices within the region had been surveyed. Such a design would have given information about variation between practices and would have allowed inferences to be made about practices throughout the region. The limitation to one practice means that the results cannot be extrapolated with confidence beyond the practice studied, as there is no information whatsoever on variation between practices.

Cluster randomized designs

  1. Top of page
  2. Study population
  3. Sample size
  4. Representative sample
  5. Piloting of questionnaire
  6. Response
  7. Characteristics of non-respondents
  8. Confidence intervals
  9. External validity
  10. Cluster randomized designs
  11. Analyses
  12. Interpretation
  13. Summary
  14. References

Survey of several practices would have required a more complex design, such as a cluster randomized design (Donner & Klar 1999), in which clusters (in this case, practices) are selected at random from all those in a defined area such as a region. This is often the appropriate design in nursing research: subjects (usually patients or nurses) are often naturally grouped within clusters (practices, hospitals or care homes). However, subjects within a cluster are likely to be more similar to each other than to subjects in a different cluster, for example, patients within a practice may give similar responses because they have experience of the same nurse. However, cluster randomized studies are often analysed without allowing for clustering (Altman 2002). If the clustering is ignored, then the resulting P-values will be artificially extreme and confidence intervals will be artificially narrow, so effects may appear to be statistically significant when actually they are not.

Analyses

  1. Top of page
  2. Study population
  3. Sample size
  4. Representative sample
  5. Piloting of questionnaire
  6. Response
  7. Characteristics of non-respondents
  8. Confidence intervals
  9. External validity
  10. Cluster randomized designs
  11. Analyses
  12. Interpretation
  13. Summary
  14. References

Duaso and Cheung did not carry out any sophisticated analyses. They simply reported mean values and proportions and checked if there were significant differences between groups using t-tests and chi-squared tests. This is acceptable – and has the merit of being understandable to most readers.

Interpretation

  1. Top of page
  2. Study population
  3. Sample size
  4. Representative sample
  5. Piloting of questionnaire
  6. Response
  7. Characteristics of non-respondents
  8. Confidence intervals
  9. External validity
  10. Cluster randomized designs
  11. Analyses
  12. Interpretation
  13. Summary
  14. References

Finally, while Duaso and Cheung acknowledged the limitations of their study design, it would have been relevant and interesting to know how they thought these affected the interpretation of their results. No study is perfect: despite the best laid plans, you may realize after you have started on your study that the design could have been better. If so, it is important not only to report the limitations of the design but also to discuss how this may have impacted on your findings.

Summary

  1. Top of page
  2. Study population
  3. Sample size
  4. Representative sample
  5. Piloting of questionnaire
  6. Response
  7. Characteristics of non-respondents
  8. Confidence intervals
  9. External validity
  10. Cluster randomized designs
  11. Analyses
  12. Interpretation
  13. Summary
  14. References

Most of the issues discussed above concern design. You can get adviceon this in several textbooks (for example, Altman 1991, Bland 1995, Campbell 2002). The assessment form used by JAN statistical referees has been posted on the JAN website (http:http://www.blackwell-science.com/jan/default.asp?page=authors&file=referee8) – read this to know how your paper will be evaluated. But ideally, you should discuss the design of your proposed study with a statistician before you start (Altman et al. 2002). If you make mistakes in your analysis, they can always be corrected: the statistical referee may ask you to revise your paper. But if you chose an inappropriate study design, this cannot be rectified without repeating the entire study: the statistical referee may have no option but to recommend rejection of your paper.

Sophisticated analysis is not always necessary. Good design is of fundamental importance.

References

  1. Top of page
  2. Study population
  3. Sample size
  4. Representative sample
  5. Piloting of questionnaire
  6. Response
  7. Characteristics of non-respondents
  8. Confidence intervals
  9. External validity
  10. Cluster randomized designs
  11. Analyses
  12. Interpretation
  13. Summary
  14. References
  • Altman D.G. (1991) Practical Statistics in Medical Research. Chapman & Hall, London.
  • Altman D.G. (1994) The scandal of poor medical research. BMJ 308, 283284.
  • Altman D.G. (2002) Poor-quality medical research: What can journals do? JAMA 287, 27652767.
  • Altman D.G., Goodman S.N. & Schroter S. (2002) How statistical expertise is used in medical research. JAMA 287, 28172820.
  • Bland M. (1995) An Introduction to Medical Statistics, 2nd edn. Oxford University press, Oxford.
  • Campbell M. (2002) Statistics at Square One, 10th edn. BMJ Books, London.
  • Donner A. & Klar N.S. (1999) Design and Analysis of Cluster Randomisation Trials in Health Research. Arnold, London.
  • Duaso M.J. & Cheung P. (2002) Health promotion and lifestyle advice in a general practice: what do patients think? Journal of Advanced Nursing 39, 472479.