# JAN Forum: your views and letters

Article first published online: 11 DEC 2002

DOI: 10.1046/j.1365-2648.2002.02442.x

Additional Information

#### How to Cite

Watson, R., J. Deary, I. and Dickinson, H. O. (2002), JAN Forum: your views and letters. Journal of Advanced Nursing, 40: 649–650. doi: 10.1046/j.1365-2648.2002.02442.x

#### Publication History

- Issue published online: 11 DEC 2002
- Article first published online: 11 DEC 2002

- Abstract
- Article
- References
- Cited By

### Response to: ‘Editorial: Use and abuse of statistics in nursing research’ by H.O. Dickinson (2002) *Journal of Advanced Nursing* 39, 405–407

- Top of page
- Response to: ‘Editorial: Use and abuse of statistics in nursing research’ by H.O. Dickinson (2002)
*Journal of Advanced Nursing*39, 405–407 - Note
- Response to Watson and Deary's critique
- References

Heather Dickinson's parting shot from *JAN* is a valuable resource for those in nursing research and beyond who wish to use statistics to analyse and justify their conclusions. However, it does not paint a complete picture and is misleading in the unqualified assertion that “Convenience samples” are invalid samples’ (p. 405). As two authors who have suffered at the hands of statistical reviewers, including those of *JAN*, on this very issue when submitting papers which are based on factorial analysis, we wish to correct this assertion. In so doing we will draw heavily on the words of Thurstone (1947), the inventor of multiple factor analysis.

Factorial analysis, which includes a range of techniques, is designed to look for patterns in the correlations among variables, usually in multiple-item questionnaires, before applying the emergent latent traits as measurement instruments to the general population. As such ‘the identification of a factor or parameter does not presuppose the experimental population to be in any sense representative of any sort of hypothetical general population’ (Thurstone 1947, p. 324). In fact, according to Thurstone, ‘the best procedure is not to select a random group of subjects but rather to select the subjects so that their attributes are as diverse as possible in the domain to be studied’ (pp. 324–325). Thurstone asserted that ‘Two or three freaks in the characteristic of interest are worth more than 50 average subjects’ (p. 325).

Once a factor has been identified it is possible to apply this to a general population in order to measure the range of scores on the factor. At this point in the process issues of representativeness become important. Nevertheless, prior to that, in discovering the dimensionality of the domain under study, ‘No assumption of normality of the distribution is involved in factorial analysis’ (Thurstone 1947, p. 325).

Finally, to quote Thurstone (1947, p. 325) extensively:

‘The question of whether the correlations between (variables) in an experimental population are in any sense representative of a general population is irrelevant. Nor does it matter whether the correlations for one experimental group differed markedly from the correlations of the same (variables) in another experimental group. If the factorial analyses are made independently, the same factors should be identified, but the numerical values would, of course, be different if the groups differed widely. The correlations for each group are determined by the factors involved in the (questionnaires) and by the correlations of the factors in each experimental group. Good science requires the appropriate sample for the question being studied.

### Note

- Top of page
- Response to: ‘Editorial: Use and abuse of statistics in nursing research’ by H.O. Dickinson (2002)
*Journal of Advanced Nursing*39, 405–407 - Note
- Response to Watson and Deary's critique
- References

We have altered Thurstone's words to ‘variables’ and ‘questionnaire’ where he used ‘test’. Thurstone was concerned with the correlation between different psychometric tests but the principles are precisely the same.

### Response to Watson and Deary's critique

- Top of page
- Response to: ‘Editorial: Use and abuse of statistics in nursing research’ by H.O. Dickinson (2002)
*Journal of Advanced Nursing*39, 405–407 - Note
- Response to Watson and Deary's critique
- References

I would like to thank Professor Watson and Professor Deary for giving me the opportunity to explain in more detail why ‘convenience samples’ are inappropriate.

In medical statistics, we are usually using information about a sample of individuals to make some inference about a wider population. As explained in basic textbooks written by distinguished medical statisticians of the present day (e.g. Altman 1991[especially section 1·3, pp. 5–8; section 4·3 p. 50, sections 5·1–5·3, pp. 74–78; section 5·5·5, pp. 82–83]; Armitage *et al*. 2001[section 4·1, pp. 83–92]; Bland 2000[chapters 2 and 3, pp. 5–46]), if the sample is not representative of the population of interest, the findings from the – usually small – sample cannot be extrapolated to any wider group and are therefore of little interest.

It is sometimes argued that unrepresentative (‘convenience’) samples are valid in pilot studies. The objective of a pilot study is to refine the research design before carrying out a full-scale study. While a pilot study based on an unrepresentative sample may help the researcher to test – and hence improve – the study design, it is questionable whether a journal would wish to publish a report of such preliminary refinements.

Professors Watson and Deary argue that factor analysis does not require the use of representative samples, quoting extensively from a textbook written over half a century ago to support their case (Thurstone 1947). Factor analysis is a technique which seeks to identify a small number of ‘factors’ which can reconstruct the distribution of the original variables and so give a more parsimonious description of the data. The factors identified depend on the correlations between the original variables. These factors are subjective (technically we say that the equations defining the factors are indeterminate and have an infinite number of possible solutions, as reflected by different rotations resulting in different factor loadings). Factor analysis is widely used in psychometric research, where it may throw light on the factors apparently affecting test scores. The place of factor analysis in other scientific fields has been questioned (Armitage *et al*. 2001, p. 464).

In particular, Professors Watson and Deary argue that in factor analysis, representative samples are not only unnecessary but ‘a few freaks are more useful than 50 average subjects’. This is incorrect: selecting unrepresentative samples – such as those with excess variation in the characteristics of interest or those with only the extremes of the distribution – runs a very serious risk of bias, as the correlation structure within the unrepresentative sample may be untypical of a wider population and hence the factors identified may not be relevant to any wider population.

They also assert that ‘No assumption of normality of the distribution is involved in factor analysis’. Actually, present-day maximum likelihood methods for estimating factors (which are implemented by standard statistical packages) are based on the assumption that the factors are multivariate normal (Basilevsky 1994, section 6·2, pp. 353–361). Hence an assumption of normality is actually *critical* to the validity of the estimates of the factors.

In the earlier part of the last century, Thurstone – and some other statisticians – believed that representative samples and assumptions of normality in factor analysis were unnecessary. Thurstone was writing at a time when statistical inference was a new discipline (Morrison 1967, p. 260). However, statistical theory has moved on since 1947, in both factor analysis (Kleinbaum *et al*. 1988, Basilevsky 1994) and sampling (Konijn 1973, Barnett 1974, Thompson 1997).

### References

- Top of page
- Response to: ‘Editorial: Use and abuse of statistics in nursing research’ by H.O. Dickinson (2002)
*Journal of Advanced Nursing*39, 405–407 - Note
- Response to Watson and Deary's critique
- References

- (1947)Multiple-factor Analysis.University of Chicago Press, Chicago.
- 1991) Practical Statistics for Medical Research. Chapman & Hall, London. (
- 2001) Statistical Methods in Medical Research, 4th edn. Blackwell Science, Oxford. , & (
- 1974) Elements of Sampling Theory. English Universities Press, London. (
- 1994) Statistical Factor Analysis and Related Methods: Theory and Applications. Wiley, New York. (
- 2000) An Introduction to Medical Statistics, 3rd edn. Oxford University Press, Oxford. (
- 1988) Applied Regression Analysis and Other Multivariable Methods, 2nd edn. PWS-Kent Publications Co., Boston. , & (
- 1973) Statistical Theory of Sample Survey Design and Analysis. North-Holland Publications Co., Amsterdam. (
- 1967) Multivariate Statistical Methods. McGraw-Hill, New York. (
- 1997) Theory of Sample Surveys. Chapman & Hall, London. (
- 1947) Multiple-factor Analysis. University of Chicago Press, Chicago. (