Correcting for regression dilution bias: comparison of methods for a single predictor variable


Chris Frost Medical Statistics Unit, London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT,UKC.FROST@LSHTM.AC.UK


In an epidemiological study the regression slope between a response and predictor variable is underestimated when the predictor variable is measured imprecisely. Repeat measurements of the predictor in individuals in a subset of the study or in a separate study can be used to estimate a multiplicative factor to correct for this ‘regression dilution bias’. In applied statistics publications various methods have been used to estimate this correction factor. Here we compare six different estimation methods and explain how they fall into two categories, namely regression and correlation-based methods. We provide new asymptotic variance formulae for the optimal correction factors in each category, when these are estimated from the repeat measurements subset alone, and show analytically and by simulation that the correlation method of choice gives uniformly lower variance. The simulations also show that, when the correction factor is not much greater than 1, this correlation method gives a correction factor which is closer to the true value than that from the best regression method on up to 80% of occasions. We also provide a variance formula for a modified correlation method which uses the standard deviation of the predictor variable in the main study; this shows further improved performance provided that the correction factor is not too extreme. A confidence interval for a corrected regression slope in an epidemiological study should reflect the imprecision of both the uncorrected slope and the estimated correction factor. We provide formulae for this and show that, particularly when the correction factor is large and the size of the subset of repeat measures is small, the effect of allowing for imprecision in the estimated correction factor can be substantial.