Assessing similarity of DNA profiles

Authors


Graham Hepworth, Statistical Consulting Centre, University of Melbourne, Victoria 3010, Australia.
E-mail: g.hepworth@ms.unimelb.edu.au

Abstract

Summary.  The genetic similarity of strains of a pathogen can be assessed by using a matrix of dissimilarities that is derived from bands in their DNA profile which are present or absent. The dependence between elements of the dissimilarity matrix, if not accounted for, results in underestimation of the variance in comparisons between groups of strains which are differentiated according to the possession of an attribute. We examine a previously proposed statistic for determining whether a group of strains is more similar than expected. We show the limitations of this statistic and propose a new statistic which better addresses the hypotheses that are usually considered in this field of study. The statistic proposed is based on similarity between strains within the group of interest and with those outside. This statistic also needs to account for the dependence in the raw data, and we use the correlation between elements of the dissimilarity matrix to investigate how this dependence affects the underestimation of the variance. Using examples involving the pathogenic yeast Candida, we show how permutation tests can be applied to the differentiation of groups of strains.

Ancillary