The influence of family groups on inferences made with the program Structure


Eric C. Anderson, Fax: (831) 420-3977; E-mail:


Unsupervised clustering algorithms, like the program Structure, are increasingly used to infer the presence of population structure from a sample of genotyped individuals. We evaluate the extent to which the presence of related individuals can lead such algorithms to the false inference that there is population structure. First, we demonstrate this problem using a real data set from a rainbow trout (Oncorhynchus mykiss) population. Then we perform an extensive series of simulations involving the program Structure. Our simulations encompass both a simple scenario with fixed numbers of full- and half-siblings in the sample, and a more complicated scenario in which we investigate 360 combinations of population divergence, fraction of population sampled, variance in family size, mating system and number of loci. We find that the inclusion of family members in a sample may produce very strong evidence of population structure, even when population structure is absent. This problem becomes more pronounced when more loci are genotyped, and it is particularly likely in studies of monogamous species, especially if variance in family size is high and a large fraction of a small population has been sampled. Researchers working in such situations should test observed clusters for the presence of family members to distinguish family-induced structure from real population structure. Additionally, this work shows that Structure's ability to estimate the number of subpopulations may be influenced by a number of factors, and therefore should be interpreted guardedly.