Minding the gap: frequency of indels in mtDNA control region sequence data and influence on population genetic analyses



    1. Institute of Arctic Biology and Department of Biology and Wildlife, University of Alaska, Fairbanks, AK 99775 and Alaska Science Center, US Geological Survey, Anchorage, AK 99503
    Search for more papers by this author

John M. Pearce, Fax: 907-786-3636; E-mail: john_pearce@usgs.gov


Insertions and deletions (indels) result in sequences of various lengths when homologous gene regions are compared among individuals or species. Although indels are typically phylogenetically informative, occurrence and incorporation of these characters as gaps in intraspecific population genetic data sets are rarely discussed. Moreover, the impact of gaps on estimates of fixation indices, such as FST, has not been reviewed. Here, I summarize the occurrence and population genetic signal of indels among 60 published studies that involved alignments of multiple sequences from the mitochondrial DNA (mtDNA) control region of vertebrate taxa. Among 30 studies observing indels, an average of 12% of both variable and parsimony-informative sites were composed of these sites. There was no consistent trend between levels of population differentiation and the number of gap characters in a data block. Across all studies, the average influence on estimates of ΦST was small, explaining only an additional 1.8% of among population variance (range 0.0–8.0%). Studies most likely to observe an increase in ΦST with the inclusion of gap characters were those with < 20 variable sites, but a near equal number of studies with few variable sites did not show an increase. In contrast to studies at interspecific levels, the influence of indels for intraspecific population genetic analyses of control region DNA appears small, dependent upon total number of variable sites in the data block, and related to species-specific characteristics and the spatial distribution of mtDNA lineages that contain indels.