Incorporation of gap characters and lineage-specific regions into phylogenetic analyses of gene families from divergent clades: an example from the kinesin superfamily across eukaryotes


* E-mail


The kinesin superfamily across eukaryotes was used to examine how incorporation of gap characters scored from conserved regions shared by all members of a gene family and incorporation of amino acid and gap characters scored from lineage-specific regions affect gene-tree inference of the gene family as a whole. We addressed these two questions in the context of two different densities of sequence sampling, four alignment programs, and two methods of tree construction. Taken together, our findings suggest the following. First, gap characters should be incorporated into gene-tree inference, even for divergent sequences. Second, gene regions that are not conserved among all or most sequences sampled should not be automatically discarded without evaluation of potential phylogenetic signal that may be contained in gap and/or sequence characters. Third, among the four alignment programs evaluated using their default alignment parameters, Clustal may be expected to output alignments that result in the greatest gene-tree resolution and support. Yet, this high resolution and support should be regarded as optimistic, rather than conservative, estimates. Fourth, this same conclusion regarding resolution and support holds for Bayesian gene-tree analyses relative to parsimony-jackknife gene-tree analyses. We suggest that a more conservative approach, such as aligning the sequences using DIALIGN-T or MAFFT, analyzing the appropriate characters using parsimony, and assessing branch support using the jackknife, is more appropriate for inferring gene trees of divergent gene families.

© The Willi Hennig Society 2007.