## Introduction

Felsenstein (1985) proposed the use of the general statistical procedure of repeated sampling with replacement from a data set (bootstrapping) to estimate confidence limits of the internal branches in a phylogenetic tree. Although there have been debates on the interpretation of bootstrap results (e.g. Felsenstein & Kishino, 1993; Hillis & Bull, 1993) this is still the commonest method for assessing confidence in phylogenies. However, this nonparametric bootstrap method does not provide the opportunity to test *a priori* hypotheses about the phylogeny of a group. The alternative, parametric bootstrap method does (Efron, 1982; Felsenstein, 1988; Bull *et al*., 1993; Hillis *et al*., 1996; Huelsenbeck *et al*., 1996a, b). In contrast to the nonparametric method where replicate character matrices are generated by randomly sampling the original data, this method creates replicates using numerical simulation. A model of evolution is assumed, and its parameters are estimated from the empirical data. Then, using the same model and the estimated parameters, as many replicate data sets as needed of the same size as the original are simulated. Phylogenies constructed from these replicate character matrices are used to generate a distribution of difference measures between the maximum likelihood and *a priori* phylogenetic hypotheses. Huelsenbeck *et al*. (1996b) have advocated a likelihood based phylogenetic reconstruction for the generation of a distribution of difference measures but because of the excessive computation required a parsimony based approach is an acceptable alternative with only a slightly lower level of discrimination (Hillis *et al*., 1996).

Many recent molecular studies have used the parametric bootstrap method for testing *a priori* hypotheses against their molecular based phylogenies (Mallat & Sullivan, 1998; Ruedi *et al.*, 1998; Flook *et al*., 1999; Jackman *et al*., 1999; Oakley & Phillips, 1999). Other studies have utilized variations of the method to examine the possibility of long branch attraction (Felsenstein, 1978; Hendy & Penny, 1989; Huelsenbeck, 1997) causing erroneous topologies (Maddison *et al.*, 1999; Tang *et al*., 1999), to estimate the sequence length required to resolve a particular phylogenetic tree (Halanych, 1998; Flook *et al*., 1999), to evaluate the efficiency of different methods of phylogenetic reconstruction (Bull *et al*., 1993; Hwang *et al*., 1998), and to test the adequacy of models of DNA sequence evolution (Goldman, 1993; Yang *et al.*, 1994).

A critical first step of the parametric bootstrap is specifying a particular substitutional model, whose parameters will be estimated from the empirical data, and which will subsequently be used when generating the simulated, replicate DNA sequences. Hillis *et al*. (1996) stated that in limited studies the method appeared to be robust to changes in the model of evolution. However, Huelsenbeck *et al*. (1996a) suggest as realistic model of DNA substitution as possible should be used to reduce Type I error (false rejection of a null hypothesis) implying that parameter-rich models may perform better. Zhang (1999) has shown that if the substitution model is inadequate then the estimates of the substitution parameters will be biased. By using biased estimates of the parameters the simulated sequence data will deviate from the empirical data, such that the null distribution of tree score differences will be distorted.

In this study we evaluate empirically the effect of the choice of substitutional model when testing hypotheses of monophyly using DNA sequence data. We have chosen three gene regions (two mitochondrial and one nuclear) commonly used in phylogenetic analyses: cytochrome oxidase subunit I and II (COI and COII), the large mitochondrial ribosomal subunit gene (16S), and the nuclear ribosomal internal transcribed spacer 2 (ITS2). For COI and COII we have used published sequence data for Macaronesian *Calathus* beetles (Carabidae) (Emerson *et al*., 1999, 2000) and for 16S and ITS2 we have used published sequence data for European *Timarcha* beetles (Chrysomelidae) (Gómez-Zurita *et al*., 2000a, b). For each of these data sets we also examine the comparative performance of the nonparametric Kishino–Hasegawa (KH) test (Kishino & Hasegawa, 1989) which compares competing phylogenetic hypotheses under the maximum likelihood (ML) optimality criterion. This method uses likelihood ratio tests of the statistical significance of competing tree topologies. All three data sets offer clear *a priori* hypotheses of monophyly for testing against empirical DNA sequence data.