Multilocus analysis of variation using a large empirical data set: phenylpropanoid pathway genes in Arabidopsis thaliana


Sebastian E. Ramos-Onsins, Fax: +34 93 4034420; E-mail:


Detecting the signature of adaptation on nucleotide variation is often difficult in species that like Arabidopsis thaliana might have a complex demographic history. Recent re-sequencing surveys in this species provided genome-wide information that would mainly reflect its demographic history. We have used a large empirical data set (LED) as well as multilocus coalescent simulations to analyse sequence variation at loci involved in the phenylpropanoid pathway of this species. We surveyed and examined DNA sequence variation at nine of these loci (about 19.7 kb) in 23 accessions of A. thaliana and one accession of its closely related species Arabidopsis lyrata. Nucleotide variation was lower at nonsynonymous sites than at silent sites in all loci, indicating generalized functional constraint at the protein level. No association between variation and position in the metabolic pathway was detected. When the data were contrasted against the standard neutral model, significant deviations for silent variation were detected with Tajima's D, Fu's FS and Fay and Wu's H multilocus test statistics. These deviations were in the same direction than in previous large-scale multilocus analyses, suggesting a genome-wide effect. When the nine-locus data set was contrasted against the large empirical data set, the level (Watterson's θ) and pattern of variation (Tajima's D) detected in these loci did not deviate either at the single-locus or multilocus level from the corresponding empirical distributions. These results would support an important role of the demographic history of A. thaliana in shaping nucleotide variation at the nine studied phenylpropanoid loci. The potential and limitations of the empirical distribution approach are discussed.