Understanding the genetic basis of adaptation remains a key goal of evolutionary genetics (Orr 2005; Hoekstra and Coyne 2007). Although adaptive change is usually associated with complex changes in morphological phenotype (Blows 2007), few genetic investigations have been conducted on adaptations that involve sets of high-dimensional traits (Albert et al. 2008). This is a particularly important limitation of evolutionary studies, as a full understanding of the evolution of a focal trait is unlikely to be gained in the absence of knowledge on how it interacts with the wider phenome that ultimately is comprised of a very large number of traits (Houle 2010). Pleiotropic genetic associations among multiple traits can cause focal traits to respond to selection in the direction opposite to that favored by selection (Walsh and Blows 2009), or to stop traits from evolving in the presence of genetic variation and ongoing selection (Hine et al. 2011). The genetic independence of a focal trait from other traits, or the genetic independence among different sets of traits—which is often referred to as modularity (Cheverud 1996; Hansen 2003), are important determining factors of if, and how, adaptation will occur in a particular circumstance.
The investigation of pleiotropic associations among phenotypes is relatively uncommon in the application of high-throughput genomic technologies that generate vast amounts of data. Although marker-based QTL mapping approaches have been successful in identifying discrete regions of the genome that underlie divergence between populations in individual traits, they often do not directly consider pleiotropic relationships among multiple traits (Xu et al. 2005; Biswas et al. 2008), as distinct from mapping single traits and searching for nonoverlap of confidence regions to reject the hypothesis of pleiotropy. Similarly, microarrays have supplied high-dimensional descriptions of transcript abundance, which have traditionally been analyzed by the phenotypic identification of co-expressed networks of gene transcripts based on various clustering approaches. However, the genetic analysis of these high-dimensional expression phenotypes, as distinct from the phenotypic clustering of co-expressed transcripts, is more problematic (Kadarmideen et al. 2006), and has tended to concentrate on individual transcripts, rather than the genetic control of sets of co-expressed transcripts (Biswas et al. 2008). What has been lacking are statistical approaches that allow the multivariate analysis of the abundance of a large number of transcripts measured from classical genetic experimental designs to determine the extent of shared genetic control of gene expression.
The importance of addressing how pleiotropic effects of genes influence the evolution of high-dimensional expression phenotypes has been highlighted by studies that have found a very large number of expression profiles that differ between the sexes (Ranz et al. 2003; Gibson et al. 2004), developmental stages (White et al. 1999), and between strains and populations adapted to different environments (Franchini and Egli 2006; Ronald and Akey 2007; Lai et al. 2008; St-Cyr et al. 2008). It is difficult to reconcile changes of expression in such a large number of transcripts with the relatively modest number of QTL that are often found to underlie adaptations in single traits, even after taking into account the underpowered nature of QTL mapping. The large number of transcripts exhibiting a change in expression is therefore likely consequence of pleiotropic regulation of expression or physical linkage to some unknown extent (Chesler et al. 2005; Kadarmideen 2006; Biswas et al. 2008; Gilad et al. 2008; Litvin et al. 2009; Skelly et al. 2009). A recent genetic analysis of transcript abundance within a population of Drosophila melanogaster indicated that the large number of differences in transcript abundance among genotypes were likely to be a consequence of a smaller number of modules of pleiotropically related expression phenotypes (Ayroles et al. 2009). This indicates that although phenotypic change resulting from adaptation may result in large-scale changes in gene expression, such changes may be accomplished through a modest number of regulatory genes that influence these pleiotropic networks.
The development of genetic analyses for high-dimensional phenotypes, particularly in the extreme case of large numbers of transcript abundances in systems genetics, has lagged behind our ability to generate these large datasets. High-dimensional genetic analysis of transcript abundances can be approached in at least two ways. First, transcript abundance phenotypes can be subjected to an ordination procedure to generate “eigentraits,” linear combinations of the large number of expression traits that covary together (Biswas et al. 2008). This allows coregulation of phenotypes to be inferred, and the discovery of eQTL that are associated with large-scale regulatory changes when suitable markers have also been obtained. The extent to which the multivariate analysis of expression phenotypes in this manner will reflect the underlying genetic patterns of coregulation will depend on the contribution of environmental covariance among the phenotypes in question. For example, if the magnitude of an environmental correlation is much stronger (or weaker) than the genetic correlation between two transcripts, or if two correlations are of opposite signs in the more extreme case, a misleading picture of the genetic coregulation of transcript abundance will be given by the multivariate analysis of phenotypes. In contrast, multivariate genetic analysis of standard metric traits (Mezey and Houle 2005; Hines and Blows 2006; Meyer and Kirkpatrick 2008) explicitly removes the confounding influence of environmental covariance to directly model the multivariate genetic relationships among traits.
A second approach is to explicitly consider the partitioning of environmental and genetic covariance among expression phenotypes to remove the confounding influence of the environment on transcript abundance. This is a challenging task as the high-dimensional analysis needs to incorporate an experimental design that is more complex than measures of multiple phenotypes of individuals. In a recent example of such an approach, the genetic co-regulation of gene expression among inbred lines of Drosophila was inferred from patterns among bivariate genetic correlations that had been estimated by partitioning out the influence of environmental covariance among transcripts. These genetic correlations were arranged in a distance matrix, from which genetic modules of coexpressed genes were identified using clustering (Ayroles et al. 2009; Stone and Ayroles 2009).
Although the removal of the confounding influence of environmental covariance among transcripts in this way is a major advantage over the multivariate analysis of transcript abundance phenotypes, two issues remain to be addressed before high-dimensional genetic analysis of transcript abundance can be implemented in a framework that shares all the advantages of standard multivariate genetic analysis. First, clustering is explicitly exploratory, lacking a hypothesis-testing framework that can be readily adapted to experimental designs with hierarchical levels, that are often required to partition phenotypic variation into genetic and environmental sources. Second, conversion of the data to a network comprised of vertices and edges based on a distance matrix of absolute pairwise genetic correlations was used to approximate the genetic covariance structure among multiple traits (Ayroles et al. 2009; Stone and Ayroles 2009), in place of the true genetic variance–covariance (G) matrix, the multivariate extension of bivariate genetic correlations that is modeled in standard multivariate quantitative genetics (Mezey and Houle 2005; Hines and Blows 2006; Meyer and Kirkpatrick 2008). Genetic information from the sign of bivariate genetic correlations, and hence the exact nature of how transcripts are coregulated, was lost as a consequence of this transformation. This precludes a formal determination of the genetic independence of expression across multiple transcripts, based on the true eigenstructure of the multivariate genetic relationships among transcripts. Ideally, the G matrix among a large number of transcript abundances needs to be directly estimated within an established hypothesis-testing framework so that the multivariate genetic relationships among transcripts can be fully characterized.
Using a well-characterized example of adaptation, we demonstrate how high-dimensional genetic analysis of gene expression can be accomplished by determining the modularity of the effect space of the among-genotype (genetic) variance in a multivariate linear model (Hine and Blows 2006). Reproductive character displacement is an adaptation that occurs when individuals from two different species that coexist encounter each other during mate choice, and suffer a fitness cost if they perceive an individual from the other species as a potential mate (Brown and Wilson 1956; Howard 1993). In this situation, reinforcing selection acts on the traits that are used by individuals in mate choice so that they evolve to avoid making such mistakes. Species of the Drosophila serrata complex use contact pheromones, comprised of cuticular hydrocarbons (CHCs), to identify potential mates. The CHCs of male D. serrata are under strong sexual selection as a consequence of female choice (Blows et al. 2004; Higgie and Blows 2008) and display reproductive character displacement in field populations where the closely related D. birchii is sympatric with D. serrata (Higgie et al. 2000; Higgie and Blows 2007). The reproductive character displacement evolves in experimental sympatry under laboratory conditions (Higgie et al. 2000), demonstrating that reinforcing selection is responsible for the divergence in CHCs among sympatric and allopatric D. serrata populations. Enzymes that are involved in the production of Drosophila CHCs have been shown to have very high rates of evolution in gene expression between the sexes (Shirangi et al. 2009), suggesting that gene regulation may play a major role in the response to selection of these traits.
We present the results from a series of three genetic analyses. First, using a panel of recombinant inbred lines (RILs) generated from two populations of D. serrata that have diverged in response to reinforcing selection, we determined that the evolutionary response to selection was associated with changes in expression of a large number of gene transcripts, but that these changes were explained by a much smaller number of genetically independent changes in regulation. Second, we show that the two major genetic modules identified by the high-dimensional genetic analysis were genetically correlated with the morphological traits under reinforcing selection, suggesting that changes in transcript expression underlie the adaptive changes in morphology. Finally, we used Quantitative Reverse Transcription PCR (qRT-PCR) to provide independent experimental validation of the genetic association between transcript abundance and CHC phenotypes. The expression of three candidate genes, identified as playing a major role in the two important genetic modules by the multivariate genetic analysis, was shown to be genetically correlated with CHC expression.