Exploratory analysis of multi‐trait coadaptations in light of population history

Abstract During the process of range expansion, populations encounter a variety of environments. They respond to the local environments by modifying their mutually interacting traits. Common approaches of landscape analysis include first focusing on the genes that undergo diversifying selection or directional selection in response to environmental variation. To understand the whole history of populations, it is ideal to capture the history of their range expansion with reference to the series of surrounding environments and to infer the multitrait coadaptation. To this end, we propose a complementary approach; it is an exploratory analysis using up‐to‐date methods that integrate population genetic features and features of selection on multiple traits. First, we conduct correspondence analysis of site frequency spectra, traits, and environments with auxiliary information of population‐specific fixation index (F ST). This visualizes the structure and the ages of populations and helps infer the history of range expansion, encountered environmental changes, and selection on multiple traits. Next, we further investigate the inferred history using an admixture graph that describes the population split and admixture. Finally, principal component analysis of the selection on edge‐by‐trait (SET) matrix identifies multitrait coadaptation and the associated edges of the admixture graph. We introduce a newly defined factor loadings of environmental variables in order to identify the environmental factors that caused the coadaptation. A numerical simulation of one‐dimensional stepping‐stone population expansion showed that the exploratory analysis reconstructed the pattern of the environmental selection that was missed by analysis of individual traits. Analysis of a public dataset of natural populations of black cottonwood in northwestern America identified the first principal component (PC) coadaptation of photosynthesis‐ vs growth‐related traits responding to the geographical clines of temperature and day length. The second PC coadaptation of volume‐related traits suggested that soil condition was a limiting factor for aboveground environmental selection.


| INTRODUC TI ON
Populations adapt to new environments through selection on preexisting alleles and/or new mutations in adaptation-related loci in the genomes (Barrett & Schluter, 2008). Therefore, adaptation of populations of a species to novel environments changes allele frequencies of loci under selection. Environmental adaptation processes can also create significant differences in phenotypes and traits among populations of a species. When correlated with variation in environmental factors over local subpopulations (hereafter, populations), such variation in traits and phenotypes may reflect phenotypic plasticity or genetic adaptation of the populations. Coop et al. (2010) proposed that significant correlations could be detected between single nucleotide polymorphism (SNP) allele frequencies and environmental variables, bypassing trait variables. Through the annotation of identified SNPs, it may be possible to characterize the type of adaptation.
Adaptation to environmental factors can change traits and phenotypes of a species, thereby creating population structure underpinned by functional loci. Geographical isolation, which can lead to reproductive isolation and consequent differences in allele frequencies of neutral loci, also contributes to population structuring (Wright, 1965). Divergent selection in an environmental gradient may affect genome-wide population structure (Nosil et al., 2009;Orsini et al., 2013). Empirical studies showed that aridity gradients caused geographically structured populations of Poaceae characterized by cytotype segregation of diploids and allotetraploids (Manzaneda et al., 2012). Geographic distance and habitat differences between populations impacted population structure of marine species (Bradbury & Bentzen, 2007;Jorde et al., 2015;Kitada et al., 2017). Therefore, population structure needs to be considered when analyzing correlations among genes, traits, and environmental factors across population samples taken from a wide range of geographical regions.
To identify genetic adaptations, the common landscape genomics approach identifies SNPs that are diversified among populations.
Adaptations to a specific environment may be determined by identifying among-population variation of allele frequencies that are associated with the variation of environmental values. Genome-wide association study (GWAS) results and annotation are used to characterize the environmental adaptation of SNPs. Another approach is to infer the positive selection on a trait during its evolutionary history by estimating the change in allele frequencies of associated SNPs beyond the level of genetic drift. So-called "genome scan methods" consider geographically structured populations and detect SNPs related to environmental variables, traits, and phenotypes (De Mita et al., 2013;De Villemereuil et al., 2014). For example, BayeScan (Foll & Gaggiotti, 2008) measures the significance of an SNP's locusspecific global F ST value, which represents the amount of genetic variation among populations, in a Bayesian framework. Genotypeenvironment associations analyze the allele frequencies of SNPs in sampling locations and test their associations with the environmental variables (Capblancq et al., 2020). Bayenv (Coop et al., 2010) and the latent factor mixed model (Frichot et al., 2013) can detect SNPs that are highly correlated with environmental factors and traits on the basis of allele frequencies. Notably, these methods essentially do not require phenotypic data. Hence, they are valid, especially when the history is complex and cannot be appropriately measured by a few trait variables or the environmental selection on the phenotypes are not characterized (Capblancq et al., 2020). During the evolutionary history of range expansion, the frequencies of existing and derived alleles in a population vary stochastically, and various pressures of environmental selection affect the allele frequencies of related genes and phenotypes. Systematic information on the associations between traits and SNPs in some species, such as humans (Watanabe et al., 2019) and Arabidopsis (Togninalli et al., 2019), enabled mapping of adaptive evolution of polygenic traits on admixture graphs (Racimo et al., 2018).
However, wild populations change their distributions gradually or abruptly over generations and encounter a variety of environments.
They adapt to the local environments by modifying, in balance, multiple traits that are mutually interrelated. In Bristol Bay, early spawners of sockeye salmon lay eggs in late July to mid-August in small streams that have cold temperatures until hatching but spawners in late August to October in large rivers/lakes that experience much warmer temperature, where the average egg size also reflects the habitat-specific incubation environment and tradeoffs between egg size and number of eggs are population-specific (Hilborn et al., 2003). Such reproductive traits adapted to specific environment might be controlled by related genes. Gonadotropin-releasing hormone increases in adult salmon brains during homing migration and controls gonadal maturation during the final phases of upstream migration to spawn (Ueda, 2019). Moreover, populations of walking stick insects diverged in body size, shape, host preference, and behavior in parallel with the divergence of their host plant species (Nosil et al., 2002). Many plant species experience regular wildfire and have evolved sets of traits to adapt to this natural selection pressure. By using phylogenetic independent contrasts, Schwilk and Ackerly (2001) tested whether pines improved their traits for a survival/avoidance strategy or a fire-embracing strategy. Phylogenetic analysis of 38 pine species demonstrated that much variation in trait evolution occurred along a fire-surviving/fire-embracing axis.
Here, we propose a complementary approach to identify multitrait coadaptation to a set of environmental changes that populations encountered in the history of their geographical distribution. This is an exploratory analysis that integrates features of population genetic information and trait-specific positive selection.

T A X O N O M Y C L A S S I F I C A T I O N
Evolutionary ecology; Population genetics; Theorectical ecology These features are generated by up-to-date inference methods.
Correspondence analysis visualizes the population structure with reference to the traits and environmental variables; the colors of the populations represent their ages and help infer their history of range expansion, the environmental selection pressures on the populations, and change in their traits in response. To characterize the environmental selection on the traits, we conducted principal component analysis (PCA) on a selection on edge-by-trait (SET) matrix.
The biplot identifies the principal multitrait coadaptation, and the relevant edges of the admixture graph describe the history of population splits and admixture. To identify the environmental factors that caused the coadaptation, we newly define the factor loadings of the environmental variables based on the factor loadings of the traits and the current among-population correlations, which are overlaid on the biplot. Once we gain insight into the pattern of multitrait coadaptation and their driving force of environmental factors, we in turn confirm the SNPs controlling the traits to examine the genetic changes that enabled the adaptations. The R codes for our representation method and simulations of population colonization used in this study are available in the Supporting Information. This approach also accepts SNP genotype data and reads Genepop format (Raymond & Rousset, 1995;Rousset, 2008).

| Correspondence analysis: chronologically ordered population structure in the map of traits and environmental variables
To first gain insight into the history of a population's range expansion and the environmental adaptation of their traits, we applied correspondence analysis to the data of the SNP allele frequencies, the environmental variables, and the mean values of the traits in each population. It characterizes the populations based on genetic, environmental, and trait variables together. Since SNP site frequencies comprise the majority of the data, the locations of the populations in the plot of correspondence analysis reflect the population structure.
It provides a cue on the traits and environmental variables, and SNPs (if any), which are unique to some populations. By adding a crude measure of their ages as below, the plot provides a crude sketch on the history of range expansion of the populations, the environmental change they experienced and the associated traits adaptation. For SNP allele frequencies, we used the frequencies of derived alleles to infer signatures of environmental adaptation. However, it is difficult to know which allele is derived in data without information on the states in closely related species. In this paper, we adopted an ad hoc approach of using the frequencies of minor alleles as a substitute and expected that the frequencies of most derived alleles would still be low. This simple assignment of derived alleles may have errors, but we hoped that it would capture the SNPs with enhanced allele frequencies that facilitated adaptation to the local environments.
To distinguish the association on the basis of positive and negative  (Kitada et al., 2021). We extended the population-specific F ST estimator to overall loci as where M i W,l is the unbiased within-population matching of two distinct alleles of locus l (l = 1 ∼ L) in population i (i = 1 ∼ K), and M B l is the between-population-pair matching average over population pairs (Buckleton et al., 2016). To interpret the adaptation of the populations during their range expansion, we identified the significant correlations between the genes and the environmental variables (Appendix S1).

| PCA of SET matrix: scores of edges of the admixture graph, and loadings of traits and environments
Visual inspection of multitrait coadaptation of the populations was investigated further using the admixture graph that describes the history of population splits and admixture. We conducted PCA of the SET matrix, which is explained below.
To obtain the input data required for PolyGraph, we conducted

| Simulation of range expansion and adaptation
To illustrate how the overview generated by the exploratory analysis can be interpreted, we conducted a simulation of a simple scenario were under the selection of E 2 . The values of the traits of each type of environmental factor were not the result of pleiotropy of shared genomic loci but experienced the same selection pressure. In this study, we needed to simulate the dynamics of phenotypic traits and the allele frequencies at the loci that contribute to the traits. It was not computationally practical to specify the selection coefficients on the loci directly and to carry out individual-based simulation.
As a result, we simulated the dynamics of the population allele frequencies and mean traits. Each trait was polygenic, generated by summing 10 latent traits that are monogenic and affected by the environment. Adaptation to an environmental stress is often accompanied by the cost of reduced activity in the normal environment (e.g., Baucom & Mauricio, 2004); therefore, the derived alleles can adapt to the severe environment at the expense of cost in the normal environment. Our simulated derived adaptive allele at the genetic locus contributing to each latent trait had the selection coefficient of s = 0.01 in the environment E = 1 and 1 1 + s − 1 = − 0.01 in the environment E = 0 (see equations A4 and A5 in Appendix S3). Although the preexisting alleles and de novo mutations contribute to environmental adaptations, in this simulation, for simplicity we assumed that the relevant preexisting loci were already monomorphic in the ancestral population.
For the ancestral population, the allele frequencies at polymorphic loci were set to the theoretical equilibrium distribution, (Wright, 1931). In the history of population expansion, populations randomly gave birth to neutral and adaptive Neutral mutations were generated at random over generations and populations. A few adaptive mutations were generated every generation. Although many of these generated polymorphic loci became monomorphic before the present, which is the 260th generation, we kept loci that retained their polymorphism to this time. As a simplified procedure that mimics the SNP discovery process, we randomly selected a prespecified number of SNPs. In this simulation, we selected observed SNPs of 10,000 initial neutral loci, 500 newly derived neutral loci, and one newly derived environmentally adaptive locus for each latent trait. Then, we calculated the allele frequencies of the sample, which consisted of 50 individuals from each population.

| Application to natural populations of black cottonwood data in North America
As an empirical example, we analyzed publicly available data that included genetic and trait information of 441 individuals of from natural populations of black cottonwood (Populus trichocarpa), which were collected from various regions over a range of 2500 km near the Canadian-US border at a latitude of 44′-59′ N, a longitude of 121′-138′ W, and an altitude of 0-800 m (Geraldes et al., 2013;McKown, Guy, Quamme, et al., 2014). density (ADd), abaxial stomata density (ABd), average of two measurements of leaf rust disease morbidity (DP), 14 phenology traits, 12 biomass traits, and 16 ecophysiology traits (see Table S1). Each sampling location of a population was described by nine environmental/geographical variables: altitude (ALT), longest yearly day length (photoperiod) (DAY), frost-free days (FFD), mean annual temperature (MAT), mean warmest month temperature (MWMT), mean annual precipitation (MAP), mean summer precipitation (MSP), annual heat-moisture index (AHM, ~MAT/MAP, an indicator of drought), and summer heat-moisture index (SHM, ~MWMT/ MSP). The day length and temperature have a north-south cline, whereas temperature, rainfall, and drought have an east-west (coastal to inland) cline . In addition, 18 soil conditions, including the ratio of clay, silt, sand, and gravel, soil depth, bulk density, cation exchange capacity, organic carbon, and pH, each of which were observed in topsoil and subsoil, were obtained from the Unified North American Soil Map (Liu et al., 2013) and used as environmental values of the sampling locations (see Table S2).

| Correspondence analysis
The plot of correspondence and correlation analysis provided a sketch of the global structure of genetic differentiation and adaptation ( Figure 3a). The colors of the populations, which represented the population-specific F ST values (Figure 3b), indicated distribution expansion in three directions; that is, the expansion from inland (s27, i15, i16) to the coast (s20-s26) and to northern (n1-n12) and southern areas (o29 and o30) (Figures 1 and 3a,b). Population s27, which had the lowest population-specific F ST values, may be better labeled as "si27" because it is located inland.
The color gradient of population-specific F ST values on the population labels indicates that the ancestral population might inhabit the inland area (SBC s27, and IBC i15 and i16). s27 is characterized by warm temperature (MWMT), whereas i15 and i16 are character- were correlated with genes associated with drought and osmotic regulation: CBF4 (response to drought and cold stress; Haake et al., 2002;Hussain et al., 2018), XERICO (response to osmotic stress, response to salt stress; Ko et al., 2006), SAL1 (response to water deprivation and salt stress; Wilson et al., 2009), MYB85 (cell wall biogenesis responding water deprivation and salinity; Winter et al., 2007), and APX1 (water deficit; Zandalinas et al., 2016). This indicates that poplar might have initially adapted to the inland area.
Slightly larger population-specific F ST values than those of IBC

Populations in NBC had large population-specific F ST values
and little genetic diversity, suggesting that they are young and that natural populations of black cottonwood expanded to the northern area. This area is characterized by long day length in summer (+DAY), which means that day length varies greatly from season to season, and low temperatures (−MAT, −MWMT, and −FFD), as described in Figure 3a. These variables were correlated with adaxial stomata density and leaf rust disease (DP). This finding supports the preceding knowledge that the adaxial stomata compensates for reduced photosynthetic efficiency in the northern area; however, there is a risk of pathogen invasion (Melotto et al., 2006). 3.2.2 | PCA and multitrait coadaptation mapped on the admixture graph When we set the population i15 as a root, admixture events into populations i15 and i16 were identified from population n13 in NBC ( Figure S3). This suggested that the small population-specific F ST values and high genetic diversity observed in i15 and i16 were due to admixture. Therefore, population s27 was tentatively set as a root in the following analysis. We estimated the history of population splits and admixture by applying TreeMix to the population-specific genotype frequencies of the 34,131 SNPs. The populations expanded their distribution to the coastal region (s20-s26) and extended the range to northward (s17-19). Some extended further northward (n1-n12) and others extended southward (o29, o30) and toward inland (i15, i16) (Figures 1 and 4a).  Table S4) and mapped the positive selection parameters on the admixture graph ( Figure S4, Table S5).
To further investigate the history of multitrait coadaptation inferred by correspondence analysis, we performed PCA of the SET matrix and obtained the biplot of edges of the admixture graph and trait variables. The factor loadings of the environmental variables were calculated from the factor loadings of the trait variables and the trait-environment correlations (Table S6)  constraints (Bingham, 2001). The soil of the first PC included bulk and clay at higher latitudes with a cold and dry climate, and organic carbon and gravel at lower latitudes with a warm climate and more precipitation. However, the soil of the second PC was sand, gravel, and organic carbon at higher latitudes, and clay and silt at lower latitudes (Figure 4c, right).

| DISCUSS ION
A population evolves in space and time and responds to variable environments. From its birth, a population may continuously change its distribution range, and the initial localities may have occasionally been exposed to unprecedented environmental stress. In these localities, individuals and populations can acclimate to such environmental stresses by phenotypic plasticity in a short amount of time, and the populations can adapt by changing their geographical distribution or genomes over a long amount of time. We focused on the latter and aimed at understanding the whole history of the populations by conducting an exploratory analysis of multitrait coadaptations. The whole scheme of the analysis has multiple layers. In the first layer, we generated the features that are used as inputs for the second layer of analysis. The features are population-wide SNP site frequency spectra, genome-wide association with traits and environments, and polygenic adaptations mapped on the admixture graph. These features were obtained by ever-evolving population genetic and quantitative genetic procedures. The exploratory analysis is the second layer of analysis incorporating these features.
To reveal the whole scenario of historical change of geographical distribution and adaptation to the new environments, we conducted correspondence analysis that visualizes the population structure in relation with the SNPs, environmental variables, and trait variables. We aimed at showing, through the numerical simulation and analysis of empirical data, that the complexity of populations' life history can be interpreted well solely by integrating the information of among-population genetic difference and genome-wide association with multiple environments and multiple traits. A multilayered approach may be a practical choice. However, our approach is still in its infancy. One direction of future study is to include latent variables that are interpreted as key elements of environmental selection and adaptation. Another direction is quantifying the pattern of adaptation. A natural framework is a Bayesian approach that uses the features provided by the first layer analysis as the prior information.
As a final remark, we note that our analysis uses populations as units of analysis. However, populations are often defined by  (Table S5). For each trait, the vector of positive selection parameters that characterize the predicted increase/decrease of the trait value on the admixture graph was estimated using PolyGraph. Edges were expressed by their connecting node pairs. To understand the coadaptation with reference to the local environments, the loadings of the environments were calculated as the sum of products of loadings of the traits and the among-population correlations between the traits and environments (see Materials and Methods). The labels colored blue (populations) and green (environments) represent edges of the admixture graph and environments. The labels colored red represent traits. Labels of traits and environments are explained in Tables S1 and S2

ACK N OWLED G M ENTS
We appreciate the essential comments made by the reviewers of Molecular Ecology Resources, which significantly improved the manuscript. This study was supported by Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research KAKENHI nos. 16H02788 and 19H04070 to H.K. We thank Mallory Eckstut, PhD, from Edanz for editing a draft of this manuscript.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The authors affirm that all data necessary for confirming the conclu- Hirohisa Kishino https://orcid.org/0000-0002-3244-359X