Length polymorphisms at two candidate genes explain variation of migratory behaviors in blackpoll warblers (Setophaga striata)

Abstract Migratory behaviors such as the timing and duration of migration are genetically inherited and can be under strong natural selection, yet we still know very little about the specific genes or molecular pathways that control these behaviors. Studies in candidate genes Clock and Adcyap1 have revealed that both of these loci can be significantly correlated with migratory behaviors in birds, though observed relationships appear to vary across species. We investigated geographic genetic structure of Clock and Adcyap1 in four populations of blackpoll warblers (Setophaga striata), a Neotropical–Nearctic migrant that exhibits geographic variation in migratory timing and duration across its boreal breeding distribution. Further, we used data on migratory timing and duration, obtained from light‐level geolocator trackers to investigate candidate genotype–phenotype relationships at the individual level. While we found no geographic structure in either candidate gene, we did find evidence that candidate gene lengths are correlated with five of the six migratory traits. Maximum Clock allele length was significantly and negatively associated with spring arrival date. Minimum Adcyap1 allele length was significantly and negatively associated with spring departure date and positively associated with fall arrival date at the wintering grounds. Additionally, we found a significant interaction between Clock and Adcyap1 allele lengths on both spring and fall migratory duration. Adcyap1 heterozygotes also had significantly shorter migration duration in both spring and fall compared to homozygotes. Our results support the growing body of evidence that Clock and Adcyap1 allele lengths are correlated with migratory behaviors in birds.

Investigating the genetic control of migratory behaviors in a diversity of bird taxa will therefore allow us to better understand the evolution of migration within birds, as well as ongoing natural selection on these behaviors in response to environmental change (Pulido & Berthold, 2010).
Several studies have shown a correlation between length polymorphisms in Clock and Adcyap1 and migratory traits in birds (Table 1). For Clock, individuals with a greater number of glutamine repeats in the 3′ polyglutamine tail (i.e., longer alleles) exhibit delayed migratory and breeding phenology Bourret & Garant, 2015;Caprioli et al., 2012;Liedvogel, Szulkin, & Knowles, 2009;Saino et al., 2015) and longer migratory distance (Peterson et al., 2013), relative to individuals with shorter alleles. For Adcyap1, longer alleles have been shown to be associated with greater migratory restlessness (Mueller et al., 2011;Peterson et al., 2013), earlier spring arrival dates (Mettler, Segelbacher, & Schaefer, 2015), and earlier postnatal dispersal (Chakarov, Jonker, Boerner, Hoffman, & Krüger, 2013). Additionally, in one species, more northerly breeding populations had longer Adcyap1 alleles on average than southerly populations (Bazzi et al., 2016), which may reflect geographic variation in migratory strategies or phenological schedules resulting from local adaptation to environmental cues (Johnsen et al., 2007). However, the relationships between candidate genes and migratory phenotypes may also be influenced by local environmental factors such as temperature, photoperiod, and breeding density (Bourret & Garant, 2015), potentially complicating interpretations of geographic patterns within species. Further complicating the study of candidate genes and migratory traits, an increasing number of studies in birds have found no correlation between Clock or Adcyap1 allele length and migratory behavior (Table 1; Bazzi et al., 2016Bazzi et al., , 2017Contina, Bridge, Ross, Shipley, & Kelly, 2018;Dor et al., 2012). The interspecific variation in genotype-phenotype correlations for these candidate migratory-phenology genes highlights the challenge thus far of generalizing the expected relationship between length polymorphism and migration timing, as well as identifying the mechanism by which length variants affect migratory behaviors.
Given the potential evolutionary and ecological importance of these migratory candidate genes, and the variation in observed patterns across species, it is valuable to continue to build evidence to either support or refute a role of these genes in migratory behavior in different avian species, particularly to help understand why patterns vary across species. Here, we contribute to this growing number of studies by examining relationship between Adcyap1 and Clock and migratory behavior in the blackpoll warbler (Setophaga striata), a small (12 g) long distance Neotropical-Nearctic migrant Birds depart the wintering grounds in mid-April to mid-May and travel through the Greater Antilles and continental United States to their northern breeding territories, a trip that can vary in duration considerably depending on breeding location (mean 34 days, range 17-49 days; DeLuca et al., 2019). After departing the breeding grounds in August through October, blackpoll warblers first migrate southeastward to the Atlantic coast of United States and Canada, then migrate south over the Atlantic Ocean to their South American wintering grounds, a nonstop trans-oceanic flight of up to 3,000 km that may take up to three days (DeLuca et al., 2015(DeLuca et al., , 2019
The Alaska population included samples from Nome (n = 10), Denali National Park (n = 2), and southwest Alaska (n = 2). Four sampling locations were grouped together into the Eastern population: Adirondack and Catskill Mountains, New York (n = 8); Cape Breton, Nova Scotia (n = 11); western Newfoundland (n = 5); and southeastern Labrador (n = 5). All samples from Yukon are from Whitehorse, Yukon Territories, and all samples from Manitoba are from Churchill, Manitoba. Previous analysis of neutral loci revealed significant genetic structure in mitochondrial DNA marker between Newfoundland and other Eastern subpopulations (Ralston & Kirchman, 2012), but no significant structure and a large number of shared alleles at neutral microsatellite markers (Ralston & Kirchman, 2013). Due to the small sample size of Newfoundland in the current study, we group these samples within the Eastern population.
We used the package hierfstat (Goudet, 2004)  We further investigated geographic structure at these loci using the program STRUCTURE version 2.3.4 (Pritchard, Stephens, & Donnelly, 2000), which uses a Bayesian clustering analysis to determine the most likely number of populations (K). We used an admixture model with an assumption of correlated allele frequencies among populations, and population of each sample as a prior. We ran 10 iterations for each K ranging from 1 to 4 with 1,000,000 MCMC steps following a burn-in of 100,000 steps, and used mean natural log probability to determine the most likely number of populations.
We used the program DISTRUCT (Rosenberg, 2004) to visualize population assignment of each individual from the STRUCTURE run with the greatest log probability for each value of K. Additionally, we performed a principal components analysis (PCA) implemented in the R package ade4 (Dray & Dufour, 2007). We visualized clustering of genotypes in ordination space using principal components with eigenvalues greater than 1.0.
While the above genetic analyses assess variation in the frequencies of alleles across populations, they do not directly assess patterns in allele length per se. In other words, in the above analyses, alleles are treated as categorical, while we are also interested in allele length as a continuous variable across geography. We therefore also assessed variation in allele lengths across populations, as well as across latitude and longitude. For these population-level analyses, we consider latitude and longitude as proxies of migratory behaviors, specifically migratory distance, with the understanding that populations that breed further north and west migrate further each year. We used a series of general linear models (GLMs) with longitude and latitude as the predictor variables to determine whether allele length varied across either sampling latitude or longitude. It is unknown whether there are any allelic interactions or dominance effects at these loci relative to the migratory traits of interest, though previous studies have suggested dominance of the longer allele in Clock (Saino et al., 2015) and other genes with 3′ polyglutamine repeats (Ross, 2002). Due to small sample sizes per genotype, and high variability in Adcyap1, we could not directly compare genotypes for dominance effects (as in Liedvogel et al., 2009;Saino et al., 2015).
However, we did run separate GLMs using either minimum allele, maximum allele, or mean allele length for each individual as the dependent variable. If longer Clock alleles are dominant, we would expect to see stronger relationships between individuals' maximum allele length and migratory traits.

| RE SULTS
We successfully genotyped 72 Blackpoll Warblers at candidate genes Clock and Adcyap1. We observed four Clock alleles ranging in size from 186 to 195 base pairs (Table 2). We found relatively greater variation in Adcyap1; we detected 13 Adcyap1 alleles ranging from 153 to 169 base pairs, five of which were found in single individuals ( with K = 1 as the most likely number of groups (Figure 3a). For all values of K, the assignment probability of each individual was roughly equal for all populations (i.e., the probability of any individual belonging to any of K populations ≈ 1/K; Figure 3b). The first two principal components from a PCA on genotype data had eigenvalues greater than 1.0 (1.61 and 1.26, respectively) and together explained 71.68% of the variation in genotype data. A PCA plot on these axes revealed no clustering by population. All populations overlapped in ordination space (Figure 3c). We found no significant relationship between allele length for either locus and either longitude or latitude, regardless of whether individual minimum, maximum, or mean allele lengths were considered (all p > 0.05, all R 2 values <0.05; Appendix 2). From our GLMs, we found a significant correlation between population and spring and fall departure, and spring and fall duration, but not with spring and fall arrival. These results are not explored further here, as population variation in migratory timing is more thoroughly discussed in DeLuca et al. (2019).
Despite finding no evidence of geographic variation in candidate genes, we did find evidence that both Clock and Adcyap1 allele length were correlated with migratory behaviors at an individual level (Table 4) (Table 4). Results were consistent when using longitude and latitude as covariates instead of population.
We also found an interaction effect between Clock and Adcyap1 allele lengths on migratory duration ( Figure 6). When we used mean This Clock × Adcyap1 interaction effect was weaker when using either maximum Clock allele length or mean Adcyap1 allele length (Table 5).
Lastly, we tested for differences in each of the migratory traits between homozygotes and heterozygotes of each locus. We found individuals that were heterozygous at Adcyap1 had significantly  Our results showed longer minimum Adcyap1 alleles were associated with earlier spring departure and later fall arrival on the wintering grounds. These results are generally consistent with work previously published on other species (Table 1). While no other studies to our knowledge have shown a significant correlation between Adcyap1 and arrival date on the wintering grounds as we show here, these results are consistent with an effect of Adcyap1 on longer fall migration duration (Mueller et al., 2011;Peterson et al., 2013). The strongest relationships between migratory traits and Adcyap1 were with an individual's minimum allele length (Appendix 3). This may suggest dominance of the shortest Adcyap1 allele, a finding not reported in other species. However, the one individual in our sample that was homozygous for the longest Adcyap1 allele (163 bp) had by far the earliest spring departure, the longest spring and fall duration, and the latest fall arrival. This may support that the effects of long Adcyap1 alleles on migratory behavior are additive instead of dominant, as was previously suggested by Bourret and Garant (2015).

| D ISCUSS I ON
Future studies with larger sample sizes may be able to more directly assess dominance by comparing migratory traits across genotypes at the individual level (Liedvogel et al., 2009;Saino et al., 2015). Our results that a single individual with a rare genotype differs greatly in migratory behavior are also consistent with studies published on other species that show a few individuals with rare genotypes can show significantly different migratory traits compared to the rest of the population . For example, a single individual barn swallow with the longest observed Clock allele had significantly later migration in both spring and fall compared to the rest of the population .
We found evidence of a significant interaction between Clock and Adcyap1 allele lengths on migratory duration, the first such finding in studies of migratory birds to our knowledge. Previous studies that have investigated the effects of both Clock and Adcyap1 either did not find a significant interaction between genes or did not test for one (Bazzi et al., 2016(Bazzi et al., , 2017Bourret & Garant, 2015;Contina et al., 2018;Peterson et al., 2013;Saino et al., 2015). In this interaction in our study, Adcyap1 length appears to increase migration duration when corresponding Clock allele length is short, especially in the fall.
The peptide product of Adcyap1, PACAP, regulates the expression of Clock in chicken (Gallus gallus domesticus) pineal glands (Nagy & Csernus, 2007), which may suggest the interaction effect we observe is the result of differential expression of Clock as determined , and a minimum convex polygon was drawn around each population. All population overlap with no clustering based on population, indicating no geographic structure departure date than arrival date, perhaps due to stronger selection on arrival date especially in the spring (Nilsson et al., 2013). Our data also showed a strong positive correlation between spring and fall duration (r = 0.688), and both of these characters were associated with a significant interaction between Clock and Adcyap1 ( Figure 6).
Whether these genes influence spring and fall duration via a common mechanism, perhaps through increased migratory restlessness (Mueller et al., 2011;Peterson et al., 2013), or whether these characters are independent requires further investigation. Nilsson et al. (2013) examined timing and speed of fall and spring migration in published tracking studies and found evidence for stronger selection on spring migratory phenology compared to fall, potentially suggesting evolutionary independence of these phenotypic characters.
Part of the answer to this question about evolutionary independence depends on the control of these characters at a proximate molecular level. Are there separate molecular pathways that control departure dates and duration, or that control spring and fall duration, or are these all proximately linked? PACAP is known to have broad influence on the physiology and behavior of organisms, acting in the brain and throughout peripheral organs (Mueller et al., 2011;Vaudry et al., 2009). It is therefore plausible that this gene could influence similar behaviors in separate seasons via independent molecular pathways. Conversely, it is perhaps equally plausible that a common molecular pathway is triggered by environmental cues in multiple seasons. This again highlights the need for studies that further investigate the functional role of candidate genes, how they influence migratory behaviors throughout the annual cycle, both independently and interactively with other factors (e.g., age, sex, environmental conditions), and specifically how variation in allele length influences the expression level, structure, and molecule functioning of gene products.
Although we found relationships between candidate genes and migratory traits at the individual level, we found no geographic structure in candidate genes. One possible explanation for this is that geographic variation in migratory behavior is explained by variation in environmental cues and not local adaptation in candidate genes. Geographic variation in behavior may be the result of plastic responses to variable environments, independent of the effects of those environments on selection in candidate genes (Foster, 1999).
Therefore, to the extent environmental cues vary across breeding  Note: Allele indicates whether an individual's minimum, maximum, or mean allele length was used for each analysis. Slope (β), p-value, and partial R 2 values are given for allele length and each migratory trait. Population was also used as a predictor variable, though the slopes and p-values for this factor are not provided. Only models for which allele length was significant (p ≤ 0.05) are shown. Results for all general linear models provided in Appendix 3. *Significant single gene effects that are not further discussed because of a significant genetic interaction effect on this migratory trait (see Table 5).

F I G U R E 4 Relationship between individual maximum
Clock allele length and spring arrival date residuals after accounting for population. Line represents slope from the general linear models with population and maximum Clock allele as independent variables. Individuals with longer maximum Clock allele lengths arrived earlier for their population compared to individuals with shorter maximum Clock alleles Further, Bourret and Garant (2015) point out that gene-by-environment interactions are underappreciated in the study of candidate genes. In their study of breeding phenology in tree swallows (Tachycineta bicolor), they found most candidate gene genotypephenotype relationships were affected by environmental variables such as breeding density, latitude (a proxy of photoperiod), and temperature (Bourret & Garant, 2015). If genotype-phenotype relationships are influenced by environment, again we may observe behavioral differences across populations without underlying population genetic differences. Alternatively, it might simply be that any interpopulation variation in allele frequencies or lengths are undetectable given high levels of intrapopulation variation and our relatively small sample size; however, our result of no geographic structure of candidate genes is consistent with studies in other bird species (Bazzi et al., 2016;Dor et al., 2012Dor et al., , 2011Johnsen et al., 2007). It is unlikely that the lack of geographic structure in candidate genes in blackpoll warblers is due solely to admixture from gene flow between locally adapted populations. Ralston and Kirchman (2012) found genetic structure due to isolation by distance at the continental scale in this species, despite a relatively high number of shared alleles among populations (Ralston & Kirchman, 2012. Future studies may benefit from comparing geographic patterns in candidate and neutral loci (Contina et al., 2018).
We observed greater allelic diversity in blackpoll warblers at Adcyap1 than at Clock, a pattern that appears to be typical across birds (Bazzi et al., 2016;Contina et al., 2018;Peterson et al., 2013; Bazzi et al., 2015Bazzi et al., , 2016Chakarov et al., 2013;Dor et al., 2011). Low heterozygosity at microsatellite loci may be the result of stabilizing selection or loss of diversity due to inbreeding and therefore useful in studies of individual fitness (Chapman, Nakagawa, Coltman, Slate, & Sheldon, 2009;Dor et al., 2011). While a single microsatellite locus is not sufficient in estimating genome-wide heterozygosity or inbreeding (Miller et al., 2014), individual heterozygosity at a small number of microsatellite loci can be useful in the study of avian life history traits and fitness (Forstmeier, Schielzeth, Mueller, Ellegren, & Kempenaers, 2012;Lens et al., 2000), especially if those loci are functionally important and under selection (Chapman et al., 2009). Adcyap1 heterozygosity in Eurasian blackcaps was significantly associated with earlier spring arrival (Mettler et al., 2015), perhaps suggesting these individuals were of greater migratory fitness and able to migrate more quickly.
We found similar results in blackpoll warblers in that heterozygotes at Adcyap1 had shorter spring and fall duration than homozygotes.
We found no associations between heterozygosity at Clock and any migratory trait. Few studies of migratory candidate genes have examined the effect of individual heterozygosity, but our results and those of Mettler et al. (2015) suggest this is a potentially fruitful avenue of further investigation.
The results of our individual-level analyses demonstrate the value of studies that reveal individual migratory phenotypes, for example, by using light-level geolocators (McKinnon & Love, 2018).
The combination of geolocator data with candidate gene analysis has been an important advancement in the study of migration Contina et al., 2018;Saino et al., 2017). For example, in the study of barn swallows, individual-level analyses revealed that rare Clock genotypes can have a significant impact on migratory phenology  and that degree of DNA methylation at the Clock gene can explain individual variation in migratory behavior (Saino et al., 2017). Both of these insights would likely have been missed with analysis of population-level genetic and migratory variation. Increasing the number of species and individuals with linked genotype-phenotype information will allow richer investigations of migratory candidate genes.
Understanding variation in candidate genes across species is especially important for behaviors that are evolutionarily labile and may have arisen independently in multiple lineages. We therefore encourage future studies of candidate migratory genes to investigate species that are codistributed, share a biogeographic history, or are in the same family as those species that have already been studied. This will allow a better understanding of the influences of environment and history on selection at candidate genes, as well as the degree to which these patterns are conserved within lineages.

ACK N OWLED G M ENTS
We thank Christian Artuso, Yousif Attia, John Brett, Jukka Jantunen, Tissue and blood sampling locations, population assignments for the current study, and original tissue sources (population, sample location, source, n)

Population (n) Source
Source n

Catalog numbers Geolocator numbers
Alaska (

APPENDIX 3
General linear model results for individual-level analyses of blackpoll warbler candidate gene alleles and migratory traits. Allele indicates whether an individual's minimum, maximum, or mean allele length was used for each analysis. Slope (β) with standard error (SE), p-value, and partial R 2 are given for the relationship between allele length and each migratory trait. Population was also used as a predictor variable, though the slope and p-values for this factor are not provided. Models with a significant slope for allele length (p ≤ 0.05) are shown with bold text.