Recent advances in conservation and population genomics data analysis

Abstract New computational methods and next‐generation sequencing (NGS) approaches have enabled the use of thousands or hundreds of thousands of genetic markers to address previously intractable questions. The methods and massive marker sets present both new data analysis challenges and opportunities to visualize, understand, and apply population and conservation genomic data in novel ways. The large scale and complexity of NGS data also increases the expertise and effort required to thoroughly and thoughtfully analyze and interpret data. To aid in this endeavor, a recent workshop entitled “Population Genomic Data Analysis,” also known as “ConGen 2017,” was held at the University of Montana. The ConGen workshop brought 15 instructors together with knowledge in a wide range of topics including NGS data filtering, genome assembly, genomic monitoring of effective population size, migration modeling, detecting adaptive genomic variation, genomewide association analysis, inbreeding depression, and landscape genomics. Here, we summarize the major themes of the workshop and the important take‐home points that were offered to students throughout. We emphasize increasing participation by women in population and conservation genomics as a vital step for the advancement of science. Some important themes that emerged during the workshop included the need for data visualization and its importance in finding problematic data, the effects of data filtering choices on downstream population genomic analyses, the increasing availability of whole‐genome sequencing, and the new challenges it presents. Our goal here is to help motivate and educate a worldwide audience to improve population genomic data analysis and interpretation, and thereby advance the contribution of genomics to molecular ecology, evolutionary biology, and especially to the conservation of biodiversity.


| INTRODUC TI ON
At this time, conservation and evolutionary geneticists can employ the power of genomic tools to answer questions in conservation that could not be answered using traditional genetics approaches (Allendorf, Hohenlohe, & Luikart, 2010;Bernatchez et al., 2017;Garner et al., 2016;Harrisson, Pavlova, Telonis-Scott, & Sunnucks, 2014;McMahon, Teeling, & Höglund, 2014;Shafer et al., 2015aShafer et al., , 2015b. Technological and analytical advances now allow us to use many thousands of loci, gene expression, or epigenetics to address basic questions of relevance for conservation, such as identifying loci associated with local adaptation or adaptive potential in species face changing environments (Bernatchez, 2016;Flanagan, Forester, Latch, Aitken, & Hoban, 2017;Harrisson et al., 2014;Hoban et al., 2016;Hoffmann et al., 2015;Jensen, Foll, & Bernatchez, 2016;Le Luyer et al., 2017;Wade et al., 2016). As conservation genomics matures, new challenges are arising. It is essential for researchers to keep up with the rapidly changing methods in appropriate study design, data quality assessment, and selecting appropriate analyses to obtain accurate results for conservation and management decisions (Benestan et al., 2016).
To address arising challenges, 15 experts from diverse areas of genomic data analysis came together to teach and exchange ideas about cutting-edge approaches for population genomic data analysis and interpretation. Students, postdocs, faculty, and agency researchers (e.g., museums, agency biologists) originating from 15 countries brought an assortment of data to work through various computational analyses. Of 31 students, 23 had restriction-site associated DNA (RAD) or genotyping by sequencing (GBS) data, four had exon capture data, and four students had whole-genome sequencing (WGS) data. Interestingly, of the 30 attendees at ConGen just 4 years ago, only a few students had RAD-seq data, only one had sequence capture data, and none had WGS data. The main focus of the 15 experts was on narrow-sense conservation genomics applications, which require use of conceptually novel approaches .
The week-long workshop, held at the University of Montana's Flathead Lake Biological Station, provided training in theory as well as empirical applications of NGS data production and analyses.
Lectures, discussions, hands-on analysis of empirical data, and oneon-one assistance from instructors improved students' knowledge of conservation and evolutionary genomic projects. Many participants in the past have taken the knowledge and resources (PowerPoint slides, worksheets, video recorded lectures) acquired during the workshop and disseminated it to others in their laboratories, further extending the educational reach of ConGen among population genomic researchers (http://www.umt.edu/sell/cps/congen2017/).
In the opening keynote lecture, L. Bernatchez discussed several mechanisms that may enhance the maintenance of genetic variation and evolutionary potential in the face of a changing environment.
Among these mechanisms that have been overlooked and should be considered in future theoretical development and predictive models, he discussed the prevalence of soft sweeps, the polygenic basis of adaptation, balancing selection, and transient polymorphisms, as well as epigenetic variation. A key message was that adaptive evolution in nature rarely involves the fixation of beneficial alleles. Instead, adaptation apparently proceeds most commonly by soft sweeps entailing shifts in frequencies of alleles being shared between differentially adapted populations. At last, L. Bernatchez argued that a new paradox seems to be emerging from recent studies whereby populations of highly reduced effective population sizes (N e ) and impoverished genetic diversity can sometimes retain their adaptive potential, and that epigenetic variation could account for this apparent contradiction (Bernatchez, 2016).
The remaining lectures focused mainly on approaches for data production or analysis. We discuss highlights from these lectures with the goal of motivating and educating a worldwide audience to improve population genomic data analysis and thereby advance the role of genomics in molecular ecology, evolutionary biology, and conservation. We describe (a) issues regarding recruiting and retaining a diverse workforce in conservation genomics, (b) impacts of genotyping error and data quality, and (c) improvements to downstream population genomic analyses. during the workshop included the need for data visualization and its importance in finding problematic data, the effects of data filtering choices on downstream population genomic analyses, the increasing availability of whole-genome sequencing, and the new challenges it presents. Our goal here is to help motivate and educate a worldwide audience to improve population genomic data analysis and interpretation, and thereby advance the contribution of genomics to molecular ecology, evolutionary biology, and especially to the conservation of biodiversity.

K E Y W O R D S
bioinformatics pipeline, conservation genomics workshop, diversity in STEM, landscape genomics, population genomics

| IN CRE A S ING CONTRIBUTIONS BY WOMEN (SAR AH HENDRICK S AND B RENNA FO RE S TER )
Following the productive trend at recent ecology and evolutionary biology conferences, issues of gender bias were discussed at ConGen. When this important topic is not widely and openly examined, it can inhibit the advancement of science generally, and conservation and population genomics specifically. Diversity leads to better problem-solving, expands the talent pool, and promotes full inclusion of excellence across the social spectrum (Blackburn, 2017;Nielsen et al., 2017). Among the plethora of topics regarding increasing diversity in STEM fields (Blackburn, 2017;Wellenreuther & Otto, 2015), here we focus on overcoming the biases against women in computer sciences and the persistence of unconscious gender stereotypes that influence both male and female researchers.
Gender biases in computer science training may limit the effectiveness of efforts to attract and retain the best and most diverse workforce in conservation genomics. As of 2014, just 18.1% of computer science bachelor's degrees were awarded to women, and this proportion has declined by 10% over the last 10 years, further widening the gender gap (NCSES, 2016). This deficit in female computer scientists has been attributed to a lower sense of belonging by women than men due to a predominately male culture in the field (Cheryan, Ziegler, Montoya, & Jiang, 2017). There is also evidence of gender gaps in self-efficacy that may be due to a lack of sufficient early education in computer programming (Cheryan et al., 2017).
Although not reported, these issues likely persist in bioinformatics and genomics. Efforts to maximize gender inclusion in computer science may benefit from changing masculine cultures in technological fields and providing early experiences for all students that signal a sense of belonging and ability to succeed in these fields. Efforts led by women, such as "Girls Who Code" (https://girlswhocode. com/) and "Learn to Code with Me" (https://learntocodewith.me/ posts/13-places-women-learn-code/), aim to decrease the gender gap by targeting coding courses and workshops to girls and women.
Likewise, short courses such as ConGen, which teach basics in linux, bash, and R scripting, act to support an inclusive community and address limitations due to gendered perceptions in the genomics era.
Unconscious stereotypes persist in the minds of male and female researchers, as evident in the studies of reference letters for postdoctoral fellowships and other academic positions (Dutt, Pfaff, Bernstein, Dillard, & Block, 2016;Madera, Hebl, & Martin, 2009;Trix & Psenka, 2003). One study of recommendation letters for medical faculty positions found that letters written on behalf of females differed from those written on behalf of men in length, negative language, and gender-linked terms. Overall, the study found that the letters, regardless of the gender of the recommender, reinforced stereotypes that portray men as researchers and professionals and women as teachers and students (Trix & Psenka, 2003). Another study found that men, more than women, were described as having agentic leadership traits, such as being in control of subordinates, speaking assertively, working independently and competitively, and initiating tasks (Madera et al., 2009). Furthermore, women were described as having more communal characteristics, which had a negative association for women with employment decisions (Madera et al., 2009). Letters of recommendation have been shown to greatly affect hireability ratings of applicants (Madera et al., 2009). On the level of personal action, we suggest recommenders edit their own letters to avoid gender bias (http://www.csw.arizona.edu/LORbias).
Despite similar proportions of women and men awarded doctoral degrees in science and engineering disciplines, women are less likely to obtain tenure-track positions in academia than their male counterparts. Although there are many reasons for this "leaky pipeline" (Gasser & Shaffer, 2014;Goulden, Mason, & Frasch, 2011;Holmes, OConnell, & Dutt, 2015), increasing training and avoiding biases in reference letters may benefit not only women, but also the greater scientific community by promoting innovation through diversity and inclusion. Further, there are many topics such as referee opportunity bias (Lerback & Hanson, 2017), the childcare-conference conundrum (Calisi & A Working Group of Mothers in Science, 2018), and misconceptions around hiring preferences (Williams & Ceci, 2015) that should also be addressed to reduce disadvantages to women. With the brief mention of this topic, we hope to stimulate future studies of gatekeeping practices in the field of conservation, so institutions can develop initiatives to recruit, retain, and advance women in STEM fields as mentorship will be essential for eliminating gender bias in computer science, bioinformatics, and by extension, conservation biology. We ask our readers to initiate discussions regarding the persistence of stereotypes and how these stereotypes affect excellence across our community. We wonder: Can the active and intentional cultivation of inclusivity help to expand the role of genomics in molecular ecology, population genomics, and nature conservation?

| G ENOT YPING ERROR AND IMPROVING DATA Q UA LIT Y
On a more technical level, several authors discussed ways to assess and prevent genotyping errors and improve data quality. We discuss several of these here.  Prince et al., 2017;Puritz, Gold, & Portnoy, 2016;Ravinet et al., 2016;Swaegers et al., 2015). While a few had low genotyping error rates (<5%), in others, allelic dropout, low read depth, PCR duplicates, erroneous assembly, and/or poor filtering resulted in much higher estimated error rates, with between 5% and 72% of heterozygotes apparently being miscalled as homozygotes. Although some of these apparent high error rates could reflect true heterozygote deficiencies due to the Wahlund effect or other factors, in all cases the samples were thought to be from a single population. Hence, this provides a cautionary note that it is good practice to visualize your data to ascertain if more homozygotes are called than expected under Hardy-Weinberg equilibrium.

| Probabilistic genotype calling
Probabilistic genotype calling, as conducted by the software program ANGSD (Korneliussen, Albrechtsen, & Nielsen, 2014), is a principled method for dealing with low-coverage sequencing data; however, it should be applied carefully. With low-coverage sequencing, because there is so little information at any individual site, the statistical model and the prior distributions are relatively more influential than they are with high-read-depth data. A good example can be seen in a recent paper by Prince et al. (2017) which features lower-depth sampling than many other contemporary RAD-seq studies. In analyses of their RAD-seq data, Prince et al. used ANGSD to integrate over the genotype uncertainty rather than directly calling genotypes.
Even more importantly, when they were able to, they were careful to use population-specific allele frequency-based genotype priors for their analyses rather than a simple uniform prior distribution on genotypes. The choice of prior is important: If one uses ANGSD to call genotypes from the Prince et al. data using the uniform prior on genotypes, the result shows a strong tendency to incorrectly infer heterozygotes as homozygotes ( Figure 1c). This is not simply a consequence of forcing ANGSD to call genotypes. Rather, the posterior probabilities, themselves, of the genotypes carry extra weight on the homozygote classes, because the uniform prior does not use allele frequency information to help infer the genotypes.
In an increasing manner, recent publications have suggested that probabilistic genotyping obviates the need for high mean depth of coverage (>10 to 20×). For example, Prince et al. (2017) found that PCA analysis applied to their full dataset yielded a first principal component driven largely by variation in read depth (M. Miller, personal communication, February 7, 2018). Randomly subsampling reads from each individual to the same depth eliminated that technical variation, and, though it led them to discard almost 70% of their sequencing reads, with probabilistic genotyping they were still able to recover meaningful population structure. To evaluate F I G U R E 1 Observed (y-axis) versus expected (x-axis) homozygote frequencies at SNPs in three RAD studies of Chinook salmon. The solid black line is at y = x, and the dotted lines show the maximum and minimum possible observed values given the expected values computed from the observed allele frequencies. n is number of individuals, L is number of SNPs, and HMR is the heterozygote miscall rate estimated from the dataset. (a) Korukluk River, Western Alaska (Larson et al., 2014): a carefully filtered dataset showing almost no distortions from HWE and with a low estimated HMR of 0.02. (b) Johnson Creek (Hecht et al., 2015): Most of the points lie above the y = x line and HMR is estimated to be 0.17. (c) Low-read-depth data from mature-migrating Umpqua river Chinook (Prince et al., 2017). Genotypes were called using ANGSD's doGeno option assuming a uniform prior on genotypes. Profound homozygote excesses are observed with HMR = 0.52 how effectively probabilistic genotype calling can retrieve the same inference with ever-smaller amounts of sequencing, Anderson presented an analysis using subsampled versions of a high-depth RAD dataset. He first performed PCA using SNPRelate (Zheng et al., 2012) to resolve population structure of a North American songbird using SNPs called from high-quality, high-read-depth RAD data using a GATK pipeline (mean read depth at 105,000 SNPs across 175 individuals was 36). He then used ANGSD and ngsCovar (Fumagalli, Vieira, Linderoth, & Nielsen, 2014) a probabilistic genotyping approach to PCA, on the BAM files for the same 175 birds after subsampling so that the mean read depth at each of those 105,000 loci was expected to be 0.65, 1, 2, 5, and 10. ANGSD was not restricted to using only the previously discovered 105,000 SNPs, and, in fact called between 29,331 SNPs at 0.65× and 898,320 SNPs at 10×. Figure 2 shows that clusters in the first two principal components from SNPRelate on the high-read-depth data resolve subspecies and show structure within subspecies that corresponds to state of origin.
Remarkably, at 0.65×, ngsCovar identifies roughly similar groupings, albeit with looser clustering. However, at all other read depths, ngs-Covar identifies clusters that are clearly inconsistent with subspecies designations and become dominated by Lissajous curves (Novembre & Stephens, 2008).
Overall, the results suggest that some probabilistic methods developed for low-coverage data might behave unpredictably when provided with high-quality, high-read-depth RAD data. However, new methods based on probabilistic genotyping are continually emerging. For example, the ANGSD methods PCAngsd and PCA_ MDS are both reported to outperform ngsCovar with variable sequencing depth (see http://www.popgen.dk/angsd/index.php/ PCA). Probabilistic inference from next-generation sequencing data is an important advance; however, one should not assume that it will automatically overcome shortcomings in sequence data caused by unsatisfactory sample quality, poor library preparation, or insufficient sequencing. As with many approaches for next-generation sequencing, user-specified settings of models, priors, and filtering can have strong effects on the results.

| Relatedness
Many researchers have concluded that it is important to remove putative siblings from population genetics datasets before conducting F I G U R E 2 Plots of the first two principal components from PCA of unpublished RAD data showing population structure among four subspecies of a North American passerine. Each point is an individual bird. Top left panel shows the result obtained in the original study, which used 105,000 SNPs called with an average read depth of 36× across 175 birds analyzed with SNPRelate (Zheng et al., 2012). Remaining panels show results obtained by subsampling the original dataset to depths of 0.65×, 1×, 2×, 5×, and 10×, and analysis with ANGSD (Korneliussen et al., 2014) and ngsCovar (Fumagalli et al., 2014). Subspecific structure in the 0.65× data is much less distinct than in the full dataset, but is generally concordant with it. However, at higher read depths the clustering is clearly inconsistent with subspecies affiliation downstream analyses (Corlett, 2017;Johnson et al., 2016), but there are several good reasons why this can create more problems than it solves (Waples & Anderson, 2017). First, siblings occur naturally in all natural populations, at frequencies that are inversely related to effective population size; therefore, removing siblings erases signals characteristic of small populations and makes the populations appear to be larger. Second, removing individuals reduces sample size and decreases statistical power, perhaps greatly, so any benefits must be large to offset this cost. Third, methods for sibling inference are not infallible, so it is important to consider the consequences of imperfect pedigree reconstruction. At last, sibling removal cannot be used to convert a nonrandom sample into a random sample, unless one has independent information about the degree to which the proportion of siblings in the sample exceeds the random expectation.
An alternative to removing individuals is to use a best linear unbiased estimator approach (BLUE; McPeek, Wu, & Ober, 2004), which gives each individual a weight that reflects its degree of relatedness to others in the sample. As shown by Waples and Anderson (2017), however, performance of the BLUE also depends on having accurate pedigree information. When sample identification is not reliable, the use of the full dataset outperforms BLUE. Because of these potential adverse effects, researchers should be cautious about adjusting their datasets for putative siblings unless they have a good reason to believe that doing so will not actually make things worse.

| Effects of filtering on downstream analyses (Paul Hohenlohe and Tiago Antao)
Methods for producing reduced representation libraries, such as RAD-seq, are rapidly evolving, and more than 15 methods exist with variations in data quality, genotyping errors, cost, and the number of loci discovered (reviewed in Andrews, Good, Miller, Luikart, & Hohenlohe, 2016). Furthermore, filtering choices (see figure 2 in Benestan et al., 2016) can greatly influence downstream summary statistics. A recent study testing the impact of data processing on population genetic inferences using RAD-seq data observed large differences between reference-based and de novo approaches in population genetic summary statistics, particularly those based on the site frequency spectrum (Shafer et al., 2016). In addition, the recent debate over the effectiveness of RAD-seq for discov-

| Retaining haplotypes in amplicon and RAD datasets (Eric Anderson)
Common approaches for dealing with multiple SNPs across an amplicon or RAD locus can result in low power or incorrect inference in subsequent analyses. When multiple SNPs are detected, these SNPs are handled as either unlinked (likely untrue) or only one of the SNPs is used in downstream analyses. However, retaining each haplotypic combination as an allele can increase power for relationship inference and pedigree reconstruction (Baetscher, Clemento, Ng, Anderson, & Garza, 2017). Further, haplotype calling allows for the retention of low-frequency variants, which may be useful for population structure assessment in recently diverged populations. Rare alleles (or haplotypes) reveal recombination events that generated alternative sequences of ancestry and thereby identify fine-scale structure that would be missed when using independent marker approaches (Lawson, Hellenthal, Myers, & Falush, 2012).
The software microhaplot (https://github.com/ngthomas/microhaplot) takes a variant file and designates nucleotides that occur together on the same read as "microhaplotypes" and allows for the visualization, filtering, and exporting of the data. The Stacks software package (Catchen et al., 2013) can also export multi-SNP haplotypes from RAD-seq data. Unlike single SNP assays, the microhaplotype data collection method uses assays designed with multi-allelic loci and can yield useful data for nontarget species phylogenies and for genealogical inference (Sunnucks, 2000).

| Draft genomes to improve data analyses (Ben Koop)
Some molecular biologists have claimed that we are in the postgenomic era (Wu, 2001); however, only a very small proportion of reference genomes are assembled to the chromosomal level.
Despite this, having even a draft genome (in 1000s of scaffolds) can help improve data analyses in many ways including the follow- There are a growing number of approaches for genome assembly using "single molecule real-time" sequencing (SMRT-seq) or "syn-

| Experimental design: which method to choose (Paul Hohenlohe)
The  Lowry et al., 2016;McKinney et al., 2017). The primary criticism raised by Lowry et al. (2016) is that RAD loci, depending on the choice of restriction enzyme(s) and the specific protocol used (Andrews et al., 2016), may be sparsely distributed across the genome, so that selected loci may lie some distance away from the nearest genotyped RAD marker. By definition, all reduced representation approaches face this issue, although RADseq approaches are more limited than other techniques (such as sequence capture) in their ability to specifically target previously identified candidate loci. In a RADseq study (and most other marker-based population genomic studies), the key factor is linkage disequilibrium (LD), which determines the extent to which genotypes at a genetic marker are correlated with those of a functionally important locus, and therefore, the signal of selection that can be detected from marker data.
If the scale of LD is larger than the distance between markers, a RAD-seq study has a high probability of identifying functionally important loci across the genome. The extent of LD can be directly estimated if a reference genome is available , and it is recommended that LD should be estimated whenever possible in population genomic studies. Moreover, many conservation and population genomic questions can be answered without exhaustive sampling of the genome or detection of all functionally important loci, and alternative techniques such as WGS may impose substantial costs and other trade-offs . In particular, increasing the density of markers may necessitate reducing the number of individuals or populations sampled, and choosing methods that target candidate loci can bias against detecting selection at previously unknown loci. Overall, there is no universally applicable genomic method, and the biological question and details of the study system should drive the choice of technique.
Given that many studies have used only shallow pedigrees or few DNA markers, it is possible that power to detect inbreeding depression has been low; therefore, inbreeding depression could be more common, widespread, and severe than previously thought.
Analyses of ROH can also be used to understand the genetic basis of inbreeding depression. Candidate regions for loci contributing to inbreeding depression can be identified as chromosome segments containing fewer ROH in a sample of individuals than expected by chance (Kardos et al., 2018;Pemberton et al., 2012). Homozygosity mapping (Charlier et al., 2008) and association analyses based on the correlation of phenotype with the presence/absence of ROH in particular genome regions (Keller et al., 2012;Pryce, Haile-Mariam, Goddard, & Hayes, 2014) can be used to identify loci affecting inbreeding depression. Genomic approaches have the potential to greatly advance our understanding of the strength and genetic basis of inbreeding depression in natural populations.
Analyses of identity-by-descent (IBD) can also be used to infer historical effective population size (N e ). Differences in historical N e among populations can be qualitatively inferred by analyzing the abundance of ROH. The abundance of very short ROH is informative of N e in distant history, while long ROH is informative of more recent N e (Kardos, Qvarnström, & Ellegren, 2017;Kirin et al., 2010;Pemberton et al., 2012). A limitation of this approach is that it is only qualitative and requires data on multiple populations to be informative.
A particularly exciting new approach for studies of recent demographic history in natural populations is to explicitly estimate a time series of recent N e using inference of IBD. The program IBDSeq (Browning & Browning, 2013) searches the genomes of all pairs of individuals to identify chromosome segments of shared ancestry between individuals. The program IBDNe (Browning & Browning, 2015) then uses the inferred pairwise IBD segments to find the most likely recent time series of N e given the IBD data. A limitation of this approach for most natural populations is that it requires a minimum of approximately 100 individuals and the genetic mapping locations (i.e., on a linkage map) of at least several hundred thousand SNPs (Browning & Browning, 2015). However, the approach has great potential to infer recent demographic history (i.e., to test for and quantify recent population bottlenecks and expansions) in natural populations where it would be difficult or impossible to evaluate recent N e otherwise (Kardos et al., 2017).

| Genomewide association studies (Marty Kardos)
Genomewide association studies (GWAS) have recently identified loci with large effects on several ecologically important phenotypic traits. For example, single loci have explained a large fraction of the variance in age of maturation in Atlantic salmon (Barson et al., 2015) and horn development in free-ranging Soay sheep (Johnston et al., 2011(Johnston et al., , 2013. In an intelligible manner, some traits are governed largely by variation at individual loci, but these are likely rare among all traits of interest to evolutionary biologists. Many adaptive traits are likely driven by a large number of loci with small effect sizes, low minor allele frequency, and/or epistatic interactions (Visscher et al., 2017). GWAS of complex traits will therefore often fail to identify enough genotype-phenotype associations to explain a useful frac- By good fortune, GWAS failing to explain a large fraction of the heritability in loci with statistically significant genotype-phenotype associations are still highly useful. It is arguably more important in ecological and conservation genetics to understand the heritability of a trait than to identify some of the loci responsible for heritable variation in the trait, as it is the heritability of a trait that determines the magnitude of the expected response to selection. The additive genetic variance and heritability can readily be estimated using linear mixed effects models (Rönnegård et al., 2016;Santure et al., 2013;Yang, Lee, Goddard, & Visscher, 2011) in GWAS, even in cases where no individual loci pass the stringent thresholds of statistical significance. In addition, heritability can be partitioned among chromosomes to determine whether the trait of interest is likely to be polygenic (i.e., affected by a very large number of loci), in which case chromosome-specific heritability is expected to increase with the number of genes on a chromosome (Santure et al., 2013).
Participants at ConGen used the R package, RepeatABEL (Rönnegård et al., 2016), to test for loci associated with clutch size using previously published data from a long-term study of collared flycatchers (Ficedula albicollis; Husby et al., 2015). This helped to familiarize students with data structures, available software, and interpretation of results from GWAS. In addition, analyzing the collared flycatcher data allowed students to consider the importance of accounting for repeated phenotypic measurements when conducting a GWAS. Students were encouraged to critically evaluate effect size estimates from GWAS in light of the Beavis effect (Beavis, 1998), and the "winner's curse" (Kraft, 2008), which state that the effect sizes of loci passing a stringent statistical significance thresholds in QTL mapping or GWAS analyses are often upwardly biased, particularly in studies with low statistical power.

| Landscape genomics (Brenna Forester)
Landscape genomics is an emerging analytical framework that investigates how environmental and spatial processes structure the amount and distribution of neutral and adaptive genetic variation among populations (Balkenhol et al., 2017). Landscape genomics is sometimes conflated with genotype-environment association (GEA) analysis, which includes a wide variety of statistical approaches for identifying candidate adaptive loci that covary with environmental predictors (Rellstab, Gugerli, Eckert, Hancock, & Holderegger, 2015). However, landscape genomics includes many other techniques for identifying and analyzing spatially structured, selection-driven variation, including GWAS across multiple environments, simulation studies, experimental approaches such as environmentally stratified common gardens, epigenetic and transcriptomic studies, and innovative approaches that combine analytical techniques (Berg & Coop, 2014;Lasky, Forester, & Reimherr, 2018;Storfer, Antolin, Manel, Epperson, & Scribner, 2015).
Most importantly, landscape genomics is not just the application of these statistical techniques to identify candidate adaptive variation, but is an approach with a developing theoretical framework linking genomic variation, spatial complexity, environmental heterogeneity, and evolutionary processes (Balkenhol, Cushman, Waits, & Storfer, 2015). The wide range of ecological and evolutionary questions and management issues that can be addressed through this framework was highlighted with recent published examples (Brauer, Hammer, & Beheregaray, 2016;Creech et al., 2017;Lasky et al., 2015;Manthey & Moyle, 2015;Razgour et al., 2017;Swaegers et al., 2015).
With this introduction to landscape genomics, ConGen participants worked on applications of GEA analysis, currently the most widely used landscape genomic technique (Balkenhol et al., 2017).  , providing a powerful tool for investigating the genetic basis of local adaptation and informing management actions to conserve evolutionary potential (Flanagan et al., 2017;Harrisson et al., 2014;Hoffmann et al., 2015). Finally, participants were encouraged to move beyond simply documenting candidate adaptive loci in their datasets, and instead focus on the ecological, evolutionary, and management-relevant questions that can be addressed by more fully integrating a landscape genomic analytical framework.

| Ancestral demography with migration (Arun Sethuraman)
Estimation of ancestral demography, particularly under an Isolation with Migration (IM) model (Nielsen, 2001), is useful for many molec-  Pritchard et al., 2000) can prove to be useful means to bridge genomics and conservation in particular.

| B ROAD RECOMMENDATI ON S AND CON CLUS IONS
Common advice among instructors was to gain extensive experience in computer programming. Students were encouraged to seek out online resources and to work in interdisciplinary teams, where through mentorship and close collaboration they can learn the basics in an applied setting. A key theme was the importance of continuing to develop and teach programming at all levels (e.g., elementary through graduate), with a specific focus on better integrating bioinformatics instruction into undergraduate life sciences education.
The advent of "big data" presents a critical challenge in the fields of population and conservation genomics. Interdisciplinary collaboration is a key as it becomes more difficult for researchers to be experts in both data production (e.g., field work, biological sampling) and bioinformatics or mathematical modeling. Koop acknowledges that he fills his team with bioinformaticians as well as biologists, but "when you find the rare individual who understands both the population genomics and the bioinformatics, you do everything you can to hold onto them." Furthermore, the "Ten Simple Rules for a Successful Cross Disciplinary Collaboration" by Knapp et al. (2015) is a useful resource for gaining skills for a successful, synergistic collaboration.
In conclusion, the genomic era presents both new data analysis challenges and opportunities to visualize, understand, and apply population genomic data to conservation in novel ways. Here, we emphasize producing and visualizing erroneous datasets, possible effects of filtering on downstream analyses, and how to improve downstream computational analyses to prevent drawing erroneous conclusions. The experts at ConGen instructed students to understand and use reliable biological models and to develop clear questions and hypotheses rooted in evolutionary and ecological theory.
In summary, ConGen and this article present problems and solutions with the goal of improving the use of genomics in the fields of population genomics, molecular ecology, and conservation biology.

ACK N OWLED G EM ENTS
We thank the additional ConGen instructors and organizers: Michael Studies). PAH received support from NSF grant DEB-1655809.

CO N FLI C T O F I NTE R E S T
None declared.