A dynamic web resource for robust and reproducible genomics in nonmodel species: marineomics.io

Genomic methods are becoming increasingly valuable and established in ecological research, particularly in nonmodel species. Supporting their progress and adoption requires investment in resources that promote (i) reproducibility of genomic analyses, (ii) accessibility of learning tools and (iii) keeping pace with rapidly developing methods and principles. We introduce marineomics.io, an open‐source, living document to disseminate tutorials, reproducibility tools and best principles for ecological genomic research in marine and nonmodel systems. The website's existing content spans population and functional genomics, including current recommendations for whole‐genome sequencing, RAD‐seq, Pool‐seq and RNA‐seq. With the goal to facilitate the development of new, similar resources, we describe our process for aggregating and synthesizing methodological principles from the ecological genomics community to inform website content. We also detail steps for authorship and submission of new website content, as well as protocols for providing feedback and topic requests from the community. These web resources were constructed with guidance for doing rigorous, reproducible science. Collaboration and contributions to the website are encouraged from scientists of all skill sets and levels of expertise.

and RNA-seq.With the goal to facilitate the development of new, similar resources, we describe our process for aggregating and synthesizing methodological principles from the ecological genomics community to inform website content.
We also detail steps for authorship and submission of new website content, as well as protocols for providing feedback and topic requests from the community.4.These web resources were constructed with guidance for doing rigorous, reproducible science.Collaboration and contributions to the website are encouraged from scientists of all skill sets and levels of expertise.

| G OAL S & MOTIVATI ON
Genomic approaches have arisen as a cornerstone of research in ecology and evolution, and the marine sciences are no exception (Ekblom & Galindo, 2010;Ellegren, 2014;Fonseca et al., 2016).Population and functional genomic methods, particularly among nonmodel organisms, are driving advancements in our understanding of population dynamics, physiology and the evolution of populations and species (Matz, 2018).As sequencing technologies and bioinformatic analyses progress at an astonishing rate, it is essential to evaluate, update and guide their use in a manner that realizes the unique contexts and obstacles within the systems these tools are applied.Indeed, the speed of method and analytical development at times outpaces unification with the underlying theories of evolution and available learning resources (Allendorf, 2017;Gain et al., 2023;Waples, 2015).
Simultaneously, as technology and analysis tools become more accessible to the larger scientific community, disciplines that traditionally would not have included genomics are now integrating these methods in new and exciting ways (Christiansen et al., 2021;Younger et al., 2017).In short, there is not only a need to keep the field of ecological genomics updated on best methodological principles, but an opportunity to welcome an emerging cohort of students and scientists that could benefit from community and educational resources.
In 2020, the MarineOmics working group was formed under the NSF-funded Research Coordination Network for Evolution in Changing Seas (RCN-ECS), with the mission to promote reproducible and robust genomics research in nonmodel species and marine science.
Here, we use the term genomics to represent various methodologies (e.g.genomics, transcriptomics and proteomics) that are used to investigate genomic variation, structure or regulation.The main activities of the working group have been the formulation of recommended practices for statistical and experimental soundness, as well as methodological transparency.These recommendations are applicable to a wide range of nonmodel taxa, although example data sets are often chosen from marine systems in order to raise the profile of marine evolution research.In some cases, particular attention is paid towards the unique features and obstacles of genomic research in marine systems; these include the difficulty of sequencing larvae and plankton that are too small to be sequenced individually (Oleksiak & Rajora, 2020), challenges in genome assembly and annotation due to highly repetitive or diverse genomes (Vinson et al., 2005) and impediments to obtaining relevant environmental data for studying adaptation (Costello et al., 2010).Furthermore, training in the ocean and environmental sciences often emphasizes ecology in undergraduate programmes instead of evolutionary thinking, highlighting the need for more targeted evolution and bioinformatics training for marine scientists.Improving the accessibility and standardization of recommendations in marine genomics research has the additional benefit of catalysing the synthesis and discovery of how marine populations will evolve in response to global change (Munday et al., 2013).
The primary method for disseminating MarineOmics recommendations is an online open-sourced website, freely available at marin eomics.io,referred to as the MarineOmics Website or MarineOmics from here forward.This website hosts a growing and evolving collection of guidelines, tutorials, recommended readings, panel discussions and method evaluations for various analyses used in ecological genomics studies of nonmodel systems (particularly marine species), paying particular attention to topics for which openly available guidelines are scarce.Instead of providing rigid 'best practices', the website promotes 'best principles', defined as a guiding set of values and goals that can be tailored to one's hypotheses and guided by one's data, as opposed to a specific set of instructions.'Best principles' are designed to encourage data exploration and critical thinking during analysis and evaluation instead of ticking off boxes on a protocol or step-list (Box 1).This web resource is inspired by other community-maintained web resources for evolutionary biology, such as popgen.nescent.org that specializes in R vignettes (Kamvar et al., 2017), the Galaxy Training Network (https://train ing.galax yproj ect.org/) and genomics modules on The Carpentries (https:// carpe ntries.org/).The MarineOmics site differs in that it is not coding language-specific and is focussed on best principles in addition to method tutorials.Built with GitHub Pages, the MarineOmics Website is an open-source platform utilizing HTML-rendered Rmarkdown notebooks to streamline community collaboration and contributions with version control.Our process going forward is to improve existing resources and build new ones with a diverse array of participants, outlined in the section Vision for the Future and the right hand panel of Figure 2.
Here, we outline the status of the MarineOmics Website at the time of publication.We first provide a description of the current resources on the website, which include a guide to different sequencing approaches, pipelines and tutorials for analysing sequencing data, and a collection of recorded seminars on MarineOmics approaches.Next, we present our synthesis methods and process for creating, reviewing and publishing content on the MarineOmics Website.Finally, we share our vision for the future to further improve the website, seek additional contributions and maintain the website as a living resource.

| DE SCRIP TI ON OF CURRENT RE SOURCE S
The backbone underlying the content on the website comes from a general set of 'best principles' that should be applied in genomics The current topics on the website are listed in Table 1 and were developed based on the expertise of the original contributors, the surveyed interests of the RCN-ECS and topics that contributors saw a particular need for based on existing resources.The website currently organizes the tutorials into two major approaches: Population Genomics and Functional Genomics.The Population Genomics section focusses on high-throughput DNA sequencing approaches designed to identify genomic patterns across populations and/ or treatment groups.The Functional Genomics section covers approaches investigating the regulation and expression of genomes, such as RNA-seq.

| Population genomics
The appropriate population genomics approach for sequencing samples depends on the study questions and genomic resources available.On the website, a summary is provided on the pros and cons of different approaches to help readers choose the approach that works best for their study.If readers are interested in highthroughput genome sequencing, the website provides advice on how to choose between Whole Genome Resequencing (high vs. low coverage), Reduced Representation Sequencing (e.g.RAD-seq) and Pooled Sequencing.There are multiple tutorials for high-and lowcoverage whole-genome analysis, with detailed bioinformatic pipelines that take raw Illumina sequencing reads as input and generate called genotypes or genotype likelihoods (e.g. in VCF format).For Reduced Representation Sequencing, there is a comprehensive tutorial that can be used for data from multiple different types of restriction digest methods (e.g.Restriction-site-Associated DNA sequencing

BOX 1 Best principles in genomics research
Rigour 1. Understand the characteristics of your chosen sequencing approach.Take these characteristics into account when designing a study and during data analysis.
Goals of study should be chosen before choosing the best sequencing approach, which will inform the total number of samples and sequencing coverage needed: For example, PoolSeq requires larger sample sizes and deeper coverage given the lack of individual genotyping (Guirao-Rico & González, 2021).
2. Plot your data early and often.Get to know it in both its raw and processed forms.
Deepen the interpretation of results and flag sources of error throughout a workflow by plotting data such as (i) read-quality metrics pre-and postfiltering, (ii) sequence coverage across a reference and across samples, (iii) principal component analysis of replicates pre-and postfiltering and (iv) results and predictions of statistical tests.
3. All models and pipelines introduce some type and magnitude of error.Compare models' nuances to find the best approach given your data.
This issue is particularly acute in nonmodel species.Some quantitative approaches towards evaluating methods include (i) comparing the performance of different methods or parameter choices using simulated data (Lotterhos et al., 2022), (ii) measuring their predictive strengths using model selection statistics (Hooten & Hobbs, 2015;Johnson & Omland, 2004) and (iii) observedpredicted plots from model outputs.A basic understanding of the sensitivity of inference in different analyses will be helpful for determining how robust the results are to nuanced decisions, especially for nonmodel organisms or unique experimental designs.Reproducibility 4. Wherever your sequencing data go, their associated metadata go with them.
Any and all metadata that can be reported should accompany sequence data in databases such as NCBI or SRA.Data on Dryad or GitHub should crosslink to NCBI/SRA.5. Take detailed records on all analysis decisions you make, including for preliminary analyses and errors that occurred, so you remember what you did and can reproduce your own work.
Use text-annotated code notebooks for bioinformatic analyses (e.g.Rmarkdown and Jupyter).
6. Provide a reproducible text-annotated code notebook for all final analyses, containing computing environment information (i.e.software versions) so that these methods could be reproduced by someone else.
Provide these notebooks (in Rmarkdown or Jupyter) in a publically accessible format on services such as GitHub, GitLab, Dryad, Figshare or Zenodo.
[RAD-seq] Genotype-by-Sequencing and ddRAD).These RAD-seq tutorials provide detailed considerations during library preparation in the laboratory, bioinformatic processing through assembly and/ or mapping to a reference and then filtering of SNPs and individuals for quality.We provide a screenshot of the RAD-seq pipeline as an example of the structure for the tutorial (Figure 1).Finally, the PoolSeq tutorial includes considerations in the experimental design phase to ensure rigorous interpretation of downstream analyses, bioinformatic processing through assembly, mapping to a reference genome and calculating allele frequencies for each pool.
Once sequencing data are in hand and processed into the desired format, the appropriate bioinformatics approach for population genomic inference depends on the type of sequencing data as well as the study questions.For example, with PoolSeq data that give allele frequencies for a pool of samples, a Cochran-Mantel-Haenszel test can be used to identify consistent allele frequency changes across replicates of pooled individuals.For sequencing approaches that give SNP genotypes, the Population Genomics section includes a tutorial for Redundancy Analysis Trait Prediction (Lotterhos, 2023).This tutorial shows how to apply a novel extension of redundancy analysis to predict individual multivariate traits from genotype and environmental data.The approach is useful for understanding adaptation to multivariate environments when the genetic basis of adaptation is not accurately known.For lowcoverage whole-genome sequencing, well-annotated scripts for some fundamental population genomic analyses (e.g.PCA, admixture analysis and F ST ) based on genotype likelihoods are provided.

| Functional genomics
The functional genomics portion of the website currently covers differential expression (DE) analysis for RNA sequencing (RNA-seq).This page compares the assumptions and outputs of popular DE packages.providing examples of quality checks on read count data and determining whether they meet packages' statistical assumptions.

| Panel seminars
To facilitate discussions about practices and pitfalls in applying genomics approaches to nonmodel systems, a Question & Answer seminar was hosted in May-June 2021 with experts on topics ranging from RAD-seq, population genomics with WGS and transcriptomics.
For each seminar, the panellists included 3-5 experts from their field who answered questions from the working group and audience in a round-table style.The discussion topics ranged from best principles (e.g.advice for filtering raw and processed data) to the perceived future utility of different sequencing approaches.The recordings can be found on the MarineOmics Website and on YouTube.

| Discussion forum
To facilitate community engagement on existing and proposed website content, a Discussion Forum is hosted on GitHub Discussions.
This forum allows for anyone with a GitHub account to post ideas for new content, suggest ways to extend existing content or provide additional feedback and constructive criticism on the website.There is also a discussion topic category to request and provide projectspecific advice, similar to www.Biost ars.org.

| PRO CE SS FOR DE VELOPING CONTENT
While conceiving and creating content for the MarineOmics Website, we identified and prioritized topics lacking in resources such as comprehensive tutorials, method comparisons and perspectives.
We used an open science approach that promoted accessibility, reproducibility and collaboration.This process consisted of four steps: (i) seeking recommendations for priority areas from the ecological genomics community, (ii) aggregating practices used by the community related to these priority areas, (iii) synthesizing these practices as 'best principles' that guided tutorials and methods published on the website and (iv) soliciting feedback, edits and contributions (Figure 2).
This approach enabled us to create a living resource that both aids and keeps pace with the rapid rate of progress in ecological genomics.

| Aggregating methods and principles
MarineOmics contributors sought advice on genomics topics of general interest from the ecological genomics community Poll respondents helped direct the focus of the online panel series related to ecological genomics in nonmodel species described above in Description of Current Resources.While each panel was chiefly intended to be a singular resource for dispensing and discussing best practices, they also provided an opportunity to aggregate practices and ideas from experts, which informed the best principles described on MarineOmics.During panels, experts were prompted with prepared questions sourced from website contributors and survey feedback in addition to questions from audience members.

| Synthesis and content creation
The decision of what initial content to develop on MarineOmics was determined by contributor expertise and the results of the survey.Each of the website's pages incorporates best principles, often with accompanying bioinformatics tutorials, which aim to serve as a framework for readers to make independent, robust choices about their own data or methods.Each best principle is the result of synthesis across recommendations from the literature, website contributors and expert panellists.Syntheses identified practices for which there is either some community consensus (or lack thereof), as well as emergent, novel recommendations.Below, we provide two examples of our content creation and synthesis processes that yielded best principles representing consensus or emergent recommendations (Figure 2).
A critical decision when designing a population genomics experiment is the choice of sequencing method, that is whether to use reduced representation or whole-genome sequencing (Matz, 2018).
A clear desire for more guidance in this area was supported by our survey respondents, with 73.1% ranking 'Guidelines for choosing [population genetic] data types…' over 'Guidelines for best practices in [population genetic] analyses…'.Therefore, the expert panel discussions included some questions focussed on the benefits and downsides of different data types.For example, the RAD-seq panellists discussed how RAD-seq provides robust estimates of some population-level parameters but often generates erroneous predictions for genotype-level parameters.These strengths and weaknesses of RAD-seq are explicitly addressed in MarineOmics' 'Choosing a Population Genetics Approach' perspective and 'RADseq' tutorial (Figure 1).
As a second example of our approach for determining best principles, an emergent recommendation was reached after synthesizing guidance for modelling gene expression data.As RNA-seq costs cheapen, studies frequently employ multiple levels of a continuous environmental variable as opposed to contrast between two groups.
Continuous predictors enable the fitting of non-linear effects that can be biologically important (Rivera et al., 2021).Popular differential expression packages can accommodate non-linear effects, but RNA-seq panellists and MarineOmics contributors agreed that most make assumptions about expression data that pose challenges to interpreting their reported effects.Indeed, we found that packages often incorrectly inferred significant non-linear effects for transcripts whose expression changed linearly across an environment (a type of false positive).To overcome this issue, we developed a custom R script to compare the probability of linear versus non-linear effects with likelihood ratio tests during differential expression analysis.
The technologies and associated analyses available to evolutionary genomic research will continue to change, but what will not change is the importance of community collaboration to build tools that guide the best principles for doing research.The MarineOmics framework is built accordingly, positioning it to remain a contemporary repository of resources for genomic studies in marine and nonmodel systems into the future.To this end, pages will be continuously updated with new versions or frameworks for genomic methods.If a page has become dated and unrepresented of contemporary approaches, it will be removed from the website's main menu and preserved on the site's archive.

| VIS ION FOR FUTURE
The motivation, current materials and process for building the website was developed with the broader conservation, marine and nonmodel genomics community in mind.The website will act as an evolving repository for reproducible genomic analysis, with content based on community contributions and incorporation of expertise within the field.For those interested in editing or authoring content on MarineOmics, we describe our resources supporting opensource contributions and our perspectives on how we anticipate the website evolving.

| Seeking contributions
As genomics continues to improve our understanding of nonmodel marine systems, the field will inevitably move to address new and complex questions and applications (van Oppen & Coleman, 2022).
As such, the site will benefit from contributors of various backgrounds, expertise and experience.To date, the tutorials, guidelines and discussions on the MarineOmics Website have been authored and led by graduate students and postdoctoral researchers with contributions from scientists at all levels, including undergraduates and faculty members.Building on this foundation, we are actively seeking contributors to aid in the development of new material, provide intellectual contributions through the Discussion Forum, review newly created content and participate in webinars.The hope is that through continued contributions from a diverse range of perspectives and backgrounds, these tutorials will be more inclusive, informative and useful for everyone in the marine and nonmodel genomics community.After initial feedback, active contributors will be notified of the proposed content and invited to review future drafts.Prospective contributors will then meet with active contributors as needed to develop and edit content.Final drafts will be reviewed by the website manager, working group coleaders and at least one other contributor.In the event that reviewers experience disagreement during any step of content creation that is not resolved with discussion, a majority vote will be opened to all contributors to determine action regarding review decisions.Once again, the working group will prioritize revising contributions over their rejection.In addition to new web pages, we welcome ideas and requests for new resources such as panel seminars, workshops, tutorials and information pages (Figure 2).

| Future developments
New technology and associated analyses are emerging in the field rapidly, making it increasingly important that we aggregate and synthesize best principles for utilizing various methods (Kamvar et al., 2017).Future developments for the website will leverage community contributions to track areas of investigation in marine genomics that are increasingly significant for understanding the di-

| CON CLUS ION
A field as dynamic as genomics requires equally dynamic resources for its navigation.The focus of the MarineOmics working group is to bring people with a variety of backgrounds and expertise together to make genomic tools more broadly accessible, particularly in the framework of nonmodel systems in marine science.The group collaborates to identify gaps in available resources and fill them via expert panel seminars, informational web pages and tutorials.These resources are constructed with guidance for doing rigorous, reproducible science.Collaboration is invited and encouraged from scientists of all skill sets and levels of expertise.
git, marine science, online resources, open access studies, irrespective of the specific sequencing method used.These principles are motivated by rigour and reproducibility, and are outlined in Box 1.The format of content on the website varies by topic and includes detailed bioinformatics tutorials (Figure 1), perspectives that synthesize the existing literature, methods comparisons and links to external resources that follow best principles.

TA B L E 1
Summary of content available on MarineOmics Website at the time of publication.Topics Content Intro Guidelines, background and goals/aims of the MarineOmics working group Panel Seminars Recorded Q & A panel seminars with experts on topics ranging from RAD-seq, population genomics with WGS, & transcriptomics Population Genomics Summary for choosing a population genomics approach with tutorials for WGS (low and high coverage), reduced representation seq, PoolSeq and RDA trait predictions Functional Genomics Summary and tutorials for fitting multifactorial models of differential expression Discussion Forum Platform to suggest ideas for future topics, provide feedback on current resources, request project advice from the community and provide links to additional references/tutorials F I G U R E 1 Screenshot of one web page concerning a RAD-seq pipeline serving as an example of tutorial structure.It provides custom R scripts to supplement packages where they are unable to accommodate tests or experimental designs common in ecology and evolution.The page also introduces DE workflows by

F
Processes for content creation and open-source collaboration.Steps during the aggregation, synthesis and publishing of best principles disseminated on the MarineOmics site are depicted on the left of the figure.Workflows for open-source contributions and collaboration are depicted on the right.by writing and releasing a poll via social media (e.g.Twitter) and professional email listservs (e.g.EvolDir).Briefly, this poll asked respondents to rank data types, pipelines and analyses within population and functional genomics according to their importance to the responder's research and/or their perceived need in the field for related content.Contributors to MarineOmics were among the respondents and interpreted the results of the poll in the context of their own expertise.The questions used in this poll are included in Appendix S1.
versity and evolution of marine organisms.Several new resources are currently in progress, including information pages and tutorials for: filtering different data types and running basic population genomic analyses, functional enrichment analysis, bisulfite sequence data to study DNA methylation patterns and Quantitative Trait Locus (QTL) analyses.Going forward, we anticipate the addition of new pages that capture developments in marine and nonmodel genomics such as genome assembly, metabarcoding, epigenetics, quantitative comparisons of different data types, genome annotation and automated workflows (i.e.Snakemake and Nextflow).Due to the rapid progression of bioinformatics software, Singularity images and conda YAML files will be integrated into code-based tutorials to facilitate reproducibility.