Not just for programmers: How GitHub can accelerate collaborative and reproducible research in ecology and evolution

Researchers in ecology and evolutionary biology are increasingly dependent on computational code to conduct research. Hence, the use of efficient methods to share, reproduce, and collaborate on code as well as document research is fundamental. GitHub is an online, cloud‐based service that can help researchers track, organize, discuss, share, and collaborate on software and other materials related to research production, including data, code for analyses, and protocols. Despite these benefits, the use of GitHub in ecology and evolution is not widespread. To help researchers in ecology and evolution adopt useful features from GitHub to improve their research workflows, we review 12 practical ways to use the platform. We outline features ranging from low to high technical difficulty, including storing code, managing projects, coding collaboratively, conducting peer review, writing a manuscript, and using automated and continuous integration to streamline analyses. Given that members of a research team may have different technical skills and responsibilities, we describe how the optimal use of GitHub features may vary among members of a research collaboration. As more ecologists and evolutionary biologists establish their workflows using GitHub, the field can continue to push the boundaries of collaborative, transparent, and open research.

ficiently develop and collaborate on projects (Git-A Short History of Git, n.d.). Since its launch in 2005, Git has become the leading version control system in software development and in other disciplines that require collaboration and community contributions, such as in scientific research (Spinellis, 2012). To understand how GitHub keeps track of changes to files and folders, it is recommended to have knowledge of basic concepts of Git (such as commit, push, pull, and checkout; see Box 1). However, the GitHub web-based platform and its integrated development environments (such as GitHub Desktop) allow users to perform most repository and data management operations without using the command console, making these functionalities available even to users who are less familiar with software development.
Version control involves tracking the state of the files and directories which are stored in a "repository" (see Box 1). A typical workflow using Git and GitHub is to: (i) create a remote repository that is synchronized with files and directories stored locally; (ii) modify these files, either locally or remotely; (iii) frequently "commit" (or record) changes to these files (see Box 1) along with a description of modifications; (iv) synchronize commits with GitHub (see "push" and "pull" in Box 1) so that the repository on the web and the local repositories are up-to-date. The repository, which contains files, their modifications, and the description of their changes can then be accessed by chosen collaborators or, whenever applicable, the public, who can easily download and synchronize them to their own computers (see "clone" in Box 1). Commits act like snapshots, allowing users to view or even revert the state of the project to any previous commit. If the modified files are plain text, only the differences from the previous commit are recorded, allowing frequent commits without causing the size of the project to grow excessively. This provides a safe and less cluttered alternative to frequently making full copies of documents at different points in their evolution (e.g. analysis.R, analysis_v2.R, analysis_FINAL.R). While we do not focus on technical details about the use of Git and GitHub in this study, we recommend users explore available resources to become more familiar with version control features (see Blischak et al.,   The use of GitHub has become increasingly popular in recent years due to the expansive GitHub user-community and numerous GitHub resources (Bryan, 2018;Happy Git with R, n.d.;Our Coding Club, n.d.;Perez-Riverol et al., 2016). Nevertheless, although multiple articles have encouraged researchers in ecology and evolution to adopt GitHub as part of their research process (Lowndes et al., 2017;Perkel, 2016), its use is still not widespread. Among many other factors, this may be because firsttime users without formal training in information technology can face a steep learning curve, as GitHub and its features have been centred on software development (Leibzon, 2016). Furthermore, there are few domain-specific resources providing tractable examples and practical guidance for researchers in EEB on how to use GitHub (but see Kim et al., 2022;Openscapes, n.d.;Our Coding Club, n.d.). Widespread adoption of GitHub for collaborative research tasks can ultimately enable EEB researchers to save time on creating novel processes for collaboration and focus more on their research (Briney et al., 2020). More importantly, expanding the availability of data and code management standards, of which GitHub is an increasingly important component, makes research more reproducible and collaborative (Alston & Rick, 2021;Gomes et al., 2022).

BOX 1 Glossary
Repository: A repository (commonly shortened to "repo") is a collection of files (e.g. a directory) tracked by Git. Repositories are managed by an owner and can be made either "public", to be visible to all GitHub users, or "private", to selected owner-specified users. Repositories can be either "local" and saved on an individual's computer or "remote" and stored on the cloud via GitHub's web platform.
Fork: A fork is a copy of a repository hosted on GitHub. If a repository is public, then anyone can make a fork. Even if they do not have access to push to the original repository, they can make a fork and edit it independently. Forks are linked to the original GitHub repository and "upstream" changes (i.e. those in the original repository) can be merged to keep the fork up-to-date with the original project. Changes made in the fork can be integrated into the original project via pull requests.
Clone: Cloning a repository is a way of making a local copy (i.e. on your computer) of a GitHub repository. If you have access to push to a repository, this can be a first step to contributing to a project.
Branch: Git workflow timelines or repositories are analogous to trees, with a main working project and diverging branches that are pointers to changes during the development process. A git branch is an alternative line of development for a project (repository). Commit: Commits are snapshots of the development of a project. In Git, versions of files and directories are uniquely identified as "commits", allowing one to identify and track modifications line-by-line. Commits can include changes in multiple files and must include a brief commit message describing the changes made. A typical workflow is to make some related changes in files, add a commit message (e.g. "Generate and include results figure"), and after several commits push those commits to the remote (i.e., cloud-based) GitHub repository.
Push and pull: When commits are made in a project locally, they must be synced with the remote GitHub repository by pushing them.
Changes on a GitHub repository can then be pulled to keep your local version of the project up to date with the remote.
Pull request: A pull request is a request for changes made on an individual's branch in the repository or in a user's fork to be merged to the repository. Pull requests contain a description of the changes alongside all code required for testing and review by other users prior to being merged into the repository.

Merge:
Combining commits from two different branches together into one branch.
Release: At any point, a release can be made on GitHub to mark a significant milestone in the progression of a repository. While this GitHub feature is designed with releases of new versions of code in mind (e.g. v1.0.0), it can also be used to create a snapshot of a repository at significant stages like pre-print, submission, revision, and acceptance of an associated manuscript.
Community: A forum where GitHub users can ask for advice, offer solutions to questions, and share ideas (https://github.commu nity/).

| Storing and sharing research compendia
An EEB research compendium includes all computational materials related to research production, including data, code for analyses and protocols. Safely storing these files is essential to protect against accidental modifications or deletions. Many researchers begin using GitHub to backup their research compendium (Marwick et al., 2018) into a centralized, readily available remote server (see Box 1). This practice has the advantages of facilitating collaboration, integrating data and code archiving, allowing file versions to be accessed and restored, and further contributing to open science practices (Borghi & Van Gulick, 2022).
Changes made to files in version-controlled repositories are accompanied by authored descriptions of modifications (Box 1).
Later, the entire history of commits and their commit messages are viewable and can be audited similarly to physical laboratory notebooks (Ram, 2013

| Project continuity
Projects in ecology and evolution often involve research professionals holding limited-term positions, such as graduate students, research assistants and post-doctoral fellows (Fehr et al., 2021).
Without clear plans on project continuity, research code and data management upkeep tends to fall off as researchers move on to new projects or other institutions. Additionally, code and data can be difficult to access and recover when kept only on personal devices (Vines et al., 2014).
GitHub can facilitate project continuity in research by making code and data handover between users easier (Fehr et al., 2021;Ram, 2013). Through version control, the history of code and data from projects in ecology and evolution becomes accessible to future laboratory members and collaborators (Lowndes et al., 2017).
Repositories and organizations can have designated data and code owners (or more appropriately, "data stewards"; see About code owners, n.d.; Hampton et al., 2015), who can also change through time, allowing for the transition of code between research cohorts (see also "Organizing and managing teams"). Other project collaborators can contribute to repository design and development, and their active involvement can both aid authors ability to act as guarantors, and the clarity and reproducibility of the project for future users. In (Figure 1), we highlight several elements of recommended repository structure, and the various ways that contributors may interact with them.
Software compatibility during the analysis and reanalysis of project data can be ensured by storing information about software dependencies and their versions within the same project repository.
With more advanced practices, one can remotely instal and execute scripts using specific versions of software within GitHub's project automation tools, GitHub Actions (see below).

| Project management
Modern research in ecology and evolution is highly collaborative, bringing together multidisciplinary teams from various institutions (Goring et al., 2014). On GitHub, collaborators can share feedback, brainstorm ideas, and troubleshoot problems ( Figure 1). Project management can happen via three GitHub repository features: "Issues", "Discussion" and "Projects" (Box 1). Github Issues allow for discrete tasks and sub-tasks to be identified, assigned to team members, and categorized with custom labels.  Finally, instructors can host and assign student work to be submitted collaboratively or individually as code and text files, and even build autograding tests using the GitHub Classroom tool (https:// class room.github.com).

| Educational materials
Although time-consuming, adopting these features in classrooms can integrate the learning of version-control and GitHub practices with the learning of course contents, and thus boost students' feelings of self-efficacy and confidence (Trujillo & Tanner, 2014).

| Hosting a website
Personal or laboratory websites can improve the sharing of research findings, build online presence, and increase coordination of research efforts (Smaglik, 2007

| Archiving citable code and data
Government, funding agencies, and publishers exercise rigorous open-access data policies and mandates (Nugroho et al., 2015;Tenopir et al., 2020). However, code and data sharing may be met by individual reluctance, temporary embargoes, or partially prevented by privacy and confidentiality reasons (Figueiredo, 2017;Tenopir et al., 2015;Wicherts et al., 2011). Still, data deposition and ensuring its availability can amplify the outreach of published studies (Pronk et al., 2015), increase citation rates (Piwowar et al., 2007), and among many other reasons, enable the reproducibility and robustness of scientific advances (Baker, 2016;Mislan et al., 2016; "On Data Availability, Reproducibility and Reuse," Nature Cell Biology, 2017). While public repositories on GitHub make it easy to store and share data files, they are not considered long-term repositories for research materials. This is because GitHub, a for-profit GitHub repositories with a DOI helps research become findable, and properly cited and can ensure long-term stability (Hampton et al., 2015). This strategy has been increasingly adopted in numerous studies in ecology and evolution (e.g. the Zenodo repositories

| Collaborative and asynchronous code editing
GitHub can serve as a platform for everyone working with research (e.g. supervisors or advisors, graduate students, postdoctoral fellows, and collaborators) to share in-progress work, and flag specific challenges or questions for each other (Table 2). Periodic code, data and text reviews are useful for identifying errors early in the research process (Song et al., 2020), and informing further training and mentorship to fill gaps in skills. This is facilitated by a group of core TA B L E 2 A non-exhaustive collection of ideas for how various GitHub features could be utilized for a research project. Here we have categorized contributors/collaborators into five roles.
A Project Manager owns the GitHub repository for a project, and leads the academic project (e.g. lead author of a manuscript). A co-author contributes to writing and other aspects of research, but may have limited or no experience with programming, git, and/or GitHub. A code contributor writes or edits analysis code for the project. A code reviewer could be a project collaborator or a peer reviewer who reviews project code. They are familiar with coding, but not necessarily with git or GitHub (but they are willing to learn). Finally, community members could be other researchers or non-researchers interested in reproducing results, re-using code or data, or communicating with researchers involved in the project. These roles are not mutually exclusive-a co-author could also be a code contributor and code reviewer, for example. For definitions of the GitHub features, see Box 1. By enabling more comprehensive remote collaboration, GitHub encourages the exchange of ideas among researchers at different institutions and in different countries, which can serve to improve the quality of the research itself by providing open access to data and code.

| Writing a manuscript
Beyond supporting collaborative code development, GitHub can be used for writing manuscripts. Writing a manuscript and storing its associated data and code in GitHub increases scientific reproducibility because text, code, and data can be found in one place. Although it may involve more initial time investment for setup, GitHub has many features that support a powerful collaborative workflow when writing manuscripts (Ram, 2013). Text documents stored and versioned in  (Table 1).
We wrote this manuscript using Manubot, a modifiable workflow implemented in GitHub to automatically render manuscripts and automate bibliographical tasks (Himmelstein et al., 2019). Manubot uses GitHub's automation workflow, GitHub Actions, to combine and convert individual Markdown files into a single LaTeX document, which can then be converted to a Word or PDF document, and displayed as a webpage. Citations and bibliographic references are automatically managed with citable persistent identifiers (e.g. DOI, PubMed ID, ISBN, URL). The resulting manuscript can be rendered with document templates and citation style language formatting to meet journal formatting requirements. Every change made to the manuscript triggers its rendering, so that updates are readily displayed and made publicly available. Additional GitHub Actions can be integrated with Manubot, such as ones creating figures or generating tables (e.g. https://github.com/SORTE E-Githu b-Hacka thon/ manus cript/ tree/main/.githu b/workf lows).

| Peer review
Peer review is the standard process for assessing whether research done in ecology and evolution should be published in a scientific journal. GitHub provides an open and transparent platform that can be used for either directly providing feedback on research products or addressing changes recommended by reviewers. GitHub Issues can be used to organize and discuss reviewer suggestions and to assign them to co-authors (e.g. https://github.com/SORTE E-Githu b-Hacka thon/manus cript/ issue s?q=label %3A%22Rev iewer +Comme nt%22). When reviewer comments are posted as separate issues, authors can comment on the issues to discuss possible changes and assign co-authors who will address the issue. Co-authors can then integrate their edits and responses to reviewers using pull requests, which can be directly linked to the issues they address.
GitHub can also assist reviewers during the peer review process.
If the code associated with a manuscript is made available at the time of submission (e.g. as a link to a GitHub repository within the Data Availability Statement), peer reviewers may be able to offer more comprehensive suggestions on the code and written materials, potentially recognizing errors before publication. Certain journals or software development communities require submitted work or research code to be hosted on GitHub and their review processes make use of GitHub Issues (e.g. rOpenSci https://ropen sci.org/softw are-revie w/, Journal of Open Source Software https://joss.readt hedocs.io/en/lates t/submi tting.html).

| Open science discussion
Scientific publications often omit part of their intellectual and computational workflows, including the treatment of raw data and analytical steps (e.g. model assumption testing). Publishing data and reproducible workflows along with manuscripts can provide readers with all details about analytical steps and enable reproducing research experiments and results (Culina et al., 2020). In addition to storing data and code, GitHub repositories can provide a timestamped (version controlled) preregistration of research plans and hypotheses.
Conventional research practices typically separate tasks among collaborators (i.e. data entry, analysis, writing). It is common that coauthors discuss, but do not actively verify, edit or execute research tasks that are not their main responsibility. GitHub can serve as a tool for open and tractable research development. Collaborators can directly interact with code and data, inspect for errors and potentially identify scientific misconduct prior to manuscript submission (e.g. Kozlov, 2022;Viglione, 2020; https://ecolo gyfor thema sses. com/2020/02/04/pruit tdata -and-the-ethic s-of-data-in-science).
Collaborators and readers are better positioned to discover erroneous or questionable findings if they have complete and transparent access to projects. This transparency can be extended beyond co-authors to the entire scientific community and to the public. Supplying code for novel or currently used methods reduces barriers to knowledge, improving the ability of others to build on existing work.
This practice results in greater proliferation and accessibility for a broader audience. Projects can make use of GitHub Discussions (https://docs.github.com/en/discu ssions) to communicate among repository members (collaborators) and to engage with other scientists and the general public. Moreover, researchers can also use the GitHub Community (https://github.commu nity/) forum to share expertise or request help from others on their analyses and ideas ( Table 2).
The desire or need for privacy during the developmental stages of a manuscript or of a larger research project is common in EEB, and this is often perceived as a major barrier to doing science openly.
Because GitHub repositories can be made private or public at any time, there is no need to choose privacy over open science or viceversa. Repositories can be kept private until their contents are ready to be shared publicly, as might occur when a research article is published or when an embargo is lifted.

| Automation
Automation has the strong potential to expand the scale and pace of research in ecology and evolution (Keitt & Abelson, 2021). Automation frameworks can streamline many stages of the scientific process, including data collection and data validation (e.g. Micheletti et al., 2021;Yenni et al., 2019), data analysis (e.g. Beaulieu-Jones & Greene, 2017), unit testing of research code (e.g. Sarma et al., 2016), archiving and deployment of data, code and reports (e.g. this manuscript, White et al., 2018), and the interpretation, integration and usage of data and software across different sources (see Pasquier et al., 2017). In this context, small modifications to code and data can be frequently committed and automatically tested, as in continuous integration and continuous deployment practices (Meyer, 2014). This allows for early detection and correction of errors, potentially improving confidence in scientific development by minimizing software errors (see Soergel, 2015). In addition to increasing scientific rigour and confidence in ecological software (Scheller et al., 2010), automation can help advance more rapidly sharing ecological data and making sure the data are high quality (Dietze et al., 2018). Integrating automation workflows has been highly encouraged in areas of EEB, including pre-

| Organizing and managing teams
GitHub Organizations are shared virtual spaces that allow teams to work in different repositories, while remaining tied together under a larger group, such as a laboratory, department, or project involving several teams. Organizations allow larger projects with many steps or moving parts to be constrained to one virtual space, where outputs and sub-projects can be easily accessed and located without relying on individuals. Because the repositories are grouped, members can reference and contribute to each other's work without necessarily being part of the same repository, broadening the accessibility and longevity of code and writing contributions.
Contributors can be assembled into teams within an organization, which allows administrators to assign roles, tasks, and repository modification permissions to organization members. Whereas access to repositories is usually assigned to individual contributors, organizations facilitate the management of access permissions by allowing teams to be granted access to specific repositories. This ensures repositories with sensitive information remain as restricted as needed, while others stay open and accessible to selected member groups. The organization structure also allows for issue tracking and discussions related to research content and progress.
As an example, GitHub Organizations are particularly well-suited to host documents and projects within a laboratory, such as research compendia, codes of conduct, protocols, training documents and other relevant documents that evolve collaboratively over time.
In this way, teams have full ownership of repositories within an organization, while ensuring that these materials stay accessible to the laboratory after people have moved on or when locally-stored data are lost. This application extends to research centres, which may include several distinct projects that remain linked to institutions [e.g. the German Centre for Integrative Biodiversity Research We expect that situating the main uses of GitHub in EEB alongside examples in this paper will be useful to the EEB community.
The 12 use cases we described here can leverage GitHub to enable more transparent and collaborative research in ecology and evolution ( Figure 2 2. Consider taking free courses, such as those from Software Carpentry (Munk et al., 2019), and sharing them with your lab members or colleagues.
3. Take advantage of GitHub as an asynchronous working tool for team-based projects. See the repository for this paper (https:// github.com/SORTE E-Githu b-Hacka thon/manus cript) as an example of a collaboratively authored manuscript that used the GitHub Discussions, Issues, Pages, and Actions features.
4. Use the interactive courses from the GitHub Skills page (https://skills.github.com/), which allow you to learn GitHub basics through short projects and tasks with step-by-step guides.
5. Learn markdown and use cheatsheets (e.g. http://markd owngu ide.org/basic -syntax) so you can write clear metadata README files for your repositories.
6. Consult online resources. The Jenny Bryan Universe of GitHub material, for example, provides a thorough and accessible introduction to a multitude of research-related uses for GitHub and includes a book (Hester & the STAT 545 TAs,n.d.), statistics course (Bryan & TAs, n.d.) and academic article (Bryan, 2018).
7. Do not be afraid of trial-and-error. One of the best ways to learn GitHub is the "trial-and-error" method. Learning from your own mistakes can be a valuable way to master your GitHub abilities. In any case, if you make mistakes, GitHub allows you to revert any steps that you desire via version controlling.
8. If you are an educator, include lectures on reproducibility and tools for creating reproducible workflows in the curricula. Some graduate programs include coursework on course R Markdown and GitHub. Getting students started with these tools earlier will prevent the resistance that comes from working with a less reproducible workflow for a longer period of time (see example https://github.com/rmcel reath/ stat_rethi nking_2022).

| Other platforms for collaboration
Despite its strong collaborative potential, we describe two use cases where GitHub falls short of highly collaborative work.

| Why aren't more EEB researchers using GitHub?
Although GitHub has been available as a platform for more than a decade, its uptake among EEB researchers, especially as a tool for collaboration, has been slow. Here, we discuss five potential barriers to GitHub use in EEB: First, there may be hesitation to independently adopt and learn a new tool. Institutional encouragement and instructional resources focused on researchers in ecology and evolution may be limited. EEB researchers may take the view that GitHub is a platform that only needs to be used by individuals writing code, and may silo those aspects of projects to a single individual. These assumptions may obscure the utility of GitHub for tasks other than traditional data analysis and code development. However, we emphasize that there are opportunities for collaboration using GitHub by researchers of all skill levels or time constraints (Table 2). For example, project stakeholders can provide a list of use-cases or highlight important conceptual components of a project using GitHub Issues or Discussions features.

F I G U R E 2
A summary of ways GitHub can be used showing technical difficulty and degree of collaboration for each. Activities higher on the vertical axis require usage knowledge of more GitHub features than activities lower on the axis. On the horizontal axis, each activity spans a region representing who is potentially involved with or benefits from each activity. For example, storing data and code mainly benefits individual researchers or members of a laboratory while making data and code citable and reproducible benefit other labs and the larger community as well. Independently of one's knowledge of GitHub features, there are ways to use GitHub that allow tapping unto one of the strongest benefits of the platform: facilitating and enhancing collaboration. For information on the methods and the data used to create this figure, see Appendix S1.1, Appendix S1.2, and Tables S1.1 and S1.2.
A third barrier to the use of GitHub may come from general reluctance to share data and code publicly, or technical and logistical issues (Gomes et al., 2022). GitHub is, by default, a public and open platform, which may add additional pressure to students and scientists learning to use it. Moreover, additional tools may be needed to fully integrate project files and GitHub repositories (e.g. Connect GitHub to a Project-OSF Support, n.d.). Other scientists may simply lack the time or incentives to document and version control their code if the code is unlikely to be reused beyond their analysis. However, we (and others, e.g. Gomes et al., 2022) argue that for open science and collaboration to be successful, code owners should document, and version control their code, despite uncertainty about future use.
A fourth additional barrier to EEB researchers is the lack language-specific resources for non-English speaking researchers working in ecology and evolution. Language is a well-known obstacle to international collaborative research progress and to widespread scientific knowledge (see Khelifa et al., 2022). Non-English speaking EEB researchers can potentially miss opportunities to fully integrate version control, reproducibility, and other benefits of GitHub without language-inclusive contents.
Fifth and lastly, when projects require a high degree of collaboration, they may need to pay for certain GitHub features, such as branch protections, multiple reviewers of pull requests and time in its automation tools. Fortunately, GitHub offers education packs (https://educa tion.github.com/) to students and academics, which extend some paid features to the free plan. However, the acquisition of GitHub by Microsoft has raised concerns over the future of free plans, causing several biodiversity data managers to shift to alternative Open Source Git services, such as Bitbucket and GitLab.

| CON CLUS ION
We provide 12 practical ways that ecologists and evolutionary biologists can use GitHub to improve their research workflow, make it more open, reproducible and transparent. We provide definitions (Box 1) and types of users ( Figure 1) to help researchers identify and prioritize the skills and tools to learn and apply. We highlight tools providing high collaborative potential (e.g. open science discussion, collaborative code editing) to more individual focused (e.g. storing code and data, building a website). We argue that the tools readily available in GitHub have the potential to make ecology and evolution more open, reproducible and transparent. With this comprehensive review of how EEB researchers can use GitHub, we encourage researchers at any career stage to adopt GitHub as a platform for sharing and collaboration.

AUTH O R CO NTR I B UTI O N S
We indicate author contributions using the CRediT Taxonomy.

ACK N O WLE D G E M ENTS
We thank Aaron Ellison and two reviewers for their careful and valuable remarks, which greatly improved our manuscript. This manuscript arose from a hackathon at the Society for Open, Reliable,

CO N FLI C T O F I NTE R E S T S TATE M E NT
On July 15, 2022, RCO was offered a position to work at GitHub and became an employee on August 23, 2022. Initial discussion of publishing this manuscript began in July 2021 and all work on the manuscript prior to the first revision was done while RCO was an employee at Lawrence Berkeley National Laboratory. The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.

PEER R E V I E W
The peer review history for this article is available at https:// w w w.w e b o f s c i e n c e . c o m /a p i /g a t e w a y/ w o s /p e e r-r e v i e w/10.1111/2041-210X.14108.

DATA AVA I L A B I L I T Y S TAT E M E N T
The preprint and the entire source code containing the manuscript and author contributions is available in the Open Science Framework repository (accessible at https://osf.io/bypfm/;  and GitHub repository (accessible at https://github.com/SORTE E-Githu b-Hacka thon/manus cript) for this study.