HeFPipe: a complete analytical pipeline for heterozygosity-fitness correlation studies



As the body of heterozygosity-fitness correlation (HFC) research grows, more and increasingly complicated tests have become an integral part of a typical HFC analysis (Chapman et al. 2009). Currently, no software is available to undertake conversion between the file formats required to conduct all of these tests and to conduct the main regression analyses at the core of all HFCs. Heterozygosity-Fitness Pipeline (HeFPipe) is a script written in Python that accomplishes both of these tasks for studies based on microsatellite data. HeFPipe is designed to be used from the command line terminal and will run on any Mac OSX computer. The script takes input in the form of allele reports from either the genotype-calling software, GeneMapper or GeneMarker, and reconfigures the data into GENEPOP (Raymond & Rousset 1995), Rhh (Alho et al. 2010), RMES (David et al. 2007) and GEPHAST (Amos & Acevedo-Whitehouse 2009) formats. The script is also equipped to reformat the output from GENEPOP on the Web (option 5) and Rhh into csv spreadsheets that can be incorporated into downstream analyses. HeFPipe accommodates user-provided lists of samples and markers to be included in or excluded from analyses. HeFPipe is equipped to create generalized linear models (GLMs) from both the main data set and subsets of the data. Finally, HeFPipe allows users to explore single-marker effects and conduct correlation analyses. The script, a comprehensive manual, a link to a series of video tutorials, and an example data set are available from GitHub (http://github.com/Atticus29/HeFPipe_rpos).