An important priority for conservation biology is to understand what makes a species or population more likely to become extinct. A popular and appealing answer is based on comparative analyses that explore the links between species vulnerability to extinction and intrinsic ecological and life-history species’ traits (Purvis et al. 2000; Fisher & Owens 2004; Cardillo et al. 2008; Fritz, Bininda-Emonds & Purvis 2009; Pinsky et al. 2011) or extrinsic factors (Kerr & Currie 1995; Forester & Machlis 1996; Cardillo et al. 2004). This approach requires large databases describing species traits (or extrinsic factors) in a format suitable for comparative analyses. Compiling such databases takes considerable effort from multiple dedicated researchers, who ideally make their complete databases publicly available allowing future research (Jones et al. 2009). However, any efforts to gather information are limited by the fact that data are not available for all species in all locations, something that has been previously recognized by other authors (Fisher, Blomberg & Owens 2003; Luck 2007; Nakagawa & Freckleton 2008; Matthews et al. 2011). As a result, gathered data represent only a subset of species and locations, which traditional comparative analyses implicitly assume are a representative sample of the taxon or group of interest (but see Matthews et al. 2011). Our study challenges this assumption testing the hypothesis that studied species, those for which data are available, are not a random sample of the global biodiversity and that this bias affects results from comparative studies. In particular, we address three objectives: (i) to describe existing biases associated with the number and type of data available in a mammalian comparative data set; (ii) to test the hypothesis that life-history, ecological and behavioural traits are associated with greater data availability, because some traits can facilitate, or complicate, research and make species more or less appealing as study subjects (Matthews et al. 2011); and (iii) to investigate whether the existing biases affect the results and conclusions of comparative analyses linking intrinsic species’ traits and vulnerability to extinction. Specifically, we compare results from standard phylogenetically informed comparative analyses based on different subsets of species, some of which attempt to control biases.
Currently available tools for comparative analyses can be broadly classified into phylogenetic and non-phylogenetic regressions (Bielby et al. 2010). Phylogenetic methods are more commonly used and include regressions using phylogenetic independent contrasts (Felsenstein 1985), a popular approach despite its unrealistic assumptions about Brownian trait evolution (Blomberg, Garland & Ives 2003), and generalized regressions, such as phylogenetic generalized least square models (PGLSs, Martins & Hansen 1997), which provide a flexible alternative with fewer assumptions. Nonphylogenetic methods include regression trees (Breiman 1984) that have fewer data requirements but can be unstable and fail to account for phylogenetic relationships (Bielby et al. 2010). All of these tools are limited by data availability because they generally require complete data for all predictors. Therefore, exploring patterns with multiple predictors requires either interpolating missing data, which may introduce biases if data are not missed at random (Little & Rubin 2002), or eliminating all species with any missing data, which can bias estimates and reduces the sample size considerably (Nakagawa & Freckleton 2008). For example, a well-cited study by Cardillo et al. (2006) drew inferences from <20% of the extant species in some analyses, while analyses for this study were in some cases limited to <12% of the species of interest. If those species with available data are not a random sample, conclusions may not apply to the broad group of interest and inferences need to be made carefully.
In recent years, many authors have contributed to develop large databases suitable for comparative analyses, which describe life-history traits in diverse taxa including birds, mammals, amphibians, fish and angiosperms (Froese & Pauly 2000; Sekercioglu, Daily & Ehrlich 2004; Bielby et al. 2008; Sodhi et al. 2008; Jones et al. 2009). For this study, we decided to focus on mammals for several reasons. First, mammals are arguably the best-studied group with many species of conservation, economic and social interest. Second, the links between species traits and vulnerability have been extensively investigated in mammals with multiple comparative studies published showing how traits such as adult body mass, distribution range area, gestation length and population density are linked to vulnerability to extinction (Purvis et al. 2000; Fagan et al. 2001; Brashares 2003; Cardillo 2003; Cardillo et al. 2004, 2005, 2006, 2008; Davidson et al. 2009; Fritz, Bininda-Emonds & Purvis 2009). Finally, we had free online access (http://www.utheria.org/) to a large mammalian life-history data set, PanTHERIA (Jones et al. 2009), which was also used in several recent comparative studies (e.g., Bininda-Emonds et al. 2007; Cardillo et al. 2008; Davies et al. 2008; Fritz, Bininda-Emonds & Purvis 2009).
In this study, we show that species’ ecology, life history and morphology explain variation in the quantity of data collected. Data appear to be not missing at random, and thus applying imputations techniques to fill data gaps may be difficult. Moreover, existing biases affect estimates obtained from comparative analyses suggesting the predictive ability of currently used models may be limited. Although our results are limited to mammalian species, the existence of data biases that can affect comparative analyses is likely common to other taxa. Overall, these findings highlight the importance of explicitly considering data biases in comparative analyses and ultimately, the need for gathering and publishing basic natural history data even if currently deemed ‘old-fashion’.