1. Comparative analyses are used to address the key question of what makes a species more prone to extinction by exploring the links between vulnerability and intrinsic species’ traits and/or extrinsic factors. This approach requires comprehensive species data but information is rarely available for all species of interest. As a result comparative analyses often rely on subsets of relatively few species that are assumed to be representative samples of the overall studied group.
2. Our study challenges this assumption and quantifies the taxonomic, spatial, and data type biases associated with the quantity of data available for 5415 mammalian species using the freely available life-history database PanTHERIA.
3. Moreover, we explore how existing biases influence results of comparative analyses of extinction risk by using subsets of data that attempt to correct for detected biases. In particular, we focus on links between four species’ traits commonly linked to vulnerability (distribution range area, adult body mass, population density and gestation length) and conduct univariate and multivariate analyses to understand how biases affect model predictions.
4. Our results show important biases in data availability with c.22% of mammals completely lacking data. Missing data, which appear to be not missing at random, occur frequently in all traits (14–99% of cases missing). Data availability is explained by intrinsic traits, with larger mammals occupying bigger range areas being the best studied. Importantly, we find that existing biases affect the results of comparative analyses by overestimating the risk of extinction and changing which traits are identified as important predictors.
5. Our results raise concerns over our ability to draw general conclusions regarding what makes a species more prone to extinction. Missing data represent a prevalent problem in comparative analyses, and unfortunately, because data are not missing at random, conventional approaches to fill data gaps, are not valid or present important challenges. These results show the importance of making appropriate inferences from comparative analyses by focusing on the subset of species for which data are available. Ultimately, addressing the data bias problem requires greater investment in data collection and dissemination, as well as the development of methodological approaches to effectively correct existing biases.