Generalized nonlinear models can solve the prediction problem for data from species‐stratified use‐availability designs

Habitat suitability modelling methods for presence‐only species data are limited in their ability for making true predictions and are therefore often misused in ecological applications. A use‐availability design—also known as a case–control design with contaminated controls—combines presence‐only species data with a background sample of covariates where the species presence/absence is unknown. Assuming a log link function for the true probability of presence/absence, the use‐availability data then can be analysed as a logistic regression model with a biased estimate of the intercept. Due to the biased intercept, the model is unable to make true predictions. Instead, ranking the “pseudo‐predictions” from the model with biased intercept provides a viable alternative for making predictive inference in single‐species models. We show the ranks are no longer conserved across species when such a single‐species model is extended to multiple species, limiting predictive inference.


| INTRODUC TI ON
Habitat suitability and species distribution modelling are broadly about quantifying where in space species/organisms are present versus absent and describing relationships between habitat data and the presence/absence status of a species/organism at a particular location. Statistical models for species presence/absence are models for binary response data, like logistic regression or some more general variant of it like generalized additive models (Wood, 2017) or classification trees (Breiman et al., 1984). Sometimes, the species presence/absence values are aggregated over some larger space to produce species counts that could be analysed using a Poisson log-linear model (McCullagh & Nelder, 1989) or some more general variant.
Frequently, species presence/absence suffers from some form of measurement error. For example, a bird identified by its song may go undetected because it did not sing while the study plot was being measured. The species is present, but marked as absent. Alternatively, its home range might not overlap exactly with the study plot so it goes undetected because it was someplace else during measurement. The species is absent and marked as absent, but it might make more sense in some projects to mark the species as present.
Both these scenarios could be considered issues with prospective studies. That is, studies where the species presence/absence status is unknown in each study plot and measurements are taken to determine their status. It is also common to find retrospective studies of species distributions, where species presence/absence values already known and records are sampled conditional on the presence/ absence values of the species. This is known as a case-control design, where cases are presence values and controls are absence values (Hosmer et al., 2013). Data from case-control studies can also be subject to measurement error. For example, records housed in herbaria, museum and databases may only contain locations where species were found to be present and no location records where species were found to be absent. This is known as presence-only data. That is, data where species presence/absence values are only available when the species is present. If the study design allowed for a census of all organisms of the species of interest, then presence-only data are equivalent to presence/absence data as any study plot missing the species status is by deduction equivalent to a record of species absence. When a census does not occur, we call location records contaminated controls when the habitat data are available, but the species/organism presence/absence status is unknown.
In the ecological literature, species distribution data in a retrospective study collected according to a case-control design with contaminated controls are called a use-availability design, where the use sample is a sample of cases and the availability sample is a sample of contaminated controls. Sometimes the availability sample is also called a background sample or quadrature points. A number of the technical details for analysing use-availability designs for single species are provided by Lancaster and Imbens (1996). A good review of the limitations of use-availability design for single species and how to address these limitations are provided by Keating and Cherry (2004).
The goal of this work is to layout an updated framework for analysing data collected according to use-availability designs for both single-and multiple-species models. To do this, we will update the model of Lancaster and Imbens (1996) so that it can be implemented as a generalized nonlinear model in both maximum likelihood and Bayesian frameworks. We will extend the updated model from one to multiple species for use-availability designs stratified (or blocked) by species. Along the way, we will update some of the inferential limitations described by Keating and Cherry (2004) of models for data from single-species use-availability designs. In particular, we will focus on limitation of estimating and predicting the probability of species presence/absence. Most notably, we will show a set of modelling choices that ensures cross species comparisons of estimated and predicted probabilities are interpretable, while highlighting another that leads to uninterpretable cross species comparisons.
The multiple-species case we consider should be referred to as a multiple-species use-availability design stratified (or blocked) by species, which is to say a list of species of interest is generated, then a separate use-availability sampling design is done for each species in series, and the model is for the combined data. For example, there may be a database with records for multiple species and a useavailability design is used to collect records for each species one at a time.
The remainder of the paper is organized as follows: First, we will describe the technical details of the use-availability design, some common modelling choices made, how they generalize from single to multiple species and their limitations. Next, we will discuss how to implement use-availability models as a generalized nonlinear model in software. We will then illustrate these methods on a real dataset from Riordan et al. (2018) of plant data collected in California, USA.
We conclude with a discussion.

| Technical formulation of the use-availability design and model
The multiple-species case we are considering is a use-availability design stratified (or blocked) by species. Which is to say it is a series of single-species use-availability sampling designs implemented sequentially for each species of interest.
In the single-species case, we would consider a species s on a bounded landscape of interest (e.g. the state of California) represented by the union of a large finite number of point locations where an organism of species s could exist. Note: this implies the approach is a discrete-space approximation to a continuous-space problem. At each point-location i, there is an associated vector of habitat data x si = (x si1 , x si2 , …, x siK ), composed of K habitat variables/covariates, and species presence/absence data y si , such that y si = 1 when the species s is present, and y si = 0 when the species is absent. We assume at each point location the species presence/absence was determined by a Bernoulli distribution with a probability of presence P(y si = 1 | x si ) = p si that is conditional on the habitat data. For the multiple-species case, we would have these same assumptions hold true for species s = 1, 2, …, S.

| Sampling scheme and useavailability likelihood
In an ideal world, we would have a random sample of point locations stratified by species. That is, a random sample of (y si , x si ) for each of the S species where we observe true species presence/absence.
We could then proceed with analysis using some method Bernoulli response data, like logistic regression. We would then proceed with inference about β s,0 , β s and p si as is pertinent to the project at hand.
Under use-availability sampling of point locations stratified by species, we take a random sample of size n s1 of point locations where species s is present (or assume the data collected is a random sample), called use data, and a random sample of size n s0 of point locations from the whole population of habit data (i.e., y si could be 1 or 0), called availability data. The total sample size is N s = n s1 + n s0 . Let C si = 1 when point-location i is in our use sample for species s and let C si = 0 when point-location i is in our availability sample for species s. Mathematically, the use sample is a sample from the distribution of the habitat given the species is present f X s |y s =1 (x s | y s = 1) which by Bayes rule can be rewritten as P(y s = 1 | x s )f X (x s )∕P(y s = 1). The availability sample is a sample from the marginal distribution of the habitat f X (x). Using these quantities, Lancaster and Imbens (1996) show the likelihood for use-availability sampling has the following form: where q s = P(y si = 1) and h s = P(C si = 1). Let us define contains a justification for parameterization in terms of s . For convenience, we will decompose the likelihood into its parts: where and where the likelihood L C|X has the form of the joint likelihood for Bernoulli distributed data, conditional on the habitat covariates, while the likelihood of L X has form dependent on the distribution of the habitat covariates f X .

| Practical details of estimation and inference
Convention is to use the conditional likelihood L C|X for inference as Bernoulli likelihoods are well studied with lots of tools for inference. Indeed, Lancaster and Imbens (1996) show that when we set , generalized method of moments estimates and constrained maximum likelihood estimates from L C|X will be the same as from L C,X . For large N, the methods of moments and maximum likelihood estimators from L C|X are asymptotically equivalent. They are asymptotically unbiased and follow a normal distribution where the asymptotic covariance is the sandwich estimator from L C|X . If the full likelihood was used, the asymptotic covariance would be the inverse Fisher information from L C,X . The sandwich covariance produces larger variance estimates than the inverse Fisher information, which is the cost to using the conditional instead of the full likelihood.
In many practical settings, obtaining a large sample size N effectively means obtaining a large availability sample sizes n s0 for all S species. For example, when habitat data are generated from GIS layers of some landscape dataset. We show in Appendix B that for large availability samples, the joint likelihood L C,X is well approximated by L C|X . As a result, the sandwich estimator of the covariance is unnecessary under this condition when using a maximum likelihood approach because the sandwich covariance collapses to the inverse Fisher information. Additionally, these derivations focus on a form L C|X parameterized in terms of , as in Equation (3). Computer optimizers only need to solve two equations instead of the three of Lancaster and Imbens (1996) because it does not need to keep tract of the constraint.
An alternative to an asymptotic maximum likelihood approach to estimation and inference is taking a Bayesian approach. We will propose two Bayesian approaches which are easily implemented in R computer programming language (R Core Team, 2019). The first is an asymptotic Bayesian approach for large availability samples that, like the maximum likelihood approach, uses the conditional likelihood L C|X parameterized in terms of instead of the full likelihood L C,X . The second is a fully Bayesian approach that uses the full joint likelihood of L C,X , including L X . The fully Bayesian approach may be more appropriate for small-scale useavailability studies, rather than large, landscape-scale use-availability studies which motivated this work. The details of these Bayesian approaches are substantial. We will discuss the asymptotic Bayesian approach more in later sections and both Bayesian approaches in Appendix C.

| The prediction problem in useavailability designs
Thus far, we have not discussed the form of p si when in use-availability designs. However, choice of p si is key to solving the prediction model.
Under large availability samples, the model using L C|X for inference can be written as follows: (1) L C,X (C, X | 1:S,0 , 1:S , h 1:S , q 1: (2) L C,X (C, X| 1:S,0 , 1:S , 1:S , q 1:S ) = L C|X (C|X, 1:S,0 , 1:S , 1:S ) × L X (X| 1:S,0 , 1:S , 1:S , q 1:S or h 1:S ), This is a nonlinear model. However, we can make it linearizable by choosing p si to be the exponential probability function: This approach is equivalent to logistic regression of C si on x si where only a biased intercept * s,0 = − log s + s,0 is identifiable. This is a very popular approach in practice because logistic regression is very easy to do in most statistical software languages, but it has a major issue. Specifically, without being able to identify s,0 , it is difficult to return exact inference on p si or other functions of s,0 without very strong prior assumptions about the value of q s . This is known as the prediction problem in use-availability designs-biased estimates of p si when the probability function is exponential. Keating and Cherry (2004) provide a nice review for single-species case.
One option to partially solve the prediction problem is to rank habitat. Because p si is a monotonic function of p si , the ranks of p si will conserve the ordering of the ranks of p si . However, this breaks down when multiple species are considered as each set of pseudoprobabilities p si is biased by a different amount − log s relative to the true probabilities p si for each species. This is illustrated in Figure A1.
Thus, the ranks of the pseudo-probabilities are only meaningful within species and not across.
Instead, we propose treating the use-availability model as the generalized nonlinear model that it is. By carefully picking the form of p si such that all the model parameters are identifiable, we can recover inference on all model parameters and functions of them like p si . This approach was suggested by Keating and Cherry (2004), but now software has advanced to the point that implementing generalized nonlinear models is much easier. In the next section, we will outline one such approach.

| A generalized nonlinear model for useavailability designs
Consider now the case where p si = logistic( s,0 + x si s ), then a generalized nonlinear model that can be used easily in software has the following form: Introducing si , the linear predictor of the original logistic model, should simplify nonlinear model parameter specification in a lot of software packages. It also clarifies how to extend the model to include predictors that are nonlinear (with known or unknown forms): change the right-hand side of the equation to whatever form is relevant to your research question. Parameterizing log s as s,0 is useful because s,0 is an unbounded quantity, whereas s is bounded below by 0. Unbounded quantities tend to have more stable estimates when fitting nonlinear models as it is impossible during optimization to either accidentally explore an area of the likelihood that is out of bounds or find an optimum there. It may be tempting to let log s also have covariate information on the right side of the equation, for example z ′ si s . We caution against it without careful consideration as its interpretation implies the sampling design is not a use-availability design as we have outlined. A full exploration of the implications is beyond the scope of this work, but we do provide some additional details in Appendix D. For example, it could be appropriate to include covariate information such as date of sample or any further stratification such as repeatedly measured organisms within species.

| Illustration of the prediction problem on plant data from California
We use the data of Riordan et al. (2018Riordan et al. ( , 2021. It is of common dominant shrubs and subshrubs in southern California chaparral, coastal sage scrub and alluvial scrub plant communities. For a full description of sampling protocols, data cleaning and data sources, see their work. In this dataset, there are multiple varieties and subspecies (infraspecies) from the same species. We treat each of these infraspecies as a separate 'species' for purposes of modelling. Taxonomic information is available in Table A2. Their use data were collected from herbarium records and field surveys.  Table A3. Availability data were randomly sampled from an area within 100 km of known species or infraspecies occurrences.
All analyses were done in R version 3.5.3 R Core Team (2019).
We used the brm function in the brms package version 2.9.0 to generate samples from the posterior distribution. We ran four chains in parallel on four cores each for 4,000 iterations with 500 iterations of warm-up (aka burn-in). We set the max tree depth to 15 (default is 10) as recommended by the function to reduce some minor computational warnings during the warm-up phase of the NUTS algorithm.
Some example pseudocode is available in Appendix E in Figure A2.
We placed normal priors on s,0 and s,0 with mean zero and standard deviation 5 and 2.5, respectively, and scaled the covariates to have mean 0 and standard deviation 1. This is a slightly more informative version of the normal prior in Ghosh et al. (2018). We opted for the more informative prior as the resulting posterior had better mixing in the NUTS sampler, reducing computational times from about 95 hr to about 40 hr on a Windows 10 PC with an Intel(R) Xenon(R) CPU E3-1245 v5 @ 3.50 GHz 3.50 GHz processor with 64 GB of RAM. We placed a normal priors with mean −2.05 and standard deviation 3.19 on each s,0 . As described in

| RE SULTS
Posterior means of the species presence/absence probabili- However, for such large sample sizes prior smoothing should be limited.
F I G U R E 1 Posterior mean species habitat use probabilities from Model (6): p logis si . Black circles are observed species presences F I G U R E 2 Difference of rank species habitat use pseudo-probability and probability from Models (5) and (6), respectively, scaled by the maximum absolute observed difference. Black circles are observed species presences. Red indicates over-ranking, blue indicates under-ranking, and grey indicates minimal over-/under-ranking F I G U R E 3 Within species difference of rank species habitat use pseudoprobability and probability from Models (5) and (6), respectively, scaled by the within species maximum absolute observed difference. Black circles are observed species presences. Red indicates over-ranking, blue indicates underranking, and grey indicates minimal over-/ under-ranking Additional results can be found in Appendix F. These results include forest plots of the covariate effects in Figure A3 and Bayes factors in Table A4.

| D ISCUSS I ON
Our major contributions in this work are as follows: (a) identifying that some models for use-availability designs as described in Lancaster and Imbens (1996) can be implemented as generalized nonlinear models for binary response data and doing so allows us to make direct inference on presence/absence probabilities, (b) generalizing these models from one to multiple species when the sampling design is stratified by species, (c) identifying that ranked pseudo-probabilities from models for data collected from use-availability designs for multiple species does not conserve the order of the ranked probabilities from the presence/absence model and therefore can not be used for cross species habitat ranking, (d) developing an unrestricted parameterization of useavailability models to use in maximum likelihood and Bayesian inference, (e) developing approximate and fully Bayesian approaches to models for data from use-availability designs, (f) providing some guidance for prior selection when taking a Bayesian approach, and (g) showing how this can all be done with widely available software for statistical analysis.
We used our approach to highlight important differences in habitat suitability ranking between and within foundational shrub species. Inference on predicted habitat probabilities could be used to assess how species distributions will change under different land management scenarios, climate change scenarios, and other natural or interventionist changes to land and climate. It can also be used to select habitat variables that are important to explaining or predicting habitat use probabilities, as well as compare how the impact of said variables is different across species.
There are some pitfalls we have yet to mention. First is that this approach really relies on having informative habitat data. If none of the habitat data contain any information about y si , then p si = q s and the likelihood is degenerate. A second pitfall is having availability samples that are too small to contain some species presence values for rare species. When the availability sample is really a sample of controls f X|Y=0 (x s | y s = 0), rather than contaminated controls f X (x), then the derivations of the joint likelihood L C,X are not the same and the conditional likelihood L C|X becomes misspecified.
While the approaches we have described are for multiplespecies use-availability designs, species is an arbitrary label and any stratification or blocking variable could be used its place. In our application, we used infraspecies. For organisms that move, the stratification variable could be organism ID in a single-species model. Depending on how sampling is done, it could be appropriate to stratify by database if multiple databases are used to generate the presence-only samples. It is important for the user to pay close attention to how the data were gathered when making these decisions.

ACK N OWLED G M ENTS
We would like to thank Jim Baldwin (USDA Forest Service, retired),

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13384.

AVA I L A B I LIT Y D E S I G N S
This section of the appendix and Appendices B and C form a kind of paper within a paper as they contain detailed justification for the methods we used in the main body of the manuscript. We will reintroduce the model from a more technical perspective, provide an alternative parameterization of the model that eases implementation in some software programs and theoretically justify these methods for data with large availability/background samples.

A.1 | Use-availability design, model and likelihood
Consider a binary response variable y where y i = 1 when an organism of some species is present in location i and y i = 0 when the species is absent. At each location i, there are also habi- which follows a Bernoulli distribution with probability parameter where p is a known (i.e., chosen) probability function and is vector parameters (e.g. an intercept and K linear regression coefficients). Suppose we have complete habitat data for all i, but incomplete data on y i . In particular, we only have data where y i = 1, also known as presence-only data. In these situations, we can use a use-availability design to make inference on p(x i , ) and its model parameters. A use-availability design is a sequence of Bernoulli trials such that when C = 1 a sample is taken from f(x | y = 1), called the "use," "presence-only" or "case" sample, and when C = 0, a sample is taken from f(x), called the "availability," "background," or "contaminated control" sample. Note, this implies f(x | C = 0) = f(x). We will use this result later in a derivation. The technical details for such a model were originally described by Lancaster and Imbens (1996). The single observation likelihood for a use-availability design is as follows: Let us define p(x i , ) = P(y i = 1 | x i ), h = P(C i = 1) and q = P(y i = 1).
We can rearrange the likelihood and then decompose it to have the following form: The likelihood L C|X corresponds to a Bernoulli likelihood of C i conditional on x i with parameter P(C i = 1 | x i , , q, h) =p(x i , , q, h), the likelihood of L X appears to be determined by the form of p(x i , ), the values of q and h, and the distribution f(x i ). Lancaster and Imbens (1996) considered the implications of when L X is the likelihood of multinomial distribution. Such a strong assumption is unnecessary.
All the information about β is contained within the conditional distribution of Y | X, and all the information about h and q is the distribution of C and Y, respectively. This implies that f X (x i ) is a proportionality constant in L X .
In the next subsection, we will walk through an updated version of the technique of Lancaster and Imbens (1996) for constrained maximum likelihood estimation. After that, we will provide an alternative parameterization of L C|X that is unconstrained while still following the same constraint as in Lancaster and Imbens (1996).

A.2 | Constrained maximum likelihood inference on model parameters
In this section, we will describe the approach of Lancaster and Imbens (1996) for constrained maximum likelihood estimation of ( , q, h)from the likelihood L C|X . They show that as h is the probability associated with a sequence of Bernoulli trials, independent of (y, x), it can be maximized separately and plugged into L C|X when that function is maximized for ( , q). Specifically, the maximum likelihood estimator for h is ĥ = N − 1 ∑ N i = 1 C i = n 1 (n 1 + n 0 ) − 1 , where n 1 is the size of the use sample and n 0 is the size of the availability sample. Additionally, the maximum likelihood estimator of q must satisfy the following restriction q = ∫ p(x, ) dF(x). This restriction is equivalent to requiring h = ∫p(x, , q, h) dF(x), which in practice ̂ ,q,ĥ). Rearranged, it has the following form: The log likelihood would be: And the set of equations we would need to solve to find maximum likelihood estimates are the following: After plugging in ĥ into these three equations, we can see that both ℒ C|X ∕ q and ℒ C|X ∕ h are proportional to the constraint needed to ensure values ̂ and q are appropriately constrained.
Therefore, any values of (̂ ,q) that solve this set of equations with ĥ plugged into h will be guaranteed to have the appropriate constraint and maximize the likelihood.

A.3 | An alternative way to parameterize the L
Suppose we parameterize L C,X so that = q(1 − h)h − 1 and L X so that q = q(q + ) − 1 = h. The likelihood then has the following form: where its components have form: Are the values ̂ and ̂ that maximize L C|X appropriately constrained? As before, they need to satisfy the following restriction: The log likelihood ℒ C|X is as follows: And the set of equations needed to find the values ̂ and ̂ that maximize L 1 are as follows: The second equation ℒ C|X ∕ is proportional to the restriction necessary to guarantee that any ̂ and ̂ that maximize L C|X are appropriately constrained. This approach absolves us needing to use plug-in estimator ĥ . Thus, it allows us to perform constrained maximum likelihood with an unconstrained likelihood. It is equivalent to maximum likelihood estimation of parameters from a generalized nonlinear model with Bernoulli distributed response C i , nonlinear mean/probability function p(x i , , ) and identity link function. It is still possible to find q from this model using the definition of its re-

SA M PLE S IZE N 0 I S L A RG E
Lancaster and Imbens (1996) showed that the asymptotic distribution of maximum likelihood estimators (̂ ,q,ĥ) from L C|X is asymptotically F I G U R E A 2 Example R code for implementing Model (6) using the brm function in the brms package on the California plant data when environmental covariates have been centred and scaled to mean 0 and standard deviation 1 unbiased and normally distributed with a sandwich estimator for the covariance as N goes to infinity. We are interested in a slightly different asymptotic result: where the size of the availability sample n 0 = N − n 1 goes to infinity. When n 0 goes to infinity, goes to infinity. Thus, it is sufficient to check the behaviour of the log likelihood ℒ C,X at the latter limit. First, we will focus on approximating the limit of ℒ X . Next, we will use this result to show ℒ C,X is approximately a Poisson likelihood.
Proof. Let ℒ X (C, X | p, ) denote the joint log likelihood of X from data collected according to a use-availability design with arbitrary probability function (e.g. logistic). It can be written as follows: From the Taylor expansions, we get log(p i + ψ) = log ψ + p i ψ −1 + O (ψ −2 ) and log (q + ) = log + q − 1 + O( − 2 ) for | p i | < | | and for | q | < | |, which leads us to: Let us define some sample averages of p i : We then have the following: Now, we plug-in = q(N − n 1 )n − 1 1 , to make explicit how goes to infinity as N goes to infinity: From here, we take the limit as N goes to infinity for fixed n 1 to get: By the law of large numbers p 0 → q, therefore: With this result, we can justify approximating ℒ C,X by ℒ C|X for large availability samples. We can create a similar approximation to ℒ C|X by a Poisson likelihood for large availability samples.

Proposition 2. A conditional Poisson likelihood
, approximates the Bernoulli condi- Proof. Let ℒ C|X,Bern (C, X | p, ) denote the joint likelihood of C conditional on X from data collected according to a use-availability design with arbitrary probability function (e.g. logistic). It can be written as follows: From the Taylor expansions log[ ( for | p i | < | |, we get: Finally, these two propositions can be combined to get an approximation for ℒ C,X by ℒ C|X,Poi for large availability samples.

Proposition 3. The Poisson likelihood
Proof. Follows directly from Proposition (1) and Proposition (2), as The implication of these propositions is that generalized nonlinear models for Bernoulli or Poisson distributed binary response data can be used for large background samples as an alternative to methods that involve the full likelihood L C,X . (X | p, ).

F I G U R E A 4
Species habitat use pseudo-probabilities from Model (5): p exp si . Black circles are observed species presences A PPE N D I X C

BAY E S I A N I N FE R E N CE
A fully Bayesian approach would use the full likelihood L C,X , as both L C|X and L X contain information about the parameters β, h and q.
When the size of the background sample is large, Proposition (1) allows us to take an approximate Bayesian approach that only relies on L C|X . This approximate Bayesian approach can be framed as a Bayesian generalized nonlinear model and implemented such as the brms package (Bürkner, 2017(Bürkner, , 2018 in R (R Core Team, 2019). The generalized nonlinear model approach is discussed in the main body of the text. In the next subsections will give some brief details of how to implement the fully Bayesian approach and propose some possible prior distribution for to be used in the approximate Bayesian analysis.

C.1 | A Markov chain Monte Carlo algorithm for fully Bayesian inference
The fully Bayesian approach uses the full joint likelihood L C,X . As noted previously, the likelihood is overparameterized. So instead of sampling from the conditional posterior of q during MCMC, we set q to its constrained value at each iteration. A rough outline of a Metropolistype algorithm for the fully Bayesian approach is as follows: Algorithm 1 First choose initial conditions h (0) , q (0) and (0) , and number of MCMC iterations T. Then, iterate the following: where the conditional posterior distributions are defined as follows: and where f(h) and f( ) are the prior distribution on h and β, respectively.

C.2 | Prior selection in
F I G U R E A 5 Scaled rank species habitat use pseudo-probabilities from Model (5) Their recommendation is to choose a Cauchy(0, 10) prior for 0,s , as F I G U R E A 6 By species scaled rank species habitat use pseudo-probabilities from Model (5)   with expected value n s,0 n − 1 s,1 where n s,0 and n s,1 are the sizes of the availability and use samples, respectively, for species s. This implies a kind of practical upper bound on what we would expect 0,s to be as log q s is strictly negative. As a result, some choices of symmetric real valued prior distributions centred at zero (e.g. Cauchy(0, 10) may end up putting too much prior probability on support above log n s,0 − log n s,1 .
To remedy this, we attempted to derive the implied prior distribution on 0,s given the rules recommended by Gelman et al. (2008): that the log-odds should follow a Cauchy(0, c) distribution. We reparameterized 0,s such that q s and h s were a function of their logodds s,q and s,h , respectively, to get the following: The prior distribution of s,0 is not closed form when s,q , s,h ∼ Cauchy(0, c) nor is it for Normal(0, c) or T df=7 (0, c). We could parameterize Model (6) in terms of s,q and s,h and let the Hamiltonian Monte Carlo algorithm used by the brm function perform the integration for us. This would mean the chains for s,q and s,h would not necessarily converge, but the chain for the computed quantity 0,s would. A faster alternative is to do Monte Carlo integration upfront to find an approximate prior distribution in terms of one we already know a bit about. When s,q and s,h are Cauchy, normal or T df=7 (0, c), then the s,0 is approximately Cauchy, normal or T df=7 (0, c), respectively. See Table A1 for the approximate priors based on these distributions with values for c = 2.5, 5, 10. Approximation based on 10 million Monte Carlo samples.

A PPE N D I X D OTH E R FO R M S O F s
As software easily allows for more complicated specification of log s than we have considered, we want to briefly discuss the implications of allowing log s to be a function of covariate information beyond species-specific intercept s,0 . Or rather, what alterations could we to make to the sampling scheme of a use-availability design for it to (7 make sense that log s should be a function of covariate information beyond species. As s is already composed of three quantities, q s , h s and (1 − h s ), we will first look at each individually then seek to combine when appropriate. We should note that fully developing these concepts is beyond the scope of this paper, but are of interest.
Lancaster and Imbens (1996) show that the distribution of the availability sample is simply f(x si ), while the distribution of the use sample is f(x si | y si = 1) which according to Bayes rule is P(y si = 1 | x si )f(x si )∕P(y si = 1) = f(x si )p si q − 1 s . From here, we can see that q s makes it into the model implicitly as a quantity that has been marginalized over x si . Therefore, it is by definition a quantity that should not be a function of environmental covariate information that is integrated over in this manner. Besides species, other design induced stratification such as temporal effects, block effects or organism specific effects if an organism within species was measured repeatedly (as might be the case for mobile species) could be included if appropriate under the study design.
As for h s , an often overlooked assumption of use-availability models is that for each species s the C s 's are a sequence of Bernoulli trials that determine whether to take a sample from the use data f(x s | y s = 1) or the availability data f(x s ). We have dropped the i subscripts for a reason-because the locations i are what is being sampled in these two scenarios. Thus, one cannot use spatially dependent environmental covariate information x to determine h s as the spatial locations are determined after the value of C s . Any covariate information used to determine h s would need to be linked to the spatial location sampled after the sampling occurs and cannot be spatially located a priori. The list of possibilities overlaps quite a bit with q s : stratification by species, organism within species, or block, or temporal effects, possibly others.
However, h s cannot be a function of treatment effects as they are not by definition a stratified variable (if they were, they would be called blocks).
Putting this all together, we can have our design induced covariate information and the associated location sampled z si = (z si1 , z si2 , . . . , z si1 ). We can define the following relationship: where s,0 is the species intercept and s = ( s1 , s2 , . . . , sK 2 ) are the species-specific covariates for the other effects.

A PPE N D I X E E X A M PLE G EN ER A LIZED N O N LI N E A R M O D EL CO D E
Example code we used to implement Model (6) on the California plant data using the brm function in the brms package is available in Figure A2.

DATA
In this section, we include some additional information about the analysis of plant data beyond prediction.

F.1 | Forest plots of species-specific habitat effects
Pairwise comparisons of environmental covariate across species were computed using the hypothesis function in brms package. In order for the hypothesis function to return Bayes factors, the sam-ple_prior = "yes" option has to be specified.
One way to compare parameter estimates across species is to use forest plots of the posterior means and 95% credible intervals.
The forest plots of s,0 , s,0 and s for each of the infraspecies are in Figure A3. Alternatively, credible intervals and Bayes factors could be computed for post hoc comparisons as shown in Table A4 for the species Acmispon glaber. In this example, the Bayes factors are comparing the hypothesis that the two varieties of Acmispon glaber share the same covariate effect versus the hypothesis that each of these varieties has its own effect. In this case, Bayes factors above 1 favour the first hypothesis and Bayes factors below 1 favour the second hypothesis. The Bayes factors can be converted to probabilities where values above 0.5 favour the first hypothesis and values below 0.5 favour the second. The only covariate for Acmispon glaber where the evidence favours no difference is minimum winter temperature (djf) with a posterior probability of 0.67. The evidence for the remaining covariates favours the hypothesis of variety specific covariate effects. For three of these covariates, precipitation of the warmest quarter (bio18), precipitation of the coldest quarter (bio19) and maximum summer temperature (jja), the estimated probability that the varieties share the same effect is 0.00. Figure A4 contains the estimated pseudo-probabilities p exp si for Model (5). Notice they do a very poor job spanning the full range of probabilities 0 to 1. This is due to the biased estimate for the intercept. In single species, we could rank and scale the pseudoprobabilities by the max rank so they are between 0 and 1 as in Figure A5. Notice that for taxa such as Eriogonum fasciculatum var.

F.2 | Pseudo habitat use probabilities and their ranks
polifolium the scaled ranks are high over the entire region. This is an artefact of inappropriately ranking across all taxa when the ranks are not comparable across taxa. Instead, it is better to both rank pseudo-probabilities and scale the ranks within species as in Figure A6. This gives a more realistic picture of which habitat is suitable within taxa, but not across taxa.