#### Estimating dispersal distance

Data were compiled from the literature for median natal dispersal distance and body mass for a range of bird species. We chose to use natal dispersal rather than breeding dispersal because natal dispersal distances tend to be longer and are therefore more likely to be of interest in metapopulation and fragmentation studies (Greenwood & Harvey 1982; Dawideit *et al.* 2009). In total, data were collected for 84 bird species across 12 orders from five studies (See Table S1, Supporting Information). Where multiple estimates were available for a species (including sex-specific estimates), these were recorded individually. Body mass estimates were taken from the study by Dunning (1993) or, where quoted, Sutherland *et al.* (2000).

In addition to body mass, it is likely that dispersal ability will be influenced by wingspan or length (Lurz *et al.* 2002; Dawideit *et al.* 2009). It is assumed that, on average, birds with a larger wingspan will be able to travel further. However, wingspan, *S*, is likely to scale with body mass, *m*, such that for every unit increase in body mass, wingspan should increase to the power of three if varying isometrically (Schmidt-Nielsen 1984). As such, we incorporated wingspan into a ‘shape’ parameter () to investigate the influence of variation in wingspan not accounted for by changes in body mass. Wingspan estimates were collated from published data sources (Table S1, Supporting Information). Where wing length (*L*) but not wingspan estimates were available, wingspan was estimated from a regression of wingspan as a function of wing length for species for which both were provided (*r*^{2 }=^{ }0·98):

We expected dispersal distance to be greater for females than males (Greenwood & Harvey 1982), so sex (male, female or combined when the data were pooled across sexes) was included as an explanatory variable. Where possible, sex-specific and combined-sex natal dispersal estimates were matched to corresponding male, female or adult-combined information on body mass and wingspan.

Feeding guild is known to influence territory size and may therefore affect dispersal distance (Schoener 1968). The assignment of feeding guild can be subjective and, as such, random effects about a central mean for feeding guild were considered for the intercept as well as slope interactions with body mass and shape. Where the classification of feeding guild is uncertain, the common mean can be used. Each species in the modelling data set was classified as a vertivore (carnivore that eats vertebrates), insectivore (carnivore that eats invertebrates), herbivore or omnivore.

We considered a subset of possible models, focussing on the three regression parameters (sex, mass and shape) and potential variation in the latter two because of feeding guild. To account for further variation in natal dispersal distances, we examined random effects on the intercept for species and taxonomic order (Table 1).

Table 1. Candidate models of the relationship between median natal dispersal distance, *D*, and explanatory variables sex (*s*), body mass (*m*) and wing shape (*sh*). ln(D) is the natural logarithm of the median natal dispersal distance, α is the intercept of the linear predictor, β_{S}, β_{M} and β_{W} are the coefficients for the effect of sex, body mass and wing shape, respectively, and ε_{sp}, ε_{ord} and ε_{g} are random effects for species, taxonomic order and feeding guild. Square brackets indicate a categorical variable, and parentheses indicate a continuous variable. Evaluation statistics after 200 000 iterations are also presented. The best model is highlighted in bold. DIC is the deviance information criterion (pD refers to the effective number of parameters estimated by each model), *r*^{2} is the proportion of variation in *D* explained by each model using the model-building data set, and *r* is the Pearson correlation between the predicted and observed dispersal distances using an independent test data set Model | DIC (pD) | Fitted Relationship (Model-building data) Slope, Intercept, *r*^{2} | Predicted Relationship (Test data) *r*, |
---|

1. ln(*D*) = α + β_{S}[*s*] + β_{M} ln(*m*) + β_{W} ln(*sh*) + ε_{sp} | 729·3 (59·7) | 1·08, 0·18, 0·30 | 0·58 |

**2. ln**(*D*)** = α**[ε_{g}]* *+ β_{S}[*s*]** + β**_{M}**ln**(*m*)** + β**_{W}**ln**(*sh*)** + **ε_{sp} | **693·4** (**20·2**) | **1·00, −0·02, 0·43** | **0·70** |

3. ln(*D*) = α + β_{S}[*s*] + β_{M}[ε_{g}] ln(*m*) + β_{W}[ε_{g}] ln(*sh*) + ε_{sp} | 712·3 (48·6) | 1·06, −0·14, 0·40 | 0·54 |

4. ln(*D*) = α + β_{S}[*s*] + β_{M} ln(*m*) + β_{W} ln(*sh*) + ε_{sp}* *+ ε_{ord} | 731·4 (57·2) | 1·52, 1·31, 0·27 | 0·60 |

#### The dispersal model

Median natal dispersal distances were assumed to be log-normally distributed and related to the explanatory variables according to a linear regression function with the general form:

where *D*_{i} is the *i*th recorded median natal dispersal distance, α is the intercept, β_{k} are the regression coefficients for *K* explanatory variables *x*_{k}, and ε_{n} are *N* random effects. The full set of candidate models is presented in Table 1. The general form of the model was modified to allow for a random effect of guild on the slopes (Table 1, Model 3). The explanatory variables body mass and shape were transformed logarithmically to improve linearity.

Models were run in OpenBUGS version 3.1.0, a freely available statistical software package for conducting Bayesian analyses using Markov chain Monte Carlo (MCMC) methods (Lunn *et al.* 2009). We used vague prior distributions for α, β_{k} and ε_{n} to ensure that the posterior distributions for these parameters were dominated by the data. Prior distributions for parameters α and β_{k} were specified as normal with a mean of zero and a standard deviation of 1000. Prior distributions for species and taxonomic order random effects were specified as normal with a mean of zero and a standard deviation to be estimated from the data. The prior distributions for the standard deviation of the species and order random effects were uniform with a minimum of zero and maximum of 100.

Guild random effects were assumed to vary around a common mean drawn from a normal distribution with a mean of zero and a standard deviation to be estimated from the data. The U(0,100) priors specified for the standard deviation amongst species and taxonomic orders can lead to heavy right tails – and therefore overestimates – in the posterior distribution for the standard deviation where the number of groups is small. As such, we used a weakly informative half-Cauchy prior for the standard deviation amongst feeding guilds, of which there were only four (Gelman 2006). The full model description and code can be found in the Supporting Information. To ensure convergence, we sampled from multiple (two) MCMC chains. In all cases, models had converged within 10 000 MCMC samples, and posterior estimates were taken from 200 000 MCMC samples after discarding the first 10 000.

Candidate models were assessed using the deviance information criterion (DIC: Spiegelhalter *et al.* 2002), by comparing the fitted and observed dispersal distances, and by comparing the predictions with median dispersal distances from an independent test data set, compiled from a second search for dispersal information after the initial modelling had taken place. This included data that were excluded from the original data set because wingspan information was missing but where we subsequently found these data. Body mass estimates for the test set were taken from the study by Dunning (1993), and wingspan and diet information were collected in an internet search (see Table S2, Supporting Information). The test set comprised 22 observations across 15 species and included representatives from the four feeding guilds identified above. We compared the observed natal dispersal distances with those predicted by each of the candidate models.

#### Demonstrating the utility of prior information obtained from the general model

Estimates of median natal dispersal obtained from the general model may be used as *a priori* information in a number of ways. They can be formally incorporated into a Bayesian analysis in the form of an informative prior; in this case as a prior distribution for the median dispersal distance of a species, the parameters of which are estimated by the general model. Alternatively, the general dispersal model can be used to produce *a priori* estimates of median natal dispersal distance for species for which no other dispersal data are available, for use as a covariate for modelling species occupancy or responses to environmental changes. This is equivalent to a Bayesian analysis in which we have no covariate data, only prior information. We demonstrate both uses here.

To demonstrate the use of the predicted dispersal distances as informative Bayesian priors, information (raw data or probability distribution parameters) on median natal dispersal distances is required. We used the data in Paradis *et al.* (1998) to represent the range of body masses, wingspans, dispersal distance estimates, variances and sample sizes that would exist in typical data sets. These were used to demonstrate the influence of informative priors on natal dispersal distances. We chose four species (European turtle dove *Streptopelia turtur*, northern goshawk *Accipiter gentilis*, willow tit *Parus montanus* and grey wagtail *Motacilla cinerea*) whose original dispersal estimates were obtained from a range of sample sizes (*n = *4, 9, 14 and 20, respectively). For each species, we had information on the arithmetic mean (AM) and standard deviation (SD) of the natal dispersal distances. When the data and the prior are distributed normally, the posterior will also have a normal distribution. In this case, it is relatively straightforward to calculate the mean and the variance of the posterior distribution based on estimates of the mean and variance of the data and prior (McCarthy 2007). We assumed that natal dispersal distances (*d*) were distributed log-normally, so ln(*d*) had a normal distribution. The mean, μ_{data}, and standard deviation, σ_{data}, of the corresponding normal distribution are equal to ln(AM) – 0·5ln(*c*) and ln(*c*), respectively, where *c* is equal to AM^{2}/SD^{2 }+^{ }1. We used WinBUGS to estimate μ_{prior} and σ_{prior}, the mean and standard deviation of the corresponding normal distribution of the predicted natural logarithm of median natal dispersal distance, for each of the four species. We then estimated the mean and variance of the posterior distribution (μ_{post} and σ^{2}_{post}) according to McCarthy (2007):

The posterior distributions were then back-transformed to be expressed in units of km. The posterior distribution is a weighted average of the data and the prior, and so including predictions of dispersal distance from the general model as informative priors will reduce the variance (increase the precision) of the posterior distribution. The degree by which precision increases depends on how informative the prior is relative to the information content of the data.

To demonstrate the use of predicted dispersal distances as *a priori* estimates where no other dispersal information is available, we investigated the relationship between predicted dispersal distance and response to habitat fragmentation in woodland bird species in northern Victoria, Australia. Radford & Bennett (2007) investigated the effect of landscape change on the incidence of woodland bird species in 24 agricultural landscapes. Their study provides a good opportunity to evaluate the application of predicted priors for investigating relationships between dispersal distance and response to habitat fragmentation. In a study area covering 20 500 km^{2} of agricultural–woodland mosaic, they selected 24 landscapes, each 10 × 10 km, to represent a gradient in remnant tree cover and to contrast landscapes in which tree cover was ‘aggregated’ with those in which tree cover was ‘dispersed’(Radford & Bennett 2007). Pairs of landscapes were chosen that had similar tree cover but contrasting aggregation. In each landscape, 10 survey sites were established in remnant wooded vegetation. Three sites were allocated to riparian vegetation and the remaining seven distributed amongst large (>40 ha) remnants, small (<40 ha) remnants, roadside vegetation and scattered farmland trees according to the proportional representation of each category in the landscape (Radford & Bennett 2007). Species presence was recorded during four 30-min bird surveys conducted along a 400 m line-transect at every site. All species heard or seen during the allocated survey time were recorded as present. Each of the 240 sites was surveyed twice in the breeding season and twice in the nonbreeding season (Radford & Bennett 2007).

We constructed prevalence models for 57 bird species considered to be woodland dependent. As in the original study, each landscape represents a single sampling unit (*n *=* *24) and the incidence of each species in each landscape is the response variable. In this case, the incidence of each species – or the number of surveys in which the species was present – is the realization of 40 Bernoulli trials, each with a probability, *p*, the proportion of sites in the landscape where the species is observed, which we refer to as prevalence. Our aim was not to build the best possible prevalence model; rather, we wanted to build a model whose outcome would allow us to assess any relationship between predicted dispersal ability and response to fragmentation. Habitat aggregation is more likely to capture the difference in distance between patches than habitat cover *per se* and was therefore the best choice of variables available to us. We constructed a model that relates *p*_{ij}, the prevalence of species *i* in landscape *j*, to the aggregation (*agg*_{j}) of tree cover using the logit link (Agresti, 1996):

where κ_{i} and γ_{i} are the intercept and regression coefficient for species *i*, and *Y*_{ij} is the observed number of presences of species *i* in landscape *j*, and η_{j} and φ_{ij} are random effects; η_{j} represents additional variation between landscapes and φ_{ij} extra-binomial variation between species and landscapes. The parameters η_{j}, φ_{ij} and κ_{i} were each assumed to have been drawn from a normal distribution with a mean and standard deviation to be estimated. The prior distribution for each mean was specified as normal with a mean of zero and a standard deviation of 1000. Prior distributions for the standard deviations were specified as uniform, ranging between 0 and 100.

Estimates of γ_{i} can be used to infer the strength of the influence of tree cover aggregation on the presence of each species in the study area. Values of greater magnitude indicate a stronger influence than those values close to zero. Comparison of values of γ_{i} and predicted dispersal distance for each species allows inference about the relationship between predicted dispersal ability and sensitivity to fragmentation of tree cover for woodland-dependent bird species in northern Victoria. We constructed a hierarchical model in which the value of γ_{i} depends on the predicted dispersal distance of species *i*:

where θ is the intercept, is the predicted median dispersal distance for species *i*, ζ_{i} is a random effect term describing variation in the response of species *i* to aggregation, γ_{i}, not explained by , and δ is the slope of the relationship between median dispersal distance and response to increasing aggregation of tree cover in the landscape. Uninformative priors (mean = 0 and standard deviation = 1000) were specified for δ and θ. The prior for ζ_{i} was specified as normal with a mean of zero and standard deviation to be estimated from the data. Dispersal distances were predicted using the best dispersal model, parameterized on a combined data set including the initial modelling data as well as the test data. The standard deviation of posterior estimates of median dispersal distance from the dispersal model was constant across species (average = 0·79). To include uncertainty in dispersal estimates in this analysis, was drawn from a normal distribution with a mean equal to the predicted median dispersal distance for species *i* and a common standard deviation of 0·79. Sex was unspecified in dispersal predictions.

When compared with the global data set, the precision of these median dispersal distance predictions can be expressed in terms of the effective sample size, *n*. If the coefficient of variation in the data is expressed as:

and the coefficient of variation of the predicted median dispersal distances is expressed as

then, assuming a common mean and standard deviation,

and

In our study, the coefficient of variation in the data and predictions was greater for shorter dispersal distances. The relationship was such that the natural log of the coefficient of variation and the natural log of the median dispersal distance were negatively, linearly correlated. This means that the effective sample size of the priors is highest for short dispersal species and lower for long distance dispersers. The slope of this relationship was different for our predictions and the published data. Allowing for the different relationships between CV and dispersal distance, we modelled the effective sample size provided by the Bayesian priors for a range of dispersal distances in OpenBUGS and estimated the effective number of observations across all 57 species. Detailed workings and code are provided in the Supporting Information.