## Introduction

Effective population size (*N*_{e}) is widely regarded as one of the most important parameters in both evolutionary biology (Charlesworth 2009) and conservation biology (Nunney and Elam 1994; Frankham 2005), but it is notoriously difficult to estimate in nature. Logistical challenges that constrain the ability to collect enough demographic data to calculate *N*_{e} directly have spurred interest in genetic methods that can provide estimates of this key parameter, based on measurements of genetic indices that are affected by *N*_{e} (reviewed by Wang 2005). Although some early proponents suggested that indirect genetic estimates of *N*_{e} would only be useful in cases where the natural population was so large it could not be counted effectively, it was subsequently pointed out that these methods have much greater power if population size is small. Indeed, the rapid increase in applications in recent years has been fueled largely by those interested in conservation issues or the study of evolutionary processes in local populations that often are small (Schwartz et al. 1999, 2007; Leberg 2005; Palstra and Ruzzante 2008).

Estimates of contemporary effective size (roughly, *N*_{e} that applies to the time period encompassed by the sampling effort) can be based on either a single sample (Hill 1981; Pudovkin et al. 1996) or two samples (Krimbas and Tsakas 1971;Nei and Tajima 1981). The two-sample (temporal) method, which depends on random changes in allele frequency over time, has been by far the most widely applied, and it was the only method considered in a recent meta-analysis of genetic estimates of *N*_{e} in natural populations (Palstra and Ruzzante 2008). This is a curious result, given that every temporal estimate requires at least two samples that could each be used to provide a separate, single-sample estimate of *N*_{e}. Furthermore, whereas the amount of data used by the temporal method increases linearly with increases in numbers of loci (*L*) or alleles (*K*), the amount of data used by the most powerful single-sample estimators increases with the square of *L* and *K.* This suggests that, given the large numbers of highly polymorphic molecular markers currently available, there is a large, untapped (or at least under-utilized) resource that could be more effectively exploited to extract information about effective size in nature.

Toward that end, in this study we evaluate precision and bias of the original single-sample method for estimating *N*_{e}– that based on random linkage disequilibrium (LD) that arises by chance each generation in finite populations (Laurie-Ahlberg and Weir 1979; Hill 1981). In the moment-based LD method, accuracy depends on derivation of an accurate expression for the expectation of a measure of LD () as a function of *N*_{e}. As *r*^{2} is a ratio, deriving its expected value is challenging, and the original derivation that ignored second-order terms was subsequently shown to lead to substantial biases in some circumstances (England et al. 2006). An empirically derived adjustment to *E*() (Waples 2006) has addressed the bias problem, but the bias correction was based on simulated data for diallelic gene loci and did not consider precision in any detail. Although is a standardized measure of LD, the standardization does not completely remove the effects of allele frequency (Maruyama 1982; Hudson 1985; Hedrick 1987). Therefore, it is necessary to evaluate more rigorously the LD method using simulated data for highly polymorphic markers (now in widespread use) that include many alleles that can drift to low frequencies. Specifically, we ask the following questions:

- • How is precision affected by factors under control of the investigator (
*L*,*K*, number of individuals sampled) and those that are not [true (unknown)*N*_{e}]? - • What effect do rare alleles have on precision and bias?
- • What practical guidelines can help balance tradeoffs between precision and bias?
- • Under what conditions can the LD method provide useful information for practical applications? If
*N*_{e}is small, how often does the method mistakenly estimate a large*N*_{e}? If*N*_{e}is large, how often does the method mistakenly estimate a small*N*_{e}? - • What kind of performance can we expect when data consist of a very large number of diallelic, single-nucleotide-polymorphism (SNP) markers?
- • How does performance of the LD method compare to other methods for estimating contemporary
*N*_{e}?