## Introduction

The coalescence process can be decomposed into two independent processes: the jump chain of the coalescent process, that is, the topology of the gene genealogy; and the sequential process of intercoalescence times (Kingman, 1982). The latter is well studied for the standard coalescent process in a constant-sized population with infinite-site mutations. It is known that the waiting times between coalescent events () are independent and exponentially distributed, and that mutations occurring along the genealogy follow a Poisson process. For gene genealogies in a population with temporally varying size, the distributions of ’s are no longer mutually independent. Griffiths & Tavare (1994a) provided a general equation for time-varying populations and proposed an importance sampling algorithm for calculating the likelihood function for haplotypes sampled therein. Wooding & Rogers (2002) and Polanski et al. (2003) derived analytical equations for the marginal distributions of coalescence times in populations of time-varying sizes. Statistical methods based on their results were developed and applied to infer population bottlenecks and growth rate.

The gene genealogy that is discussed in this paper is different from previous studies. In the above studies, when tracing back in time, gene genealogies of current haplotypes eventually reached their most recent common ancestor (MRCA). However, if we consider the scenario that at some time *T* in the past, a population or demographic event occurred, and given the event time *T*, the lineages may or may not have reached their common ancestor. When the lineages have not reached their common ancestor by time *T*, the gene genealogy is “incomplete,” and is referred to as a “truncated genealogy” (TG) in this paper. For a TG, the distribution of its intercoalescence times conditional on the truncation time *T*, is different from the unconditional distribution of complete genealogies in classic coalescent theory, but is seldom investigated in the literature. Blum & Rosenberg (2007) derived the conditional distribution of , and used a rejection sampling algorithm developed from the distribution to infer ancient lineage number. Chen (2012) derived the marginal distribution of intercoalescence times conditional on a TG for populations of constant size and populations whose size only changes instantaneously, the so called *n*-epoch model. The derived marginal distribution was subsequently used for analytically deriving the allele frequency spectrum (AFS, also known as the site frequency spectrum in the literature) and the joint allele frequency spectrum (JAFS). However, the marginal distribution for intercoalescence times was not addressed in a population with deterministically and continuously time-varying size, which is more realistic than populations with size jumping only at some discrete time points. For example, the simple exponential growth model is a commonly used approximation for modern human demography (Slatkin & Hudson, 1991; Marjoram & Donnelly, 1994; Di Rienzo et al., 1998; Wall & Przeworski, 2000; Adams & Hudson, 2004; Voight & Pritchard, 2005). To infer the demographic history without the error of model misspecification, it is necessary to incorporate populations of temporally varying sizes into the model.

The work presented in this paper follows the theoretical framework of Chen (2012), but is generalized for populations with deterministically time-varying sizes, specifically, for populations under exponential growth. It is an essential step to derive the AFS summarizing the genetic polymorphism pattern, which afterwards can be used to construct various methods for statistical inference in nonequilibrium populations. After we obtain the distribution and expectation of time lengths of truncated genealogies in temporally varying populations, we emphasize the potential applications of the theoretical results to population genetic inference in nonequilibrium populations. We present two applications that use the intercoalescence time distribution: inferring the rate and onset time of population exponential growth; and inferring the number of ancient lineages or founding lineages of samples from the current population at a specific ancient time.

Human populations experienced frequent bottlenecks and growth. In recent years, with the advent of genomic sequencing data from various populations, it has been a research hotspot to construct demographic history from genetic data (Novembre et al., 2008; Gutenkunst et al., 2009; HUGO Pan-Asian SNP Consortium, 2009; Reich et al., 2009; Tishkoff et al., 2009; Li & Durbin, 2011; Gronau et al., 2011; Reich et al., 2012). The result presented in this paper can be a useful component for constructing a coalescent likelihood in order to infer demographic history with a temporally varying population size. As an illustration, we elaborate on a three-parameter model for a single population undergoing two stages of growth: a period with constant population size followed by an exponential growth. The three parameters of interest are the onset time, the rate of exponential growth, and the ancient population size. More complicated demographic models for multiple populations can also be built to derive the joint allele frequency spectrum as similarly done in Chen (2012), and the resulting JAFS can then be used to infer the demographic history of multiple populations on a fine scale (Gutenkunst et al., 2009; Lukic et al., 2011; Chen, 2012).

Inferring the number of ancient lineages of a contemporary sample or population is another topic of great interest in population genetics and ecological studies (Hey, 2005; Anderson & Slatkin, 2007; Blum & Rosenberg, 2007; Leblois & Slatkin, 2007). Estimation of the founding lineage number at a specific ancient time, especially during a bottleneck or founding history, helps to elucidate ancient admixture, founder effect, their effects on genetic diversity of modern population, and the presence of Mendelian diseases in isolated populations (Risch et al., 2003), as well as helps to understand species invasion in ecological studies (Dlugosch & Parker, 2007). The existing methods are developed under the coalescent framework and require computationally intensive approaches, such as importance sampling (Anderson & Slatkin, 2007), coalescent simulation (Leblois & Slatkin, 2007) and rejection sampling (Blum & Rosenberg, 2007). Compared to the existing methods, the new method for inference of lineage number presented in this paper benefits from the analytical form of the intercoalescence time distribution, and thus gains computational efficiency.

We further applied the methods to two real data sets: (1) the resequencing data of 20 European haplotypes from the NIEHS Environmental Genome Project (EGP) to infer the rate and onset time of population growth in West Europeans; (2) 31 Tibetan mitochondrial genomes to infer the number of ancient lineages in the late Paleolithic age. We expect the methods developed in this paper to be useful tools for population genetic inference with the coming flux of large-scale genomic data of human populations and other species.