The implications of small stem cell niche sizes and the distribution of fitness effects of new mutations in aging and tumorigenesis

Abstract Somatic tissue evolves over a vertebrate's lifetime due to the accumulation of mutations in stem cell populations. Mutations may alter cellular fitness and contribute to tumorigenesis or aging. The distribution of mutational effects within somatic cells is not known. Given the unique regulatory regime of somatic cell division, we hypothesize that mutational effects in somatic tissue fall into a different framework than whole organisms; one in which there are more mutations of large effect. Through simulation analysis, we investigate the fit of tumor incidence curves generated using exponential and power‐law distributions of fitness effects (DFE) to known tumorigenesis incidence. Modeling considerations include the architecture of stem cell populations, that is, a large number of very small populations, and mutations that do and do not fix neutrally in the stem cell niche. We find that the typically quantified DFE in whole organisms is sufficient to explain tumorigenesis incidence. Further, deleterious mutations are predicted to accumulate via genetic drift, resulting in reduced tissue maintenance. Thus, despite there being a large number of stem cells throughout the intestine, its compartmental architecture leads to the accumulation of deleterious mutations and significant aging, making the intestinal stem cell niche a prime example of Muller's Ratchet.


Evolution in somatic tissue
The epithelial tissues within many animals are continually replenished by populations of stem cells that divide throughout the organism's lifetime. For instance, the epithelial lining of the intestinal tract is replaced weekly by millions of independent populations of stem cells located in intestinal crypts [reviewed in Barker (2014)]. This continual division provides an opportunity for mutation, resulting in the accumulation of mutant lineages and somatic evolution (Lynch 2010). Stem cell lineages with decreased fitness, or a diminished ability to divide and survive, will represent a failure in this tissue renewal process and the aging of tissues and multicellular organisms as a whole (L opez-Ot ın et al. 2013;Moskalev et al. 2013). Lineages with increased fitness, or faster division rates and an increased propensity to survive, will result in the accumulation of cells and neoplasia (Merlo et al. 2006). Although considered premalignant at the onset, the accumulation of cells into a polyp, in which cells continually divide and accumulate subsequent mutations, can develop a cancerous phenotype over time (Winawer 1999).

Distribution of fitness effects
The effect that a new mutation will have on an individual's fitness can be characterized by a distribution of fitness effects (DFE). The DFE of several organisms have been experimentally estimated using mutation accumulation experiments or directed mutagenesis experiments in the laboratory (Eyre-Walker and Keightley 2007;Halligan and Keightley 2009). The majority of random mutations to a genome that affect fitness have a deleterious effect on fitness, while a small subset increase fitness (Eyre-Walker and Keightley 2007). Additionally, many mutations that affect fitness have a small effect, while few have a large effect. In general, both beneficial (Imhof and Schlotterer 2001;Orr 2003;Kassen and Bataillon 2006) and deleterious (Elena et al. 1998) mutational fitness effects can be described well using an exponential distribution. We note that certain beneficial DFE may not be exponentially distributed, and are better classified as having either a bounded or heavier-than-exponential tail (Rokyta et al. 2008;Bank et al. 2014;Levy et al. 2015), and compound distributions or distributions with more parameters may better fit empirical measures of DFE (Sanju an et al. 2004).
By understanding the mutational DFE in somatic tissue, we can predict the evolutionary trajectories of tissues within multicellular organisms as they age. Absolute fitness is typically measured as the reproductive success of a genetically identical lineage, which can be measured empirically as the growth rate of a population and interpreted ecologically as the death rate of individuals in a population subtracted from the birth rate. Within stem cell populations, this is analogous to the differentiation rate of the stem cell lineage subtracted from the division rate. However, the total growth rate of the healthy stem cell population is necessarily zero to insure tissue homeostasis. As we describe in the next section, the stem cells exist in two populations: a static niche population with cells that are undergoing division and migrating into the second population, containing cells that are undergoing division and differentiation. Therefore, although only mutations to stem cell division rate would confer a change in selection pressure within the stem cell niche, fixed mutations to both division and differentiation rate will alter the rate of growth of the total stem cell lineage, the expected size of the stem cell population as a whole, and contribute to the probability of a tumorigenesis event. Hence, mutations to division and differentiation rate affect the reproductive success of stem cell lineages, that is, fitness, and we consider distributions of mutational effects on these two rates in this work. Although there has been no direct measurement of the distribution of fitness effects in somatic tissue [but see Vermeulen et al. (2013), Snippert et al. (2014) for estimations of the selective advantage for some known cancer drivers], the evolution of cancer progression has been previously modeled using discrete (Beerenwinkel et al. 2007;Bozic et al. 2010;McFarland et al. 2013) and continuous (Foo et al. 2011) fitness effects. Here, we differ from these previous models by investigating different mutational effect frameworks using parameters derived from whole organisms to explore mutation accumulation in crypts initialized at their measured healthy size in mice and humans and quantify both aging and tumorigenesis.
When quantifying tumor incidence, we are concerned with the moment that the regulatory regime in the intestinal crypt breaks down: when the stem cell division rate exceeds its differentiation rate. We call this point the tumorigenesis threshold. The resulting population will accumulate stem cells without bound, which is thought to be the cause of crypt fission and the main mechanism of polyp or adenoma growth (Loeffler and Grossmann 1991;Wong et al. 2002). We investigate the full spectrum of deleterious and beneficial mutational effects on the progression of a healthy crypt to tumor initiation using empirically measured rates of division.
The evolution of multicellularity has necessitated the evolution of regulatory systems that hold somatic stem cells at a relatively low fitness (when compared to their maximum potential) in order to ensure the cooperation of the different cellular systems constituting a whole organism. As such, in addition to the beneficial mutations that would be accounted for by a typical DFE for whole organisms [which are commonly assumed to already be highly fit (Orr 2010)], we expect mutations of large effect in somatic tissue as regulatory processes become dysfunctional, such as the deactivation of tumor suppressor genes or the activation of oncogenes. It is reasonable to hypothesize that a heavy-tailed distribution could better classify mutational effects that have a beneficial effect in somatic stem cells by capturing both the mutations of small effect and also having a nontrivial probability of capturing the mutations of large effect often associated with cancer.
We evaluate whether or not the DFE estimated in whole organisms can explain known tumor incidence in the intestine. Further, we explore whether or not tumor incidence is better explained by a heavy-tailed distribution for mutations beneficial to fitness. Thus, we create a model of an evolving intestinal stem cell pool and implement alternate DFE and compare the resultant incidence curves to known tumor incidence curves.

Crypt population structure
The base of each intestinal crypt harbors a population of symmetrically dividing cells expressing markers associated with the stem cell phenotype (Lopez-Garcia et al. 2010;Snippert et al. 2010). Within this population, there exists a subpopulation niche that is responsible for maintaining tissue homeostasis (Kozar et al. 2013;Vermeulen et al. 2013). We model the stem cells of the intestinal crypt as two populations of cells, the first being this stem cell niche, which consists of a fixed population of stem cells, N, and the second consisting of the stem cells displaced from this niche but not yet committed to differentiation. The sum of these populations represent the total number of stem cells within the crypt, N T . In our model, cells within the niche divide at rate k and displace their neighbors through overcrowding, as proposed by Lopez-Garcia et al. (2010) and revealed by in vivo live imaging by Ritsma et al. (2014). This population of cells experiences genetic drift and selection; cells that have a higher division rate are more likely to push their neighbors out of the niche [as demonstrated by Snippert et al. (2014)] and cells with lower division rates are more likely to be displaced. Mutations may occur at division with mutation rate l and result in either a lineage with a new division rate or a lineage with a new rate of committing to differentiation. Displaced stem cells divide at the rate of their progenitor cells in the niche and commit to differentiation at rate m, hereafter referred to as the differentiation rate. We assume that once a lineage commits to differentiation, it is destined to be expelled from the crypt. We define tumorigenesis in the crypt as the moment a lineage of stem cells with a division rate greater than its differentiation rate has become fixed in the niche, resulting in exponential population growth. We note that, although a stem cell's propensity to commit to differentiation in healthy tissue is partially dependent on external signaling queues, such as Wnt signals from Paneth cells in the small intestinal crypt stem cell niche (Clevers 2013), the ability of a stem cell to interpret and respond to, or even gain independence from, external signals is an intrinsic and heritable property of the stem cell (Reya and Clevers 2005). The parameters k, N, and N T have been previously estimated (Kozar et al. 2013;Vermeulen et al. 2013), and we calculate m according to eqn 4 in the Appendix, where k and y is the average number of stem cells outside of the niche.

Distribution of fitness effects
We first describe our model of mutations that affect the division rate of stem cells and address mutations that affect differentiation rate later in section "Mutations that alter the differentiation rate of stem cells result in rapid aging and tumorigenesis" When mutations occur, the new division rate is greater than the previous rate with probability P B , and the mean positive change of rate is s þ . We consider positive and negative changes that are exponentially distributed for deleterious effects and exponentially or Pareto distributed for beneficial effects, see the Appendix. The mean negative change is s À . We define the exponential DFE in eqn 1 and the power-law DFE in eqn 2.
The power-law distribution is well defined if a > 1 and is considered to be heavy-tailed (having infinite variance) if 1 < a < 3.

Selection assumptions
We are concerned with the mutations that arise and reach fixation within the stem cell niche. Due to drift, all stem cells with the same division rate as the background population have an equal probability of reaching fixation, commonly referred to as neutral drift dynamics (Lopez-Garcia et al. 2010;Snippert et al. 2010). Following Wodarz and Komarova (2005), we use a Moran model to estimate the probability that a mutant lineage fixes in the stem cell niche: where N is the number of cells in the niche. The mutation rate is low relative to the division rate, so we assume that there are at most two competing division rates at any given time.
Using the above formula (3), we can use Bayes' theorem to compute the probability density Uðkjk old Þ of a new fixed division rate k given that the previous division rate is k old : As described above, tumorigenesis occurs when the division rate k is greater than differentiation rate m and we define the point at which this happens to be the tumorigenesis threshold. In our modeling framework, each new fixed mutation presents a new possibility that the division rate exceeds the threshold for tumorigenesis. From (4), we can iteratively derive the sequence of functions ff n g that represent the density of the distribution of the stem cell division rates conditioned that n mutations have fixed in the stem cell niche and tumorigenesis has not occurred as of mutation nÀ1. If we let k 0 denote the initial stem cell division rate, then f 1 ðkÞ ¼ Uðkjk old ¼ k 0 Þ. For each n, let p n denote the probability that tumorigenesis occurs due to the nth mutation (given that n mutations have occurred). Then, From there, we can write the recursive formulae From this, we have a recursive formula for the probabilities, fq n g, that tumorigenesis has not occurred given n fixed mutations: q 1 ¼ 1 À p 1 and To translate this result to an individual's lifetime, we model the time-dependent arrival of new mutations as a Poisson process with fixed rate parameter l mutations per cell division. We keep track of the time-dependent number M(t) of mutations that fix in the stem cell niches by time t. Then, using T to denote the time that tumorigenesis occurs in a given crypt, we can write the probability that tumorigenesis has not occurred as of time t by the equation Depending on the species, an individual has hundreds of thousands or even millions of crypts. The probability that an individual has at least one crypt that has undergone tumorigenesis can be calculated by considering the distribution of fixed mutations that have accumulated among the individual's crypts and the probability that these mutations result in tumorigenesis. This can then be extrapolated to the incidence rate of tumors among a population of individuals ( Fig. 1). Let T represent the time that tumorigenesis first occurs in any of an individual's crypts. We use the following estimate to calculate tumorigenesis incidence data reported in the Results section. In the Supporting Information, we describe the full calculation and a few simplifying assumptions we make to develop a computationally tractable model.
In the above, C is the number of crypts in the length of intestine being investigated, n max is the maximum number of mutations simulated, and l ¼ Nlk 0 R 1 0 p fix ðk; k 0 Þmðk; k 0 Þdk, where, as above, N is the number of cells in the stem cell niche, l is the mutation rate per cell division, and p fix is defined by eqn 3.

Parameter choices
Some estimates of crypt dynamics parameters have shifted over time, for example, the stem cell division rate in the mouse was formerly thought to be once every 1-1.5 days (Lopez-Garcia et al. 2010), but more recent estimates indicate they divide once every 3-10 days (Kozar et al. 2013). Kozar et al. (2013) demonstrated that the division rate of stem cells in the stem cell niche of mice varied from approximately 0.1 to 0.2 to 0.3 divisions per day along the proximal small intestine, distal small intestine, and colon, respectively. Likewise the estimated number of stem cells within the mouse stem cell niche varies from approximately five to six to seven, respectively. The total number of cells in crypts expressing stem cell markers has been reported to be 14-16 in mice (Lopez-Garcia et al. 2010;Snippert et al. 2010;Clevers 2013). For the analysis of our mouse model, we chose the middle value of these parameter ranges, a crypt with 15 total cells expressing stem cell markers, with 6 of the cells constituting the stem cell niche dividing 0.20 times per stem cell per day. To estimate the differentiation rate of stem cells outside the stem cell niche, we used a continuous time Markov chain, described in the Appendix. According to this model, in order for the total stem cells in the crypt of a mouse to stay at a constant population size, the differentiation rate of stem cells outside of the stem cell niche must be 0.333 per stem cell per day. The parameters associated with crypt dynamics in mice have been well described; however, we were unable to obtain any data on population incidence of intestinal polyps or tumors in wild type mice. On the other hand, while crypt dynamics in humans have not been as well studied, there exist incidence data for large intestine polyps (Chapman 1963). To parameterize the human colon crypt system, we considered a few sources. Nicolas et al. (2007) analyzed the methylation patterns within the human colon crypt and their Bayesian analysis suggests a posterior density mode between 15 and 20 stem cells maintaining homeostasis and constituting the stem cell niche within the crypt. Their posterior density provides more support for numbers of stem cells larger than this mode than for numbers smaller, so we chose 20 as an initial value for the number of stem cells within the stem cell niche. Bravo and Axelrod (2013) report an average of 35.7 quiescent stem cells within the human colon crypt through a staining experiment, so we assume there are 36 total stem cells within the human colon crypt. Human colon stem cells divide about once every 7 days (Potten et al. 2003), which would mean they would have to differentiate at a rate of about 0.321 per day to maintain homeostasis at the assumed initial parameters.
We parameterized the initial DFE based on those measured in whole organisms to evaluate whether they can account for known tumorigenesis incidence. The distribution of fitness effects has been estimated in mutation accumulation experiments and directed mutagenesis experiments. We consider the DFE proposed by Joseph and Hall (2004) in a mutation accumulation study because they report the expected effect size of deleterious and beneficial mutations, as well as the mutation rate and the proportion of mutations that were beneficial in a diploid eukaryotic system (Saccharomyces cerevisiae). They report an average beneficial heterozygous fitness effect of 0.061, which is slightly lower but within an order of magnitude of the effect of average beneficial mutation measured for vesicular stomatitis virus of 0.07 (Sanju an et al. 2004) and E.coli of 0.087 (Kassen and Bataillon 2006). They found that 5.75% of accumulated mutations were beneficial and that the overall mutation rate to alleles that alter fitness was 6:3 Â 10 À5 mutations per haploid genome per generation. This would result in a diploid beneficial mutation rate of 2 Â 6:3 Â 10 À5 Â 0:0575 ¼ 7:245 Â 10 À6 : This is within an order of magnitude of the beneficial mutation rate reported for E. coli (Wiser et al. 2013).
Mutation accumulation experiments may not capture the true distribution of fitness effects because they rely on observing the mutations of lineages that survive and persist in a population. Because of this, they are biased against mutations of large deleterious effect. Additionally, the random passaging of individuals to repopulate new generations may result in drastically different estimates of average mutational effect size for the same species. For instance, average deleterious effect of mutations in Saccharomyces cerevisiae has been estimated to be 0.061 (Joseph and Hall 2004), 0.086 (Wloch et al. 2001), and 0.217 (Zeyl and DeVisser 2001). Directed mutagenesis of random genome targets in an RNA virus revealed an average nonlethal deleterious fitness effect of 0.244 (Sanju an et al. 2004). It is likely that the inherent average effect size of a mutation of deleterious effect would be better reflected by the larger estimates because mutations of large deleterious effect may be lost in mutation accumulation experiments. After building our model with DFE parameters estimated from whole organisms, we describe overall patterns of mutation accumulation and risk of tumorigenesis and then we utilized least squares analysis to explore the best fit among a series of plausible choices of l and the expected value of s þ for the human incidence curves and compared to data from Chapman (1963) (best fit figures available as Figs S1-S3). For the division rate scenario, both for the exponential and for power-law DFE, we vary the expected value of s þ from 0.041 to 0.07 and l from 2:5 Â 10 À5 to 7 Â 10 À4 . For the differentiation rate scenario, we vary s þ from 0.041 to 0.07 and l from 2:5 Â 10 À7 to 7 Â 10 À4 .
The model described above was executed using R version 3.1.1. R scripts developed for this study are available at https://github.com/vcannataro/Somatic-Evo-DFE.

Mutations result in both aging and tumorigenesis within the intestine
Because stem cell niche populations are small, it is possible for mutant lineages with a fitness disadvantage to fix in the niche. This, coupled with the fact that the vast majority of mutations that occur will have a deleterious effect on stem cell fitness, results in the expected value of the probability density describing the new division rates to move away from the tumorigenesis threshold with subsequent fixed mutations ( Fig. 2A,C,E). In general, the accumulation of fixed mutations within crypts results in impaired stem cell maintenance and lower stem cell production, contributing to the aging of the tissue and organism.
The probability that a particular fixed mutation will result in tumorigenesis in the crypt, p n (eqn 5), is equal to the area under these densities that crosses the tumorigenesis threshold (Fig. 2B,D). For the initial parameterization in mice and humans, this increases at first, but then decreases with subsequent fixed mutations as the probability densities describing division rate move away from the tumorigenesis threshold.
Predicted incidence curves in mice and humans using DFE derived from a whole organism Using the model described in "Selection-assumptions", we determined the cumulative probability distribution of tumorigenesis within a population of crypts in an individual organism. For mice, using the initial parameters in Table 1 and exponentially distributed beneficial fitness effects, we find that the incidence of tumorigenesis is predicted to increase linearly with age, with close to nine percent of mice experiencing tumorigenesis at 3 years of age (Fig. 3A). Human tumorigenesis incidence in the large intestine is pre-dicted to be approximately 36% at 80 years of age (Fig. 3B), using an exponentially distributed beneficial fitness effects and the initial null parameters from Table 1. The only incidence data for early tumors or polyps were found for the large intestine in humans. The predicted incidence curve derived from an exponentially distributed DFE follows the same qualitative dynamics as the tumor incidence data. Incidence curves that are derived from a power-law distribution using the initial parameters in Table 1 predict nearly 100% tumorigenesis by 80 years of age and do not follow the incidence data dynamics. Hence, we performed a least squares analysis, varying parameters that have not been characterized for human somatic tissue, to find the parameter set in our exploratory space with the best fit to the observed incidence curve to the data.
Altering the expected beneficial fitness effects and the mutation rate provides better fits for both exponential and power-law derived incidence curves The expected mean fitness effects (s þ ,s À ) of the DFE and the mutation rate (l) per division of a mutation that alters the stem cell fitness were inferred from whole organisms as an initial parameter choice (Table 1). A parameter space around the initial choices was explored, and a least squares analysis was performed to find a better fit to the data (additional information in the Appendix, Figs S1-S3). Just the mutation rate (l) and expected beneficial fitness effect (s þ ) are presented because changes to the expected deleterious fitness effect (s À ) had little effect on the resultant tumorigenesis incidence curves. We found that both the exponential and power-law scenarios can provide similarly good fits to the data, however, with distinctly different parameters. The exponential DFE derived curve provided the best fit with the same mutation rate as our initial choice (Table 1), with a slightly larger expected beneficial fitness effect (E½s þ ¼ 0:064, Fig. 4A, red dashed line). Interestingly, assuming the same expected beneficial fitness effect as in Table 1 and varying the mutation rate provides a reasonable fit with a slightly larger mutation rate (l ¼ 1:75 Â 10 À4 , 4A, blue dashed line). The power-law DFE provided a similarly good fit to the incidence curve, but for a parameter space that assumes a much smaller expected beneficial fitness effect and a large mutation rate (E½s þ ¼ 0:044, l ¼ 5 Â 10 À4 , Fig. 4B red dashed line).

Mutations that alter the differentiation rate of stem cells result in rapid aging and tumorigenesis
Mutations affecting differentiation rate influence the lifetime of a stem cell lineage. Mutations that increase differentiation rate will decrease the fitness of the lineage, while mutations that decrease differentiation rate increase fitness.  Table 1 for the mouse. The first density is a green dashed line. Each probability density represents the division rate of a fixed lineage after n fixed mutations, with n indicated by an arrow. (B) Zooming in on the tumorigenesis threshold, we see that the area of the division rate density that is over the tumorigenesis threshold increases at first and then decreases with subsequent mutation. There is a change in slope of the densities at the tumorigenesis threshold because subsequent densities are calculated from the previous density which has had the area to the right of the tumorigenesis threshold removed and the area to the left renormalized to 1. Mutations affecting differentiation rate neutrally drift to fixation in the stem cell niche because the differentiation phenotype is not expressed in the niche, hence all cells divide at the same rate. Thus, the probability of fixation of mutations to differentiation rate is (1/N), regardless of mutational effect. We only considered an exponential mutational effect distribution because the distinction between exponential and power-law distributions is only significant in prevalence of large deviations from the mean, and beneficial mutational effects in this scenario exist between m 0 and zero. Because all mutations that solely affect differentiation rate drift neutrally, and the majority of mutations decrease fitness (by increasing differentiation rate), the majority of fixed mutations move stem cell pools away from the tumorigenesis threshold (Fig. 5).
Mutational effects are typically described as a proportion of the phenotype they are affecting, and as such, the same DFE applied to a larger rate will have a larger absolute expected effect. The differentiation rate of stem cells displaced from the niche is necessarily larger than the intrinsic division rate because only a subpopulation of the entire stem cell population is exposed to committing to differentiation; however, all cells are dividing (Ritsma et al. 2014), and the stem cell population is maintained at a steady-state equilibrium. Thus, mutations affecting differentiation rate in our model have a larger absolute effect for the same proportional change in rate when compared to the previous analysis on mutations to division rate. Hence, given a fixed mutation, we see a high incidence of tumorigenesis when the mutation affects differentiation rate (Fig. 6A,B). Fitting   Table 1. The solid red line connects large intestine polyp incidence data found during autopsy (Chapman 1963).
analyses along a range of plausible parameter space revealed a poorer fit when compared to mutations that alter division rate because mutations that alter differentiation rate will always result in large tumor incidence at early age.

Whole organism DFE are sufficient to explain tumorigenesis
We hypothesized that mutations in somatic tissues would differ in their distribution, compared to unicellular whole organisms, because of the regulatory processes that control cell division and differentiation rates in multicellular organisms. However, we found that whole organism DFE were sufficient to account for patterns of tumorigenesis in the intestines. This suggests that somatic evolution is not unique, but instead is based on the same patterns of mutation that we see in whole organisms. Hence, the differences in evolutionary patterns between somatic tissues and whole organisms, such as the tendency of tissues to age via the accumulation of deleterious mutations while populations of whole organisms instead evolve to greater mean fitness in benign environments, arise as a consequence of the small populations of stem cells within multicellular organisms and the asexual nature of cell division. Somatic aging via mutation is thus akin to the action of Muller's ratchet, the accumulation of deleterious mutations in organisms that cannot eliminate them via recombination. Indeed, the ratchet acts more strongly than in populations of organisms as a result of the relative importance of drift versus selection in very small stem cell populations (i.e., niches). This raises the interesting question of why somatic tissues are organized in this way and whether small stem cell pools predominate to minimize tumorigenesis at the expense of aging, as has been suggested by Michor et al. (2003). Given the role of well-known large effect mutations in cancer, it is tempting, from a mathematical modeling point of view, to adopt a heavy-tailed (infinite variance) distribution for the DFE. In contrast to the DFE employed for modeling populations of whole organisms (e.g., an exponential distribution), which tend to exhibit small incremental changes, a heavy-tailed regime enables a significant contribution from "one-shot" large mutations. To probe this possibility, we included in our simulations a powerlaw (Pareto) distribution which, through its shape parameter a, can be either heavy-tailed (1 < a ≤ 3) or not (a > 3). It is noteworthy, then, that the best fit parameters were very far from the heavy-tailed regime (a % 16). The prevalence of large outlier mutations for such a distribution is comparable to what would be seen from an exponential distribution, meaning that the heavy-tailed regime is not an appropriate modeling framework to explain the data.

Small populations and genetic drift lead to aging
One of our primary findings is that mutation effects drive crypt aging as much, if not more so, than tumorigenesis.   Table 1), which had l ¼ 1:75 Â 10 À4 . (B) Incidence curves derived from the assumption of a power-law beneficial DFE. All parameters are the same as in Table 1, except E[s þ ] = 0.044 for each curve and, ranging from top to bottom, l ranges from 4:5 Â 10 À4 to 5:5 Â 10 À4 by 2:5 Â 10 À5 , with 5 Â 10 À4 providing the best fit. Figure 5 The accumulation of probability densities describing stem cell differentiation rate. (A) Exponentially distributed fitness effects on differentiation rate using the parameters in Table 1 for the mouse. The first density is a green dashed line. Each probability density represents the differentiation rate of a fixed lineage after n fixed mutations, with subsequent mutations traveling away from the original differentiation rate. (B) Zooming in on the tumorigenesis threshold, we see that the area of the differentiation rate density that is over the tumorigenesis threshold decreases with subsequent mutation. There is a change in slope of the densities at the tumorigenesis threshold because subsequent densities are calculated from the previous density which has had the area to the left of the tumorigenesis threshold removed and the area to the right renormalized to 1.  (Rossi et al. 2007), competitively exclude healthy stem cells (Nijnik et al. 2007), and self-renew or differentiate (Jones and Rando 2011;Moskalev et al. 2013). These effects on stem cell dynamics would decrease the number of functional stem and nonstem cells in tissues, thus resulting in tissue aging, as defined in aging reviews and experimental work (above) and previous mathematical models (Wodarz 2007). As mutations become fixed in the intestinal stem cell niche, the expected value of the probability density describing new stem cell lineage division rates decreases when we consider mutations that affect division rate, and the expected value for differentiation rate increases when we consider mutations that affect differentiation rate, and thus, crypts are predominately aging. The intestinal stem cell niche is maintained at a population size smaller than the effective population sizes of whole organisms and our findings derive directly from this population structure. Our study, which emphasizes small healthy crypt populations, contrasts with previous studies that have investigated the accumulation of deleterious mutations in somatic tissue. These studies have looked at larger initial population sizes and in effect model hyperplasia or growing tumors. (C) Figure 6 The tumorigenesis incidence resulting from stem cell mutational effects on differentiation rate. Calculations presented for tumor incidence in (A) mice, (B) humans, and (C) Best fit incidence curve in red; expected beneficial fitness effect of 0.057 and mutation rate of 2.5 Â10 À6 . The other curves have the same mutation rate but vary around the expected beneficial fitness effect by increments of 0.001.
based on estimates from hyperplasia in mice 2 weeks after APC deletion. Similarly, McFarland et al. (2014) investigated mutation accumulation in models of hyperplasia and growing cancers. Datta et al. (2013) modeled deleterious mutations in housekeeping genes in an exponentially growing tumor initialized at 1 Â 10 6 cells. Beckman and Loeb (2005) assumed their population of cells was sufficiently large to ensure a deleterious mutation of any strength could not become fixed. These approaches are useful to describe tumor growth in initiated tumors but fail to capture the relative importance of drift in evolving stem cell niches and the process of tumorigenesis from healthy stem cell niches, which exist as very small populations. We find that crypts with fixed mutations are distributed along a range of both aging and tumor formation. The expected value of the division rate density moving away from the tumorigenesis threshold causes the probability of tumorigenesis per fixed mutation to eventually decrease. For example, in mice, there is a smaller probability that the fourth fixed mutation in a stem cell niche will result in tumorigenesis when compared to the third mutation in the exponential beneficial DFE scenario. The human intestinal crypt stem cell niche consists of a larger number of stem cells so drift plays a smaller role in the evolutionary trajectory of these crypts. Nonetheless, the mode of the distribution of division rate still moves away from the tumorigenesis threshold, albeit at a slower rate.
Although our model assumes that the size of the stem cell niche (N) remains constant and mutations only change the division rate or differentiation rate of lineages, it is possible that mutations could alter the niche size. If mutations altered the size of a crypt's stem cell niche, they would change the probability of fixation of subsequent mutations.
Mutations that only affect differentiation rate do not match incidence data curves Analyses of colon cancer genomes from different individuals reveals that a small number of genes, associated with large fitness advantage, are commonly mutated among cancers (Wood et al. 2007). For instance, many colon cancers contain cells that have mutations in genes involved in the Wnt-signaling cascade responsible for maintaining "stemness" (Clevers and Nusse 2012). A study by Smith et al. (2002) found that 56% of 106 sequenced tumors had mutations in the APC gene, which, when nonfunctional, results in the activation of the Wnt cascade (Reya and Clevers 2005). Additionally, cancers that have a mutation in the APC gene tend to have the mutation distributed throughout the tumor, suggesting the mutations occurred early in tumor growth (Sottoriva et al. 2015). Because the Wnt-signaling cascade is involved with maintaining a stem cell phenotype, mutations in this cascade would influence the propensity for stem cells to differentiate. Additionally, Smith et al. (2002) also found that 61.3% of colorectal cancers had mutations in p53, involved in regulating apoptosis, and 27.4% of colorectal cancers had mutations in Kras, thought to drive cancer growth by accelerating stem cell division and leading to enhanced crypt fission .
Our modeling scenario of mutations only having an effect on the differentiation rate of stem cells and having effect sizes equal to those measured in whole organisms results in rapid tumorigenesis, with nearly 100% of human individuals having a polyp in their large intestine at young age. Indeed, individuals with familial adenomatous polyposis (FAP), who already have a germline mutation in one copy of their APC gene and only need one mutational hit on the other to form an adenoma, regularly develop adenomas as teenagers (Bozic et al. 2010). In our model, the large tumorigenesis incidence associated with mutations solely affecting differentiation rate is due to both the mutations having a larger absolute effect toward the tumorigenesis threshold and there being a higher overall probability of fixation of new mutations among all the crypts due to the mutations fixing through neutral drift. Even when we decrease the expected beneficial mutational effect size and decrease the mutation rate in an attempt to better fit the tumorigenesis incidence data, we find that mutations only affecting differentiation rate still result in more tumorigenesis than predicted at young age. However, the data were derived from autopsies on individuals >10 years of age, so data for tumorigenesis are lacking in this age group. Additionally, we modeled scenarios where mutations only affect division or differentiation, nature is certainly more complex, and mutation in both differentiation and division rates are likely to co-occur within a crypt population. Indeed, the APC protein discussed above contributes directly or indirectly to cellular division, differentiation, migration, cell orientation, and apoptosis (Dikovskaya et al. 2007;McCartney and N€ athke 2008).
We model stem cell dynamics and mutational effects on those dynamics as a property that is controlled by an individual stem cell's genome, that is, a stem cell's heritable ability to produce or respond to internal signals, or respond to external signals, to divide or differentiate. The external signals regulating stem cell phenotype, such as Wnt signals produced by Paneth cells in the small intestine (Clevers 2013), are produced by cells differentiated from stem cells. Mutations to the stem cell genome may eventually influence the production of these signals in daughter cells. These mutations would drift neutrally in the niche, as they are not expressed until after stem cell differentiation, and, unless the lineage harboring the mutation reaches fixation in the niche, would eventually be lost from the crypt because Paneth cells die in approximately 20 days (Bry et al. 1994). Thus, mutations that result in differential signaling output by daughter cells can be modeled as neutrally fixed mutations acting intrinsically in the stem cells.

The influence of organism specific factors on somatic evolution
We find less tumor incidence in mice than humans throughout their respective lifetimes using the same DFE parameters. Mice only live a few years and have an order of magnitude fewer crypts in their entire intestine than humans have in just their large intestine (Potten et al. 2003). They also have smaller numbers of stem cells within their crypts, although those stem cells are dividing at a faster rate than human stem cells. Overall, this results in a lower chance of mutant lineages reaching fixation within crypts during the shorter mouse lifetime, and therefore, a reduction in the overall number of crypts with fixed mutations is lower. For instance, using the distribution of fixed mutations derived in the Appendix, at 2 years old, a mouse is expected to have about 75 crypts with two mutations, and only about 28% of mice will have a single crypt with three mutations. At 85 years old, a human is expected to have about 44 crypts with five mutations, one crypt with six mutations, and about four percent of humans at 85 years old will have a crypt with seven fixed mutations. As humans age they experience more fixed mutations, each of which confers a higher probability of tumorigenesis than the previous, whereas mice are expected to experience the accumulation of fewer mutations, possibly explaining the near linearity of the mouse incidence curve and the upwards curvature of the human incidence curve. Of note, given that a tumorigenesis event has occurred, it is likely the product of one mutation in the mouse model, whereas multiple mutations may contribute to the initiation of a tumor in the human model (the Appendix, Fig. S4).
The incidence of polyps at autopsy reported by Chapman (1963) was based on visual observations of discernible elevations of the mucosa in the entire large intestine during autopsy. It would take time for an initiated tumor to grow to a visible mass, so the true tumorigenesis incidence curve may lie in front of the data recorded in this study, with a lag time of growth before the tumor is visible. This lag time would be a function of the individual mutational spectrum of the initiated tumor and the tumor's environment.
Overall, we have shown that small homeostatic populations of stem cells, typical of somatic tissues in multicellular organisms, accumulate mutations that affect cellular fitness, contributing both to aging and tumorigenesis over an organism's lifetime. We show that the evolution of intestinal stem cell populations under the assumption of an organismal DFE, as opposed to the assumption of a heavy-tailed beneficial DFE, best predicted early tumor formation. However, aging, rather than tumorigenesis, predominated among crypts in the intestine. Our modeling approach emphasizes tumorigenesis in the context of aging, and vice versa, and demonstrates the importance of mutational processes within very small populations in both these phenomena.

The intervals between mutations that fix in the stem cell niche
The DFEs used in this work are both considered in terms of percentage increase or decrease, rather than in terms of absolute quantities of change. In mathematical terms, this means that the densities can be expressed in terms of the ratio k=k old . A remarkable consequence of this assumption is that the probability of a new lineage fixing in the niche is independent of the prevailing division rate k old . To see this, consider the probability of that a new lineage fixes after a mutation drawn from the exponential DFE. Recalling eqn 3, we note that the probability of fixation formula can be written in terms of the ratio of the new to the old division rate, r ¼ k=k old , We can then write which is independent of the choice of value k old . A similar result holds for the power-law DFE. Generally, this property holds for any DFE that can be expressed in terms of the ratio k=k old . It follows that the number of mutations that must occur in order for a new division rate to fix is distributed Geometrically with success probabilityp. By standard properties of CTMC, we can then say that the time between the arrivals of "successful" mutations is exponentially distributed with rate parameter lpk old N.

Population dynamics outside the stem cell niche
Once outside the niche, a stem cell can either divide (at rate k), or it can differentiate into transient amplifying cells (at rate m). For the purposes of this model, we consider differentiated cells to be dead. There is a chance that the lineage of a stem cell outside the niche can undergo sufficiently many mutations to cause tumorigenesis, but we found by way of numerical investigations that this does not significantly contribute to overall incidence of these cancers. As such, let Y(t) denote the number of stem cells outside the niche that have not yet differentiated. Assuming, for the moment, that all members of the stem cell niche have a division rate k, the CTMC Y(t) is defined by the transition rates YðtÞ ! YðtÞ þ 1 at rate ðN þ YðtÞÞk YðtÞ ! YðtÞ À 1 at rate YðtÞm: The form of the rate of increase follows from the observation that Y(t) increases anytime a stem cell divides, whether that stem cell is in the crypt or not. On the other hand, because stem cells in the niche are assumed to not differentiate, the total rate of decrease is proportional to the number of stem cells outside the niche. Because the population size is so small, there is high variability and we note that Y(t) can regularly hit the value zero. Because the niche is protected by unrelated biological processes, this does not constitute extinction of the full stem cell population. As soon as another stem cell in the niche divides, the population outside the niche is renewed again. A typical trace for Y(t) can be seen in Fig. S5.
The law of this CTMC, y n ðtÞ ¼ P YðtÞ ¼ n f g , satisfies the system of master equations d dt y n ðtÞ ¼ ðN þ ðn À 1ÞÞky nÀ1 ðtÞ1 n ! 1 ðnÞ þ ðn þ 1Þmy nþ1 ðtÞ À À ðN þ nÞk þ nm Á y n ðtÞ: One can then show that the mean yðtÞ ¼ P 1 n¼1 ny n ðtÞ satisfies the ODE d dt yðtÞ ¼ Nk þ ðk À mÞ yðtÞ: If k < m, this ODE converges to a steady-state value Nk/(mÀk). Otherwise the mean diverges to infinity with exponential growth. For this reason, we consider this threshold to be the initiation of tumorigenesis.
An alternate way to view the dynamics is to note that each time a stem cell in the niche divides it creates a new independent lineage outside the crypt. Let Y j ðtÞ be the number of living stem cells outside the crypt that are descended from (and include) the product of the jth stem cell division in the niche. As such YðtÞ ¼ P 1 j¼1 Y j ðtÞ. Each process Y j ðtÞ can be understood as a branching process, with an offspring distribution that is geometrically distributed with "success probability" q = m/(m + k). (The number of offspring is determined by the number of times the cell divides before differentiating. This is a sequence of independent trials where the probability of having another offspring, rather than differentiating, is k/(m + k).) As long as the mean of this offspring distribution is less than or equal to one, these lineages will eventually go extinct. Therefore, the critical stem cell division rate corresponds to when the mean of the offspring distribution (which can be shown to be (1 À q)/q = k/m) is less than one. In other words, the critical division rate k Ã is simply k Ã ¼ m.

Population dynamics in the crypt
Of course, tumorigenesis in a given crypt is exceedingly unlikely, even over the lifetime of an individual. We model the colon as a collection of C % 10 7 individual crypts that are mathematically identical and independent. The number of fixed mutations in the ith crypt at time t is denoted M i ðtÞ, and let fk i 0 ; k i 1 . . .g denote the sequence of division rates that become fixed in the ith crypt at times f0; s i 1 ; s i 2 . . .g, respectively. It follows that the inter-arrival times are independent and distributed as Whether tumorigenesis has occurred in the ith crypt will be tracked by the function v i ðtÞ, defined by That is to say, v i ðtÞ ¼ 1 if tumorigenesis has occurred before time t. It follows that the time of first tumorigenesis in the colon is given by the time The per capita population incidence curves are then just the cumulative distribution function of the random variable T , which can be expressed in terms of the individual crypt dynamics as follows: P T [ t f g¼ P v i ðtÞ ¼ 0 for all i 2 f1; . . .; Cg È É : To prepare for our numerical approximation of this quantity, we introduce one last bit of notation, fN m ðtÞg 1 m¼0 , which represents the number of crypts that have seen the arrival of m new fixed lineages as of time t. Then P T [ tjN m ðtÞ ¼ n m for all m f g These dynamics can be simulated by Gillespie's method (Gillespie 1977), but such an approach is computationally intensive. For this reason, we introduced a few simplifying assumptions. For example, we model the arrival rates of new fixed lineages in the crypts as being constant over time [having fixed ratel ¼plk 0 N, rather than a sequence of rates given in eqn (13)]. This allows us to assume that the number of mutations in each crypt at time t is Poisson distributed with meanlt. With such a tremendously large number of crypts in the colon, it is in turn reasonable to assume that the number of crypts takes the form n m % CP MðtÞ ¼ m f g% Ce Àlt ðltÞ m =m!. To complete the derivation of eqn (8) in the main text, we truncate the infinite product in eqn (15) and note that in the notation of the main text, P vðtÞ ¼ 0jMðtÞ ¼

DFE equations
To define the parameters of the system, we specified the probability P B of a beneficial (versus deleterious) mutation and the respective means s þ and s À of the DFE conditioned on the event that the mutation is beneficial or deleterious. The form of the mean of the DFE depends on whether it is exponential or heavy tailed. The exponential form DFE has mean Eðk new jk old Þ exp ¼ k old 1 þ P B a À ð1 À P B Þ b while the heavy-tailed DFE has mean Eðk new j k old Þ Pareto ¼ P B a À 1 a À 2 k old þ ð1 À P B Þ k old À k old b : The conditional means have the form s þ :¼ Eðk new j k old ; beneficialÞ exp ¼ k old 1 þ 1 a s À :¼ Eðk new j k old ; deleteriousÞ exp ¼ k old 1 À 1 b for the exponential DFE, and s þ :¼ Eðk new j k old ; beneficialÞ Pareto ¼ k old a À 1 a À 2 s À :¼ Eðk new j k old ; deleteriousÞ Pareto ¼ k old 1 À 1 b for the heavy-tailed DFE.

Least squares analysis
We generated tumor incidence curves and used a least squares analysis to determine which set of these parameters best fit the tumor incidence data described in Chapman (1963). The best fit has the smallest sum of squared residuals of the parameter space explored in Figs S1-S3.
Tumor mutational profile Figure S4 contains the probabilities that each individual mutational profile was the culprit in tumorigenesis given that tumorigenesis occurred in a single crypt. They were calculated using Bayes' theorem to compute the probability that a certain mutational load fixed in the crypt given that a tumorigenesis event happened, where we recall that M(t) is the number of fixed mutations as of time t, T is the precise time that tumorigenesis occurs in the crypt, and f T refers to density of the random variable T. The quantity PðMðTÞ ¼ nÞ is the probability that tumorigenesis occurs exactly on the nth mutation, a quantity we defined earlier as p n and gave a recursive formula for in eqn 5 in the main text. To compute the quantity f T ðt j MðTÞ ¼ nÞ, note that because the arrival time of the nth mutation is independent of the event that it causes tumorigenesis, we have that f T ðt j MðTÞ ¼ nÞ ¼ f s n ðtÞ, where s n is the arrival time of the nth mutation. By hypothesis, s n is Poisson distributed with meanlt withl being defined after eqn 8 in the main text. The distributions of mutational profiles given the tumorigenesis event occurred at a certain point in time throughout an organism's lifetime are given in Fig. S4.

Supporting Information
Additional Supporting Information may be found in the online version of this article: Figure S1. Heat map depicting values for the least squares analysis of predicted tumor incidence and human tumor incidence data for the exponential beneficial DFE on division rate scenario. Figure S2. Heat map depicting values for the least squares analysis of predicted tumor incidence and human tumor incidence data for the power-law beneficial DFE on division rate scenario. Figure S3. Heat map depicting values for the least squares analysis of predicted tumor incidence and human tumor incidence data for the mutations affecting differentiation rate scenario. Figure S4. Mutation profiles of a tumor at the onset of tumorigenesis. Figure S5. The simulated stem cell dynamics within a human crypt for both the displaced stem cells (A,B) and the stem cell niche (C,D) showing a fixation event of a mutant lineage.