• Clare A. Rebbeck,

    Corresponding author
    1. Department of Biology, Imperial College London, Silwood Park, Ascot, Berks., SL5 7PY, United Kingdom
    Search for more papers by this author
  • Rachael Thomas,

    1. Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, 4700 Hillsborough St., Raleigh North Carolina 27606
    2. Center for Comparative Medicine and Translational Research, North Carolina State University, 4700 Hillsborough St., Raleigh North Carolina 27606
    Search for more papers by this author
  • Matthew Breen,

    1. Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, 4700 Hillsborough St., Raleigh North Carolina 27606
    2. Center for Comparative Medicine and Translational Research, North Carolina State University, 4700 Hillsborough St., Raleigh North Carolina 27606
    Search for more papers by this author
  • Armand M. Leroi,

    1. Department of Biology, Imperial College London, Silwood Park, Ascot, Berks., SL5 7PY, United Kingdom
    Search for more papers by this author
  • Austin Burt

    1. Department of Biology, Imperial College London, Silwood Park, Ascot, Berks., SL5 7PY, United Kingdom
    2. E-mail:
    Search for more papers by this author

Current address: MRC, Physiological Genomics, Imperial College London, Hammersmith Hospital, London, W12 0NN, United Kingdom.


Canine transmissible venereal tumor (CTVT) is an infectious disease of dogs. Remarkably, the infectious agent is the cancerous cell itself. To investigate its origin and spread, we collected 37 tumor samples from four continents and determined their evolutionary relationships using microsatellite length differences and microarray-based comparative genomic hybridization (aCGH). The different tumors show very little microsatellite variation, and the pattern of variation that does exist is consistent with a purely asexual mode of transmission. Approximately one quarter of the loci scored by aCGH show copy number variation relative to normal dogs, again with little variation among different tumor samples. Sequence analysis of the RPPH1 gene indicates an origin from either dogs or wolves, and microsatellite analysis indicates that the tumor is more than 6000 years old, and perhaps originated when dogs were first domesticated. By contrast, the common ancestor of extant tumors lived within the last few hundred years, long after the first tumor. The genetic and genomic patterns we observe are typical of those expected of asexual pathogens, and the extended time since first origin may explain the many remarkable adaptations that have enabled this mammalian cell lineage to live as a unicellular pathogen.

Canine transmissible venereal tumor (CTVT) is an infectious disease of dogs caused by a pathogenic lineage of cancerous cells (Das and Das 2000). Cells are transmitted directly from one dog to another, usually during coitus, and once on a new host they reproduce over a period of two to six months to form a tumor-like growth, usually around the genitals. In the absence of treatment, tumors often regress after one to three months, and if regression is complete then the host is immune to subsequent reinfection. CTVT can be found in many parts of the world, particularly in feral populations, and in some regions is the most common dog tumor.

Previous work on the genetics of CTVT has revealed a number of distinctive features that together suggest that the tumor originated once and subsequently spread worldwide. First, although the karyotype of the normal domestic dog has 2n= 76 acrocentric autosomes plus submetacentric X and Y sex chromosomes, the chromosome number in CTVT is consistently reduced to 2n= 57–59, thought to be largely due to a series of fusion events between smaller chromosomes to form 16–18 bi-armed derivatives (Makino 1963; Murray et al. 1969; Adams et al. 1981; Vermooten 1987; Fujinaga et al. 1989). As the total number of chromosome arms is grossly comparable to that of the normal dog genome, it has been proposed that the karyotypic rearrangement in CTVT is not associated with any significant change in DNA content (e.g., Makino 1963; Weber et al. 1965). Second, CTVT cases of diverse geographic origin all share a LINE transposable element insert upstream of the c-myc gene that is not found in the normal dog genome (Liao et al. 2003). It is tempting to speculate that this insertion may have been causally involved in the origin of the tumor, but this has yet to be experimentally tested. Finally, a recent survey of microsatellite length variation found relatively few differences among extant tumors, again indicating a single origin, and estimated the time back to the common ancestor of extant tumors to be about 250–2500 years (Murgia et al. 2006).

By any criterion, the change from free-living canid to pathogenic cell line is a drastic one that might be expected to have had profound effects on the genotype and phenotype. In this study, we have combined analyses of microsatellite length variation, DNA copy number variation, and DNA sequence variation to better understand the origin of CTVT and the subsequent changes in its genome, including an estimate of the age of CTVT and the fraction of the genome that has undergone duplication and deletion.

Materials and Methods


Fresh tissue samples were cut so that linear dimensions were no greater than 1 cm, then suspended for at least 48 h in at least nine times their own volume of 70% ethanol, transferred to twice their volume of 95% ethanol, and stored refrigerated. Where possible tissue from the host dog was also collected. DNA was extracted using Qiagen DNeasy tissue kits (Qiagen, Hilden, Germany) following the manufacturer's protocol, eluted with 200 μl of buffer, and then diluted 10-fold with water for analysis. We confirmed the tissue composition of all CTVT samples by quantitative PCR using primers matching a tumor-specific LINE insert upstream of the c-myc gene (Liao et al. 2003).


Di-nucleotide microsatellite markers were chosen from those previously used to study dog breeds (Parker et al. 2004). Forward primers were redesigned to include a 19 base M13 sequence (5′-AGG AAA CAG CTA TGA CCA T-3′) on the 5′ end. Amplicons were labeled by the addition of 0.8 μl (10 pmol/μl) of an M13 primer of the same sequence as above tagged with 6FAM, HEX (MWG-Biotech AG, Ebersberg, Germany) or NED (Applied Biosystems, Warrington, Cheshire, UK) dyes for each reaction, and analyzed on an AB 3700 genotyper. A sample of dog DNA from Parker et al. (2004) was also analyzed, to ensure comparability across studies. Where matched host samples were unavailable and the alleles present were consistent with alleles from other samples, these alleles were taken as representing CTVT, with the one or two other peaks present (usually smaller) attributed to host contamination. For marker 77, it was difficult to distinguish CTVT and dog alleles, and this marker was excluded from further analysis. Two additional loci (nos. 8 and 18) were excluded because they appeared to be triploid, and one other (no. 19) because allele sizes differed by an odd number of nucleotides, and hence was difficult to interpret in terms of di-nucleotide repeat length variation.

To calculate the divergence of two microsatellite genotypes, we make use of the stepwise mutation model, according to which the expectation for the square of the difference in number of repeats between two alleles (s2) is equal to the mutation rate (u) multiplied by twice the time back to the common ancestor of the two alleles (t): E(s2) = 2tu (Goldstein et al. 1995; Slatkin 1995). For two diploid genotypes with microsatellite lengths (a,b) and (c,d), we therefore calculate the average divergence of the alleles in the two individuals as Δ= ((ac)2+ (ad)2+ (bc)2+ (bd)2)/4. This is equivalent to the D1 statistic of Goldstein et al. (1995), replacing populations with individuals, and is expected to increase linearly with the average divergence time of the alleles in the two individuals. Mean divergence within a group of individuals (e.g., wolves) or between two groups of individuals (e.g., CTVT vs. wolves) was calculated by averaging Δ first across all pairs of individuals and then across loci. To compare, say, CTVT-wolf divergence to wolf-wolf divergence, we calculated ΔCTVT-wolf−Δwolf-wolf separately for each locus and then the average of these differences on 10,000 bootstrap samples.


Array-based comparative genomic hybridization analysis was performed using a genomic microarray consisting of dog bacterial artificial chromosome (BAC) clones distributed at ∼10 Mb intervals along each dog autosome (Thomas et al. 2007). The average insert size of the BAC clones is estimated at 170–175 kb. Microarray analysis was performed as described previously (Thomas et al. 2005) using a reference sample composed of an equimolar mix of DNA isolated from the peripheral lymphocytes of 10 clinically healthy male dogs of mixed breed. DNA samples were fluorescently labeled by random priming, resulting in labeled fragments predominantly 100 bp–2 kb in length. Equal amounts of CTVT and normal dog DNA were cohybridized to the array and the ratio of fluorescent hybridization signals from the two samples was recorded (hereafter referred to as the “hybridization ratio”). Arrays had two replicate spots of each BAC clone, and data were discarded if the standard deviation of hybridization ratios among the replicate spots was greater than 0.2. For simplicity, we analyzed untransformed hybridization scores, so that gains and losses of a single copy are expected to have symmetrical effects on the hybridization score. The CTVT samples analyzed by aCGH included four that had been collected directly from a host dog and one that had been propagated in the laboratory as a xenograft on mice (Harmelin et al. 2001). The CTVT samples collected directly from dogs also contained normal noncancerous dog cells, and therefore their hybridization scores fluctuated with lower amplitude around a value of 1 (i.e., genomic balance) than did the tumor propagated as a xenograft. To remove the effect of contaminating dog cells, we standardized the hybridization scores from the dog-derived samples as follows: standardized score = (original score − 1) × (x/y) + 1, where x is the average across BAC clones of the absolute value of the hybridization score minus 1 for the mouse xenograft sample, and y is the same for the particular dog-derived CTVT sample. The consequence of this standardization is that all samples fluctuate around a hybridization score of 1 with the same average amplitude as the xenograft sample. The ratio y/x is an inferred estimate of the fraction of cancerous cells in the four dog-derived samples, and ranged from 52% to 81%.


RPPH1 was amplified and sequenced using the primers 5′-CCCGCGGTATTTGCATGCTGCT-3′ and 5′-CTGTCTCTCAGGAATAAATAAGGGCT-3′ (Bardeleben et al. 2005). Sequences were aligned as previously described (Bardeleben et al. 2005) and maximum parsimony trees were constructed using PAUP * 4.0b10 (Swofford 2002).

Results and Discussion


To study the population genetics and evolution of CTVT, we collected samples of 37 tumors from seven countries (Mexico, Greece, Israel, Kenya, South Africa, Thailand, and Malaysia) on four continents (Table S1). Noncancerous host tissue was available for 15 of the samples. All 37 CTVT samples were genotyped at 26 di-nucleotide microsatellite loci scattered throughout the genome. These loci have previously been shown to be highly polymorphic among different dogs (Parker et al. 2004). An additional four microsatellite loci were excluded from analysis (see methods), and 14 loci did not amplify consistently from the tumor samples, although they did amplify from control DNA from a single dog (Table S2). Consistent with previous studies (Murgia et al. 2006), the tumor had a different genotype from its host in all 15 cases for which host tissue was available. In addition, the different tumor samples showed strikingly little variation: of the 26 microsatellite loci scored, 21 were monomorphic in all CTVT samples and only five loci were variable, with a maximum of four alleles (Tables S3 and S4). All of the tumors were genetically more similar to each other than any of the 414 dogs studied by Parker et al. (2004) were to each other, including members of the same breed (Fig. 1A). These results are fully consistent with those of Murgia et al. (2006) from an independent collection of tumors, and support their conclusion that CTVT is a single neoplastic cell lineage that is transmitted among dogs as an allograft.

Figure 1.

Microsatellite analysis of CTVT samples. (A) Frequency distribution of the proportion of alleles shared between CTVT samples, between dogs of the same breed, and between dogs of different breeds. (B) Unrooted genealogy of 37 CTVT samples, as determined by five variable microsatellite loci. Branch lengths from a maximum parsimony analysis using a stepwise mutation model; consistency index = 1.00. Black star represents the midpoint of the genealogy, used to infer the ancestral genotype. † indicates samples with missing data, and whose position is therefore uncertain.

Multilocus genotype data can be useful for distinguishing sexual from asexual reproduction (Burt et al. 1996; Vorburger et al. 2003). It scarcely seems possible that CTVT could have a conventional meiotic sexual cycle, but perhaps there is some less regular form of parasexual gene transfer. To investigate this issue, we performed a phylogenetic analysis of the five variable microsatellite loci. At least in normal dogs, these are each located on a different autosome. In an asexual population the genealogy of any set of individuals can be represented as a bifurcating tree, and the whole genome shows the same pattern of ancestry, whereas in a sexual population the ancestry of a set of individuals is a net or web, and different regions of the genome show different genealogical relationships. Our analysis reveals a single most parsimonious genealogy, with a consistency index of 1 and no evidence of the homoplasy or conflicts between loci expected with recombination (Fig. 1B). This pattern is consistent with an evolutionary history for CTVT that is purely asexual, without any transfer of nuclear genes among tumors, though with only five loci we cannot rule out a low frequency of genetic exchange. The very low level of variation among tumors is also consistent with a clonal mode of reproduction, because a selective sweep at any one locus will homogenize the whole genome (Barraclough et al. 2003).

The genealogy also shows substantial geographical structure, with samples from the same location tending to have more similar genotypes than those from different locations. This is the pattern expected if dispersal is usually local. Nevertheless, tumors that are identical at all 26 loci can be found on different continents, indicating relatively recent intercontinental movement.


In our microsatellite analysis there were 14 loci that did not amplify consistently from the tumor samples despite successful amplification from dog DNA. In the normal dog genome these 14 loci are found on 12 different autosomes. There are many possible explanations for PCR failure, including small-scale changes in the primer-binding sites; chromosomal rearrangements that separate the primer-binding sites or change their relative orientation; and homozygous deletion. Previous studies have shown that CTVT has a substantially different karyotype compared to the normal dog (Oshimura et al. 1973), though it is not clear whether this has been associated with changes in DNA copy number (Makino 1963; Weber et al. 1965).

To investigate this issue more fully, we used microarray-based comparative genomic hybridization (aCGH) between CTVT DNA and a reference sample of DNA from clinically healthy dogs to identify sites of genomic DNA copy number variation at 10 Mb resolution (Thomas et al. 2007). Two rounds of aCGH experiments were performed: in the first, we analyzed CTVT sample 6A, which has been propagated in the laboratory as a mouse xenograft (Harmelin et al. 2002), and in the second, we analyzed the same tumor plus four others collected directly from dogs, chosen to be spread across the microsatellite tree (29M, 82K, 57C, and 72C). Thus, we performed a total of six hybridization experiments. Any locus for which there were missing data in more than one of the six hybridization experiments was excluded, resulting in a total of 244 loci available for analysis.

For each locus we calculated the median ratio of tumor:reference fluorescence across the six experiments and the probability that the mean is different from 1 (i.e., genomic balance) by a t-test. These show the expected “funnel” relationship, with statistical significance increasing as the median deviates in either direction from 1 (Fig. 2A). Half the loci (121/244) have P-values less than 5%, indicating very substantial genomic differences between the CTVT samples and the control pool of DNA from clinically healthy dogs that were used as the reference sample in aCGH. For many of the loci that show statistically significant deviations, the magnitude of the deviation is small (e.g., median hybridization ratios as high as 0.97 or as low as 1.03), and it is not clear in these cases whether the underlying difference between CTVTs and dogs is a simple unit change in copy number of the entire locus or some other difference (e.g., copy number variation that involves only part of the locus, sequence divergence of repeats due to concerted evolution, etc.) To determine a nonarbitrary criterion for identifying copy number differences, we plotted the frequency distribution of median hybridization scores. This distribution is bimodal, with a secondary peak at 0.7–0.75 (Fig. 2B). This extra peak is even more apparent if one considers only the most reliable medians (standard deviation across experiments < 0.1; Fig. 2C). We therefore take median ratios < 0.75 (and, by symmetry, >1.25) as conservative thresholds for identifying changes in copy number.

Figure 2.

aCGH analysis of CTVT samples. (A) Statistical significance of difference from 1 as a function of median hybridization ratio. Frequency distributions of hybridization ratios for (B) medians of the six experiments; and (C) medians where the standard deviation among experiments is less than 0.1. (D) Hybridization ratios across the genome. Data are shown in order for the 38 autosomes of the normal dog karyotype, with alternating colors between chromosomes for clarity. Lines indicate 95% confidence limits on the hybridization ratio; thick lines indicate BAC clones for which the median value (indicated by a circle) is less than 0.75 or greater than 1.25.

By this data-derived criterion, 57 of the 244 loci (=23 ± 2.7% (SE)) have a different copy number in CTVT than in normal dogs, with slightly more losses than gains (36 vs. 21). For all but one of these loci the 95% confidence limits on the hybridization ratios do not include 1. Loci with variant copy numbers are well distributed along the genome, involving 28 of the 38 autosomes (Fig. 2D). In no case do all loci on a chromosome show concordant gains or losses, suggesting that duplications and deletions are restricted to parts of chromosomes. Indeed, with some exceptions (e.g., chromosomes 3, 7, and 10), the gain or loss of a locus is not usually shared with its neighbors. This pattern presumably reflects the relatively wide spacing of the BAC clones used in the array (ca. 10 Mb), and implies that a more-dense array is likely to detect even more differences in copy number. Nevertheless, assuming the loci scored in our aCGH are a random sample from the genome, the overall fraction of the genome showing losses and gains should remain about the same in a larger sample. Interestingly, we get approximately the same results if we use a threshold of P < 0.001 for identifying gains and losses: 53/244 (22%) of loci have an altered copy number, again with more losses than gains (39 vs. 14).

It is not possible from these data to determine whether the secondary peak at hybridization ratios of 0.7–0.75 represents loss of one or both copies of the genomic region. Starting from a diploid ancestor, loss or gain of a single copy might be expected to give relative hybridization scores of 0.5 and 1.5, respectively; however, cross-hybridization to other regions of the genome mean that the fluctuations will be less than this simple theoretical expectation (Snijders et al. 2001; Fiegler et al. 2003; Thomas et al. 2003). The proportion of the genome estimated to have lost one or both copies is 36/244 = 15 ± 2.3%, which is less than the fraction of microsatellites that failed to amplify. This difference is expected, because our threshold is probably conservative, and there are more ways to cause PCR failure than simple complete deletion of a BAC-sized genomic region.

These analyses indicate substantial DNA copy number differences between CTVT genomes and normal dog genomes. If we now compare the different CTVT samples with each other, the differences are much less. Comparing the five samples analyzed in the second round of experiments to the mouse xenograft sample analyzed in the first round, the average difference in hybridization values between the four samples collected directly from dogs and the xenograft is no greater than the difference between the two replicate experiments on the same xenograft (average absolute value of difference, 0.106 ± 0.008 vs. 0.110). Moreover, the number of BAC clones that differ in hybridization score by more than our threshold of 0.25 when comparing dog and xenograft samples is no greater than when comparing the replicate xenograft experiments. Finally, if we compare the average of the two Cape Town samples (57C and 72C) to the average of the other three (which are on the opposite side of the deepest split in the microsatellite tree), there is no hint of the bimodal distribution seen in the hybridization values themselves (not shown). We conclude that the differences in genome content between dog and CTVT are substantially greater than those between different CTVT samples.


One possible explanation for the substantial differences between CTVT and dog genomes is that CTVT originated in a more distantly related species and then transferred to dogs. Though an origin from dog or wolves is commonly assumed, we are not aware of any phylogenetic analysis exploring the possibility that it originated elsewhere. In the laboratory, the tumor can be experimentally transferred to a wide range of hosts, including species as distantly related as foxes (Cockrill and Beasley 1979; Cohen 1985). To determine more confidently the source species of CTVT, we sequenced 574 bp of the RPPH1 gene from two tumor samples (6A and 21M), and compared them to previously published sequences for 12 species of Canidae (accession numbers in Table S5). This gene has previously been shown to be a useful phylogenetic marker for resolving canid relationships (Bardeleben et al. 2005). The two CTVT sequences were identical to each other and to sequences from the gray wolf and a dog. All wolf, dog, and CTVT sequences formed a strongly supported cluster, distinct from any other sequence (Fig. 3). This cluster is defined by a 6 bp insertion and an A -> T nucleotide substitution, both in the 5′ flanking region of the RPPH1 gene. These changes are found in all wolf, dog, and CTVT sequences, but not in the next closest relative on the tree, the golden jackal (Canis aureus). Coyotes (Canis latrans) are more closely related to dogs and wolves than are golden jackals, but unfortunately RPPH1 sequences for them are unavailable. We conclude that CTVT did indeed arise from a dog or wolf (or possibly a coyote), rather than from a more distantly related species.

Figure 3.

Consensus phylogeny of canids based on the RPPH1 locus (n= 574 nucleotides). The position of the CTVT samples is indicated in red. Numbers are bootstrap percentages from 1000 replicates.


Another possible explanation for the substantial differences between CTVT and dog genomes is that CTVT is relatively ancient. Murgia et al. (2006) estimated that CTVT originated recently, 250–2500 years ago, but this was based only on the divergence of extant tumors from each other, and so estimates the age of the most recent common ancestor, not the time of origin per se (Fig. 4A). The Murgia et al. estimate is a lower bound on the age of CTVT, but the actual age could be much greater. In the present study, the aCGH data showed little variation among tumors—as was also observed with the microsatellite data—but considerable differences between tumor and dog. This pattern is consistent with an origin of CTVT that long predates the common ancestor of extant tumors.

Figure 4.

(A) The age of the common ancestor of extant tumors and the age of the progenitor tumor are not the same. (B) Average divergence among different classes of individuals; error bars are 95% confidence limits obtained by bootstrapping across n= 26 loci.

To investigate this issue more closely, we returned to our microsatellite dataset, first to estimate the age of the most recent common ancestor of our own independent sample of tumors. To do so, we first inferred the ancestral genotype from the midpoint of the genealogy (star in Fig. 1B). The average number of mutational steps separating the extant tumors from the inferred ancestor is 0.047 mutations per locus (95% CI: 0.002–0.12, calculated by bootstrapping across loci). To estimate the time required to generate this number of mutations, we need to know the mutation rate. Microsatellite mutation rates in mammals are typically about 10−3 per generation (Ellegren 2000), and the generation time of dogs is about 4 years (Neff et al. 2004). If we assume that the mutation rate per cell division in CTVT is the same as that in the germline of the dog, and that the number of cell divisions separating CTVT samples taken 4 years apart is about the same as the average number of cell divisions in the male and female germlines of the dog, then the mutation rate for CTVT would be about 0.25×10−3 per year. To allow for uncertainty in this estimate, we use a range from 10−4 to 10−3. This is the same range of mutation rates used by Murgia et al. Using this estimate, we calculate that the most recent common ancestor of our tumors existed only 47–470 years ago (95% CI from bootstrapping across loci of 2–120 and 20–1200 years for mutation rates of 10−3 and 10−4, respectively). The estimated time back to the common ancestor is even more recent under most alternative scenarios: if the most common genotype is the ancestral genotype; if an infinite alleles model of mutation is used; if multistep mutations occur; or if the mutation rate per cell division is higher for the tumor than for dogs. Our estimates are more recent than those of Murgia et al., although the ranges overlap.

Clustering analysis suggests that wolves are somewhat more likely ancestors for CTVT than dogs (Murgia et al. 2006 and our unpublished analyses). To estimate when CTVT first originated, we compare our CTVT genotypes to wolf genotypes for the same 26 microsatellite loci as reported by Parker et al. (2004). To ensure comparability between the studies, we re-analyzed microsatellite lengths from one of the DNA samples used in their study. There are eight wolf genotypes in their dataset, from a worldwide sample (Canada [Quebec], US [Alaska], Mexico, Sweden, Italy, Oman, Iran, China). For each locus, we calculated the divergence between two individuals as the mean squared difference in repeat number between the two alleles in one individual and the two alleles in the other. Under the stepwise mutation model, this measure is expected to increase linearly with the average divergence time of the alleles in the two individuals (Goldstein et al. 1995; Slatkin 1995). We assume initially that the progenitor wolf was as divergent from other wolves alive at the time as our eight recently sampled wolves are from each other. Under this assumption, if CTVT had originated very recently from wolves, then CTVT-wolf divergence should be approximately equal to the divergence of two wolves. In fact it is significantly greater (24 ± 5.9(SE) vs. 11 ± 2.8 mutations per locus, n= 26 loci, P= 0.006; Fig. 4B). The difference between these values is ΔCTVT-wolf−Δwolf-wolf= 12.9 ± 6.1 mutations per locus (95% CI: 2.3–25.6). The expected number of mutations separating extant CTVT samples from the progenitor tumor is half this difference, or 6.5 mutations per locus, which is 6.5/0.047 = 137 times larger than the estimated number of mutations separating extant tumors from their common ancestor (inferred above). These results are again consistent with an origin for CTVT that long predates the common ancestor of extant tumors.

To estimate the time of origin, we again use the stepwise mutation model, under which ΔCTVT-wolf−Δwolf-wolf has an expectation equal to twice the time back to the origin of CTVT (t) multiplied by the mutation rate (u): ECTVT-wolf−Δwolf-wolf]= 2tu (by analogy with Goldstein et al. 1995; Slatkin 1995). If we use the same mutation rate as previously (i.e., 10−4 to 10−3), then the estimated time of origin of CTVT is 6500–65,000 years ago.

Apart from uncertainties in the mutation rate, several potential sources of error in these estimates are worth discussing. First, it is possible that CTVT originated more recently, but from a progenitor that was significantly more divergent from other wolves than the eight recently sampled wolves are from each other. Given the diverse geographical origins of these eight wolves, we do not think this very likely, although more extensive sampling of wolf diversity would be useful. CTVT may have originated from a dog, but we have repeated the analysis for each of the 5 “ancient” breeds of dog that cluster genetically with wolves and are possible progenitors (Akita, Basenji, Chow Chow, Shar Pei, and Shiba Inu; Parker et al. 2004; Murgia et al. 2006), and we get similar results: in each case divergence between CTVT and dog is greater than dog–dog divergence (statistically significant in four of five cases; Table S6). Moreover, the average divergence of CTVT from wolves and dogs is greater than the divergence of wolves and dogs from each other (23 ± 5.2 vs. 16 ± 4.0, P= 0.04). Second, there are likely to have been some deviations from a strict stepwise mutation model. To the extent that there have been gains or losses of more than one repeat at a time, then our times are overestimates (this should apply equally to the time back to the common ancestor and the time back to the progenitor, so their ratio should be unaffected). On the other hand, to the extent that there are bounds on allowable microsatelite lengths and divergences between CTVT and wolves have saturated, then our times are underestimates. Finally, we are assuming that the microsatellites are neutral, and not functionally involved in the evolution of CTVT. As CTVT is clonal, the microsatellites will be linked to loci under selection, but this linkage should not affect their rate of divergence or our estimated times.

An additional source of information on the age of CTVT comes from the difference in repeat number between the two alleles in each individual. In a sexual population, this difference will evolve to an equilibrium between the opposing effects of mutation and drift, but in an asexual lineage drift will have no effect on allelic variation within individuals, and the two copies will diverge over time, a phenomenon sometimes referred to as the “Meselson effect” (Birky 1996; Mark Welch and Meselson 2000). For every locus, we calculated the squared difference in the number of repeats between the two alleles in a tumor, averaged across the tumors. Again, this measure is expected to increase linearly over time, assuming a stepwise mutation model (Coltman et al. 1998; Coulson et al. 1998). Because we cannot distinguish homozygotes from haploid deletions, we only included heterozygous loci. The mean divergence of alleles within a tumor is 15.6 ± 6.4 (SE, n= 14 loci). Sequence analysis of the MHC (a.k.a. DLA) loci in CTVT indicates very similar alleles, suggesting the progenitor was relatively inbred (Murgia et al. 2006). If we assume, as a lower bound, that the progenitor was completely homozygous (initial divergence = 0), then an upper bound on the average divergence of each allele from the ancestor is 15.6/2 = 7.8 mutations per locus, or 7800–78,000 years.


In this study, we have used microsatellite and aCGH analyses to analyze the origin and evolution of CTVT. These two types of molecular marker give highly concordant results, both showing little variation among extant tumors and considerable divergence from the progenitor (confirmed by sequence analysis to be dog or wolf rather than a more distantly related species). This pattern suggests that the origin of CTVT long predates the common ancestor of extant tumors—indeed, the microsatellite data suggest the difference in age is a factor of about 140, although the confidence limits are large. The estimated time of origin now includes the time when dogs were first domesticated, thought to be about 10,000–15,000 years ago (Savolainen et al. 2002). The fact that CTVT-wolf and CTVT-dog divergence is greater then wolf–dog divergence might suggest that the origin of CTVT even predates the time of domestication. However, it is also possible that continued interbreeding of wolves and dogs since the time of first domestication has made them less divergent than they otherwise would be. We speculate that domestication may initially have been associated with a genetic bottleneck, creating a genetic homogeneity that facilitated the early spread of the tumor (just as genetic homogeneity has been implicated in the spread of transmissible tumors in hamsters [Banfield et al. 1965] and Tasmanian devils [Pearse and Swift 2006]). As dogs then slowly diverged into the numerous breeds we see today, the cancer too would have adapted to allow transmission among genetically diverse hosts. By contrast, the common ancestor of extant tumors is estimated to have existed only a couple of hundred years ago. This short time-span back to the common ancestor is not due to a recent origin, but rather to recent intercontinental spread of a particularly successful variant. A consequence of being asexual is that such a spread homogenizes the entire genome.

The aCGH analyses indicate that CTVT differs not only in chromosome number and morphology from the normal domestic dog karyotype, but also in the copy number of many genomic regions. Some fraction of the very many duplications and deletions observed in the CTVT genome have probably been directly selected, whereas others have been tolerated because the genomic region in question is no longer important. It will be fascinating to investigate which precise regions of the genome have changed in copy number, by higher-resolution aCGH using oligonucleotide arrays, and eventually by whole-genome resequencing.

Our analyses considerably extend the estimated age of this oldest known cell line. This increased time-span increases the scope for adaptation by CTVT to its new way of life as a sexually transmitted pathogen. These adaptations can be expected to include more sophisticated versions of adaptations seen in conventional cancers. It is clear, for example, that CTVT has evolved a number of powerful means for escaping the host immune system (Liu et al. 2008). There should also be adaptations not seen in other cancers, for transmission between hosts. The cycle of infection currently takes about six months; if this has been true for, say, 15,000 years, then there have been 30,000 transmission events back in time, plenty of opportunity for adaptation by natural selection. Parasites are often observed to show adaptations for manipulating their hosts in such a way as to increase their probability of transmission. As CTVT is a sexually transmitted disease, we should therefore expect infected dogs to show greater proceptivity, receptivity, and perhaps even attractiveness for sexual encounters.

Associate Editor: A. Read


We thank the following for providing tissue samples for this study: E. C. Yeoh, M. Dagli, F. Gartner, J. Gilchrist, A. Ortrga-Pacheco, B. Rivera, N. Trakrantungsie, and H. Ververidis. We would also like to thank C. Isacke and A. Ashworth for useful discussions, and Dr. H. Parker and Dr. E. Ostrander for contributing their data. This work was supported by the National Environmental Research Council (NERC) and the Breakthrough Breast Cancer Research Centre, part of the Institute of Cancer Research (IRC). Canine cancer genetics work at NCSU is generously supported by funding from the AKC Canine Health Foundation and the Morris Animal Foundation.