How selective sweeps in domestic animals provide new insight into biological mechanisms


  • L. Andersson

    1. From the Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University; and Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences; Uppsala, Sweden
    Search for more papers by this author

Leif Andersson, Department of Medical Biochemistry and Microbiology, Uppsala University, Box 582, SE-75123 Uppsala, Sweden.
(fax: +46 18 4714673; e-mail:


Abstract.  Andersson L (Uppsala University, Uppsala; and Swedish University of Agricultural Sciences, Uppsala; Sweden). How selective sweeps in domestic animals provide new insight into biological mechanisms (Review). J Intern Med 2012; 271: 1–14.

Genetic studies of domestic animals are of general interest because there is more phenotypic diversity to explore in these species than in any experimental organism. Some mutations with favourable phenotypic effects have been highly enriched and gone through selective sweeps during the process of domestication and selective breeding. Three such selective sweeps are described in this review. All three mutations are intronic and constitute cis-acting regulatory mutations. Two of the mutations constitute structural changes (one duplication and one copy number expansion). These examples illustrate a general trend that noncoding mutations and structural changes have both contributed significantly to the evolution of phenotypic diversity in domestic animals. How the molecular characterization of trait loci in domestic animals can provide new basic knowledge of relevance for human medicine is discussed.


Genome research in domestic animals is of considerable importance, both for potential applications in animal breeding and as a model to advance basic knowledge of the genetics underlying phenotypic variation. Humans have lived in close contact with domestic animals for thousands of years. Dogs were the first animals to be domesticated about 15 000 years ago while humans were still hunter–gatherers. Cattle, sheep, goats and pigs were then domesticated about 10 000 years ago at the time when humans became agriculturists and started to cultivate crops and keep animals. Horses and chickens were domesticated a few thousand years later. Domestication of the rabbit is assumed to have been a more recent event, occurring only about 1500 years ago [1].

Since the dawn of domestication, domestic animals have gradually changed phenotype owing to a combined effect of natural selection (the ability to survive and reproduce in captivity) and human-controlled selective breeding. This has created huge phenotypic diversity within domestic animals and genetic adaptation to a variety of climatic conditions and to different production systems. There is more phenotypic diversity to explore in domestic animals than in any experimental organism. For instance, it is hard to believe that all the various dog breeds that exist today have evolved from a single ancestral species (the wolf) within a fairly short time period. The domestication of animals and crops is clearly the most extensive genetic experiment that has been carried out by humans to change the frequency of mutations affecting phenotypic traits. This genetic screen also differs from mutation screens in experimental organisms, as it has primarily resulted in a collection of nondeleterious variants. Therefore, domestication is an excellent model for evolutionary biology. A consequence of the long history of domestication is that we are able to study evolution in action, as alleles evolve by accumulating several consecutive mutations. For example, dominant white colour in pigs is caused by the combined effect of at least two different mutations, a 450-kb duplication involving the entire KIT gene [2] and a splice mutation in one of the two KIT copies that leads to exon skipping [3]. It is very likely that haplotypes associated with complex disorders in humans may also differ by multiple substitutions with functional effects, which will certainly complicate the dissection of genotype–phenotype relationships.

During the course of domestication, humans have cherry-picked mutants with visible effects, while alleles with more subtle effects have been enriched by selective breeding, a process that has been carried out with increasingly more sophisticated methods. During the last 50 years, animal breeding based on the theory of quantitative genetics has resulted in a remarkable progress in terms of increased productivity. Domestic animals provide unique opportunities to explore genotype–phenotype relationships owing to this long history of phenotypic selection. The primary strength of domestic animals is not as a model for inherited disorders, as there is strong purifying selection against deleterious mutations in most populations but is their rich phenotypic diversity that can now be studied in great detail owing to the development of powerful tools for genomic analysis. Genetic studies of domestic animals are facilitated by (i) the ability to collect very large pedigrees, (ii) the ability to conduct specific breeding experiments, (iii) the fact that the genetic heterogeneity is not as extensive as in humans and (iv) the fact that it is often possible to collect tissue samples of high quality for experimental studies.

The aim of this review is to provide some examples of how the molecular characterization of mutations that have gone through selective sweeps in domestic animals can provide new insights into biological mechanisms of relevance for human medicine.

The example of the white horse

Almost all white horses in the world carry the same dominant mutation causing the coat colour variant named ‘greying with age’ or simply ‘Grey’. Grey horses are born any colour (brown, black, chestnut, etc.) but already during their first year of age they start to grey, and usually by the age of 6–8 years, they are completely white (Fig. 1a). To refer to a white horse as grey is a bit confusing but reflects this gradual loss of hair pigmentation. The white horse is considered charismatic, and this phenotype has had a huge impact on human culture all over the world. The white horse has represented outstanding beauty and nobility. It has often been used in fiction (e.g. the prince on the white horse) and portrayed in art (Fig. 1b), and numerous restaurants and inns have been named ‘The White Horse’. We assume that Grey is an old mutation that occurred thousands of years ago. The oldest written record of white horses we have found is from the Greek historian Herodotus, who noted that the Persian emperor Xerxes kept sacred white horses when he invaded Greece about 2500 years ago. It is easy to imagine the huge impression a herd of white horses would make if you had never seen such animals before. Herodotus’ record suggests that white horses were rare in Europe at that time, but paintings and written documentation show that they were widespread in Europe during mediaeval times, and we know that they were present in the Nordic countries around the year 1000 because Icelandic sagas document that the Vikings brought white horses with them when they colonized Iceland.

Figure 1.

 Greying with age in horses. (a) A Grey horse that has become almost completely white. Photograph: Shutterstock images. (b) Entrance of the knight. Painting by Axel Wallert, 1925. (c) Gene content of the region on chromosome 25 harbouring the Grey locus. The 352-kb region showing complete association with the Grey mutation is indicated by a box. The duplicated region is marked by a vertical arrow. The Grey mutation is associated with the upregulated expression of the NR4A3 transcript as well as two splice forms of STX17 (indicated by horizontal arrows). This figure is adapted from the study of Rosengren Pielberg et al. [6].

The Grey mutation has been spread widely owing to its spectacular phenotype combined with tissue-specific effects on pigmentation with no known negative pleiotropic effects on other tissues. A Grey horse is as strong and as fast as a horse of any other colour; Grey horses perform well on the race track, which is a challenging test of general fitness. The mutation causes premature hair greying but leaves skin and eye pigmentation unaffected. Thus, a white horse has dark skin under its white coat. It is interesting that the white colour is associated with a very high incidence of melanomas approaching 100% in old Grey horses. These melanomas are usually benign, and a Grey horse can live with many nodular melanomas for years. However, in some cases, these develop into a widespread melanoma that leads to premature death of the horse [4]. It is very likely that the same mutation is the cause of both hair greying and predisposition to melanoma because this association is consistent across all breeds in which the Grey allele is present. Unlike fair-skinned humans who are sensitive to damaging ultraviolet (UV) light, Grey horses are not sensitive to UV light as their very dark skin provides good protection against UV light. In fact, Arabian horses are often white and appear well adapted to a hot climate with intense sun light; their white coat is reflective, and their dark skin provides UV protection.

We sought to identify the gene and mutation causing greying with age because we reasoned that the characterization of a dominant mutation causing premature hair greying and predisposition to melanoma development is likely to reveal an interesting biological mechanism. Initially, we used classical linkage mapping based on two large paternal half-sib families in which the two sires were heterozygous Grey (G/g) and all dams were homozygous non-Grey (g/g). This resulted in the assignment of the Grey locus to a 6.9-Mb region on the horse chromosome 25 corresponding to a region on the human chromosome 9 [5]. At this time (in 2005), no horse genome assembly was available so we predicted the gene content of this region on the basis of conserved synteny to the corresponding region in the human genome. No obvious candidate gene with a previously described role in pigmentation biology was identified, suggesting that greying with age is caused by a mutation in a novel gene.

We assumed that all Grey chromosomes were derived from a single ancestral individual, which implied that all contemporary Grey chromosomes share a segment in the vicinity of the Grey mutation that is identical by descent. We identified such an identical-by-descent region comprising 352 kb by genotyping a large number of single nucleotide polymorphisms (SNPs) from this region across a panel of Grey horses [6]. The region contained four genes (NR4A3, STX17, TXNDC4 and INVS; Fig. 1c). None of these four genes had previously been associated with a function in melanocytes. We reasoned that a gene not expressed in Grey melanoma tissue is unlikely to cause greying with age, but expression analysis revealed that all four genes were expressed in Grey melanomas. We then sequenced all coding sequences of the four genes but did not identify any sequence polymorphism that was unique to Grey horses. However, Southern blot analysis combined with PCR analysis and sequencing revealed a 4.6-kb tandem duplication located in intron 6 of STX17 (Fig. 1c). A screen comprising 1000 horses representing 14 breeds demonstrated that this duplication was only found in Grey horses, and all tested Grey horses were either heterozygous or homozygous for the duplication. In a subsequent study (Sundström et al. in preparation), we resequenced the whole 352-kb region from Grey and non-Grey horses, and the duplication was the only unique difference between Grey and non-Grey chromosomes, providing conclusive evidence that this intronic duplication is causing greying with age.

The identification of the causative mutation established for the first time a diagnostic DNA test for the Grey locus. The DNA test confirmed our assumption that all Grey horses share a single causative mutation. This is therefore a classic example of a selective sweep where humans must have identified one or a few horses that turned white and used these for breeding. Today, perhaps 10% of the world’s horse population is Grey, and some such as Lipizzaner horses, used by the famous Spanish riding school in Vienna, are characterized by their white colour.

We used the diagnostic test to study genotype–phenotype relationships in a cohort comprising about 800 Lipizzaner horses with detailed phenotypic records. We demonstrated that Grey homozygotes become grey faster than heterozygotes and also show a higher incidence of melanoma [6]. An interesting feature of Grey horses is that many show speckling, which involves a huge number of small pigmented spots that escape the greying process. Such horses are often referred to as flea- or fly-bitten Grey, depending on the size of the pigmented spots. This is remarkable, as the Grey allele shows 100% penetrance. So, why should it leave pigmented spots in this way? Our data revealed that essentially only Grey heterozygotes show these peculiar spots, implying that they represent somatic changes (genetic or epigenetic), which inactivate the causative mutation.

How can we explain that an intronic duplication in a poorly studied gene, with no previously described mutation associated with a phenotype in any other organism, has such a dramatic phenotypic effect as greying with age? STX17 encodes syntaxin 17, a member of the syntaxin family of proteins involved in vesicle transport. Expression analysis using melanoma tissue from Grey heterozygotes revealed that this intronic duplication is a cis-acting regulatory mutation that upregulates two different STX17 transcripts (one long transcript including the entire coding sequence and one short transcript initiated just downstream of the duplication in intron 6) as well as the neighbouring NR4A3 transcript (Fig. 1c). This result was obtained by quantifying the relative expression from the Grey and non-Grey chromosomes in Grey heterozygotes [6]. NR4A3 encodes a member of the NR4A group of the nuclear hormone receptor superfamily [7]; NR4A3, NR4A1 and NR4A2 constitute early response genes that are induced by serum, growth factors and receptor engagement and are implicated in cell mitogenic responses [8]. It is possible that the phenotypic effect of this duplication is primarily mediated by its impact on NR4A3 expression despite the fact that the mutation is located in an intron of STX17.

The expression data indicated that the duplication may act as an enhancer with long-range effects. To test this hypothesis, we developed transgenic zebrafish with a reporter construct containing one or two copies of the Grey duplication, fused to a minimal promoter and a LacZ reporter [9]. This experiment provided a remarkable replication of the characteristics of the Grey mutation. Weak reporter expression was observed using the construct containing a single copy of the Grey duplication, whereas the construct containing two copies mimicking the Grey allele was associated with strong expression in neural crest-derived cells that subsequently developed into melanophores, which correspond to melanocytes in other vertebrates. The results show that the duplication transforms a weak enhancer to a strong melanocyte-specific enhancer in perfect agreement with the horse phenotype. Further insight into the functional role of the duplicated region was obtained using transfection experiments of reporter constructs with both mouse melan-a melanocyte cells and mouse C2C12 myoblasts [9]. This confirmed that the duplicated region contains one or several enhancer elements with tissue-specific effects. The fragment with the strongest enhancer effect contains two sequences with a perfect match to the consensus binding motif for microphthalmia-associated transcription factor (MITF). MITF is a key regulator of transcription in melanocytes, and inactivation of MITF completely abolishes the development of melanocytes [10]. Deletion of the MITF-binding sites in the reporter construct abolished the melanocyte-specific enhancer effect, and inactivation of MITF expression in the transgenic zebrafish silenced the reporter expression [9]. These results now provide a reasonable explanation of why this intronic duplication can have such a tissue-specific effect in melanocytes. However, it is still unclear whether the phenotypic effect is primarily caused by the upregulation of STX17 or of NR4A3 or whether it is a combined effect of the two genes. In fact, at the present time, we cannot even rule out the possibility that the duplication of this enhancer element may influence gene expression in other parts of the genome.

Ever since the first description of melanomas in Grey horses in 1903 [11], why the same dominant mutation is associated with both premature hair greying and susceptibility to melanoma has remained an enigma. Our results now provide a plausible explanation. Grey horses are born with normal pigmentation but start to grey during their first year of life. Every time a new hair develops, the melanocytes producing the pigment for the growing hair are recruited from a pool of melanocyte stem cells [12]. This implies that the Grey phenotype is probably caused by a defect in the maintenance of the melanocyte stem cell pool. We propose that the Grey mutation is an activating mutation that causes over-recruitment of stem cells during hair growth, leading to the premature depletion of melanocyte stem cells. This hypothesis is consistent with a characteristic feature of Grey horses, namely that their coat colour darkens before they start to grey. This cyclic process does not occur in dermal melanocytes, and we propose that the activating mutation stimulates melanocyte expansion and therefore predisposes to development of melanoma.

Structural variants contribute significantly to phenotypic diversity in domestic animals

Greying with age in horses is one example of a structural variant causing a phenotype in domestic animals. Another example is the Pea-comb phenotype in chickens. The fully dominant Pea-comb allele drastically reduces the size of the comb and wattles in chickens. Pea-comb is a classic monogenic trait in chickens and was in fact one of the traits used by Bateson in his seminal paper from 1902 when he first demonstrated the principles of Mendelian inheritance in animals [13]. We have identified the causative mutation for Pea-comb using a similar strategy as described for the Grey horse. Pea-comb is caused by a massive expansion of a duplicated sequence located in intron 1 of the gene for the transcription factor SOX5 [14]. The duplicated sequence is 3.2 kb in size and is present in two copies in the wild-type allele, whereas the Pea-comb allele contains about 30 copies. SOX5 is an important transcription factor in all vertebrates and, together with SOX6, has a critical role in the development of cartilage and bone [15]. Sox5 knockout mice die soon after birth owing to developmental defects [16]. The SOX5 protein is highly conserved among vertebrates, and the SOX5 genomic region contains many evolutionarily conserved, noncoding sequences [14], suggesting complex regulation.

We used immunohistochemistry and in situ hybridization to investigate how the Pea-comb mutation affects SOX5 expression during development [14]. SOX5 showed a transient ectopic expression in Pea-comb birds during only a few days of development, around days 6–12, confined to cell layers in which the comb and wattles subsequently develop (Fig. 2). This demonstrates that Pea-comb is a tissue- and temporal-specific, cis-acting regulatory mutation. This ectopic expression of a critical transcription factor, which occurs during a limited period of development, alters the shape of the comb for the entire lifetime of the bird. This finding illustrates how challenging it can be to determine the consequences of a regulatory mutation. It is critical to sample the right tissue at the right time, and it may not be sufficient to carry out RT-PCR analysis of mRNA expression, because the spatial resolution provided by immunohistochemistry or in situ hybridization may be essential for revealing the altered expression in a small subset of cells (see Fig. 2). Why ectopic expression of SOX5 alters the development of comb and wattles remains unclear, but it is plausible that this potent transcription factor affects some cells that have a critical role in the normal development of the comb and wattles. We also do not know how a massive copy number expansion in the large (>200 kb) first intron of SOX5 leads to this ectopic expression, but there are several possible mechanisms. First, this intron contains a large number of evolutionarily conserved sequences, and there are two such elements – conserved among mammals, birds and fish – within 10 kb of the duplicated region [14]. Therefore, it is possible is that the massive copy number expansion disturbs the function of these elements, for instance by epigenetic silencing of the region; it is well established that stretches of repeated sequences tend to attract epigenetic silencing [17]. A second possibility is that the duplicated sequence may contain a very weak enhancer that becomes a strong enhancer owing to the copy number expansion, which would thus mimic the mechanism established for the Grey horse. Considering the phenotypic effects in Pea-comb chickens, it is tempting to speculate that the timing of expression for important regulators of development – such as SOX5 – underlies the huge phenotypic difference in facial appearance (shape and size of ears and nose, distance between eyes, etc.) in humans.

Figure 2.

SOX5 immunostaining in Pea-comb and wild-type embryonic heads. (a, d) Schematic drawings of a sagittal view and cross section of an embryonic day (e) 7 chick head. Green indicates SOX5 immunostaining, which is identical in wild-type and Pea-comb birds, and red indicates SOX5 staining in the Pea-comb alone. The planes of the drawings are shown as shaded lines. Scale bar, 1 mm. Fluorescence micrographs of the wattle and comb regions with SOX5 immunostaining and DAPI nuclear staining of E7 wild-type (b, e, i) and Pea-comb (c, f, j) birds. (g, h) Bright-field micrographs of cRNA in situ hybridization for SOX5 mRNA in wild-type and Pea-comb. The positions of the comb and wattle regions shown in panels b, c and e–j are in boxes in the schematic drawings. Scale bars, 100 μm. (k–p)SOX5 immunostaining and DAPI nuclear staining in the comb region of E6, E9 and E12 wild-type and Pea-comb chickens. Insets show schematic drawings of the comb-ridge shapes in wild-type and Pea-comb birds. The positions of the corresponding fluorescence micrographs are in boxes. Scale bar, 100 μm. Ect, ectoderm; e, eye; l, lumen of nostril; m, Meckel’s cartilage; me, mesenchyme; nr, neural retina; o, optic lobe; s, interorbital septum; st, stage according to Hamburger and Hamilton [47]; t, tongue; te, telencephalon; wt, wild-type. From Wright et al. [14].

The Grey horse and Pea-comb chicken are two examples of structural mutations underlying phenotypic traits in domestic animals, and it now appears that structural variants (duplications, deletions, inversions and insertions) contribute significantly to phenotypic diversity among these animals. Table 1 shows a number of structural changes, larger than 100 bp in size, that have been associated with phenotypes in domestic animals. The phenotypic effects of these mutants are probably due to altered gene expression. Deletion of most of the coding sequence of the poorly characterized SH3RF2 gene, associated with growth in chicken, is probably a classical loss-of-function mutation. Several of the mutations listed in Table 1– including Grey and Tobiano in horses, Pea-comb and Dark brown in chicken, Polled in goat and the combined duplication and copy number expansion associated with familial Shar-Pei fever in dogs – constitute structural changes that involve only intronic or intergenic regions. Other mutations, including Dominant white in pigs and sheep and Ridge in dogs, are large duplications including entire genes with flanking noncoding regions. These may represent simple dosage effects, but it is more likely that altered gene regulation is caused by a change in the configuration of regulatory elements in the region. Finally, the common form of chondrodysplasia (short legs) in many dog breeds including dachshund, corgi and basset hound is caused by the expression of a recently acquired insertion of a retrogene encoding fibroblast growth factor 4 (FGF4) [18]. It is interesting that the same gene is also part of the 133-kb duplication causing the hair ridge in Ridgeback dogs [19].

Table 1.   Examples of phenotypic traits in domestic animals in which the causative mutation constitutes a chromosomal rearrangement
  1. aMassive expansion of a duplicated sequence.

  2. bThis mutation also predisposes to familial shar-pei fever, which is a periodic fever syndrome. The duplication shows copy number expansion.

  3. cLack of horn, also associated with intersexuality in males.

  4. dThis mutation also predisposes to melanoma development.

CattleStill-birth110-kb deletionPEG3 [39]
ChickenPea-combCopy number variationaSOX5 [14]
QTL for growth19.0-kb deletionSH3RF2 [27]
Dark brown colour8.3-kb deletionSOX10 [40]
DogHair ridge133-kb duplicationFGF3, FGF4, FGF18, ORAOV1 [19]
ChondrodysplasiaRetrogene insertionFGF4 [18]
Wrinklesb16.1-kb duplicationHAS2 [41]
Startle disease4.2-kb deletionSLC6A5 [42]
GoatPolledc11.7-kb deletionPISRT1, FOXL2 [43]
HorseGreying with aged4.6-kb duplicationSTX17 [6]
Tobiano white spotting∼40-Mb inversionKIT [44]
PigDominant white colour450-kb duplicationKIT [2, 45]
SheepWhite colour190-kb duplicationASIP, AHCY [46]

The structural changes associated with phenotypes in domestic animals show some unique features when compared to the many copy number variations (CNVs) that have been documented in humans. Most of the mutations listed in Table 1 represent changes in single copy sequences, whereas many of the CNVs in humans represent expansions and contractions of duplicated sequences. An important difference is that a majority of common CNVs in a mammalian genome is expected to be selectively neutral and represent regions of the genome in which CNV is tolerated. By contrast, the mutations listed in Table 1 have all been identified based on their association with phenotypic effects. Those CNVs that have been firmly associated with phenotypes in humans (often autism or mental disorders) are large duplications and deletions (often several megabases) with deleterious effects, whereas most of the changes listed in Table 1 are smaller structural changes that alter the phenotype without causing obvious deleterious effects. The emerging picture in domestic animals that structural changes cause phenotypic effects by reorganizing the configuration of regulatory elements strongly suggests that structural changes play a similarly important role during evolution via natural selection and that these rearrangements, more so than the more common expansion and contraction of duplicated sequences (CNVs), may also be relevant for human disease. Much more will be learned about the importance of structural changes in domestic animals and other organisms in the coming years owing to the extensive use of new technologies such as high-throughput sequencing.

Detection and analysis of a quantitative trait locus affecting muscle growth

In humans, family-based linkage analysis has had low power to find genes underlying multifactorial disorders, such as diabetes, whereas it has been very successful at mapping monogenic disease loci. Therefore, genome-wide association analysis has become the favoured approach for complex disorders. By contrast, family-based linkage analysis is usually a powerful tool for mapping quantitative trait loci (QTLs) in domestic animals because one can take advantage of very large family sizes and the possibility to construct specific pedigrees for gene mapping. The major problem with QTL mapping in domestic animals is not the reliable detection of true marker–trait associations but finding the causal mutation following identification of a linked marker. This is attributable to poor map resolution in the absence of a simple one-to-one genotype–phenotype relationship. However, one of the major success stories in this field is the identification of the causal mutation for a QTL affecting body composition (muscle growth versus fat deposition) in the domestic pig.

In the late 1980s, we crossed the European wild boar with large white domestic pigs to generate a three-generation pedigree that could be used both to establish a primary linkage map for the pig and to attempt to identify genes that explain the many striking phenotypic differences between a wild boar and a domestic pig. The phenotypic diversity in the F2 generation was extensive with regard to both monogenic traits such as coat colour [20, 21] and multifactorial traits such as growth and body composition [22]. We noticed that in general the F2 animals became more obese than purebred domestic pigs. The explanation for this lay in the major selection goal in pig breeding during the last 60 years, which has been to increase muscle growth and reduce fat deposition. As a consequence, the proportion of fat in pork chops has been markedly reduced in modern domestic pigs (Fig. 3a). We mapped a QTL with a major effect on body composition to the distal tip of pig chromosome 2 [23]. The allele originating from the domestic pig increased muscle growth, reduced subcutaneous fat depth and increased the size of the heart. This QTL allele has gone through a selective sweep, and the majority of pigs used for commercial meat production in the Western world carry the QTL allele favouring muscle growth.

Figure 3.

 The IGF2 locus in pig and the discovery of ZBED6. (a) Pork chops with marked differences in the relative proportion of fat and muscle tissue. Photograph: Ingemar Hansson. (b) Schematic illustration of the single nucleotide polymorphism in intron 3 of IGF2 underlying a major quantitative trait loci (QTL) effect on body composition in pigs. The mutation disrupts the interaction with the ZBED6 protein. (c) The genomic region on mouse chromosome 1 including the Zc3h11a and Zbed6 genes. This figure is adapted from the study of Markljung et al. [26]. (d) Phylogenetic tree of ZBED6 sequences. Different types of placental mammals are included in this figure and represent a subset of those species for which full or partial genome sequence data are available. This figure is adapted from the study of Andersson et al. [25].

Interestingly, the pedigree data demonstrated that the QTL was imprinted and showed paternal expression. The fact that the QTL was co-localized with the gene for insulin-like growth factor 2 (IGF2), which was the only known paternally expressed gene from this chromosomal region, immediately identified IGF2 as the prime candidate gene for the QTL. Family segregation analysis was used to classify chromosomes as carrying the wild-type or mutant allele at this QTL [24]. Approximately 28 kb from the InsulinIGF2 region was resequenced from five mutant chromosomes and 10 wild-type chromosomes. The data showed that the five mutant chromosomes were identical by descent as they demonstrated complete sequence identity for a 20-kb region covering most of the IGF2 gene. The data provided conclusive genetic evidence that a single-nucleotide substitution in a CpG island located in intron 3 of IGF2 was the causal mutation for this major QTL [24] and thus constituted a quantitative trait nucleotide (QTN). Causality was proven because we were able to find an ancestral haplotype that was identical to the mutant haplotypes for the entire critical region except at the mutated site. The CpG island harbouring the causal mutation is well conserved, and 16 base pairs including the mutated site show 100% sequence identity among all placental mammals for which sequence data are available [25]. Northern blot and real-time PCR analysis showed that the mutation is a cis-acting regulatory mutation with tissue- and temporal-specific effects. The mutation upregulates IGF2 expression in postnatal skeletal and cardiac muscle but not in prenatal tissue or in postnatal liver. Furthermore, gel-shift analysis indicated that the interaction between an unknown nuclear protein and the mutated site is abolished by the mutation (Fig. 3b).

We used biotin-labelled oligonucleotides, corresponding to the wild-type and mutant sequence, combined with the stable isotope labelling of amino acids in culture (SILAC) method to capture the nuclear protein interacting with the IGF2 QTN [26]. It was surprising that the protein identified was a previously unknown protein unique to placental mammals. It had been overlooked during the annotation of mammalian genomes, including the human, as all coding sequences for this unknown protein were located in an intron of another gene denoted ZC3H11A (Fig. 3c), which itself encoded a poorly characterized zinc finger protein. The single exon encodes a protein of about 970 amino acids, and the new protein belongs to the hAT superfamily of DNA transposases, named after hobo from Drosophila, Activator from maize and Tam3 from snapdragon. It contains two N-terminal BED DNA-binding domains and a carboxyterminal hATC dimerization domain. We named the protein ZBED6 because it is the sixth human protein that contains one or more BED domains. An evolutionary scenario emerges whereby ZBED6 has evolved from a domesticated DNA transposon that must have integrated in an ancestral species before the split between monotremes and other mammals because we find traces – but no conserved opening reading frame – of ZBED6 within the same intron of ZC3H11A in the platypus and opossum genomes. ZBED6 is conserved among all sequenced placental mammals, and the BED domains show 100% sequence identity at the protein level among species. Thus, ZBED6 must have evolved a critical function after the divergence of marsupials and eutherians but before the radiation of different families of placental mammals (Fig. 3d). Therefore, ZBED6 is an innovation in placental mammals that may have contributed to the evolution of this group.

Luciferase assays, using a construct containing the IGF2 P3 promoter and the ZBED6-binding site from the QTN region, in conjunction with siRNA-based silencing of ZBED6, provided conclusive functional evidence that ZBED6 binds the QTN region and can repress transcription from the IGF2 P3 promoter [26]. Chromatin immunoprecipitation combined with high-throughput sequencing using an anti-ZBED6 antibody and mouse C2C12 myoblast cells revealed ∼2500 putative binding sites for ZBED6. One of these sites overlapped the sequence orthologous to the QTN region in the porcine IGF2 gene. A motif search based on these 2500 putative binding sites (excluding IGF2) resulted in a consensus motif of 5′-GCTCGC-3′, which constituted a perfect match to the mutated site in pigs (GCTCGC mutated to GCTCAC). ZBED6-binding sites are preferentially located within CpG islands and in the vicinity of transcription start sites. Approximately 1200 genes had one or more ZBED6-binding sites located within 5 kb of their start of transcription. A gene ontology analysis showed that this set of 1200 genes was clearly a nonrandom grouping, as they were highly enriched for developmental genes and transcription factors. The results suggested that ZBED6 may have a crucial regulatory role in placental mammals. A major topic for future research will be to further unravel the functional significance of ZBED6 and the mechanism whereby it regulates transcriptional activity.

Next-generation sequencing will unleash the full potential of animal genomics

The three examples presented here all concern favourable alleles that have been highly enriched by strong phenotypic selection (selective sweeps). We reasoned that it should be possible to find such sweeps by whole-genome resequencing based on next-generation sequencing technology. For instance, whole-genome resequencing of pigs used for meat production in Europe would reveal the lack of genetic variation around the IGF2 locus, or resequencing of white horses would reveal an identical haplotype block including the STX17 gene. The classical approach for whole-genome sequencing is to sequence each individual to a depth of between 15× and 30× and then determine the genotype at each polymorphic locus with confidence, because truly heterozygous positions are expected to generate an equal number of reads of both alleles, whereas sequencing errors are expected to be associated with a significant deviation from the expected 1 : 1 ratio for true SNPs. However, to use this approach for a population sample would require a huge amount of sequence data. Assuming that we generate 20× coverage for each individual, use 10 individuals from 10 populations and that the genome size is about 1 Gbp (as in the domestic chicken), this would require the generation of a total of 2000 Gbp of sequence. We therefore proposed an alternative approach in which pools of samples are sequenced and the allele frequency is determined directly from the pooled data [27]. Using this approach and a sequence depth of 5× per pooled sample, we were able to generate reliable estimates of the level of homozygosity in 40-kb sliding windows throughout the chicken genome. Thus, the total amount of sequence data required is reduced about 40-fold. The approach is highly efficient for finding alleles with striking frequency differences between populations, such as those alleles that have gone through selective sweeps, but inefficient for finding rare alleles associated with rare diseases.

In a pilot experiment, we sequenced four different populations of broilers (chicken selected for muscle growth), four different layer populations (chicken selected for egg production) and the red junglefowl (the wild ancestral species of the domestic chicken) (Fig. 4). Before 1900, chicken breeds were multipurpose and were thus used for both meat and egg production, but in the early 20th century, specialized broiler and layer breeds were established to avoid the conflict between selection for both growth (meat) and reproduction (egg) in the same birds. We generated about 5 Gbp of sequence from each population and identified more than 7 million SNPs, a major advance in chicken genomics [27]. We then estimated the degree of homozygosity in 40-kb sliding windows for each population. It was assumed that these eight populations of domestic chickens were homozygous for the recessive allele causing the yellow skin phenotype. This was apparent from the sequence data that showed a high homozygosity across domestic chickens that peaked just at the BCDO2 locus, which is known to cause the yellow skin phenotype [28]. This provided proof-of-principle evidence that our approach has the power to reveal regions of homozygosity associated with selective sweeps.

Figure 4.

 Experimental design of a whole-genome resequencing experiment in the chicken [27]. CB-1 and CB-2 are two different lines of commercial broilers.

In total, we identified 38 loci in the chicken genome with convincing support for the presence of selective sweeps; 23 of these were shared among all domestic populations, nine were unique to broilers, and six were found only in layers [27]. The most striking selective sweep coincided with the TSHR locus on chromosome 5 encoding the thyroid stimulating hormone receptor. A bioinformatic analysis of the TSHR sequences revealed a nonconservative missense mutation, glycine to arginine at residue 558, associated with the sweep haplotype present in domestic chickens but rare in the red junglefowl. All TSHR sequences from other vertebrates contain glycine at this position. We then screened 271 birds representing 36 populations of chickens with geographic origins ranging from Iceland to China for this missense mutation. Every domestic chicken tested carried at least one copy of the derived allele; seven birds were heterozygous for the mutation, whereas all others were mutant homozygotes. TSHR could thus be termed a domestication locus in chickens. No phenotypic effect associated with the TSHR haplotype has yet been established. However, the finding of a strong selective sweep at the TSHR locus is very interesting because the thyroid system has a crucial role in metabolic regulation, and in particular, TSHR signalling in the brain of both birds and mammals regulates photoperiod control of reproduction [29]. Altered genetic control of reproduction is a characteristic feature of domesticated animals. The functional implications of the TSHR sweep in chickens are currently being investigated.

We also used the resequencing data to screen for structural changes and identified about 1300 deletions that occurred at high frequency in at least one of the tested populations [27]. A striking feature of this set of deletions was the absence of large deletions; only 16 were larger than 6.4 kb in size. It is very likely that large deletions are kept at low frequency as a result of strong purifying selection in chickens selected for growth or egg production. Only seven of the 1300 deletions clearly affected coding sequences. One of these removed all but the first exon of a novel gene SH3RF2 (SH3 domain containing ring finger 2) that is evolutionarily conserved between birds and mammals; however, the function of this gene is completely unknown. The deletion was fixed in the high growth line, a line selected for high body weight at 8 weeks of age, and was co-localized with a growth QTL detected in an intercross between the high growth and low growth selection lines (see Fig. 4). Further analysis of segregation data in this intercross as well as expression data from the hypothalamus provided strong support for the hypothesis that the deletion is in fact the causal mutation for this QTL on chicken chromosome 20 [27]. These lines of chickens therefore provide a unique animal model to explore the function of SH3RF2 and test the possibility that SH3RF2 affects growth via appetite regulation, which shows dramatic differences between the two lines.

The new sequencing technologies will revolutionize the field of animal genomics, as well as essentially all fields of biology, including human medicine. These methods will be used to produce extensive collections of genome sequences, transcriptome data and epigenetic data. Therefore, we can expect to see a steady stream of new discoveries to explain the genotype–phenotype relationships in domestic animals, similar to the examples presented in this review.

Concluding remarks

Animal genomics will inform human medicine by contributing to a better understanding of genotype–phenotype relationships and biological mechanisms as illustrated in this review. The strength of animal genomics is the very powerful genetic studies that can be carried out to elucidate causal relationships between sequence changes and phenotypic variation, which is very challenging for multifactorial disorders in humans. As illustrated in Table 2, data from different domestic animals are informative for particular human disease. An increasing number of monogenic disorders in domestic animals have been characterized at the molecular level. The reason for this is that genetic studies are becoming increasingly more powerful as SNP chips and next-generation sequencing have become available. Such gene identifications may establish large animal models for human disease that can be used to explore better treatments. As an example, we identified that leucocyte adhesion deficiency (LAD) in dogs is caused by a missense mutation in the gene for integrin beta 2 (ITGB2), the same gene that was previously associated with LAD in humans [30]. This dog model has since been used to develop an efficient gene therapy procedure for this lethal disease [31].

Table 2.   Examples of domestic animals that are particularly informative for various disorders of major clinical importance
Type of diseaseSpecies
Monogenic disordersAll
Metabolic disordersChicken, pig, cattle
Cardiovascular diseaseDog, pig
Inflammatory disordersDog
Neurological disordersDog
MelanomaGrey horse, pig, dog
Infectious diseasesChicken

The domestic animals used for large-scale production of animal protein (cattle, pigs and chickens) are all interesting models for metabolic disorders because body composition and metabolic regulation have been markedly changed by the selection schemes applied in these species. The IGF2 locus in pigs is an excellent example of a major discovery that became possible owing to the massive selective sweep that has occurred at this locus as a consequence of selection for altered body composition (more muscle, less fat). A major advantage with these species is the extensive phenotyping of a large number of animals that is carried out as part of the normal breeding evaluation. The fact that SNP chips and whole-genome sequencing are frequently used in commercial populations provides emerging opportunities for powerful genetic studies using these resources.

The pig is primarily a production animal, but it is also used as an experimental organism for biomedical research [32]. For instance, a line of pigs that develop atherosclerosis owing to a missense mutation in the low-density lipoprotein receptor gene (LDLR) has been established as a model of cardiovascular disease [33]. There is also a very interesting melanoma model in pigs [34]. Essentially, all pigs in this line develop melanoma, but almost 100% of the affected pigs recover from the disease and extensive loss of pigmentation occurs in the skin surrounding the tumour, implying an immune attack on both melanoma cells and normal melanocytes.

The dog is in many respects an ideal model for human disease [35, 36]. It shares many environmental factors with humans, it lives long enough to suffer from chronic disorders, and canine illnesses are often relevant to human clinical conditions. Therefore, there are now extensive ongoing genetic programmes, for instance the European LUPA project [37], that utilize the dog as a model for a variety of human disorders, including monogenic, cardiovascular, inflammatory and neurological disorders as well as malignancies. However, the dog has also its limitations as a genetic model. Some of the most interesting phenotypic variations in dogs occur between breeds rather than within breeds. It would therefore be very interesting to carry out experimental crosses to find out the genetic basis for breed differences, for instance in behaviour. However, it is difficult to carry out such breeding experiments because the dog is not well accepted as an experimental animal. Another limitation with regard to the dog as a model organism is that it is difficult to collect tissue samples from internal organs for functional studies because most dogs are privately owned.

Susceptibility to infectious disease and parasites is of profound importance in domestic animals as such infections cause suffering for the animals and sometimes severe financial loss, in particular in tropical countries. Therefore, the new genomic tools are of considerable interest to establish associations between genetic markers and disease susceptibility and to apply this information within breeding programmes [38]. The domestic chicken is particularly useful for this type of research owing to the very large population sizes and the possibility of conducting pathogen challenge tests in experimental units. This research may lead to new insights into the biological mechanisms underlying disease resistance and susceptibility.

Conflict of interest statement

No conflicts of interest were declared.


Sincere thanks are due to Jennifer Meadows for valuable comments on the manuscript. The author’s research programme is funded by the Swedish Foundation for Strategic Research, the Swedish Research Council, the Swedish Cancer Society and Formas.