Genetic diversity in farm animals – a review


S. Weigend, Institute of Farm Animal Genetics, Friedrich-Loeffler-Institut, Hoeltystr. 10, 31535 Neustadt, Germany.E-mail:


Domestication of livestock species and a long history of migrations, selection and adaptation have created an enormous variety of breeds. Conservation of these genetic resources relies on demographic characterization, recording of production environments and effective data management. In addition, molecular genetic studies allow a comparison of genetic diversity within and across breeds and a reconstruction of the history of breeds and ancestral populations. This has been summarized for cattle, yak, water buffalo, sheep, goats, camelids, pigs, horses, and chickens. Further progress is expected to benefit from advances in molecular technology.


Domestication of animals was an essential step in human demographic and cultural development. Together with the domestication of plant species it laid the foundation of agriculture as we know it today (Diamond 2002). During the subsequent history of livestock, the main evolutionary forces of mutation, selective breeding, adaptation, isolation and genetic drift have created an enormous diversity of local populations. In the last centuries, this has culminated in the formation of many well-defined breeds used for a variety of purposes with differing levels of performance. During the last decades, development of and increased focus on more efficient selection programmes have accelerated genetic improvement in a number of breeds. Artificial insemination and embryo transfer have facilitated the dissemination of genetic material. In addition, progress in feed technology has allowed optimal nutrition, while enhanced transport and communication systems have led to uniform and strictly controlled production environments. As a result, highly productive breeds have replaced local ones across the world. This development has led to growing concerns about the erosion of genetic resources (Food and Agriculture Organization of the United Nations, FAO 2007b). As the genetic diversity of low-production breeds is likely to contribute to current or future traits of interest (Notter 1999; Bruford et al. 2003; Toro et al. 2008), they are considered essential for maintaining future breeding options. According to the FAO, 20% of the roughly 7600 breeds reported worldwide, belonging to 18 mammalian species and 16 avian species, are at risk, and 62 breeds became extinct within the first 6 years of this century (FAO 2007b).

Effective management of farm animal genetic resources (FAnGR) requires comprehensive knowledge of the breeds` characteristics, including data on population size and structure, geographical distribution, the production environment, and within- and between-breed genetic diversity. Integration of these different types of data will result in the most complete representation possible of biological diversity within and among breeds, and will thus facilitate effective management of FAnGR. These objectives are addressed under one of the four Strategic Priority Areas of the Global Plan of Action for Animal Genetic Resources adopted by 109 countries at the first International Technical Conference on Animal Genetic Resources, held in Interlaken, Switzerland in 2007, and endorsed by the FAO Conference, (FAO 2007a).

It is widely accepted that detailed molecular data on within- and between-breed diversity are essential for effective management of FAnGR (e.g. Weitzman 1993; Hall & Bradley 1995; Barker 1999; Ruane 2000; Bruford et al. 2003; Simianer 2005; Toro & Caballero 2005; Toro et al. 2008). However, to date molecular methods only provide a fraction of the data needed to make informed management decisions. Many mechanisms controlling biological diversity are not understood. For instance, the link between functional diversity and diversity as assessed by neutral markers is not clear. Data on the environment in which breeds are raised may be informative regarding their adaptations and facilitate comparisons of their performance levels. Furthermore, demographic data, compiled across political borders, are needed to assess a breed’s risk status (FAO 2007b).

Here, we review the current state of knowledge regarding the evaluation of biological diversity of the main farm animal species. The sections on demographic characterization and production environment recording focus on data requirements and outline the current state of the availability of these data. This is followed by a review of breed description databases, which briefly describes the available infrastructure for data management and summarizes the types of publicly accessible phenotypic and demographic data. The section on genetic characterization in livestock briefly reviews current knowledge regarding domestication processes and breed diversity at the global and local level for cattle, water buffalo, goats, sheep, horses, pigs, camelids, yak and chickens. The available databases for storage and management of molecular data are the subject of a subsequent section. We conclude with an assessment of the adequacy of the infrastructure necessary for comprehensive analyses of global livestock diversity and outline prospects for the future.

Demographic characterization

Demographic data are fundamental to the assessment of the risk status of livestock breeds – a key step in the strategic planning of FAnGR management. Risk status depends on several factors. First, it is linked to the size and structure of the population. Effective population size (Ne) is the preferred measure for the assessment of risk status (FAO 1992; Gandini et al. 2004); it is approximated on the basis of the size of both the female and the male breeding populations. Knowing the Ne allows the rate of inbreeding, and hence the loss of genetic diversity within the population, to be inferred. Second, risk status depends on current and predicted future population trends. For instance, a rapid downward trend indicates a high level of risk. The third relevant factor is the geographical distribution of the population. A more concentrated population is more vulnerable to localized disasters, such as disease epidemics, than a widespread population. Demographic data obtained at the national level need to be considered in the context of the global demographics of the breeds in question. A breed that is common in other countries is likely to be a lower priority for national conservation. A basic requirement is to know whether a given national breed is genetically distinct or whether it is part of a larger population spread across several countries. In a recently developed classification (FAO 2007b), breeds present in only one country are termed ‘local breeds’ and those present in more than one country are termed ‘transboundary breeds’, the latter being further differentiated into ‘regional’ and ‘international’ transboundary breeds depending on the extent of their distribution. In 2008, 7040 local breeds, 500 regional transboundary breeds and 551 international transboundary breeds were recorded in FAO’s Domestic Animal Diversity Information System (DAD-IS; (FAO 2009).

The country, species and breed coverage of DAD-IS is described below. In the case of demographic data, much remains to be done to improve coverage. For approximately 53% of avian national breed populations and 48% of mammalian national breed populations recorded in DAD-IS the data necessary to provide even a basic assessment of risk status are unavailable.

Monitoring of trends in population size and structure is hampered by a lack of regular updates of demographic data. To allow effective monitoring, data should be collected at least once per generation of the species in question, particularly for breeds classified as at risk: about 8 years for horses and donkeys, 5 years for cattle, buffalo, sheep and goats, 3 years for pigs and 2 years for poultry species. The required frequency is also affected by the reproductive technology being used, which should be recorded as part of the monitoring process. For many breeds, particularly in developing countries, even if demographic data are available, they have not recently been updated. The methods used to collect the data affect their reliability, but consideration also needs to be given to the costs involved. Analysis of population data in DAD-IS shows that 87% of entries are based on a census or survey at breed level, while 11% are estimates based on a census at species level.

Data on the geographical distribution of breed populations are also limited. However, efforts to improve the situation are underway – textual data describing breed distribution that have been entered into DAD-IS are being converted into georeferenced coordinates; more comprehensive georeferencing is regarded as a priority as part of the implementation of set production environment descriptors (see below) within DAD-IS (FAO/WAAP 2008).

Key challenges for the future include the development of methods for representative sampling of national animal populations to estimate their total population sizes and other demographic data in a cost-effective manner. Another problem is the lack of measures that capture the genetic dilution caused by crossbreeding (FAO 2007b). It is not always clear whether, and to what degree, historic or recent interactions between breeds have affected their uniqueness. This applies especially to so-called non-descript local populations, which often merge gradually into neighbouring populations. Molecular characterization studies help to unravel such relationships, but need to be better coordinated and the results better combined.

Production environment recording

Descriptions of breeds’ production environments are important for many aspects of FAnGR management. They can be used to make inferences regarding the breeds’ characteristics, based on the assumption that being exposed to particular climates, feed resources and pathogens will over time have led to genetic differences in adaptation to environmental conditions. A comprehensive description of the production environment is also vital for meaningful evaluation and comparison of the performance of different breeds. More broadly, a deeper understanding of production environments – including socio-economic aspects such as markets – can help in the planning of the future use and development of the breeds.

While descriptions of the production environments of individual breeds – varying in their focus and level of detail – can be found, comparisons are difficult; and too often breeds are considered in isolation from their production environments. Efforts have therefore been made to develop a recognized set of ‘production environment descriptors’ to be used throughout the world as a common framework for describing production environments and to provide a basis for recording more detailed production environment data within DAD-IS (FAO 1998; FAO/WAAP 2008). Under the proposed framework, a production environment is divided into two main domains, the management environment and the natural environment. These domains are further broken down into a hierarchy of criteria. Most of the measures required for the natural environment domain (with the exception of the distribution of diseases and parasites) are now available on global high-resolution maps. Overlaying these data with georeferenced breed distributions will allow more comprehensive descriptions and analyses of the production environments. As noted above, georeferencing breed distributions is therefore a priority.

Breed description databases

Creating awareness through information dissemination is considered an important component in conservation and utilization of genetic resources. Accordingly, a number of websites try to address this issue, often from different perspectives. Three groups of databases can be identified:

First, breed societies maintain websites to describe their populations, with the intention of advertising their own genetic resources. Perceived strong points of a breed are emphasized, although not always substantiated through facts and figures. However, the websites give a useful and informative overview of a certain set of breeds, usually including images, while their outreach may be limited by the use of the national language. The website of the Devon Cattle Breeders’ Society may serve as an example ( It hosts information on the breed’s history, the breed society and lists perceived strong points of the breed such as ‘High Daily Weight Gains’ or ‘Longevity’. Interested readers find contact addresses for further information.

Second, after the ‘Convention on Biological Diversity’ was adopted at the United Nations Conference on Environment and Development held in Rio de Janeiro in 1992, national websites have been put in place by each country, with a complete coverage of those breeds considered to be part of their national heritage. An English version or at least an English introduction is sometimes available. Two examples are the German ‘Central Documentation of Animal Genetic Resources’ (TGRDEU; and the French ‘Bureau des Ressources Genetiques’ (BGR; Again, visually appealing presentation may get more emphasis than inclusion of hard facts.

Third, at the international level, only a few websites are available. The ‘Breeds of Livestock’ website run by the University of Oklahoma ( describes a respectable number of breeds of livestock, including cattle, goats, horses, sheep, pigs, buffalo, camelids and poultry, with differing degrees of detail. A similar website, solely for cattle, has been compiled by a South African company ( It contains phenotypic descriptions for about 140 of 950 listed breeds. As both websites are in English, they are useful for a wide audience.

While the above websites tend to address only within-country biodiversity with little or no factual data on performances and census data, the EAAP (European Association for Animal Production) database – initiated in the 1980s – is based on a questionnaire and contains a large number of factual data items on breeds from all over Europe (Simon 1990). It was the basis of FAO’s DAD-IS, which was redesigned to become FABISnet, a worldwide network consisting of communicating national and regional databases (Groeneveld et al. 2006).

These information systems target true global coverage, as all FAO member countries have agreed to report their breed data to DAD-IS, now the FAO node of FABISnet, through their officially appointed National Coordinators for the Management of Animal Genetic Resources. In contrast to other databases, factual information is stored in more than 200 clearly defined fields, allowing targeted database searches.

Furthermore, its multiple language capability and networking allows the setup of national databases, while ensuring seamless integration into the worldwide network headed at the FAO in Rome. Currently, a network of 13 national systems (Austria, Cyprus, Georgia, Estonia, Iceland, Ireland, Italy, The Netherlands, Poland, Slovakia, Slovenia, Switzerland and the United Kingdom) all over Europe is linked to the European EFABIS node (, which in turn is connected to FAO’s DAD-IS ( This regional setup can serve as a model for other regions of the world, and FABISnet will likely expand in the near future.

Compared with others, the FABISnet databases are the most comprehensive, with data from 198 countries and territories for more than 14 000 populations from 37 species, including descriptions of morphology, performance, reproduction and demographic data. A unique feature is that the degree of endangerment is automatically computed from the number of male and female breeding animals, or if this is not available, from the total population size. While a large number of breeds have been entered, the completeness of the information still needs improvement.

FABISnet goes well beyond breed descriptions as it is also a repository of documents related to the breeds, their conservation and utilization. While being far from exhaustive in all aspects, these websites provide a wealth of information on the breeds of the world.

Genetic characterization in livestock


Traditionally, taurine cattle (Bos taurus) and zebu (Bos indicus) are considered as separate species despite their complete interfertility (Lenstra & Bradley 1999). One of the first contributions of DNA research to a reconstruction of the domestication of cattle was a comparison of the mitochondrial DNA (mtDNA) of taurine and indicine cattle (Bradley et al. 1996). The divergence of their control regions implied separate domestications, which most likely started c. 8000 years bc in Southwestern Asia and the Indus valley respectively (Zeder et al. 2006).

Zebus were probably imported into Africa after the Arabian invasions in the 7th century (Bradley et al. 1998). Interestingly, the discovery that African zebus carry taurine mtDNA implies that African zebus were the result of crossing zebu bulls with taurine cows (Bradley et al. 1998). The resulting distribution of taurine, indicine and mixed phenotypes correlates with the Y-chromosomal INRA124 microsatellite alleles (Hanotte et al. 2000), satellite DNA polymorphism and AFLP patterns (Nijman et al. 1999). Microsatellite genotypes allowed a reconstruction of zebu migration routes (Hanotte et al. 2002). In West Africa, zebu introgression is counteracted by the tsetse resistance of the native taurine breeds (Freeman et al. 2004, 2006b; Ibeagha-Awemu et al. 2004).

A comparison of European, Southwest-Asian and Indian cattle reveals a gradual autosomal indicine-taurine cline from India to Anatolia and a sharper cline of the mtDNA and Y-chromosomal markers (Loftus et al. 1999; Troy et al. 2001; Kumar et al. 2003; Edwards et al. 2007a). A meta-analysis of different microsatellite datasets revealed patterns of diversity and taurine–zebu admixture over Europe, South-West Asia and Africa (Freeman et al. 2006a).

In Asia, zebu and taurine cattle dominate in the south and the north respectively. This again established central hybrid zones in China (Cai et al. 2006, 2007; Lai et al. 2006; Zhang et al. 2007a) and Central Asia (Kantanen et al. 2009). More to the north, indicine mtDNA was found in Mongolia (20%), but Japanese and Korean cattle are completely taurine (Mannen et al. 2004). Kikkawa et al. (2003) described male taurine introgression in zebus from Bangladesh and Nepal. Interestingly, of six Nepalese zebus, five carried the expected zebu mtDNA, but one animal originated via the maternal lineage from yak (Bos grunniens).

Following the European discovery of America in 1492, cattle were brought over from Spain and Portugal. Later, Indian zebu cattle were imported to Central and South America because of their adaptation to hot and dry conditions. Because mainly bulls were imported and crossed with Creole cattle, the Brahman zebu breed carries taurine mtDNA, while Brazilian Nellore and Gir carry both taurine and indicine haplotypes (Meirelles et al. 1999). The humpless Creole cattle are thought to be descendants of Iberian imports, but depending on the breed 40–100% of the bulls harbour the zebu Y-chromosome (Giovambattista et al. 2000; Ginja et al. 2010). For Argentinean and Bolivian Creole cattle, autosomal microsatellites indicate 2–5% zebu admixture (Liron et al. 2006b). A network analysis of microsatellite-based genetic distances and model based clustering showed an intermediate position of five Brazilian Creole breeds between modern taurine and Brazilian zebu breeds with 10–20% zebu introgression (Egito et al. 2007).

Crosses of zebu and taurine with banteng (Bos javanicus), which are wild cattle from Southeast Asia, yield fertile female and sterile male offspring (Lenstra & Bradley 1999). Domestic cattle in Southeast Asia and Indonesia are thought to be of hybrid origin via crossing of zebu with Bali cattle, which is a domestic form of the banteng. Indeed, Kikkawa et al. (2003) and Mohamad et al. (2009) found banteng mtDNA in Indonesian zebus, most notably in the Madura (56%) and Galekan (94%) breeds. The mixed species origin of Indonesian zebus was confirmed by microsatellite analysis (Mohamad et al. 2009). Analysis of mtDNA, Y-chromosomal DNA and microsatellites indicated a purely banteng origin of Indonesian Bali cattle. However, mtDNA and nuclear DNA in a Bali cattle population kept in Malaysia was of mixed zebu-banteng origin (Nijman et al. 2003).

The wild ox or aurochs (Bos primigenius), which is the ancestor of both taurine and indicine cattle, lived in the European forests until its extinction in 1627, so hybridization with domestic cattle originating from Southwestern Asia (Troy et al. 2001) is an obvious possibility. In 59 fossil aurochs bones, Edwards et al. (2007b) found one mtDNA haplogroup (P) in all except one sample, which had a different haplotype (E). Both P and E are distinct from the taurine haplogroup T. This would exclude a recruitment of aurochs cows for use as livestock, but exceptions seem to confirm the rule: the P haplotype is present in less than 0.1% of modern cattle samples (Achilli et al. 2009; Stock et al. 2009), while the related Q and R haplotypes are also found sporadically (Achilli et al. 2008, 2009).

However, the extent to which aurochs contributed to modern cattle via male introgression is not yet clear. Götherström et al. (2005) defined a Y1 haplotype in most North-European breeds and a Y2 haplotype in most other European cattle and in Southwest Asia. Y1 was also found in fossil aurochs remains, but this was not in agreement with later findings (Svensson & Götherström 2008). Bollongino et al. (2008) found Y2 haplotypes in several European samples for which the aurochs origin was verified via the mtDNA P-haplotype, which raises the possibility that Y2 carrying bulls have also descended from aurochs bulls.

Mitochondrial DNA, as well as nuclear polymorphisms, have revealed several other aspects of the early differentiation of taurine cattle. The predominance of one taurine mtDNA haplogroup (T1) in Africa (Troy et al. 2001) and a new haplogroup in Eastern Asia (T4: Mannen et al. 2004; Kantanen et al. 2009) suggested two other regions of domestication. However, complete mtDNA sequences showed that T1 and T4 are closely related to the major T3 haplogroup, so their predominance probably reflects founder effects in Africa and Eastern Asia respectively (Achilli et al. 2009).

The T3 mtDNA haplogroup is predominant in most European breeds and Northern Asia (Kantanen et al. 2009) and is one of the four major haplogroups (T, T1, T2 and T3) in Southwestern Asia. By contrast, in the African taurine cattle haplogroup T1 is dominant, which is rare in Southwestern Asia. These observations are in line with a Southwest-Asian origin of European cattle, confirming the paleontological evidence of a gradual introduction of domestic cattle in Europe from Southwestern Asia (Zeder et al. 2006). There are two interesting exceptions to the T3 dominance in Europe. First, four ancient breeds from Tuscany have almost the same mtDNA diversity as found in Southwestern Asia, suggesting an ancient maternal origin and a direct link between Tuscan and Western-Asian cattle (Pellecchia et al. 2007). For the Chianina breed this was confirmed by microsatellite data (European Cattle Genetic Diversity Consortium, unpublished results). Microsatellites also indicated that two other Tuscan breeds, the Maremmana in the south and the Cabannina in the north have been subject to Podolian and Brown Mountain breed introgression respectively. Cattle east of the Appennines and on Sicily are of the Podolian type and were most likely introduced during the Middle Ages (Felius 1995).

Second, the T1 haplogroup has appreciable frequencies in several Spanish and Portuguese breeds (Cymbron et al. 1999; Miretti et al. 2004; Beja-Pereira et al. 2006; Cortés et al. 2008; Ginja et al. 2010), indicating migration from Africa to the north. This may have occurred either during the Neolithic movement of cattle or later, for instance during the Islamic occupation. Importation of Iberian cattle into the newly discovered American continent explains the relatively high frequency of the T1 haplogroup in Caribbean and South American cattle (Magee et al. 2002; Carvajal-Carmona et al. 2003; Mirol et al. 2003; Miretti et al. 2004; Liron et al. 2006a,b; Ginja et al. 2010).

Autosomal protein polymorphisms (Medjugorac et al. 1994), microsatellite data (Cymbron et al. 2005; Li et al. 2007; Medugorac et al. 2009) and AFLP fingerprinting (Negrini et al. 2007) are in line with a demic expansion of agriculture from southeastern to northwestern Europe. Cymbron et al. (2005) observed that the correlations between genetic and geographical distances are different for Mediterranean and Northern breeds; it is proposed that this reflects the separate Neolithic migrations along the Mediterranean coasts and the Danube respectively. A larger set of microsatellite data (Lenstra et al. 2006b; Lenstra 2008) indeed indicates a separate position of Mediterranean cattle, but divides the Transalpine cattle into two different clusters of breeds: Central-European (Alpine, Southern-French) and Northern European. The separate position of Central-European cattle was also indicated by AFLP data (Negrini et al. 2007). Strikingly, the Northern-European cluster largely coincides with a high diversity of milk protein genes (Beja-Pereira et al. 2003), the distribution of the human lactase persistence alleles and the location of Neolithic cattle farming sites. This led to the suggestion of a gene-culture co-evolution between cattle and humans (Beja-Pereira et al. 2003).

Predictably, SNP data (e.g. Gautier et al. 2007; Svensson et al. 2007; McKay et al. 2008; The Bovine HapMap Consortium et al. 2009) will reveal more about the history of European cattle. AFLP polymorphisms, as proxy for SNP diversity, suggested that relative to microsatellites SNPs emphasize the zebu-taurine divergence and hence also the difference between Podolian and other European cattle (Negrini et al. 2007). Large-scale SNP analysis (Gautier et al. 2007; The Bovine HapMap Consortium et al. 2009) indicated that in several breeds linkage disequilibrium (LD) extends further than in humans, but is hardly detectable at distances over 200 kb. These data also suggested a rapid recent decrease of the effective population size of domestic cattle. Also promising is the differentiation of several Y1 and Y2 haplotypes that as markers of paternal lineages will be informative for introgression and upgrading (Svensson & Götherström 2008; Ginja et al. 2009; Ginja et al. 2010; Kantanen et al. 2009).

Molecular data have also generated information on the history of individual breeds. A major determinant of the genetic constitution of a breed is its degree of isolation from other breeds. For instance, the Jersey is a typical island breed that has been kept isolated since 1789. This has led to a limited degree of inbreeding (Chikhi et al. 2004), but has also preserved unique features. Inbreeding has gone further in two Balearic Island breeds, in a Betizu subpopulation (Martin-Burriel et al. 2007), and in the Spanish Lidia (fighting cattle: Cañón et al. 2008). The most extreme inbreeding has been observed in English Chillingham cattle, which have become almost completely homogeneous by strict isolation of one herd for hundreds of years (Visscher et al. 2001). Often, but not always, genetic isolation has led to phenotypic uniqueness. This has also been the case for the Italian Chianina (see above) and is an obvious argument for conservation.

At the other end of the scale are the several breeds that have been shaped by gene flow from other breeds. For instance, Northern-Russian cattle have been influenced heavily by modern commercial cattle (Li et al. 2007; Kantanen et al. 2009). On a comparable scale, several Scandinavian breeds have been upgraded by the Scottish Ayrshire (Tapio et al. 2006a). A rustic Spanish breed, Serrana di Teruel, was clearly influenced by brown mountain cattle (Martin-Burriel et al. 2007). Several other introgressions have been indicated by a Europe-wide microsatellite dataset (European Cattle Genetic Diversity Consortium, unpublished results). This has been rather extreme for the Portuguese Minhota, which has been upgraded with German Yellow bulls (Felius 1995) to the point that it has become virtually identical to the German breed.

Genotypes from 30 microsatellites for 69 European breeds were used for testing formal criteria for conservation (Lenstra & the European Cattle Genetic Diversity Consortium 2006a). The popular Weitzman method, based on genetic distances, favours highly inbred populations even if these have been derived recently from other populations. Ranking of conservation priorities on the basis of marker-estimated kinships was less influenced by inbreeding, and favoured Mediterranean breeds (Lenstra & the European Cattle Genetic Diversity Consortium 2006a). These breeds indeed have a relatively high degree of molecular diversity, which next to phenotypic uniqueness is an obvious argument for conservation. Moreover, the Busa and Anatolian breeds were considered to be valuable genetic resources on the basis of their high genetic diversity (Medugorac et al. 2009). Conservation priorities of Nordic cattle were analysed by Bennewitz et al. (2006) and Tapio et al. (2006a).


Yak (Poephagus grunniens) is a bovine species that can hybridize with taurine and zebu cattle and produce fertile females but sterile males (Lenstra & Bradley 1999). It is a unique livestock species on the Qinghai-Tibetan Plateau of western China, in the Mongolian and Russian steppes and in other Himalayan countries (Wiener et al. 2003; Wiener & Jianlin 2005). The state of development of molecular markers and genetic research on the yak was reviewed by Jianlin (2003).

Recently, the genetic diversity of yak has been examined. mtDNA cytochrome b and D-loop sequences revealed two halogroups within domestic yak, which diverged at least 100 000 years ago (Guo et al. 2006; Lai et al. 2007). Haplotypes of both groups were found in a single, small, wild yak population, thus indicating that the domestic Chinese yak were derived from a single wild gene pool. A domestication event was estimated to have taken place around the early Holocene, within 10 000 years before present (YBP) in Qinghai and Tibet. No pattern of phylogeographical distribution of major clades in Chinese yak sampled from different localities in south-western and north-western China was found (Guo et al. 2006; Lai et al. 2007). A study with intensive sampling of domestic yak from all the yak-keeping countries, including China, Bhutan, Nepal, India, Pakistan, Kyrgyzstan, Mongolia and Russia, revealed a third, less frequent, haplogroup (Qi et al. 2008). Geographical clines in the haplogroup diversity indicated that a single domestication on the Eastern Qinghai-Tibetan Plateau was followed by a westward migration passing through the Himalayan and Kunlun mountain ranges, and northward migration through South Gobi and the Gobi Altai mountains to Mongolia and Siberia.

Cross-species amplification of 136 bovine microsatellite markers revealed a high success rate up to 95% (Minqiang et al. 2003; Nguyen et al. 2005; Xuebin et al. 2005). Several of these are included in the list of markers recommended by the ISAG/FAO working group for yak (Hoffmann et al. 2004). Using 15 microsetellites, Xuebin et al. (2005) found high genetic diversity within the Mongolian and Russian yak populations. The Gobi Altai, south Gobi and north Hangai populations in Mongolia are closely related, as are the Hovsgol and the Buryatia populations in Mongolia and Russia respectively. These groups of populations should therefore be considered as distinct genetic entities for conservation and breeding programmes.

Cross-species amplification of bovine Y-chromosome specific markers now allows the analysis of paternal lineages (Xuebin et al. 2002). In addition, a complete yak mtDNA genome sequence (Gu et al. 2007) and several bovine SNPs that are also polymorphic in yak will contribute further to the understanding of the genetic constitution of yak populations.

Water buffalo

The domestic water buffalo Bubalus bubalis is thought to have been domesticated in the Indus and Yangtze valley civilizations 5000 years ago (Cockrill 1981). Domestication was also proposed to have occurred in China as early as 7000 years ago (Chen & Li 1989). However, this was not in agreement with mtDNA sequences of ancient remains of the endogenous Bubalus mephistopheles, which did not establish a link with the modern domestic water buffalo (Yang et al. 2008). Representations of buffalo appear on seals of the Indus valley and Mesopotamia from the third millennium bc (Zeuner 1963). The ancestral wild water buffalo Bubalus arnee was common across the Indian subcontinent, but numbers have decreased because of environmental pressures and hybridization with domestic populations. The wild form is now listed as endangered and is thought to survive only in a few areas of India, Nepal, Bhutan and Thailand (Scherf 2000).

Water buffalo have historically been divided into swamp and river buffalo based on morphological, behavioural and geographical criteria. The two types also differ in chromosome number: swamp 2n = 48, river 2n = 50 (Ulbrich & Fischer 1967; Fischer & Ulbrich 1968), because of a telomere-centromere tandem fusion between two chromosomes in river buffalo (Di Berardino & Iannuzzi 1981). River and swamp buffalo will only mate if reared together from calfhood and while first generation hybrids are fertile it has not been confirmed whether fertility persists in subsequent generations (Fischer & Ulbrich 1968). They are sometimes referred to as different subspecies; river as Bubalus bubalis bubalis and swamp as Bubalus bubalis carabenesis. Swamp buffalo bear a closer morphological resemblance to wild buffalo than do river buffalo.

Swamp buffalo are found throughout Southeast Asia and China. There are no recognized breeds, although some geographical populations have local names and have been shown to differ in morphology and environmental adaptation (Chen & Zu 2004). River buffalo are mainly found in the Indian subcontinent and westwards through Southwestern Asia and Mediterranean countries. Buffalo have recently been introduced to Africa, South America and Australia. Well-recognized and morphologically defined river buffalo breeds exist in India and Pakistan, but 70% of river buffalo do not belong to any named breed and are classified as non-descript (Arora et al. 2004). The geographical ranges of river and swamp buffalo overlap in East India and Bangladesh. Sri Lankan buffalo are morphologically similar to swamp buffalo but analyses of chromosome number, microsatellites and mtDNA identify them as river buffalo (Barker et al. 1997b; Lau et al. 1998). Genetic differentiation of both river and swamp populations is of the same order of magnitude as that between well-recognized breeds of other domestic species (Barker et al. 1997b).

Estimates of the time of divergence of river and swamp buffalo vary widely, but all predate the domestication of buffalo. The estimates range between 10 000–15 000 YBP (Barker et al. 1997a), 28 000–87 000 YBP (Lau et al. 1998), more than 700 000 YBP (Tanaka et al. 1995), 1 million YBP (Amano et al. 1994) and 1.7 million YBP (Tanaka et al. 1996). A study of 30 microsatellites found a river-swamp differentiation of 30.2% (Zhang et al. 2007b), and studies of mtDNA found an average sequence divergence of 8.6% for D-loop and 2.6% for cytochrome b (Kumar et al. 2007a), of the same order as the differentiation between Bos taurus and Bos indicus in cattle.

Initial analyses of a short region of the mitochondrial D-loop found haplotypes shared between river and swamp buffalo, which is consistent with a single domestication event (Lau et al. 1998; Kierstein et al. 2004). However, studies of longer regions of the D-loop and of cytochrome b all support the hypothesis of separate domestications of river and swamp buffalo, probably in the Indus and Yangtze valley civilizations in the second millennium bc (Kumar et al. 2007b; Lei et al. 2007). Research into Chinese swamp buffalo populations revealed two maternal lineages: Swamp A and B. The more common lineage A was found in 81.5% of samples, but lineage B was present in five of the seven populations sampled, with no clear geographical pattern of A and B distribution. The estimated time of divergence of the two lineages was 18 000 YBP, and while both show indications of population expansion, lineage A appears to be a more recent expansion (Lei et al. 2007).

Microsatellite analyses in buffalo have focused on the defined river breeds of India and local swamp populations of China. Most genetic diversity in buffalo lies within breeds, and estimates of the percentage of diversity between populations vary between 2.8% in Chinese swamp populations (Zhang et al. 2007b), 3.4–9.69% in Indian river breeds and local populations (Kumar et al. 2006; Vijh et al. 2008), and 5.7% in Italian, Greek and Egyptian river breeds (Moioli et al. 2001). Most of these values are low compared with other species [7.11% for cattle (MacHugh et al. 1998), 8% in horses (Cañón et al. 2000), 13% in pigs (Martinez et al. 2000)]. This may be because buffalo have not undergone the same degree of isolation and rigorous selection and widespread use of artificial insemination in the creation of established breeds. Mean expected heterozygosity also varies between studies; it was 0.535 in Chinese swamp populations (Zhang et al. 2007b), 0.506 in Southeast-Asian swamp populations (Barker et al. 1997a), 0.71–0.78 and 0.63–0.73 in Indian river populations (Kumar et al. 2006; Vijh et al. 2008), and 0.577–0.605 in river buffalo of Mediterranean countries (Moioli et al. 2001).

The previous systems of grouping buffalo breeds based on morphology and geography (Cockrill 1981) do not correlate well with genetic diversity patterns. amova analysis of microsatellite data found that, when Indian breeds were divided by either geography or morphology, <1% of the genetic diversity lay between groups (Kumar et al. 2006). However, a DA distance tree and principal component and structure analyses of microsatellite genotypes in Chinese swamp buffalo populations revealed several geographical clusters: in the upper and middle reaches of the Yangtze valley, in the lower reaches of the valley, in Southern China and in Southwestern China. The first two components of the principal component analysis also divided populations on North/South and East/West axes (Zhang et al. 2007b). The genetic distances between Chinese populations also correlated with the geographical distance between them; one study of Indian populations found no such correlation and another only found it after the removal of a population and several loci, which were out of HardyWeinberg equilibrium (Kumar et al. 2006; Zhang et al. 2007b; Vijh et al. 2008).

Most analyses of river buffalo have focused on the minority that form recognized breeds. Inclusion of two local non-descript populations in a microsatellite study showed similar levels of within- and between-population diversity as the recognized breeds (Vijh et al. 2008). Such local populations may be valuable reservoirs of genetic diversity, which is threatened by modern breeding practices. Murrah buffalo are a popular breed, and increased use of Murrah sperm for artificial insemination is decreasing the genetic diversity between buffalo populations (Sethi 2001). In a multidimensional scaling analysis of microsatellite data, Murrah buffalo cluster with several other breeds of northern, central and western India, possibly because of this ongoing admixture. The Toda breed is reared by the Toda tribe in the Nilgiri hills of South India and is both culturally and religiously significant to the tribe, and is also endangered as a result of its low numbers. Microsatellite and mtDNA studies identified the Toda breed as genetically distinct from other recognized breeds and in need of conservation, but Vijh et al. (2008) found that the geographically close local population of Kalasthi buffalo cluster with the Toda breed, demonstrating the importance of considering local populations as well as breeds when deciding on conservation priorities.


Sheep (Ovis aries) were domesticated in Southwestern Asia about 12 000 years ago and thus represent one of the earliest livestock animals (Zeder et al. 2006). As with other domestic animals (Bruford et al. 2003), relationships with ancestor species have been investigated via comparison of mtDNA data. Hiendleder et al. (2002) found two haplogroups A and B, which both were different from the sequences in any extant Ovis species. The European mouflon (Ovis musimon) carries haplogroup B, but this is a feral form of early European domesticates. Most likely, sheep descend from one or more Asiatic mouflon (Ovis orientalis) populations (Hiendleder et al. 2002).

Several reports have further analysed the geographical distribution of haplotypes. The most relevant information has been summarized by Meadows et al. (2007). The main haplotypes A and B are both found in Asia, while B dominates in Europe (see also Bruford et al. 2003; Meadows et al. 2005). A high frequency of A in New Zealand resulted from early imports of Indian animals into Australia (Hiendleder et al. 2002). Haplotype C is less frequent, but has been found in Portugal, Turkey, the Caucasus and China (Tapio et al. 2006b). Haplotype D, present in Rumanian Karachai and Caucasian animals, is possibly related to the A haplotype. Haplotype E, which is intermediate between A and C, is also rare and has only been found in two Turkish animals.

This mtDNA diversity with distinct haplogroups is comparable with what is observed in goats and cattle, although the divergence of sheep haplogroups is less pronounced than the taurine–zebu divergence (Bruford et al. 2003). Furthermore, in contrast to the taurine cattle haplotypes, the sheep haplogroups hardly correlate with geographical origin. Different lineages might reflect multiple regions of origin, but another obvious possibility is a coexistence of different maternal lineages in the predomestic population.

By contrast, little variation has been observed in the paternal lineage. One SNP in the Y-chromosomal SRY mutation has a high frequency in European breeds (Meadows et al. 2004) and is probably of European origin. The microsatellite SRYM18 defines other haplotypes (Meadows et al. 2006), but except for the major haplotypes, these were of low frequency and dispersed over different continents.

A recent study of retrovirus integrations (Chessa et al. 2009) has provided additional information on the introduction of sheep into Europe. A high frequency of one integration or the lack of other integrations indicated an early arrival of the primitive sheep populations (European mouflons, North-Atlantic Island breeds). Another informative retrovirus copy is present in most other European breeds and probably indicates the later arrival of wool-producing sheep. This study also indicated an interesting genetic link of English Jacob sheep with Asian or African populations.

Although several groups have studied the diversity of sheep as revealed by microsatellites, this has provided relatively little insight into the relationship between breeds. One drawback is the unfortunate use of different microsatellite panels, which precludes the combination of datasets (for an overview, see Another drawback is that there is only little phylogeographical structure; this in contrast to the clear correlation of genetic and geographical structure observed in cattle and goats. In a study of 20 European breeds, amova analysis showed that only 1% of the variation is between regions and less than 3% is between seven types of breed (Lawson Handley et al. 2007). In Baltic breeds, Tapio et al. (2005a) found a general lack of differentiation at the breed level.

On the other hand, with eight microsatellites, Buchanan et al. (1994) observed a clustering of three English breeds relative to Merino-type breeds and to Awassi. So far, most data on phylogeographical relationships of breeds came from the EU Econogene project, which analysed 57 breeds with 31 microsatellites (Peter et al. 2007). Separate positions were observed for three clusters of breeds: Southwest-Asian, Southeast-European and Central- and Western-European. Within the last group, there was a weak differentiation of Merino and Alpine breeds. There was also a clear decline of the heterozygosity and allelic richness from Southwestern Asia and Southeastern Europe to the west and the north-west (Peter et al. 2007), reflecting repeated founder effects during the gradual introduction of domestic sheep into Europe.

In another study, independent coordination analysis suggested a separate position of Northern-European short-tailed sheep, which could be divided into a north-western, northeastern and a heterogeneous Swedish-Norwegian cluster (Tapio et al. 2005b). Santos-Silva et al. (2008) studied the relationships of Portuguese sheep, which were clearly different from the imported Assaf breed. Cinkulov et al. (2008) analysed genetic differentiation of the Pramenka, an indigenous mountain sheep breed of the Balkans. Furthermore, Gizaw et al. (2007) observed a partial differentiation of three breed groups that had been successively introduced to Ethiopia: thin tailed, short- and long-fat-tailed and thick-rumped breeds.

The differentiation of European and Asian sheep and the weak geographical structure of European sheep were confirmed by analysis of a 1536-SNP dataset (Kijas et al. 2009). This study also showed a difference between Asian and African populations and a separate position of the North-Atlantic Soay sheep.

The diversity pattern of European sheep breeds, which is clearly more panmictic than observed for cattle and goats, probably reflects a history of cross-breeding promoted by commercial interests (Lenstra 2005). From the 17th century onwards, Merino sheep from Spain were exported to several European countries (Wood & Orel 2001), while English or Texel rams were also popular sires.


Goats (Capra hircus) were domesticated about 10 000 years ago in Southwestern Asia, thus in the same period and in the same region as sheep. Although the species are of a similar size, goats found their own use because of their adaptation to marginal conditions. Goats most likely descend from the wild bezoar, Capra aegagrus, (Naderi et al. 2007, 2008). The information available on mtDNA haplogroups has been summarized by Naderi et al. (2007). More than 90% of goats worldwide carry haplogroup A. Haplogroup B has so far been found mainly in Asia and South Africa, C in Southern Europe, D in Asia, F only in the Sicilian Girgentata breed, and G in Southwestern Asia and Northern Africa. Subgroup B1 is restricted to China and Mongolia. Another subgroup of B is reported to be specific to the Canary Islands, which is possibly due to their genetic isolation since their arrival 3000 years ago (Amills et al. 2004). Data on African goats are relatively scarce.

Haplogroups A–G are all present in the bezoar goat (Naderi et al. 2008). The distribution of the haplogroups suggested that eastern Anatolia and possibly Northern and Central Zagros were the most important domestication centres. The diversity of the C-haplogroup indicated a second domestication on the Central Iranian plateau and in the Southern Zagros, but this domestication centre probably did not contribute significantly to the current domestic goat gene pool.

Mitochondrial DNA haplotypes suggested a genetic link between Southwest-Asian and Iberian goats (Pereira et al. in press) and between Southern/Central American goats and Canarian goats (Amills et al. 2009), both via maritime transport.

The prevalent notion that the geographical structure of goats is weaker than for cattle and sheep (Luikart et al. 2001) rests mainly on the worldwide prevalence of haplogroup A. However, the dispersal of A haplotypes seems to be predomestic, and Y-chromosomal data show considerable geographical partitioning. Three Y-chromosomal haplotypes belong to two haplogroups, Y1 and Y2 (Lenstra 2005; Pereira et al. in press). Y2 has not been found in Switzerland and Germany and is scarce in Italy, while it is predominant elsewhere.

Microsatellites also reveal a high degree of geographical structuring, although incompatibility of datasets again limits the scope of most studies to the regional scale. Barker et al. (2001) found a clear correlation of tree topology and genetic distance for Southeast Asian goats. The largest dataset described so far (Cañón et al. 2006) comprises 45 breeds from Europe and the Middle East. Four discrete groups were found: Middle East, central Mediterranean, western Mediterranean and central/northern Europe. Again there was a decline in allelic richness from south-east to north-west, presumably the result of founder effects that also explain the distribution of Y-chromosomal alleles (Cañón et al. 2006). Geographical structuring of microsatellite genotypes was also reported for goat populations from Burkina-Faso (Traoréet al. 2009), India (Rout et al. 2008) and northern Vietnam (Berthouly et al. 2009). Conservation value of Swiss goat breeds on the basis of microsatellite diversity was explored by Glowatzki-Mullis et al. (2008).

The clear phylogeographical structure of European goats probably reflects the style of husbandry. In contrast to the situation of sheep and cattle and with the exception of the widespread use of Swiss dairy animals, goats are of more limited economic importance, and breeding has remained largely a local affair.

In the Econogene dataset, Western Europe was only partially represented. Comparison with Asian and African breeds will probably define additional clusters of breeds. We conclude that further molecular analyses of autosomal and Y-chromosomal diversity of goats offer excellent perspectives to retrieve the history of their domestication and subsequent migrations.


The Camelidae family comprises four domesticated species belonging to three genera. The Bactrian camel (Camelus bactrianus) is found throughout Central Asia, and the distribution of the dromedary (Camelus dromedarius) ranges from Central Asia and Southwestern Asia to Northern Africa. The llama (Lama glama) and alpaca (Vicugna pacos) are found in the Andean mountains in South America (Jianlin 2005a,b). All species of the family have the same conservative karyotype (2n = 74) and can produce fertile hybrids between species, both within and even between genera (Skidmore et al. 1999; Potts 2004; Mengoni Goñalons & Yacobaccio 2006; Wheeler et al. 2006). mtDNA sequences and nuclear microsatellite markers support a clear genetic differentiation of wild guanaco (Lama guanicoe) from wild vicuña (Vicugna vicugna) (Stanley et al. 1994; Kadwell et al. 2001; Palma et al. 2002). They also provide genetic evidence for two geographically isolated wild subspecies of the guanaco (L. g. cacsilensis and L. g. guanicoe) (Palma et al. 2002; Gonzalez et al. 2006) and the vicuña (V. v. vicugna and V. v. mensalis) (Palma et al. 2002; Marín et al. 2007). Independent domestications of the llama from L. g. cacsilensis and the alpaca from V. v. mensalis have subsequently been demonstrated (Palma et al. 2002; Wheeler et al. 2006). These occurred 4000–4500 YBP for llama in the South-Central Andes (Mengoni Goñalons & Yacobaccio 2006) or 6000–7000 YBP for both alpaca and llama in the Central Andes (Wheeler et al. 2006). mtDNA analyses recognize the extant wild Bactrian camel as a separate lineage (Jianlin et al. 1999; Ji et al. 2009). Combined mtDNA and microsatellite data further support the recognition of the wild Bactrian camel as a separate subspecies (Camelus gobi or Camelus bactrianus gobi), and suggest different ancestors and separate domestication events for the dromedary and the Bactrian camel (H. Jianlin et al. unpublished data). For the Bactrian camel, this took place 4000 YBP in the eastern part of Central Asia (Mason 1984; Peters & von den Driesch 1997; FAO 2007b). For the dromedary, this occurred 4500–5000 YBP in the Southern Arabian Peninsula (Mason 1984; Peters 1997).

Jianlin (2005a) have reviewed the development of camelid microsatellite markers. Recently, Matéet al. (2005) reported an additional four microsatellite markers. Twenty-five markers are included in the current list of markers recommended by the ISAG/FAO working group for both the New and Old World camelids (Hoffmann et al. 2004). So far, these markers have been used only for studies with a regional scope. Jianlin et al. (2004) suggested that the domestic Bactrian camels from China and Mongolia should be considered as distinct populations in conservation and breeding programmes. Nolte et al. (2005) found no evidence for loss of genetic diversity within, and a very low differentiation among 16 southern African dromedary populations. Mburu et al. (2003) identified two separate genetic entities present in Kenyan dromedaries, namely the Somali dromedary and a group including the Gabbra, Rendille and Turkana populations. Vijh et al. (2007) indicated that there were two distinct genetic clusters in the Indian dromedaries, with the Mewari breed being differentiated from the Bikaneri, Kutchi and the Jaisalmeri breeds.

For the New World camelids, Rieder et al. (2000) found high genetic variation at six microsatellite loci within Swiss New World camelid breeds. Sarno et al. (2001) observed much less variation in an island guanaco population than in the mainland population and a significant genetic differentiation between the two populations in southern Chile. Bustamante et al. (2002) and Matéet al. (2005) reported a high level of genetic diversity in Argentine llamas and guanacos, indicating the Patagonian guanaco to be an important genetic resource for conservation or economic utilization programmes. Sarno et al. (2004) detected higher levels of microsatellite allelic diversity in V. v. mensalis than in V. v. vicugna in Bolivia and Chile.

The structure and organization of the D-loop region of four South American camelid species in Argentina were reported by Matéet al. (2004, 2007), and a high degree of heteroplasmy was found. Complete mtDNA sequences and structure are available for alpacas (Ursing et al. 2000; Arnason et al. 2004), a dromedary (16 643 bp) and domestic and wild Bactrian camels (Cui et al. 2007). Genomic sequence data with 2× coverage (O’Brien et al. 2008) and the identification of 1516 microsatellite loci (Reed & Chaves 2008) as well as 750 000 SNP markers of alpaca will facilitate further studies of the diversity of camelids.


Molecular data have shed light on pig domestication by tracing mtDNA. Initial mtDNA studies showed that European and Chinese pigs were domesticated independently from European and Asian subspecies of wild boar (Giuffra et al. 2000), but later studies suggested at least seven domestication events across Eurasia (Larson et al. 2005) and East Asia (Wu et al. 2007). These studies also suggested the occurrence of introgression of Asian domestic pigs into some European breeds during the 18th and 19th centuries. Larson et al. (2007) demonstrated that domestic pigs of Near Eastern ancestry were introduced into Europe during the Neolithic, and that the European wild boar was also domesticated by this time. Once domesticated, European pigs rapidly replaced the introduced domestic pigs of Near Eastern origin throughout Europe.

Y-chromosomal variation demonstrated the existence of two highly divergent and ancient lineages, with an estimated divergence time of c. 0.33 Myr, i.e. in the order of the species age (O. Ramírez, personal communication). A recent study (Ramirez et al. 2009) based on microsatellite, mtDNA and Y-chromosomal data has confirmed the divergence of East-Asian and European pigs. In both regions, wild and domestic populations were found to be related to each other. All three marker types showed that Southwest Asian, African and American pigs were most closely related to the European population, but East-Asian mtDNA and Y-chromosomal haplotypes occurred in East-African and Nicaraguan populations. Anglosaxon, African local pigs and especially the international breeds (e.g. Large White, Landrace and Pietrain) are of mixed European-Asian origin. The almost complete predominance of the HY1 Y-chromosomal haplotype in Europe, including in the international breeds, and in Southwestern Asia, argues against male-mediated introgression and suggests that Chinese introgression in British breeds was mainly maternal.

Fang et al. (2009) investigated genetic variation in the melanocortin receptor 1 (MC1R) gene among 15 wild and 68 domestic pigs from both Europe and Asia to address why coat colour is so much more variable in domestic animals than in their wild ancestors. They found that all mutations were silent in wild animals, suggesting purifying selection, but nine of ten mutations found in domestic pigs resulted in altered protein sequence, suggesting that early farmers intentionally selected for novel coat colour.

Across the world, nearly 400 breeds have been exploited, the largest number of breeds being found in Asia and Europe. In a collaborative EU project (PigBioDiv1), 58 European populations, including local breeds, national varieties of international breeds, privately owned commercial populations, and the Chinese Meishan breed as an outgroup were genotyped for 50 microsatellites and 148 AFLP markers. Data from 11 breeds included in the PiGMaP study (Laval et al. 2000) were also included. The microsatellite data showed that the individual breed contributions to between-breed diversity ranged from 0.04% to 3.94% of the total European between breed-diversity, and that the local breeds accounted for 56% of the total, followed by commercial lines and international breeds (Ollivier et al. 2005). They also applied a cryopreservation potential criterion as proposed by Weitzman (1993), taking into account the risks of extinction. SanCristobal et al. (2006a), analysing the same data, showed a clear structure of the European pig breeds with a FST value of 0.21. With the exception of five local breeds, the between-breed general structure exhibited a star-like tree with no visible phylogenetic relationship between the local and the main international breeds. Even the inclusion of the Chinese Meishan breed as an outgroup did not allow the tree of European breeds to be rooted. SanCristobal et al. (2006b) proposed that AFLPs produce diversity patterns similar to microsatellites and can be combined with microsatellite data. However, Foulley et al. (2006) highlighted the problems arising in the analysis of these types of markers and suggested that AFLPs are more sensitive than microsatellites to selection and/or other forces.

Amaral et al. (2008) evaluated LD and haplotype block structure in 15–25 individuals from each one of 10 European and 10 Chinese breeds genotyped for 1536 SNPs in three genomic regions. The LD extends up to 2 cM in Europe and up to 0.05 cM in China. The authors suggest two possible explanations: either European ancestral stock has a higher level of LD, or modern breeding programmes have increased the extent of LD in Europe. The haplotypic diversity has also been studied in other material, focusing on the IGF2 gene (Ojeda et al. 2008).

The ongoing project PigBioDiv2 covers 50 Chinese breeds and mtDNA and Y-chromosomal regions in addition to the microsatellite data of the European breeds. Trait gene loci and markers will be analysed to seek insight into the functional differences between breeds. The first results on microsatellites using pooled DNA samples indicate that the Chinese breeds show a higher degree of genetic variability than the European breeds both within and between breeds (Megens et al. 2008).

Detailed studies have also been carried out for local breeds in several countries. The important local Iberian breed was analysed using microsatellite markers by Fabuel et al. (2004), who compared different methods, such as Weitzman, cluster analysis or optimal contributions, to establish conservation priorities. Moreover, Alves et al. (2003) showed that, unlike other European breeds, the Iberian breed has not been introgressed with Asian mtDNA. Finally, an allelic richness analysis indicated that the desirable integration of allelic richness into the diversity theory currently poses some unsolved difficulties (Rodrigáñez et al. 2008).


A total of over 100 distinct equine mtDNA haplotypes have been described in multiple studies focusing on the domestication of horses in general, or on the origin of specific breeds (Bowling et al. 2000; Vila et al. 2001; Jansen et al. 2002; Kavar et al. 2002; reviewed in Kavar & Dovc 2008). Joint analyses showed that, in contrast to the double broomstick topology of mtDNA networks of the principal livestock species (Troy et al. 2001; Bruford et al. 2003), the equine mtDNA network shows a typical star-like branching structure (Jansen et al. 2002; Kavar & Dovc 2008). In a dataset comprising extant horse breeds, as well as wild horses from 12 000 to 28 000 years ago, an unexpectedly high genetic divergence between clades was found (Vila et al. 2001). Analyses of DNA from horses of Chinese tombs from the 3rd century bc (Keyser-Tracqui et al. 2005) and from the Bronze Age (Lei et al. 2009) showed that the high mtDNA diversity is of ancient origin. Clearly, the divergence of horse mtDNA must have predated domestication, which on the basis of archaeological evidence has been dated at around 6000 YBP in a broad region of the Eurasian Steppe. This is consistent with the notion that capture and exploitation of wild mares took place independently in multiple locations over a broad time span (Lister 2001; Vila et al. 2001; Hill et al. 2002; Kavar & Dovc 2008). Apparently, the know-how required for domestication, rather than the domestic animals themselves, spread from one region to the next, challenging the suggestion that the domestication process was confined to a restricted area. However, the horse domestication scenario has recently been complicated by the analysis of matrilines from Lusitano and Soraia populations, which suggests a role of the Iberian Peninsula as a glacial refugium and a possible second centre of horse domestication (Lopes et al. 2005). Interestingly, analysis of fossil remains showed that domestication of horses from 5000 YBP onward was followed by the spread of mutations resulting in a large variety of coat colours (Ludwig et al. 2009).

amova analysis of 72 populations from Europe, Southwestern Asia, Eastern Asia and Africa revealed a non-random distribution of diversity among populations and a clear, although weak, geographical partitioning of mtDNA variation (McGahern et al. 2006). In a few instances, mtDNA has provided evidence for the origins of specific horse breeds. Luis et al. (2006) found Iberian haplotypes in New World breeds with a high frequency, which is in line with historic evidence for the origin of American horses, while Yang et al. (2002) identified Mongolian haplotypes in the Korean Cheju breed. The association of haplogroup F with Eastern-Asia was proposed as an argument for a Chinese domestication of the haplogroup (Lei et al. 2009).

Analysis of Y-chromosomal data supported a strong sex-bias in the domestication process. Lindgren et al. (2004) screened 14.3 kb of non-coding Y chromosome sequence in 52 male horses of 15 different breeds and did not identify a single segregating site. Even though their observations cannot exclude the possibility that Y-chromosomal variation was low before domestication took place, their results strongly suggest that only a few stallions have contributed genetically to the domestic horse.

Several studies have compared horse breeds or assessed the genetic structure of single breeds on the basis of microsatellites. Most of these targeted local breeds and used their own marker panel, meaning that data from different studies cannot be compared directly. Thus, for many breeds, data on genetic diversity are available, but insights into breed relationships are still fragmentary.

The so far unrealized potential of a standardized microsatellite panel for the elucidation of breed relationships is illustrated by three well-supported clusters of two riding breeds (Arabian, Hanoverian), two ‘primitive’ breeds (Exmoor and Sorraia) and six German cold-blooded breeds (Aberle et al. 2004). Similarly, Bigi et al. (2007) found, using only 12 markers, significant clustering of the Thoroughbred and Anglo-Arabian breeds and of Haflinger, Italian heavy draught and Bodaglino. Based on 17 protein and 12 microsatellite markers, Luis et al. (2007) reported eight breed groups among 33 breeds, of which four groups were well supported (Andalusian with Lusitano; Friesian with two pony breeds; Morgan, Standardbred, Rocky Mountain and American Saddlebred; Irish Draught, Quarter Horse, Hanoverian, Holsteiner and Thoroughbred). Microsatellites have also been used to assess possible origins of specific horse breeds. For instance, Kakoi et al. (2007) found evidence for a Mongolian origin of Japanese breeds. Evidence for a relationship of Mongolian and Norwegian breeds on the basis of 26 microsatellites was only incomplete (Bjornstad et al. 2003), but is consistent with the morphological appearance of the Nordic breeds.


Among poultry species, chickens are the most important and provide an important source of human food. The red jungle fowl (Gallus gallus) is believed to be the progenitor of the domesticated chicken and has its widest distribution in East Asia, from Pakistan through China, Eastern India, Burma, most of Indo-China, and on the islands of Sumatra, Java and Bali (Crawford 1990). As in other livestock species, sequence variation in mtDNA, in particular in the highly polymorphic control region, has been used to study domestication events and relationships in the chicken. First results with representatives of each of the four wild Gallus species, domestic chickens from Indonesia and two commercial breeds, has suggested that domestic chickens descend from only one species, Gallus gallus, and that a single domestication event took place in Thailand and its adjacent regions (Fumihito et al. 1996). Subsequent studies of samples from various regions in Europe and Asia suggested multiple origins of domestic chickens in South and Southeast Asia, which is consistent with archaeological data (West & Zhou 1988; Liu et al. 2006; Oka et al. 2007). Moreover, whole mtDNA sequences and two nuclear markers revealed that, besides Gallus gallus, Gallus sonneratii and Gallus lafayettii might have also contributed to the genetic make-up of contemporary domesticated chickens, although to a lesser extent (Nishibori et al. 2005). Recently Eriksson et al. (2008) provided further evidence of a hybrid origin of the domestic chickens. They studied sequence variation of the BCDO2 gene in domestic chickens and closely related wild species. BCDO2 encodes beta-carotene dioxygenase 2, which cleaves colourful carotenoids to colourless apocarotenoids and is an obvious candidate gene for skin colour. Sequence comparison revealed that yellow skin, a common feature of many breeds of domestic chicken, does not originate from the red jungle fowl (Gallus gallus), but is most likely from the grey jungle fowl (Gallus sonneratii), a wild relative of domestic fowl found in India. A study of African domestic chickens revealed the presence of two maternal lineages among Zimbabwean, Sudanese and Malawian chickens, one of Southeast Asian and the other of presumably Indian origin (Muchadeyi et al. 2008). mtDNA analyses also showed that modern Chilean breeds, presumed to be of Polynesian origin (pre-Columbian), are actually of Indo-European and Asian origin. Ancient mtDNA haplotypes found in pre-Columbian archaeological chicken remains on Easter Island support the theory of early Polynesian/Pacific chicken transport. Either these haplotypes never reached South America, or they were subsequently displaced by new introductions (Gongora et al. 2008).

Since domestication, chickens have been distributed throughout various countries, continents and cultures. As a result of many years of adaptation and breeding, a wide range of chicken breeds exist today. These encompass more or less unselected indigenous chickens and ecotypes from various regions in the world, standardized fancy breeds selected for morphological traits and maintained for leisure activities, and experimental and commercial lines. An increasing number of local chicken breeds are under threat of extinction, and valuable genotypes and traits may be at risk of being lost (Blackburn 2006).

Insight into the extent of diversity of chicken breeds worldwide has been gained using microsatellites in numerous studies (Wimmers et al. 2000; Berthouly et al. 2008; Chen et al. 2008), including the European research project AVIANDIV and follow-up studies (Rosenberg et al. 2001; Hillel et al. 2003; Granevitze et al. 2007, 2009). Overall, results suggest that Jungle Fowl populations and traditional unselected breeds are widely heterogeneous populations, which include a large portion of the total genetic diversity. Within commercial chickens, broiler lines were slightly more polymorphic than layers. Among the layers, the white layers were less polymorphic than the brown layers. In recent years, there has been concern about reduced genetic variability in commercial white egg layers that have originated from a sole breed, the Single Comb White Leghorn. Although findings of the AVIANDIV project support this concern to some extent, commercial lines still exhibit a considerable amount of variation at microsatellite loci.

Hillel et al. (2007) undertook a large-scale analysis of 2000 individuals from 65 populations representing different chicken types from various geographical regions. Individuals were genotyped at 29 microsatellite loci. Model-based clustering (as implemented in Structure (Pritchard et al. 2000)) indicated that the 65 populations split into groups corresponding to their geographical origin and cultivation history, i.e. Asia, Europe and Africa (Hillel et al. 2007). Using the same dataset, Granevitze et al. (2007) showed that the degree of polymorphism varies between clusters. The relatively low genetic diversity observed in the native European breeds, mainly standardized fancy breeds, was presumably resulting from positive assortative mating and small effective flock size. By contrast, native populations from Africa and Asia had high genetic diversity and did not show a typical population structure. Differentiation was only observed between populations from distant areas and countries (Muchadeyi et al. 2007; Mwacharo et al. 2007; Berthouly 2008; Chen et al. 2008). Rosenberg et al. (2001) demonstrated that it was possible to assign individuals to their correct breeds with 90% efficiency based on only 12 microsatellite markers genotyped in 30 animals from 20 diverse chicken populations. By increasing the number of loci used to 24, accuracy was close to 97% (Rosenberg et al. 2001). Furthermore, small-scale studies analysing only a few local Italian or Japanese fancy breeds showed that these breeds can be genetically identified and that they generally display low genetic diversity (Tadano et al. 2007, 2008; Zanetti et al. 2007).

For 28 of the 30 FAO-recommended microsatellite markers, data for around 100 breeds are currently published. Additional studies, including of West African, South African and Vietnamese breeds, are underway (M. Tixier-Boichard and H. Jianlin, personal communication; S. Weigend, unpublished data). Although merging microsatellite datasets generated in different laboratories is often problematic, Berthouly et al. (2008) succeeded, after calibrating 14 of 22 markers, in combining genotypes from different laboratories.

SNPs now have also become a well-established genetic marker system. In the AVIANDIV project, one SNP per 50 bp was found on average in a subset of ten highly diverse chicken populations (10 individuals per population, Schmid et al. 2005). This frequency is higher than that found by comparing different domestic breeds (Wong et al. 2004). The high frequency found in the AVIANDIV project presumably reflects the wide genetic spectrum of chicken breeds collected. SNP arrays with over 2500 informative SNPs in commercial chicken lines and other resource populations indicated that individual commercial breeding lines have lost 50% or more of their genetic diversity. Only a limited fraction of this loss can be recovered by combining all stocks of commercial poultry (Muir et al. 2008). However, it appears that modern breeding was not the primary source of this loss of alleles, and that many alleles were lost prior to the formation of the current industry.

Andreescu et al. (2007) assessed the extent of LD in nine commercial broiler breeding populations using genotype data for 959 and 398 SNPs on chromosomes 1 and 4 respectively. Results showed that in these lines LD did not extend much beyond approximately 0.5 cM, which is shorter than previously reported for other livestock species. However, it seems to be much larger in White Leghorn-based breeds. Within 1 cM, LD tended to be consistent across related populations. Calculating the correlation of LD between neighbouring SNPs within and between populations closely matched the line relationships based on marker allele frequencies. Thus, there are indications that this approach is equivalent to estimating kinship coefficients, and it might also be of interest for other livestock species. As in other farm animal species, initiatives are underway to develop genomic 60 K SNP arrays for chickens.

Molecular databases

Sequence data generated by individual laboratories or large-scale sequencing projects are usually deposited in one of three major databases: GenBank (NCBI, National Center for Biotechnology Information), EMBL-Bank (European Molecular Biology Laboratory-Bank) or DDBJ (DNA Databank of Japan). These publicly accessible databases are synchronized on a daily basis, so that the data become available at all three sites (Benson et al. 2008; Cochrane et al. 2008; Sugawara et al. 2008). A great amount of data from livestock species is included in these databases. However, most of the records represent data on the respective genomes of the species and not single individuals, which would be required for assessment of genetic diversity. Despite this, the framework of the NCBI databases does allow for the submission of individual, even redundant, sequences, including microsatellites and SNPs (both stored in dbSNP; (Wheeler et al. 2005). In general, the number of records for each species, which are bound to change their order of magnitude soon, reflect the agricultural importance of the species and/or the current state of progress of the genome projects. Furthermore, NCBI hosts organism-specific genome resource pages, which include links to resources found within and outside NCBI (e.g.

A multitude of other smaller publicly accessible databases, often with more specific purposes, are available besides the previously mentioned databases. For individual livestock records, three databases were identified: AVIANDIV for chickens, CaDBase for cattle and PigDBase for pigs.

The AVIANDIV database ( includes: (i) genotypes for 20 microsatellite loci from DNA pools of 52 European breeds and commercial lines (Hillel et al. 2003); (ii) individual data from 600 individuals representing 20 breeds for 27 microsatellites (Rosenberg et al. 2001); and (iii) SNPs from 13 random non-coding DNA fragments typed in 100 individuals belonging to 10 breeds/populations. The last update of the database was in 1999 (AVIANDIV 1999).

PigDBase ( contains the data from the EU PigBioDiv project. The data consist of 118 188 microsatellite and 349 348 AFLP genotypes from 60 distinct populations and 50 microsatellite and 148 AFLP markers. Unfortunately, the database is password-protected (Russell et al. 2003).

CaDBase (– access to some areas of this system requires username/password authorisation) contains data on 134 breeds and 30 microsatellite markers that are recommended by the FAO (,50006220) (Williams 2002). However, as it does not contain data from the most recent large projects (Li et al. 2007; Martin-Burriel et al. 2007; Lenstra 2008), data for all 30 markers are listed for only few breeds. Furthermore, allele sizes within CaDBase are not consistent across breeds.

Besides the three major sequence databases and those containing individual livestock records, numerous databases on livestock genomics are available (Table 1). The contents range from genome maps including annotations, SNPs, QTL data, whole genome shotgun libraries and microsatellites to extensive link lists. Even if several are no longer updated, these databases remain valuable resources for the development of markers as well as for fundamental research on livestock animals.

Table 1.   Non-exhaustive list of publicly accessible livestock genome resources. Database or project name and the respective URL are given. For pigs, cattle, goats, sheep, chicken and any additional species (other sp.), databases are scored according to presence (y) or absence (n) of data for the respective species. Non-available data are denoted by n/a.
Database/project nameURLPigsCattleGoatsSheepChickenOther sp.Type of data includedLast updatedReferences
AVIANDIV genotypes, SNPs6 September 1999AVIANDIV (1999)
CaDBase genotypesn/aWilliams (2002)
PigDBase and AFLPgenotypesn/aRussell et al. (2003)
PEDE: Pig Expression Data Explorer cDNA clones and ESTs21 October 2006Uenishi et al. (2007)
AnimalQTLdb Trait Loci dataLast release: 10 30 December 2009Hu et al. (2007)
ARKdb mapping data© 2007–2009Hu et al. (2001)
ChickVD: Chicken Variation Database variation map, primarily SNPs16 December 2006Wang et al. (2005)
Bovine Genome Project and Whole Genome Shotgun (WGS) libraries© 1998–2010n/a
Cattle Genome Database mapping, QTL10 December 2002n/a
LivestockGenomics Browsers (maps) and SNPs© 2001–2010n/a
Pig Genomic Informatics System annotations, maps, SNPs© 2006Ruan et al. (2007)
Bovine Genome Database analysis and annotation, QTLn/an/a
US Poultry Genome Project, BACs linked to genes, primers4 November 2008n/a
COMRAD – comparative radiation hybrid mapping genome radiation hybrid map© 2002n/a
Gallus Genome Gbrowse array of chicken genome data27 January 2007Schmidt et al. (2008)
Geisha – gallus expression in situ hybridization analysis situ hybridization information© 2008Darnell et al. (2007)
AgBase, functional annotations13 January 2010McCarthy et al. (2006, 2007)
BBSRC ChickEST Database addition September 2004Hubbard et al. (2005)
Chicken full-length cDNA Database been updated in 2007Wang et al. (2007)
e!Ensembl genome information system, genome annotationEnsembl Release 57 – March 2010Flicek et al. (2008)
s! Public Sigenae Contig browser, based on Ensembl, Release 40, August 2006© 2010n/a
Sheep Genomics server at the University of Sydney mapsn/an/a
EuMicroSatdb (Eukaryotic Microsatellite database) visible acitvity 2007Aishwarya et al. (2007)
UgMicroSatdb (Unigene MicroSatellite database) from various eukaryotic genomes, unigenesn/aAishwarya & Sharma (2008)
OXGRID – The Oxford Grid Project maps (oxford grids)© 2005n/a
International Sheep Genomics Consortium library, SNPs© 2002–2010e.g. Oddy et al. (2007)
A database of Avian genes and genomes© 2010n/a
AvianNET – The avian information network© 2003, 2004n/a
US Livestock Species Genome Projects resources, mostly links21 March 2010n/a
BOVMAP resources, incl. microsatellites, SNPs, mapsn/aGas et al. (1996)
Origin and diversity of North European sheep breeds primers, allele lengths per markern/an/a


Genetic variation in traits of interest is the basis for future breeding programmes. Variation is displayed by genetic differences between individuals, families and populations within a given species. Pronounced erosion of these genetic resources across all farm animal species within the last century has been ascertained. In the last two decades, the erosion of genetic resources is being counteracted by efforts directed at their conservation. This has resulted in considerable progress on two fronts. First, there is a growing amount of systematically collected information on livestock breeds and their environment. Second, in most species of livestock many breeds have been the subjects of molecular analyses. This has resulted in a great number of publications ranging from descriptions of local populations to more systematic assessments of global diversity patterns. The results of both phenotypic and molecular approaches are, or should be, accessible in databases with the ultimate objective of providing an integrative platform for scientific analysis and decision-making. However, the current state of the databases leaves much to be desired. Furthermore, documentation of phenotypic data and molecular genotyping, both having their own merits, still have to converge on consistent and plausible valuations of specific breeds for conservation.

Our understanding of breed diversity has been deepened significantly by technological progress in molecular genetics. Blood groups, enzyme polymorphisms, transplantation antigens and RFLPs have been succeeded by mtDNA and Y-chromosomal haplotypes and autosomal microsatellites. For all domestic species, mtDNA data have allowed the elucidation of the relationships with wild ancestor species, and for most species it is also informative at the intercontinental level. In combination with archaeological data, it has been shown that the most important areas for domestication events of the main livestock species and chickens are found in Asia and Europe, with the South American camelids representing an exception. There is evidence of multiple domestication events for most species, often involving more than one ancestor species or subspecies and repeated introgression events of closely related ancestor species. Sheep, goats, and taurine cattle (Bos taurus) are presumed to have been domesticated in Southwestern Asia. The Indus valley has been proposed to be the site of domestication of indicine cattle and the river type of water buffalo, while the swamp type of water buffalo is thought to have originated in the Yangtze valley. The domestication of pigs is considered to have happened across Eurasia and Eastern Asia in at least seven separate events involving both European and Asian subspecies of boar. The Yak is presumed to be the result of a single domestication event in China/Tibet with at least three maternal lineages contributing to the ancestral yak gene pool. Domestic chickens are thought to be the result of multiple domestication events, predominantly of Red jungle fowl (Gallus gallus) in Southeastern Asia and possibly also involving Gallus sonneratii and maybe Gallus lafayettii. Horses were domesticated in a broad area across the Eurasian steppe, and in this species the husbandry style has left considerable signatures. It is presumed that mares were domesticated numerous times, but that only a few stallions contributed to the genetic make-up of the domestic horse. The last finding illustrates the use of Y-chromosomal haplotypes as a marker for mammalian patrilines. This is still limited by the identification of haplotypes, but probably has the same potential as in human population genetics.

A consistent finding with all molecular markers is that genetic variability declines with increasing distance from the domestication centres. This has been shown for pigs, sheep, goats, cattle and chickens. Within breeds, autosomal microsatellite markers allow parameters such as expected heterozygosity and allelic richness to be calculated and may reveal effects of genetic isolation, inbreeding, population bottlenecks, introgression and subdivision. Relationships between breeds can often be represented schematically via trees, networks, coordination plots or clustering diagrams. Most rewarding are then inferences regarding the history of livestock, such as evidence for more recent events like migrations, introgressions, expansions and/or selection.

The content of the molecular project-specific databases described in this article is rather outdated. Updating of both main and project-specific databases is often neglected. A second problem caused by the coexistence of numerous project-specific databases is that searches across different projects are not possible, as each database has its own architecture and thus different report formats and export functions. Thus, it is very tedious to combine data retrieved from different databases. This problem could be overcome either by storing all data within one database or by setting up a database search engine that can execute searches across a number of individual databases. This is only feasible if the project-specific databases fulfil certain structural requirements that the scientific community would have to decide on and adhere to when setting up such databases. This setup, which returns search results from a number of individual databases, has been implemented successfully on a national level for plant genetic resources (Harrer et al. 2002).

Molecular datasets have the additional problem that only a minority of the research institutes use the FAO-standardized microsatellite markers (see for surveys). This seriously hinders a comparison of breeds from different datasets, although a meta-analytic approach may be feasible.

The greatest value of breed description databases is that they present the large variation that exists among livestock breeds from around the world. Hence, a rough idea of the number of breeds is available, often with estimates on population sizes. Nonetheless, for a large number of breeds contained in the most comprehensive and detailed database, the FAO Global Databank for Animal Genetic Resources, very little information other than the name and country of origin is available. Here, it should be noted that the breed concept is less useful in characterizing livestock variability in developing countries than it is in developed countries. Performance figures, if available at all, rarely have a reference point, e.g. the production system. Census data on population sizes are very often lacking and where available tend not to be up to date and may be inaccurate. This, together with a delay in reporting, does not allow for real-time monitoring of the status of species endangerment.

Currently, the breed is the unit of conservation. However, breeds are also social entities with a role in the national or regional identity, which leaves room for subjective perceptions of their uniqueness. Breed uniqueness is also not immediately obvious from molecular data. These show invariably that most of the variation is shared by the breeds, most of which harbour a considerable part of the total diversity of the species. In other words, most of the genetic diversity is present within a breed and not between breeds. This is analogous to what was found by Rosenberg et al. (2002) for humans. Furthermore, the variation displayed by current microsatellite panels of 10–30 markers only partially reflects the diversity of the animal genomes, and it remains unknown how the variation of these selectively neutral and quickly evolving markers relates to other parts of the genome. However, as illustrated by several examples cited above, this does allow a reconstruction of the history of breeds.

In the near future, new technologies such as high throughput SNP typing or even whole-genome sequencing are likely to revolutionize our insight into the diversity and uniqueness of breeds, with the ultimate objective of gaining a fuller understanding of the molecular basis of functional diversity.


The authors thank L. Ollivier, M. SanCristobal, L. Silió and M. Pérez-Enciso for their comments. The Authors state that there is no conflict of interest regarding the material discussed in the manuscript. Action GLOBALDIV AGRI GEN RES 067 receives financial support from the European Commission, Directorate-General for Agriculture and Rural Development, under Council Regulation (EC) No 870/2004.