Harnessing the genetic toolbox for the benefit of the racing Thoroughbred


email: peter.webbon@aht.org.uk


The understanding and application of genetics have grown extremely quickly since it has become possible to sequence the whole genome of an organism. The human genome sequence was completed in 2001 and that of the horse in 2007. The significance of this is that it makes it more feasible to explain how both genetically simple and complex traits are transmitted from one generation to the next and, therefore, to make informed breeding decisions, modify how horses are managed and trained to minimise the risk of disease and injury, and improve methods of prevention, diagnosis and treatment of many conditions. The science of genetics/genomics will continue to grow internationally, limited only by the funds available. The application of the science to man, horses and other species raises very complex moral and commercial issues. Thoroughbred breeders are perceived by some as resistant to change, but their apparent intransigence is often based on a genuine concern for the integrity of the breed. By taking control of the application of the advances in genetics, the Thoroughbred industry potentially has the opportunity to improve both the health and performance of Thoroughbreds. If, however, the science is applied in an uncoordinated manner, driven by commercial interests with no underlying concern for the horses themselves, there is a very real risk that breeders, the Thoroughbred breed and individual horses will all suffer as a consequence.


Although the original studies of the science of what we now call genetics were conducted by an Augustinian monk, Gregor Mendel, and published in 1865, it was only in the early 20th century, after his death, that his work was revisited and its importance understood. The term ‘gene’, representing the particle responsible for the inheritance of inherited characteristics, was coined in 1889, but it was not until 1944 that DNA was recognised as the cellular nuclear component in which genes were located.

The first gene was sequenced in 1972 and the first genome of a free living organism, Haemophilus influenzae, in 1995. The explosion of knowledge over the last 15 years has been remarkable so that, to date, the genomes of 180 organisms have been sequenced, including 25 mammalian species from the mouse to the elephant. This has led to opportunities to predict and prevent diseases, to improve diagnosis and to develop new and improved medicines.

The horse has not been neglected during this period of genetic expansion and 3 main factors have contributed to this. The first was the establishment of 3 horse reference families, 2 half-sib, one in Uppsala and the other the International Horse Reference Family Panel (IHRFP), and one family, comprised of 3 generation full-sibs, established as a collaboration between the Animal Health Trust (AHT) and the Equine Fertility Unit (EFU). The second driver of equine genetic progress has been the international collaboration nurtured, since 1995, by regular Havemeyer Equine Genetics Workshops, most recently in 2010 hosted in the UK by the AHT, and the third essential component has been financial support, provided in the UK by the racing industry, via the Horserace Betting Levy Board (HBLB) and the Thoroughbred Breeders Association.

The genotype of an animal is the internally coded, inheritable information that, together with the animal's environment, controls all aspects of the physical manifestation of an animal, its phenotype, from its appearance to its metabolic pathways, and its behaviour. It therefore follows that variation among individuals, for example Thoroughbred horses, may be due to genetic and/or environmental factors. The horse genome consists of about 2.7 billion bases. Sections of the genome, comprising several thousand base pairs (bp), can be identified as genes. An allele is one of 2 or more forms of the DNA sequence of a particular gene. Diploid organisms have one copy of each gene (one allele) on each chromosome. If both alleles are the same, they are homozygotes, whereas if the alleles are different, they are heterozygotes. Monogenic traits are those that are controlled by 2 alleles, one of which is inherited from each parent. With dominant genetic traits, such as hyperkalaemic periodic paralysis and grey coat colour, only one copy of the gene is required for the horse to exhibit the trait. With recessive disorders, horses can be carriers that have only one copy of the defective gene and do not suffer from its effects. Horses expressing recessive disorders (for example glycogen branching enzyme deficiency) must have both copies of the defective gene (homozygous), so both mare and stallion must be carriers or have the disorder. Polygenic (complex) traits are those that rely on a number of genes, each exerting a specific effect, possibly on other genes, usually modified by the effect of the animal's environment (Fig 1).

Figure 1.

An example of a polygenic characteristic (courtesy of Sarah Blott).

Heritability is the proportion of phenotypic variation in a population that is attributable to genetic variation among individuals, and can be calculated for any attribute provided that the attribute is measurable and the pedigrees of animals in the population are accurately recorded.

Genetics research is only likely to be of value if it is directed at characteristics that have both a relatively high heritability and are important to the industry or individuals associated with the population to be studied. Coat colour may be highly heritable and rated as highly important by breeders of pedigree cats, but low on the priority list of racehorse breeders compared to musculoskeletal integrity or laryngeal function.

The development of an equine genetic toolbox

The development of an equine genetic toolbox has been comprehensively reviewed by Chowdhary and Raudsepp [1] in a paper appropriately titled ‘The Horse Genome Derby: Racing from map to whole genome sequence’. Much of what follows is drawn from that review. The first few furlongs of the Equine Genome Derby in the UK were represented by the development of the 3 generation full-sib family, by the EFU and AHT, referred to above and supported by the HBLB. In 1998, Marti and Binns described the first steps in mapping the horse genome by identifying microsatellite markers that are evenly distributed over all the chromosomes – a microsatellite is a segment of DNA consisting of numerous tandem repeats, e.g. GAGAGAGAGAGAGAGA (GA8) [2]. By 2008, approximately 4300 polymorphic markers had been generated, of which 742 (734 microsatellites and 8 gene based markers) were genotyped on the AHT reference family [3]. This led to the identification of a single linkage group for each autosome and the X chromosome. Concurrently, a whole genome map was prepared using the IHRFP and 776 markers [4]. More than half of the markers were common to both AHT and IHRFP maps, but over 200 were unique to each map.

Another technique that has been used to map the horse genome is radiation hybrid (RH) mapping, where horse/hamster hybrid cells are irradiated by x-rays, which fragment the chromosomal DNA [5]. The distance between the DNA fragments on the chromosome influences the likelihood that they will be separated when the chromosome is irradiated. This technique was used to generate medium and high density maps of the autosomes. The overall resolution of the high density map was one marker every 720 kilobases (kb). The map integrated genetic linkage data from 907 markers shared with the AHT and IHRFP maps. By comparing 1902 equine loci with sequence information from 8 other species, it was possible to identify candidate genes for equine conditions such as recurrent airway obstruction [6] and glycogen storage disease IV [7]. Comparative mapping in addition to cytogenetic and BAC contiguous maps have all contributed to acquiring adequate coverage of the horse genome.

The male specific Y chromosome consists of 2 parts, the pseudoautosomal region and the male specific region. Both regions of this chromosome have been mapped in the horse in more detail than in other species with the exception of man/chimpanzee and the mouse. The high resolution genetic maps, together with BAC contiguous genomic maps, laid the foundations for complete sequencing of the euchromatic (gene rich) region of the Y chromosome. Further research on the genes that have been identified on the male specific region of the Y chromosome may be important in identifying causes of stallion infertility [8].

Since 1995, when the first bacterial genome sequence was published, an aim of genetic researchers has been to sequence the human genome and those of other mammals. The first draft of the human genome was published in 2001 and the completed sequence in 2006. The horse genome, which is intermediate in size between those of dogs and man, was sequenced by the Broad Institute and Harvard in collaboration with the international Equine Genome Sequencing Consortium, which was conceived at the 6th Dorothy Russell Havemeyer Foundation International Equine Genome Mapping Workshop in Dublin in 2005. Its first iteration was published in 2007, and was constructed using DNA from a Thoroughbred mare, Twilight. The 2.69 Gb genome had a polymorphism rate of about 1/1500 bp. Over 90% of the sequence was positioned on the chromosomes using available linkage maps, high density RH maps and FISH mapping data. In November 2009, a more detailed, high quality draft assembly was published with over 95% of the sequence anchored to the 64 (2N) chromosomes [9]. The similarity between the horse and human genomes is greater than for many other animal species. For example, comparison of the horse chromosomes with human revealed that 17 horse chromosomes (53%) consist of material present in a single human chromosome. In practical terms, this means that the horse could be a good model for human diseases.

Assembly of the horse genome makes it possible to identify and validate large numbers of single nucleotide polymorphisms (SNPs) across the entire genome. An SNP is a nucleotide (or ‘base’) in a DNA sequence that is variable within a species. For example, at a certain position in a DNA sequence there may be a C (cytosine) present in some individuals but a T (thymine) present in others. SNPs represent the most basic form of genetic polymorphism and some can be used as genetic markers. The Thoroughbred mare (Twilight) genome has been compared with low pass sequencing reads from each of 7 breeds, Akhal Teke, Icelandic, Arabian, Andalusian, Quarter Horse, Thoroughbred and Standardbred, representing both ancient and recent populations. This has produced an SNP map of over one million markers at an average density of one SNP per 2 kb [9]. Haplotype diversity and linkage disequilibrium have been determined by resequencing or genotyping targeted regions from 24 representatives of 10 different horse breeds, as a result of which a commercial 50 K SNP array was produced (Illumina Equine SNP50 BeadChip) and has been followed by a 70 K array (Illumina Equine SNP70 BeadChip).

Single nucleotide polymorphisms have been identified that are associated with genes that may influence diseases, coat colour, fertility, temperament, disease resistance and performance in the horse. Once the Illumina Equine SNP50 BeadChip was available, whole genome scanning using SNP markers became feasible.

A genome-wide association study (GWAS) examines all or most of the genome of different individuals to identify variations between individuals. Different variations may then be associated with different traits, such as diseases. A recent example of a GWAS, funded by the HBLB, has been completed at the AHT in conjunction with the Royal Veterinary College, both working under the banner of Equine Genetics Research Limited. Two-hundred-and-seventy-six cases with fatal fractures and 290 control samples were genotyped using the Equine SNP50 BeadChip. Although as yet unpublished, 2 genomic regions were found to be significantly associated at genome-wide level, each containing a number of candidate genes, which, based on their function, could have a role in the pathogenesis of fracture. A similar approach has been taken in an investigation of susceptibility to inflammatory airway disease that is also supported by the HBLB.

For many polygenic conditions, very large sample sizes are needed to find significant association with specific genes when the contribution of a number of genes to a complex condition may be relatively small. However, for many equine diseases we are not sure about the genetic architecture, that is, the number of genes or the size of their effect, that may be contributory to a condition. This is an area where further information is needed.

Application of genetics and genomics research

So far, this review has concentrated on the development of the tools for genetic research in the horse. This should be unsurprising, because this technology has developed at a remarkable speed and it is much to the credit of those with an interest in the horse as a subject for research that the basic essential work has not been neglected, with the result that it is, in fact, now one of the better defined animal species. The support of the HBLB for the early mapping work, the formation of Equine Genetics Research Ltd (EGR) by the British Horseracing Board (BHB) and AHT and the funding of the EGR projects by the HBLB when the BHB funding was in doubt, are the reasons why geneticists working in the UK have been, and arguably remain, at the forefront of equine genetics research.

Geneticists and the British Thoroughbred industry now stand together at a crossroads. Just as the development of genomic resources has gathered pace over the last few years, so will the application of those resources in the next decade. To obtain a flavour of what may be in store, we need to look at the application of genetics research to date.

Monogenic diseases are those that are controlled by a single, usually recessive, autosomal gene. The gene mutations responsible for some monogenic diseases have already been identified (Table 1), but whole genome scanning using the currently available Illumina Equine BeadChip will allow the region in which the causal gene is likely to be found to be identified relatively quickly, although the fine mapping necessary to demonstrate homozygosity for the causal gene, and the expression studies required to link the causal gene and its effect, may take significantly longer.

Table 1.  Equine monogenic disorders/traits with known causative genes and/or available genetic tests (based on [1])
Agouti colour (bay) ASIP ECA22q [14]
Cremello colour MATP (SLC45A2)ECA21q [15,16]
Extension (red/black colour) MCIR ECA3p [17]
Glycogen storage disease IV GBE1 ECA26q [7]
Grey colour STX17? ECA25q [18–20]
Hereditary equine regional dermal asthenia PPIB ECA1 [21]
Herlitz junctional epidermolysis bullosa LAMC2 ECA5p [22]
Hyperkalaemic periodic paralysis SCN4A ECA11p [23]
Overo lethal white foal syndrome EDNRB ECA17q [24–26]
Polysaccharide storage myopathy GYS1 ECA10 [27,28]
Sabino white spotting KIT ECA3q [29]
Severe combined immunodeficiency PRKDC ECA9p [30]
Silver colour PMEL17 ECA6q23 [31,32]
Tobiano white spottingChromosomal inversionECA3q [33,34]
Malignant hyperthermia RYR1 ECA10 [35]
Glycogen branching enzyme deficiency GBE1 ECA26 [7,36]
Lavender foal syndrome MYO5A ECA1 [37]
Body mass and sprinting ability MSTN ECA18 [10]

Before whole genome scanning became feasible, candidate gene association studies were more frequently used, based on success in human genetic studies. Candidate genes were selected on the basis of their association with traits in other species. Microsatellite markers for the genes were then identified and, provided they were polymorphic, used in studies to associate alleles of the candidate gene with a particular condition. Candidate gene association studies were somewhat superseded by whole genome scanning, but SNPs, rather than known biological activity, have recently been used to identify possible candidate genes for a number of conditions as well as indicating their chromosomal location. An example of a recent candidate gene study was published by Hill et al. [10] who, having observed that myostatin gene (MSTN) variants contribute to muscular hypertrophy in other species (most noticeably in double muscled cattle such as the Belgian Blue), investigated sequence variations in Thoroughbred horses and showed that one MSTN allele is associated with athletic performance over sprint distances and with increased body mass to height ratio: the typical sprinter phenotype.

In the horse, few of the diseases/traits of widespread importance are likely to be monogenic but rather to be complex conditions resulting from the interaction of a number of genes and several aspects of the horse's environment. In other species, including farm animals and pedigree dogs, quantitative genetic techniques are used to estimate the heritability of these complex conditions (which may be either beneficial or deleterious) and then to calculate, as estimated breeding values (EBVs), the genetic merit of individual animals to reduce or enhance the trait under consideration. EBVs have been routinely used in livestock breeding to improve breed performance and increase disease resistance, and are now being applied on a routine basis to pedigree dog breeding to reduce the prevalence of undesirable characteristics in a breed, such as syringomyelia in Cavalier King Charles Spaniels. A recent editorial, written by an eminent group of geneticists, including the first author on the 2009 high quality draft sequence of the equine genome, wrote ‘This type of collaboration [i.e. between quantitative geneticists and clinicians] should become standard practice for any clinician dealing with a disorder for which pedigree information is available’[11].

Looking ahead, whole genome scanning may reveal regions on different chromosomes that are significantly associated with the complex condition under study, and fine mapping of those regions may narrow down the search, or even locate the gene, whereas quantitative genetics will ascribe to each region the contribution that it makes to development of the disease/trait.

For multifactorial traits genomic estimated breeding values (GEBVs) combine phenotypic, genotypic and pedigree information to produce rankings for all animals in a population. GEBVs are calculated as the sum of the effects of dense genetic markers across the entire genome, potentially capturing all the quantitative trait loci (QTL) that contribute to a variation in a trait. The QTL effects, inferred from either haplotypes or SNP markers, are first estimated in a large reference population with phenotypic variation. In subsequent generations only marker information is needed to calculate GEBVs. While these techniques are in their infancy, the equine industry will be able to learn from the experience of breeders of other livestock, who are enthusiastic about gaining the advantages offered by the use of EBVs and GEBVs.

In practice, therefore, in the medium term, future breeders, owners and trainers are likely to use genetic testing in 2 possible ways. The first will be to employ tests for monogenic disorders. Once the causal gene of an undesirable monogenic condition has been identified it is possible, both in theory and in practice, to avoid producing any more affected (homozygous) animals by avoiding carrier to carrier matings. The practical applications of this approach have recently been described by Brosnahan et al. [12]. However, if the causal gene is closely linked to a gene for another, desirable, characteristic, care must be taken not to lose that desirable characteristic while breeding out the undesirable trait or condition. This implies that the practical application of acquired genetic knowledge is likely to be extremely beneficial but not always straightforward. An alternative may be to test for a panel of genes that, together, increase the risk of the manifestation of a disorder, for example racecourse fracture. This type of testing could be used either when making breeding selections or, more likely, to modify the training regime of a horse that is known to have an increased risk of fracture. It is important to realise that the test may have little use in making decisions about which animals to buy or retain, as it is quite possible that horses with a reduced risk of fracture may also turn out to be slow. Buyers will need to rely on their usual criteria to choose horses, but then to ensure that they realise their full potential by training them appropriately.

The second, and more far reaching practical use of genetic technology will be through GEBVs, as described above. In farm animals, EBVs are used to identify sires that produce offspring which, for example, grow more quickly, have leaner carcasses or have better mothering characteristics. It is not difficult to envisage the application of the same techniques to horses, for example to increase the number of times that horses race in a season, to reduce their risk of fracture or lower airway disease, or to influence up or down the age at their first race.

Genomic studies of pathogens

Although the genomes of 25 mammals have been sequenced, a great deal of attention has been focused on microorganism genomics and, particularly, of those organisms that are pathogenic to animals and man. The aims of this research, some of which has been funded by the HBLB, are to understand how each organism overcomes the host defences and exerts its pathogenic influence and then to identify possible targets for novel vaccine production.

A related area of increasing interest is likely to be the nonspecific and specific reactions of a host to an antigen, either a pathogen or a vaccine. It is well established, by both epidemiological studies and repeated observation, that, in any population, there will be those who do not mount an effective immune response after vaccination, and individuals or breeds that appear to be more susceptible to particular infections. Genetic epidemiology, including twin studies, provides robust evidence that genetic variation in human populations contributes to susceptibility to infectious disease. One of the major limitations of studies that attempt to identify the genes and mechanisms that underlie this susceptibility has been lack of power caused by small sample size. The novel technologies that facilitate the study of the genetics of complex diseases will make it possible to take a fresh look at the epidemiological evidence that supports a role for genetics in susceptibility to infectious disease. This will, potentially, lead to translational data in the future and result in improvements in the health of groups and individuals [13].


Most of the research on equine genetics to date has focused on the development of technologies that will have significant applications in the next decade. If the twentieth century is characterised by the accumulation of knowledge about diseases and injuries, how they are caused and how a host animal reacts, the early part of the present century will be characterised by biological research that explains why individual animals react to disease and injury in the way that they do, and what differentiates them from other individuals in the same group. The application of genetic technology to Thoroughbred horses will proceed rapidly, not least because the Thoroughbred is an ideal population in which to conduct genetic research. One early impact of genetics in the Thoroughbred is likely to be in tailoring management regimes for individual horses to reduce the risk of injury and disease to which they are susceptible, and therefore to maximise their racing and breeding potential. There will be some (probably a small number) of monogenic autosomal recessive conditions that can be reduced in the Thoroughbred population by the identification of carriers and avoiding carrier to carrier matings. The selection of matings that are most likely to produce healthy, durable and successful racehorses will rely on the use of EBVs and GEBVs. Although a test is available that indicates the type of race that is best suited to a horse, no test is currently available to predict an animal's athletic ability. Since the heritability of performance is about 35%, other, nongenetic, factors play a bigger part in any horse's racecourse performance. The Thoroughbred industry, and those conducting research on its behalf, will need to be clear on the primary breeding objective. This may be for racing performance or the reduction of disease and injury. With well directed research, these 2 aims can be complementary rather than exclusive.

Conflicts of interest

The Veterinary Advisory Committee of HBLB commissioned and sponsored this article as part of a series summarising progress made in areas relating to their priorities for research funding. The author and other workers at his Institute hold current and previous research grants funded by the HBLB.

Equine Veterinary Journal is delighted to publish HBLB's Advances in Equine Veterinary Science and Practice Review Series in recognition of the major contribution that HBLB research and educational funding has made to the health and welfare of the Thoroughbred.