Summary
- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- Conflicts of interest
- References
One hundred and eighty-nine Thoroughbred horses that had won Graded Stakes races in North America were genotyped with the Illumina Equine SNP50 bead chip. Association tests using PLINK to determine whether any SNPs were associated with optimum racing distance (7 furlongs and under compared to 8–10 furlongs) identified a locus on ECA18 that was statistically significant (−log 10 EMP2 = 1.63) at the genome-wide level following permutation analysis (10 000 permutations). Bioinformatic analysis revealed that the two ECA18 SNPs with the highest statistical significance spanned the MSTN (myostatin) locus. Mutations in myostatin in several mammalian species have been associated with increased muscling, with a preferential increase in fast glycolytic type IIB fibres, which would increase power potential. Thoroughbred horses that race over sprint distances, which are 5–7 furlongs, are often characterized by impressive hind quarter musculature, strongly suggesting that the association observed between the ECA18 SNPs and optimum race distance is mediated through MSTN.
Introduction
- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- Conflicts of interest
- References
The Thoroughbred horse breed was established in England in the early 1700s based on crosses between stallions of Arabian origin and a poorly defined, possibly indigenous, group of mares. The founder population was small; all contemporary males trace back to one of three stallions, the Godolphin Arabian, the Byerley Turk and the Darley Arabian, whilst on the female side about seventy foundation mares have been identified (Willett 1970). A stud book for Thoroughbred horses was initiated in 1791, and pedigree records for the breed, which now numbers about 500 000 horses, are maintained by Thoroughbred registries worldwide.
Initially, the new breed raced over extended distances; match races in which two horses competed against each other over 4–6 miles were common, and horses frequently had to run multiple heats during a day to determine the winner. Today’s English Classic races, the St. Leger (run over 1¾ miles), the Derby and Oaks (run over one and a half miles) and the 1000 and 2000 Guineas (run over 1 mile) were established in 1776, 1780,1779, 1809 and 1814, respectively. The decreasing distances over time suggest that there was already a trend towards racing over shorter distances by 1800, and the fashion has continued to the point that, in North America, races at one and a half miles and over are now frequently referred to as ‘marathons’ and are relatively uncommon. In North America, whilst the classic Triple Crown races, The Kentucky Derby, The Preakness and the Belmont are run over one and a quarter
and one and a half miles, respectively, there are important races at 7 furlongs and under, and there has been an emphasis on selecting for speed through the fast pace at which most US races are run when compared to European races. In general, contemporary Thoroughbred horses are classified by optimum distance into sprint, middle distance and stayers, and distinct pedigrees specific for each distance have been developed by breeders.
The biochemistry of this distance specialization has remained obscure. An examination of the muscle fibre types present in sprinters and stayers and the response of muscle to different training regimes has been hampered by the difficulty of reliably sampling the same muscle location in different individuals (Snow & Guy 1980; Sewell et al. 1994).
This study seeks to investigate whether there are any specific loci in the equine genome associated with distance specialization in the Thoroughbred horse using a whole genome association approach.
Results
- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- Conflicts of interest
- References
One hundred and eighty-nine horses that had won Graded Stakes races were classified into those that had won races at seven or less furlongs (n = 125), those that had won at eight to ten furlongs (n = 44) and those that had won races at more than 10 furlongs (n = 20).
Whole genome SNP genotypes were determined using the Illumina Equine SNP50 bead chip system, which comprises 54 602 SNPs uniformly distributed across the equine genome. The ASSOC option was run to compare the genotypes of the 7 furlongs and under group to the 8–10 furlong group. The results are presented in Fig. 1a, where it can be seen that the SNPs with the most significant P values (smallest P value = 2.77E-07) are located on ECA18. To confirm the significance of this association, a permutation analysis with 10 000 permutations was carried out using the MPERM option in PLINK. The results of the permutation analysis, reported as max (T) empirical P values (EMP2) are presented in Fig. 1b. Genome-wide significance for the horse is estimated to be −log 10 EMP2 = 1.3. Two markers on ECA18, BIEC2-417274 (located at 65 868 604 bp) and BIEC2-417495 (located at 67 186 093 bp) reach genome-wide significance with −log 10 EMP2 values of 1.3829 and 1.6345, respectively. The location of these markers on ECA18, together with other SNPs in the immediate region that show association but do not reach genome-wide significance, is shown in Fig. 1c.
It should be noted from Fig. 1b that a second locus on ECA17, whilst not reaching genome-wide significance (−log 10 EMP2 = 0.9179), is high enough to suggest that a second locus may make a contribution to optimum racing distance.
A second analysis compared the 7 furlongs and under group of horses (n = 125) to the group that won at over 10 furlongs (n = 20). Whilst the same region on ECA18 was again identified, there was insufficient power in this analysis to reach genome-wide significance with the highest −log 10 EMP2 = 0.261, because of the limited number of samples in the longer-distance category.
The genotype and allele frequencies for BIEC2-417495 (the SNP with the highest statistical significance) were estimated from a set of 464 Thoroughbred horses that had not been selected on the basis of distance. The A allele that is associated with a longer optimum distance was present at a frequency of 0.473, whilst the G allele associated with sprinting was present at a frequency of 0.527. The genotype frequencies for the AA:AG:GG genotypes in this population were 0.189:0.567:0.244, respectively.
Discussion
- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- Conflicts of interest
- References
The association analysis comparing Thoroughbred horses that won Graded Stakes races at a distance of 7 furlongs or less with those that won Graded Stakes races over a distance of 8–10 furlongs identified a single locus on ECA18 that achieved genome-wide significance following permutation analysis.
Whilst a clear association peak involving a large number of SNPs towards the distal end of ECA18 is evident in Fig. 1c, only two SNPs, BIEC2-417274 (located at 65 868 604 bp) and BIEC2-417495 (located at 67 186 093 bp), reach genome-wide significance with −log 10 EMP2 values of 1.3829 and 1.6345, respectively. When the genes in this region of ECA18 were examined using the Ensembl genome browser, it was noted that the MSTN (myostatin) gene is located between the two significant SNPs at 66 490 208–66 495 180 bp.
Traditionally, the Thoroughbred horse industry has regarded horses that race over less than seven furlongs, between seven and twelve furlongs, and longer than twelve furlongs as different, referring to them as sprinters, middle-distance and stayers, respectively. The pedigrees of horses racing over these different distances tend to be different and breeders have frequently tried to produce middle-distance horses by breeding sprinters to stayers. The horses in the different distance groups also often possess distinct body morphologies, with the sprinters being heavily muscled, whilst the middle-distance and staying horses have a lighter musculature. This phenotypic difference suggests that myostatin might play a role in the distance preference of Thoroughbred horses.
The increased muscling seen with myostatin mutations does not affect all muscle fibre types evenly, and there is a preferential increase in fast glycolytic type IIB fibres (Deveaux et al.2001; Hennebry et al. 2009), which is consistent with improved sprinting ability.
The identification of SNP markers linked to the myostatin loci that are associated with athletic ability in the Thoroughbred horse represents the first gene influencing athletic ability identified in horses.
Whilst this paper was in preparation, Hill et al. (2010) published research showing that a polymorphism in the myostatin gene is strongly associated with best race distance in elite Thoroughbred horses. In their study, myostatin was selected as a candidate gene, and whilst the intronic mutation used is in the myostatin gene, they state that the results do not preclude the functional variant being in a neighbouring gene. They state that there are no other plausible candidates within 2 Mb upstream or downstream of the myostatin gene, although it is unclear whether the functional mutation could be further away than this. In addition, they do not appear to have controlled for the possibility of population stratification between the sprinting and staying populations by testing association with other genes unlikely to be involved in muscle physiology.
In this study, following a whole genome-wide association analysis strategy, we identify a single locus on ECA18 that is significant at the genome-wide level following permutation analysis. In our analysis, population stratification between sprinters and middle-distance horses is internally controlled for, and excluded, by the remaining 54 600 SNPs uniformly distributed across the genome. No other region of the genome showed a genome-wide statistically significant difference, indicating that no population stratification exists.
The results clearly have implications for the Thoroughbred racing and breeding industries, although it is important to stress that the markers identified do not in any way define the likely racing ‘class’ of the individual, merely over what distance the horse is likely to be most effective. It would be wrong to directly equate the preference for sprint distances with speed, and myostatin does not represent ‘the speed gene’. There will be many horses that, whilst homozygous for the sprint genotype, are poor racehorses and would be beaten over sprint distances by better racehorses with one of the other two genotypes.