Accuracies of genomic prediction for twenty economically important traits in Chinese Simmental beef cattle

Summary Genomic prediction has been widely utilized to estimate genomic breeding values (GEBVs) in farm animals. In this study, we conducted genomic prediction for 20 economically important traits including growth, carcass and meat quality traits in Chinese Simmental beef cattle. Five approaches (GBLUP, BayesA, BayesB, BayesCπ and BayesR) were used to estimate the genomic breeding values. The predictive accuracies ranged from 0.159 (lean meat percentage estimated by BayesCπ) to 0.518 (striploin weight estimated by BayesR). Moreover, we found that the average predictive accuracies across 20 traits were 0.361, 0.361, 0.367, 0.367 and 0.378, and the averaged regression coefficients were 0.89, 0.86, 0.89, 0.94 and 0.95 for GBLUP, BayesA, BayesB, BayesCπ and BayesR respectively. The genomic prediction accuracies were mostly moderate and high for growth and carcass traits, whereas meat quality traits showed relatively low accuracies. We concluded that Bayesian regression approaches, especially for BayesR and BayesCπ, were slightly superior to GBLUP for most traits. Increasing with the sizes of reference population, these two approaches are feasible for future application of genomic selection in Chinese beef cattle.


Introduction
Genomic prediction has been widely utilized to estimate genomic breeding values (GEBVs) for quantitative traits in breeding program of farm animals (Hayes et al. 2009;Goddard et al. 2010). The application of genomic selection is considered an important revolution for the theory of animal breeding over the past two decades (Hayes et al. 2009;Heffner et al. 2009;de Los Campos et al. 2013;Spelman et al. 2013). With the advances of genomic selection technologies, this strategy has led to dramatic increases in genetic progress in farm animals (Goddard et al. 2016;Kumar & Hedges 2016). For instance, rates of genetic gain per year for US Holstein increased from about 50% to 100% for yield traits and from threefold to fourfold for lowly heritable traits (Garcia-Ruiz et al. 2016).
In beef cattle, genomic prediction offers great promise to predict genetic merits of selection candidates, especially for traits that are difficult or expensive to measure, such as carcass merit traits. The success of genomic selection depends on the accuracies of GEBVs, which are largely affected by the predictive approaches, the size of reference population, trait heritability and the extent of the linkage disequilibrium between SNPs and QTL (Hayes et al. 2009;VanRaden et al. 2009). Many studies have assessed the predictive accuracies of GEBVs for economically important traits in different beef cattle populations using the BovineSNP50 BeadChip (Saatchi et al. 2011(Saatchi et al. , 2012Todd et al. 2014;Chen et al. 2015), and their results show varying degrees of accuracy for GEBVs. For instance, genomic prediction for growth, meat quality and reproduction traits in US Limousin and Simmental beef cattle revealed accuracies of GEBVs ranging from 0.39 to 0.76 and 0.29 to 0.65 respectively (Saatchi et al. 2012). Accuracies of GEBVs for US Angus ranged from 0.22 to 0.69 (Saatchi et al. 2011). Using genomic best linear unbiased prediction (GBLUP) and BayesB methods, the accuracies of genomic prediction for carcass traits in Canadian Angus and Charolais cattle varied from 0.16 to 0.6 (Chen et al. 2015). Using simulation studies of carcass traits in UK Limousin, terminal index accuracy of GEBVs varied from 0.18 to 0.73 (Todd et al. 2014). Only a few studies have evaluated the predictive accuracies of multiple methods using the Illumina BovineHD chip. For example, Neves et al. (2014) and Fernandes Jr. et al. (2016) evaluated genomic prediction for growth and carcass traits of Nellore cattle using GBLUP, BayesC and Bayesian Lasso methods. Their findings showed that the predictive accuracies varied among traits using different approaches.
Chinese Simmental is one of the predominant beef cattle in China (representing approximately 70% of beef market), which is particularly renowned for rapid growth rate and palatable meat quality (Niu et al. 2016). Previous studies have comprehensively investigated the molecular mechanisms underlying important traits, such as foreshank weight, triglyceride levels and shear force, using genomewide association studies (Wu et al. 2014;Xia et al. 2016;Zhang et al. 2016). Based on this population, genomic prediction using Bayesian regression methods with variable degrees of freedom and scale parameters have been evaluated for live weight and tenderloin weight . However, until now, no studies have evaluated the accuracies of genomic prediction for economically important traits such as growth, carcass (especially for retail beef cuts) and meat quality traits using multiple methods in Chinese Simmental cattle. The objective of this study was to estimate and compare the predictive accuracies and abilities of GEBVs for 20 traits, including growth, carcass and meat quality, using five methods (GBLUP, BayesA, BayesB, BayesCp and BayesR) in Chinese Simmental beef cattle.

Ethics statement
All animals were treated following the guidelines for the experimental animals established by the Council of China. Protocols of the experiments were approved by the Science Research Department of the Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS) (Beijing, China).

Animals and phenotypes
Animals originated from five farms in Ulgai Grassland, Xilingole League, Inner Mongolia of China. All animals were born between 2008 and 2013 and were weaned at approximately six months of age. After weaning, animals were moved to Beijing Jinweifuren farm for fattening and raised under the same feeding conditions. Animal were measured for growth traits every six or 12 months until slaughter. Live weight was measured after 24 h of fasting. Slaughter age ranged from 18 to 24 months. After slaughter, carcass traits and meat quality traits were measured according to the Institutional Meat Purchase Specification for fresh beef guidelines and GB/T 27643-2011. In this study, a total of 20 traits were measured and analyzed (Table 1): (i) growth traits: average daily gain (ADG; kg) was calculated by subtracting the entering farm weight from the live weight and dividing by the number of days spent in the farm, and live weight (LW; kg) was measured before slaughter with fasting 24 h; (ii) carcass traits: hot carcass weight (CW; kg), dressing percentage (DP; %), lean meat percentage (LMP; %), back fat thickness (BFT; mm) and retail beef cuts including striploin (ST; kg), spencer roll (SR; kg), chuck roll (CR; kg), tenderloin (TD; kg), fore shank (FS; kg), conical muscle (CM; kg), outside (OU; kg), Silverside (SI; kg), knuckle (KN; kg), inside cap off (ICO; kg), hind shank (HS; kg) and retail meat weight (RMW; kg); and (iii) meat quality traits: potential of hydrogen (pH) and shear force (SF, kg).

Genotype and quality control
A total of 1302 Simmental beef cattle were genotyped with the Illumina BovineHD SNP array. Missing SNPs were imputed using BEAGLE v3.3.1 software (Browning & Browning 2007). Prior to statistical analysis, genotypes were edited using PLINK v1.07 software (Purcell et al. 2007) for the following: (i) minor allele frequency (>0.05), (ii) proportion of missing genotypes (<0.05) and (iii) Hardy-Weinberg equilibrium (P > 10 À6 ). SNPs satisfying these criteria were used to interrogate the linkage disequilibrium with syntenic SNPs located with a window of 100 neighboring markers, which resulted in only one SNP form each pair of highly correlated SNPs (r 2 > 0.995) remaining in the SNP dataset. After these quality controls, the final data consisted of 1217 individuals and 459 268 filtered SNPs in the autosomes. Genotype data are available from the Dryad Digital Repository (https://doi.org/10. 5061/dryad.4qc06).

Statistical model and genetic analyses
Heritabilities were calculated using a restricted maximum likelihood method with an animal model in ASREML v3.0 software (Gilmour et al. 2009). Relationships between animals were estimated using a G matrix, where G was the genomic relationship matrix and inferred from SNP markers, as suggested by VanRaden (2008). The animal model included random additive polygenic effects, fixed effects and residual for all traits. The additive polygenic effects were treated as random and assumed to be mutually independent. Fixed effects in the model included gender, farm and year of measurement. In addition, animals' age at slaughter were considered covariates in the model except for average daily gain traits.
In all cases, the statistical model used was the following: GBLUP For GBLUP, y is an N 9 1 (N = number of observations) vector of phenotype in Equation 1, X is an incidence matrix of the fixed effects, Z is the incidence matrix allocating records to GEBVs, g is the vector of GEBVs and e is a vector of residuals. It is assumed that g follows a normal distribution Nð0; Gr 2 g Þ. Given b and g, y is conditionally independent and distributed as: ðyjb; g; r 2 e Þ $ NðXb þ Zg; Ir 2 e Þ

BayesA, BayesB and BayesCp
The Bayesian model used the same equation as in (1), where y, X, b and e were defined as before, but g represented an M 9 1 vector of SNP marker effects; Z is an N 9 M matrix of SNPs, coded with values 0, 1 or 2 for genotypes 11, 12 and 22 respectively; and Z ij denotes marker j of individuals i.
The prior for g j depends on the variance r 2 j and the prior probability p. In BayesA, all SNPs have effects, i.e. r 2 j $ x À2 ðv; vS 2 a Þ, and p is equal to 0. r 2 j denotes that SNP j has its own variance, with the parameters of v and S 2 a . In BayesB, the two-component mixture, with one component being tð0; v; S 2 a Þ and the other component being a spike at 0, are provided as: a j jr 2 j $ ðiddÞ 0 x0026; with probability p Nð0; r 2 j Þ x0026; with probability ð1 À pÞ where j = 1, . . .. . . , P Here, p represents the proportion of SNPs with no genetic effects on the trait of interest. S 2 a is derived using following equation, where v is 4.2, as reported by Meuwissen et al. (2001). In BayesCp, the SNP effects have a common variance, r 2 j ¼ r 2 g ðj ¼ 1; . . .; PÞ, and r 2 g follows a scaled inverse v 2 prior with parameters v and S 2 a . As a result, the SNP effects with probability (1 À p) follows a mixture of multivariate Student's t-distributions tð0; v; IS 2 a Þ. The p parameter is treated as unknown with a uniform (0,1) prior distribution.

BayesR
BayesR is an extension of BayesCp, where SNP effects are assumed to be sampled from a mixture of normal distributions (Erbe et al. 2012;Bolormaa et al. 2013). The variance of each component of the mixture is fixed (0, 0.01%, 0.1% or 1% of the genetic variance). The number of SNPs belonging to each component of the mixture is assumed to come from a multinomial distribution with proportions p i (i = 1, 2, 3 or 4) in which the p i is drawn from a Dirichlet distribution (a multivariate generalization of a beta distribution) with pseudo-counts of 1 for each component of the mixture. Thus, the prior assumes that the four components of the mixture are equally probable but with minimal prior knowledge of these probabilities.

SNP effects estimation
SNP effects were estimated using the Markov chain Monte Carlo sampling algorithm in BayesA, BayesB, BayesCp and BayesR. Markov chains were run for 50 000 cycles of Gibbs sampling. The first 10 000 were discarded as burn-in. Then GEBVs were calculated as GEBV i = ∑Z ij a j .

Validation of the models
To assess the predictive accuracies for 20 economically important traits, we used a five-fold cross-validation method (Luan et al. 2009). Overall, 1217 individuals were divided into reference and validation populations. The genotyped individuals were randomly divided into five groups, whereas phenotypes of animals in validation set were masked to be unknown. Thus, 974 individuals were randomly sampled as the reference set, and the remaining 243 individuals as the validation set. The whole procedure was repeated 10 times.

Comparison criteria
Three methods were utilized to evaluate the predictive ability based on comparison of GEBVs with corrected phenotypes of animals in the validation population: The correlations between GEBV and corrected phenotype were calculated to evaluate the predictive ability (r GEBV;ŷ ). To remove the influence of heritability on the predictive ability, Pearson's correlation between GEBV and corrected phenotype was divided by the square root of heritability (r GEBV;ŷ ffiffiffiffiffi h 2 p ), whereŷ was the corrected phenotype. This value is approximate to the correlation between the true breeding value and GEBVs (Pryce et al. 2012). The slope of the regression ofŷ on GEBV for animals in the validation population (bŷ ;GEBV ) was calculate to measure the degree of inflation or deflation of genomic prediction. Estimates of bŷ ;GEBV close to 1 are indicative of predictions that are similar to that of corrected phenotype on scale.
The mean squared error of prediction (MSE) betweenŷ and GEBV in the validation population was used to measure the overall fit of model, and the computation equation was ðGEBV ÀŷÞ, where N is the number of individuals. A large estimated value of predictive accuracy is indicative of reliable prediction, and a low MSE value means a better overall fit.

Heritability estimates
Heritability estimates for 20 traits ranged from 0.04 to 0.62. We found that 10 traits showed relatively high heritabilities:

Predictive ability of five methods
We evaluated the predictive abilities for these 20 traits using different methods based on 5-fold cross-validation. The summary of predictive results is presented in Table 3. Overall, we observed that the predictive abilities (r GEBV;ŷ ) ranged from 0.059 (for LMP estimated by BayesCp) to 0.376 (for HS estimated by BayesB; Table 3). The average predictive abilities across 20 traits were 0.216, 0.216, 0.221, 0.220 and 0.225 for GBLUP, BayesA, BayesB, BayesCp and BayesR respectively. To investigate the relationship between predictive ability and trait heritability, we next estimated the regression coefficient of predictive ability on heritability for these methods. Our results showed that the predictive abilities were linearly correlated with trait heritability. The strongest correlation (0.5) was observed across 20 traits using GBLUP compared with Bayesian methods (Fig. 1). Among Bayesian methods, we found the highest regression coefficient of predictive ability on heritability was 0.47 in BayesR.

Scale of genomic predictions and mean squared prediction error
The regression coefficient of corrected phenotype on GEBV was calculated as a measurement of the bias for the prediction. In this study, we observed that predictions of GEBV using both GBLUP and Bayesian regression methods were inflated for most traits, whereas for traits ST, TD and pH, the predictions from GBLUP and Bayesian regression methods tended to be slightly deflated (Table 4). We found that predictions of GEBV using GBLUP, BayesA and BayesB were inflated for LW and RMW, whereas those from BayesCp and BayesR were slightly deflated. The average regression coefficients across traits were 0.89, 0.86, 0.89, 0.94 and 0.95 for GBLUP, BayesA, BayesB, BayesCp and BayesR respectively.
For most traits, we found GBLUP generally outperformed Bayesian regression methods based on the MSE (Table 4). However, for ADG, LMP, TD, FS, BI, KN, HS and RMW, lower estimates of MSE were obtained for Bayesian methods.

Discussion
Previous studies have been conducted for genomic prediction in multiple breeds including Angus, Limousin, Simmental, Charolais, Hereford, Japanese Black, Nellore and other crossbreds using BovineSNP50 and BovineHD SNP arrays (Saatchi et al. 2011(Saatchi et al. , 2012Bolormaa et al. 2013;Akanno et al. 2014;Gunia et al. 2014;Hulsman Hanna et al. 2014;Onogi et al. 2014;Todd et al. 2014;Rolf et al. 2015;Fernandes Jr. et al. 2016). Su et al. (2012) have investigated the difference of predictive accuracies between the BovineHD array and BovineSNP50 using the GBLUP method, and they found that the reliability of GEBV for protein, fertility and udder health traits using the BovineHD array was higher (0.5-1%) than that of 54K array in Holstein.
We previously investigated the pattern of linkage disequilibrium using the BovineHD SNP array in Chinese Simmental cattle, and our findings suggested that the high density SNP array was sufficient to achieve high accuracy for genomic prediction in Chinese Simmental population (Niu et al. 2016). To our knowledge, this study is the first attempt to investigate the performance of genomic prediction for 20 economically important traits using BovineHD SNP arrays in Chinese Simmental beef cattle.

Comparisons of genomic prediction methods
In the current study, we found that the Bayesian regression approaches performed better than did GBLUP for most traits. Based on the estimations of predictive accuracies and regression coefficients, Bayesian regression methods were superior to GBLUP, whereas GBLUP had smaller MSE compared to Bayesian regression approaches.
Previous studies suggested the superiority of Bayesian regression approaches over GBLUP when the number of SNPs is larger than the genotyped animals, i.e. several simulation studies revealed that Bayesian regression approaches have higher accuracies than does GBLUP (Meuwissen et al. 2001;Habier et al. 2007;Solberg et al. 2008;Clark et al. 2011). These findings were also consistent with many previous studies using real data (Erbe et al. 2012;Pryce et al. 2012;Gunia et al. 2014;Neves et al. 2014;Fernandes Jr. et al. 2016). In this study, we found that genomic predictions using Bayesian regression approaches were superior to that of GBLUP (Table 3). This can be explained by the fact that the assumption of Bayesian approaches is more suitable for fitting the genetic architecture of quantitative trait (Rolf et al. 2015). However, most of the Bayesian regression approaches, except for BayesA, showed high prediction accuracies for these 20 traits. A large number of SNPs with small effects in BayesA are likely to cause noise for the estimation of the GEBVs (Habier et al. 2011). In contrast, Bayesian approaches like BayesB, BayesCp and BayesR assume that only a small proportion of markers have effects, which may avoid the potential bias caused by linkage disequilibrium (Erbe et al. 2012).
Moreover, we observed higher predictive accuracies using BayesR for most of the traits, which implies the segregation of genes with larger effects for them. For traits with mutations of moderate effect segregating and a high number of significant SNPs, a recent study showed that the accuracy of GEBVs with BayesR was higher than with GBLUP (Bolormaa et al. 2013). In our study, the average predictive accuracies of 20 traits for BayesR increased~1.7% compared to GBLUP. Among 20 traits, the highest increase in accuracy of BayesR over GBLUP was observed for ADG (2.9%) and ST (3.1%). We also identified several SNPs with large effects for ADG using BayesR. These SNPs had been previously identified as significant associated SNPs in the gene NCAPG and can explain~4.01% of the phenotypic variances .

Predictive abilities and accuracies
The accuracy of GEBVs can be affected by trait heritabilities, size of training population, and breed and statistical method (Bolormaa et al. 2013). For example, traits with high heritability (h 2 ) and a large training population (n = T) give higher accuracies, which can be expected from the theory that Th 2 is a critical parameter (de Roos et al. 2008). In our study, we found that Bayesian regression approaches outperformed GBLUP in predictive accuracies for most traits. Moreover, the accuracy of GEBVs for several important traits varied in other cattle populations. For instance, Neves et al. (2014) presented the results of implementation of genomic prediction for weight and carcass traits, gestation length and scrotal circumference traits in 685 Nellore cattle. They found that the average accuracy was 0.39 for GBLUP and 0.44 for both BayesC and Bayesian LASSO methods, which was higher than in our study. Under datasplitting strategies by birth year, Chen et al. (2015) accessed the predictive accuracies for hot carcass weight via PBLUP (0.33 for Angus, 0.42 for Charolais), GBLUP (0.34 for Angus, 0.18 for Charolais) and BayesB (0.34 for Angus, 0.21 for Charolais) using the BovineSNP50 Beadchip. Using five-fold cross-validation, we found that the predictive accuracy of carcass weight with Bovine HD SNP array was higher than those in Chen et al.'s (2015) study. Also, Bolormaa et al. (2013) found that the accuracy of genomic prediction for ADG and CW using the GBLUP approach in multiple beef cattle populations were 0.21 and 0.27. Their results also suggested relatively low accuracy compared to the results in our study (ADG, 0.312 and CW, 0.40). Using K-means clustering validation, the predictive abilities of CW, BFT and SF were 0.59, 0.29 and 0.53 in 2703 registered Simmental beef cattle (Saatchi et al. 2012). Their results revealed higher predictive ability for CW, BFT and SF compared with our study, and this finding is likely to be explained by the large population utilized in their study. In addition, other studies have performed genomic prediction for the ADG, CW, SF and BFT traits in diverse populations (Akanno et al. 2014;Rolf et al. 2015;Fernandes Jr. et al. 2016). However, no study has been reported for genomic prediction of DP, LMP and primal cuts in beef cattle.

Scale of genomic predictions
The scale of predictions is an important factor in determining whether GEBVs can be used for genetic evaluation. For instance, the results of one previous study suggested that overestimation of the genetic merit may cause potential exaggeration of GEBVs compared with traditional EBVs when both progeny-tested and genomic selection were used for selecting candidates (Vitezica et al. 2011). In our study, the average regression coefficients across traits were 0.89, 0.86, 0.89, 0.94 and 0.95 for GBLUP, BayesA, BayesB, BayesCp and BayesR respectively (Table 4). This finding indicates that the BayesR and BayesCp approaches generate more reliable predictions for these traits in Chinese Simmental population.
The estimation of scale of genomic predictions may vary with population, genetic inherent of the studied trait and the statistical approaches (Neves et al. 2014). Among our studied traits in Chinese Simmental beef cattle, 13 traits generated inflated prediction using GBLUP and Bayesian approaches. Many studies showed a similar inflation trend of genomic prediction in Nellore, Nellore-Angus crossbred, Holstein and Jersey populations using Bayesian approaches (Duchemin et al. 2012;Erbe et al. 2012;Hulsman Hanna et al. 2014), whereas other studies revealed opposite trends in American Angus using GBLUP and in French Holstein and Montbeliarde populations using Bayesian approaches (Saatchi et al. 2011;Colombani et al. 2013).

Conclusions
Using multiple methods (GBLUP, BayesA, BayesB, BayesCp and BayesR), we conducted genomic prediction for economically important traits including growth, carcass (especially on retail beef cuts) and meat quality traits in Chinese Simmental cattle. Bayesian regression approaches, especially BayesR and BayesCp, were superior to GBLUP for most traits. Thus, it may be feasible to apply these approaches for genomic prediction of these economically important traits in Chinese Simmental beef cattle. Further improvements are required to enlarge population size and reduce inflation of predictions. In addition, our findings provide valuable insights for further implementation of genomic selection in the commercial beef industry.