SASHAYDIALL: A SAS Program for Hayman’s Diallel Analysis

Different methods of diallel crossing are commonly used in plant breeding. The diallel cross analysis method proposed by Hayman is particularly useful because it provides information, among others, on additive and dominance effects of genes, average degree of dominance, proportion of dominance, direction of dominance, distribution of genes, maternal and reciprocal effects, number of groups of genes that control a trait and exhibit dominance, ratio of dominant to recessive alleles in all the parents, and broad-sense and narrow-sense heritability. In this paper, we fully describe a SAS-based software SASHAYDIALL for performing a complete diallel cross analysis based on Hayman’s model with or without reciprocals. We demonstrate the use of SASHAYDIALL with two data sets; one is a published diallel cross data set with reciprocals in cabbage (Brassica oleracea L.), and the second is a data set from a multilocation diallel cross trial in maize (Zea mays L.) without reciprocals. With SASHAYDIALL, diallel experiments conducted in single sites can be analyzed to estimate various genetic parameters, and this analysis is extended over locations or environments to assess genetic effect × environment interaction. SASHAYDIALL is user-friendly software that provides detailed genetic information from diallel crosses involving any number of parents and locations.

T he diallel cross, made by crossing a set of genotypes in all possible combinations, is one of the most popular mating designs used in plant breeding. There are variations of the diallel depending on whether the parents and reciprocals are evaluated together with the F 1 s. Different methods of diallel cross analysis were developed by Jinks and Hayman (1953), Hayman (1954aHayman ( , 1954b, Griffing (1956), and Gardner and Eberhart (1966). The most commonly used method of diallel analysis is the one developed by Griffing (1956), in which four different methods of analyses were proposed. Griffing's (1956) methods of analysis provide estimates

SASHAYDIALL: A SAS Program for Hayman's Diallel Analysis
Dan Makumbi,* Gregorio Alvarado, José Crossa, and Juan Burgueño ABSTRACT Different methods of diallel crossing are commonly used in plant breeding. The diallel cross analysis method proposed by Hayman is particularly useful because it provides information, among others, on additive and dominance effects of genes, average degree of dominance, proportion of dominance, direction of dominance, distribution of genes, maternal and reciprocal effects, number of groups of genes that control a trait and exhibit dominance, ratio of dominant to recessive alleles in all the parents, and broadsense and narrow-sense heritability. In this paper, we fully describe a SAS-based software SASHAYDIALL for performing a complete diallel cross analysis based on Hayman's model with or without reciprocals. We demonstrate the use of SASHAYDIALL with two data sets; one is a published diallel cross data set with reciprocals in cabbage (Brassica oleracea L.), and the second is a data set from a multilocation diallel cross trial in maize (Zea mays L.) without reciprocals. With SASHAYDIALL, diallel experiments conducted in single sites can be analyzed to estimate various genetic parameters, and this analysis is extended over locations or environments to assess genetic effect ´ environment interaction. SASHAYDIALL is user-friendly software that provides detailed genetic information from diallel crosses involving any number of parents and locations.

Theory of Hayman's Diallel Analysis
Considering that an "array" refers to all the crosses involving a particular parent, the steps required to carry out Hayman's diallel analysis are (i) ANOVA to detect genetic variation among the genotypes, (ii) creation of the array of means in the diallel table, (iii) calculation of the expected variance (V ri ) and parent-offspring covariance (W ri ) of individual arrays, (iv) calculation of the mean variance ( r V ) and covariance ( r W ) over all arrays, (v) calculation of the variance of the array means ( r V ), (vi) testing the validity of the additive-dominance model, (vii) generation of the W r − V r graph, and (viii) estimation of the genetic components.
In Hayman's method of diallel cross analysis, six variances and covariances are calculated from the data set. These are the variance among parents (V P ); the variance among family (V r ) (F 1 + reciprocal) means within an array; the mean value of V r over all arrays ( r V ); the variance among the means of the arrays ( r V ); the covariance between families within the ith array and their nonrecurrent parent (W r ), and the mean value of W r over all arrays ( r W ). According to the theory of Hayman, the parameters for additive (D), and dominance gene effects (H), and the distribution of genes (F) are defined as: in which u and v are the frequencies of increasing and decreasing alleles, respectively, and u + v = 1, d represents the gene's contribution to the fixable or additive genetic table, j r is mean deviation from the grand mean due to the rth parents, l is mean dominance deviation, l r is further dominance deviation due to the rth parent, l rs is the remaining discrepancy in the rsth reciprocal sum, k r is average maternal effect of each parental line, and k rs is the variation in the rsth reciprocal differences (Hayman, 1954a). The parameters in the model measure different sources of variation whereby j r = a (variation due to additive genes), l = b 1 (mean dominance deviation), l r = b 2 (further dominance deviation due to the rth parental line), l rs = b 3 (residual dominance variation), k r = c (average maternal effects of each parental line), and k rs = d (variation in the reciprocal differences not due to c) (Hayman, 1954a).
The SASHAYDIALL program is based on the mathematical derivations presented in Hayman (1954aHayman ( , 1954b and Mather and Jinks (1971). For the analysis to proceed, the SASHAYDIALL program requires an input "csv" format data file that has replication (REP), Parent 1, Parent 2, and the trait of interest for a single site experiment. The data are arranged as REP P1 P2 VAR where VAR is the name of the trait to be analyzed. For data obtained from multiple environments, the user should include a variable for the locations, and the arrangement could be REP P1 P2 ENV VAR, but any order can be provided by the user. The user can include another variable "ENTRY" if they so wish, but this is not required by SASHAYDIALL for analysis. It is important that locations or environments are numbered sequentially in the "csv" file. The file with data to be analyzed can be saved in any directory. The user is only required to specify the location and name of the data file to be analyzed in the SASHAYDIALL program. The program automatically detects the number of parents and presence or absence of reciprocals in the diallel cross data set. Before analysis can proceed, the user is required to provide responses in two input windows: one window for variable information, and the other to indicate whether the data to be analyzed are from single or multiple environments ( Fig. 1).
As a first step, a general ANOVA of the data using both PROC MIXED and PROC GLM (for single and across locations or environments) is executed to detect differences among genotypes. The SASHAYDIALL program will then ran ANOVA for the diallel table in presence or absence of reciprocals. The main effects in the ANOVA are tested for significance using both their interaction with replication and the residual as the error terms, and output for both cases is provided. The main effects can be tested against the residual if the error variances are homogenous (Mather and Jinks, 1971). The SASHAYDIALL program will then execute various computations as described by Hayman (1954b), including creation of an array of variances (V ri ) and covariances (W ri ), calculation of the four second-degree statistics, and testing the adequacy of the additive-dominance model using a t test. Estimates of variation, and h is the difference between the heterozygote and the mid-homozygote values (Hayman, 1954a(Hayman, , 1954b. In Hayman's diallel analysis, four second-degree statistics are calculated from the parents and F 1 progeny. These are V 0L0 (variance of parents), V 1L1 (mean variance of arrays), V 0L1 (variance of the array means), and W 0L01 (mean covariance of array means) (Hayman, 1954a(Hayman, , 1954b. These seconddegree statistics are related to the three genetic components of the variation as shown below (Hayman, 1954b): The ANOVA of a diallel table according to Hayman includes main effects denoted a (additive genetic effects), b (dominance genetic effects), c (average maternal effects of each parental line), and d (variation in the reciprocal differences not attributed to c) (Hayman, 1954a). The main effect b is further partitioned into three effects, namely, b 1 (test of mean deviation of F 1 from their mid-parental values), b 2 (test of whether mean dominance deviation of the F 1 from their mid-parental values within each array differs over arrays), and b 3 (test of dominance deviation that is unique to each F 1 ) (Hayman, 1954b;Mather and Jinks, 1971). The analysis developed by Hayman is related to that of Griffing (1956), but the two methods differ in the genetic assumptions and interpretations. Griffing's GCA, SCA, and reciprocal effects components are equivalent to Hayman's a, b, and (c + d) components, respectively (Mather and Poysa, 1983). The analysis proposed by Hayman required a complete diallel, but Morley Jones (1965) extended Hayman's analysis for the half-diallel. Hayman's method of diallel analysis also includes graphical analysis, whereby W r is plotted against V r . In the W r − V r graph, the dominance order of the parents can be inferred from the relative position of the array points along the regression line of W r on V r. The intercept of the regression line on W r provides information on the degree of dominance in the genetic material under question.

PROGRAM DESCRIPTION
The SASHAYDIALL program was written in SAS/IML (SAS Institute, 2013) and runs in SAS (SAS Institute, 2014). SASHAYDIALL consists of codes that correspond to the steps necessary to execute diallel cross analysis according to Hayman (1954aHayman ( , 1954b). The linear model for Hayman's diallel analysis implemented in SASHAYD-IALL program is shown in the equations below: y rs = m + j r + j s + l + l r + l s + l rs + k r − k s + k rs (r ¹ s) [1] y r = m + 2j r − (n − 1)l -(n − 2)l r (r = s) [2] where y rs is the entry in the rth row (female parents) and sth column (male parents), m is the grand mean of the diallel genetic parameters including D, H 1 , H 2 , F, H 2 , h 2 , and the average degree of dominance, among others, are computed by SASHAYDIALL. The heritability estimates (H 2 and h 2 ) are computed using the formulae given by Mather and Jinks (1971). Finally, regression analysis for W r on V r is computed and a W r − V r graph is generated by SASHAYDIALL. A W r + V r vs. Y r (the mean parental value) graph is also plotted. These computations are performed for a single site and across locations or environments, depending on the data set. The SASHAYDIALL program is not computationally intensive.
Below, we provide limited parts of the SASHAYD-IALL program code to show different steps followed in Hayman's (1954b) method of diallel analysis. Brief comments are provided to guide readers on the functions of some of the SAS statements in the code. We also provide part of a maize diallel data set from multiple environments to show data arrangement for analysis using SASHAYD-IALL (see Supplemental Table S1). %let dir = D:\ /*specify location of file with data to be analyzed*/ ; %let FileName = MAIZEDIALLEL /*name of data file to be analyzed*/ ; PROC IMPORT datafile="&dir\&FileName..csv" /*to import the data file*/ out=DIALLEL dbms=csv replace; getnames=yes; RUN; Data DIALLEL; set DIALLEL; dsid=open('DIALLEL'); if varnum(dsid, 'Env')=0 then Env=1; rc=close(dsid); drop dsid rc; RUN; %global Y Parent1 Parent1 replication genotype Env; %macro testVAR(var); %let dsidvar=%sysfunc(open(&var)); %let nvars=%sysfunc(attrn(&dsidvar,nvars)); Fig. 1. Input windows for SASHAYDIALL with different data arrangements. SASHAYDIALL will read the variables in the data set, but the user is required to give the corresponding codes, specify the response variable to be analyzed, and indicate whether to analyze single or multiple environment data.

Examples of Hayman's Diallel Analysis using SASHAYDIALL
To demonstrate usage and show key output from SASHAYDIALL, we reanalyzed data from a cabbage diallel experiment with reciprocals (Tanaka and Niikura, 2006) and also analyzed data from a multilocation maize diallel without reciprocals.

Example 1: Cabbage Diallel
Details of the seven-by-seven cabbage diallel are found in Tanaka and Niikura (2006). We reanalyzed data of two parameters (width of the 15th wrapper leaf [W15] and leaf shape index of the 15th wrapper leaf [LSI15]). In the analysis of cabbage data, SASHAYDIALL performs the general ANOVA (Supplemental Fig. S1), which is the first requirement to test for significance among genotypes before proceeding with further analysis. For the cabbage diallel study, there were highly significant differences (P < 0.0001) among the genotypes, and based on this result, the user can proceed with interpretation of results from other analyses proposed by Hayman (1954b). For this data set, the SASHAYDIALL program detects the presence of reciprocals, and hence it computes ANOVA with items a, b (and its components), c, and d. The components a, b, c, and d are tested for significance using both their respective interaction with replication and the residual as the error term in the ANOVA. The user has to decide which output to use for interpretation, although Hayman (1954b) and Mather and Jinks (1971) recommended testing the significance of components a and b using their respective interaction with block as the error term. The ANOVA output for two traits, W15 and LSI15, (Supplemental Fig. S1) generated by SASHAYDIALL is similar to that presented in Table 3 of Tanaka and Niikura (2006), except for minor differences in estimation of some parameters for LSI15. The genetic parameters generated by SASHAYDIALL for the two traits in cabbage (Table 1) are nearly identical to those given in Table 4 of Tanaka and Niikura (2006). The genetic components of variation (D, H 1 , H 2 , F, and h 2 ) and their SEs are computed by SASHAYDIALL to allow for a test of significance. The W r −V r graphs plotted by SASHAYDIALL (Supplemental Fig. S2) show the distribution of dominant and recessive genes among the parents, and these graphs are similar to those presented by Tanaka and Niikura (2006).

Example 2: Maize Diallel Data from Multiple Locations
Hybrids without their reciprocals from a 13-by-13 maize diallel cross together with their parental inbred lines were evaluated at three locations in Kenya. The hybrid trial with 78 diallel hybrids and two check hybrids was laid out as an 8-by-10 a (0,1) lattice, whereas the parental trial with 13 parents and two check inbred lines was laid out as a three-by-five a (0,1) lattice with two replications. Days to anthesis (DTA, days from planting to when 50% of the plants had shed pollen) were recorded for the hybrids and inbred lines and the data analyzed using SASHAYD-IALL. Results of the general and genetic effects ANOVA by location and across locations are presented in Table 2. There were significant differences among genotypes, and therefore further analysis according to Hayman (1954b) is valid. Without reciprocals, SASHAYDIALL only computes components a and b. In this example both a and b gene effects were highly significant (P < 0.001) for DTA at each location. Significance of component b indicates presence of dominance for this trait. The genetic parameters for DTA are estimated by SASHAYDIALL across locations (Table 3) and for individual locations (Supplemental Table S2). In this example, genetic components D, H 1 , H 2 , and h 2 were all significant, and dominance genetic variance was larger than additive genetic variance across locations. Significance of both D and H components suggests that DTA is controlled by both additive and dominant effects. Furthermore, an estimate of the number of groups of genes that control DTA and exhibit dominance and heritability (broad and narrow sense) are provided among others (Table 3,  Supplemental Table S2). The relationship between covariance of parental inbred lines and hybrids (W r ) and variance of the F 1 hybrids is shown in the W r − V r graph, which gives the ranking of inbred lines for frequency of dominant alleles for DTA across locations (Fig. 2) and individual locations (Supplemental Fig. S3a and S3b). In addition, the W r − V r graph plotted by SASHAYDIALL shows the proportions of dominant to recessive genes (75:25, 50:50, and 25:75%) (Fig. 2, Supplemental Fig. S3a and S3b). The SASHAYDIALL program also plots a graph of W r + V r against Y r (Supplemental Fig. S4).

DISCUSSION
Diallel mating designs provide important genetic information useful in a breeding program. The SASHAYDIALL program described in this paper is based on models for analysis of a diallel developed by Jinks and Hayman (1953) and Hayman (1954aHayman ( , 1954b, and extended by Morley Jones (1965) for the half-diallel, which is frequently used. This method of analysis has been used to analyze diallel cross data sets with and without reciprocals in many crops. Several genetic components estimated by Hayman's method are related to the components in the method proposed by Griffing (1956), and hence output from the two methods of analysis can be compared by the breeder.
The SASHAYDIALL program is user friendly, as the user only needs to specify the location and name of the file with data to be analyzed, provide the variables required for the analysis, and indicate whether to analyze data from single or multiple locations. Breeders typically evaluate progenies from diallel crosses in multiple locations, and SASHAYDIALL can handle analysis of such data easily. The analytical procedure in SASHAYDIALL is automated, as the program automatically detects the number of parents in the diallel cross data set and presence or absence of reciprocals and computes the number of genotypes. The significance of genetic components a, b, c, and d is tested using both the pooled error and block interaction as the error terms (Mather and Jinks, 1971). The user should decide which output to use for interpretation, although Hayman (1954b) and Mather and Jinks (1971) provided recommendations that can be followed  Tanaka and Niikura (2006). ¶ Parameter estimates are presented with more accuracy for purposes of illustration only.
by the user. The genetic (D, H 1 , H 2 , F, and h 2 ) and environmental (E) components are computed for single and multiple locations together with their SEs, which enables a test for their significance.
Estimates of h 2 and H 2 , mean degree of dominance, and the number of groups of genes that control the trait of interest are computed to aid in the interpretation of the inheritance of a trait by the breeder. The W r − V r graph that provides an insight into the order of dominance and an estimate of the proportions of dominant to recessive alleles among the parents is generated. The W r and V r output can be used in other software to generate the W r − V r graph with the limiting parabola, which is not provided for in this program. The plot of W r + V r against Y r generated by SASHAYDIALL can give an indication of the effect of dominant or recessive alleles on expression of a trait.
In summary, the SASHAYDIALL program was written to provide user-friendly, freely available analytical software for Hayman's diallel analysis. SASHAYD-IALL has an advantage that the user does not have to indicate the number of parents, as this is automatically detected and the program can analyze data from multiple environments. The SASHAYDIALL program generates comprehensive output that is easy to understand for proper genetic interpretation of the inheritance of a trait by the breeder. This program should revive the interest in application of Hayman's (1954b) method of diallel analysis because of ease of use. The SASHAYD-IALL program runs in SAS (SAS Institute, 2014), which is among the most powerful and widely used software for statistical analysis. An interested user does not need any knowledge of the SAS/IML language to analyze data with this program. The SASHAYDIALL program is not computationally intensive and should therefore run on slower computers. Users are advised against making any changes to the program code.

AVAILABILITY
The SASHAYDIALL program described in this paper was developed and tested in SAS version 9.4 (SAS Institute, 2014), but it should work with SAS version 9.0 and upward. The SASHAYDIALL program is freely available to interested users from the corresponding author or from the CIMMYT Biometrics and Statistics Unit software repository (http://hdl.handle. net/11529/10548045). The data set used for the maize diallel example and corresponding output will be provided on request.

Conflict of Interest
The authors declare that there is no conflict of interest.   Narrow-sense heritability (h 2 ) 0.364 *** Significant at the 0.001 probability level. † D, component of variation due to additive effect of genes; H 1 , component of variation due to dominance effects of genes; H 2 , dominance component indicating asymmetry of positive and negative effects of genes; h 2 , overall mean dominance effect of heterozygous loci; F, relative frequency of dominant and recessive alleles in the parents; E, environmental variation; W r , covariance between families within the ith array and their nonrecurrent parent; V r , the variance among family (F 1 + reciprocal) means within an array; Y r , mean parental value. ‡ Parameter estimates are presented with more accuracy for purposes of illustration only. Fig. 2. Hayman's W r − V r graph for days to anthesis in a 13-by-13 maize diallel across three locations in Kenya plotted using SASHAYDIALL. V r is the variance among family (F 1 + reciprocal) means within an array, and W r is the covariance between families within the ith array and their nonrecurrent parent

Supplemental Material Available
Supplemental material for this article is available online.