Large‐scale GWAS in sorghum reveals common genetic control of grain size among cereals

Summary Grain size is a key yield component of cereal crops and a major quality attribute. It is determined by a genotype’s genetic potential and its capacity to fill the grains. This study aims to dissect the genetic architecture of grain size in sorghum. An integrated genome‐wide association study (GWAS) was conducted using a diversity panel (n = 837) and a BC‐NAM population (n = 1421). To isolate genetic effects associated with genetic potential of grain size, rather than the genotype’s capacity to fill the grains, a treatment of removing half of the panicle was imposed during flowering. Extensive and highly heritable variation in grain size was observed in both populations in 5 field trials, and 81 grain size QTL were identified in subsequent GWAS. These QTL were enriched for orthologues of known grain size genes in rice and maize, and had significant overlap with SNPs associated with grain size in rice and maize, supporting common genetic control of this trait among cereals. Grain size genes with opposite effect on grain number were less likely to overlap with the grain size QTL from this study, indicating the treatment facilitated identification of genetic regions related to the genetic potential of grain size. These results enhance understanding of the genetic architecture of grain size in cereal, and pave the way for exploration of underlying molecular mechanisms and manipulation of this trait in breeding practices.


Summary:
27  Grain size is a key yield component of cereal crops and a major quality attribute. It is 28 determined by a genotype's genetic potential and its capacity to fill the grains.  (Table S1) and a BC-NAM consisting of 30 interrelated families 110 (n=1421) ( Figure S1A; Table S2). The diversity panel has around 225 genotypes in common were planted between November and February, using a row column design with partial 144 replication. Each plot consisted of two 6 m rows with a row spacing of 0.75 m. Differences in 145 plant height within a plot was minimal. Different numbers of genotypes were grown in each 146 trial due to seed availability (Table S3). Standard agronomic practices and pest-control 147 practices were applied.

148
A treatment of removing half of each of two panicles in each plot was imposed when each 149 panicle commenced flowering, before significant grain development had occurred ( Figure   150 1A). Once physiological maturity was reached, the remaining half panicle (hereafter, referred 151 as half head) was hand harvested and threshed using a mechanical threshing machine. In 152 DPGAT16, a full head panicle in each plot was also harvested and threshed for comparison 153 with the half head panicles from the same plot. An aspirator was used to remove any debris 154 from each sample before measurements were taken. Although sorghum grain is typically 155 tending toward spherical, considerable phenotypic variation in length, width, and thinkness 156 does exist. Therefore, grain size parameters, including thousand kernel weight (TKW), and 157 the length, width, thickness, and volume of grains were measured using Seedcount SC5000 158 (Next Instruments, Condell Park, NSW, Australia) and a digital balance.

159
Statistical analysis 160 Due to the large number of genotypes included in this study, partial replication was used in 161 all trial designs (Cullis et al., 2006), with 30 % of the genotypes replicated two or more times 162 and the remaining 70 % represented by single plots. The total number of plots in each trial 163 ranged from 880 to 1521, with the total number of genotypes planted in each trial ranging 164 from 658 to 1164 (Table S3). A customized design was used to minimise spatial error effects 165 within each trial. The concurrence of genotypes and populations across the two seasons 166 allowed the DP and BC-NAM trials to be analysed as two multi-environment trials (METs) 167 for each of the five measured grain size parameters, comprising of two trials for the DP and     Individual SNP markers with >50% missing data were removed from further analysis and the 198 remaining missing values were phased and imputed using Beagle v4.1 (Browning & 199 Browning, 2016). An average imputation accuracy of 96% was achieved across both

251
Eighty-five grain weight QTL were collated from previously published studies (Table S5).

254
To be conservative, QTL with confidence interval > 20 cM were excluded during this step.

255
Candidate gene analysis 256 Candidate genes for grain size in sorghum were identified using the methods described by  (Table S6). Subsequently, 259 corresponding sorghum orthologous of grain size genes in rice and maize were identified 260 using a combination of syntenic and bidirectional best hit (BBH) approaches. Syntenic gene 261 sets among rice, maize and sorghum were downloaded from PGDD 262 (http://chibba.agtec.uga.edu/duplication/) to identify sytenic orthologues of known grain size 263 genes from rice and maize. Local blast was performed to identify best BLAST hits of pairs of 264 genes from two genomes, using BLASTP.

266
Minimizing environmental impact on grain size through the removal of half of the 267 panicle during flowering time 268 In the DPGAT16 trial, a significant increase in average grain size of 8.54% was observed in 269 half heads compared to full heads ( Figure 1B). The magnitude of this increase varied   (Tables 1 and 2). 286 Hence, the combined cross-trial BLUPs were used for subsequent analyses of these 287 parameters. The high cross-trial correlations of the individual grain size parameters were 288 consistent with their medium to high heritability (Table 1 and 2).

289
All five grain size parameters displayed near normal distributions in both populations ( Figure   290 2), which in combination with the medium to high heritability and low G×E observed, To investigate possible correlations between racial groups and grain size parameters in 301 sorghum, structure analysis of the diversity panel was conducted, which revealed 5 groups 302 that corresponded to different racial groups of sorghum ( Figure S4B, Table S4). All grain size 303 parameters showed significant differences among these racial groups (ANOVA, p-value 304 <0.05), with representatives of the caudatum and guinea racial groups having the largest and 305 heaviest grains and east-African durras the smallest and lightest (Table S7).  Figure S4A). Thus, a 1cM window was 311 used to cluster these 71 SNPs into 54 QTL. The majority (>80%) of these 54 QTL were 312 significantly associated with multiple grain size parameters (Table S9) Table S10). Because the extent of LD was greater 317 in the BC-NAM than in the diversity panel ( Figure S1C), a 2 cM window was used to cluster 318 these associated SNPs into 66 QTL regions. Similar to observations for the diversity panel, 319 the vast majority (~85%) of the QTL were significantly associated with multiple grain size 320 parameters and phenotypic variation explained by individual QTL was generally small, with 321 the majority of QTL explaining < 5% of the phenotypic variation of a given grain size 322 parameter (Table S11). Within the BC-NAM, it was possible to observe the distribution of 323 allelic effects of the exotic parents compared to the recurrent parent ( Figure 5). In general, 324 alleles were observed that were both larger and smaller than the elite parent, which was 325 supportive of the presence of multiple alleles at the majority of QTL.  (Table S12).

334
The genetic basis of grain size has been the focus of 18 previous studies in sorghum, which 335 used 21 bi-parental populations and reported 85 grain weight QTL (Table S5). Nearly three 336 quarters (62) of these previously reported QTL co-located with QTL identified in this study.  (Table S13).  (Table S6; Table S14). The candidate genes were enriched in the grain size QTL; 348 16 of the 111 candidate genes were found within a 0.1 cM window of a grain size QTL (p-349 value<0.05, χ 2 test), and 36 identified within a 1 cM window (p-value<0.05, χ 2 test) ( Table   350 S15). Some of the negative correlations between grain number and grain size are likely due to 351 variation in the genotypes' capacity to fill the grain as observed in this study. Given the 352 treatment imposed in the current study to minimise variation due to grain filling capacity, we 353 would expect that the previously reported candidate genes that have been shown to exert 354 opposite effects on grain size and grain number would be less represented within our set of 355 QTL. To explore this hypothesis, the candidate genes were divided into 3 groups based on 356 whether they had been reported to exert opposite effects on grain number and grain size in the 357 species in which they were cloned (ie whether the gene was associated with bigger grains and 358 lower grain number). This approach identified 21 genes that showed opposite effects on grain 359 size and grain number, 37 genes that did not show opposite effects, and 53 genes where this 360 information was not reported (Table S15). It was found that the candidate genes that were 361 reported to have an opposite effect on grain number and grain size were underrepresented in 362 the overlap with the QTL from this study (Table S15).

382
Grain size is one of the most critical traits of cereal crops due to its direct contribution as a 383 yield component, its importance as a quality attribute, and its contribution to fitness 384 stemming from its impact on reproductive rates, emergence and establishment (Tao et al.,

546
The authors declare that the research was conducted in the absence of any commercial or 547 financial relationships that could be construed as a potential conflict of interest.

721
Chr indicates the chromosome on which the QTL is located. P-start provides the physical coordinate, 722 based on v3.1.1 of the sorghum genome assembly, of the start of the QTL CI. P-end provides the  Correspondence between numbers on x-axis and QTL identity is provided in Table S16.