A sorghum NAC gene is associated with variation in biomass properties and yield potential

Abstract Sorghum bicolor is a C4 grass widely cultivated for grain, forage, sugar, and biomass. The sorghum Dry Stalk (D) locus controls a qualitative difference between juicy green (dd) and dry white (D‐) stalks and midribs, and co‐localizes with a quantitative trait locus for sugar yield. Here, we apply fine‐mapping and genome‐wide association study (GWAS) to identify a candidate gene underlying D, and use nearly isogenic lines (NILs) to characterize the transcriptional, compositional, and agronomic effects of variation at the D locus. The D locus was fine‐mapped to a 36 kb interval containing four genes. One of these genes is a NAC transcription factor that contains a stop codon in the NAC domain in the recessive (dd) parent. Allelic variation at D affects grain yield, sugar yield, and biomass composition in NILs. Green midrib (dd) NILs show reductions in lignin in stalk tissue and produce higher sugar and grain yields under well‐watered field conditions. Increased yield potential in dd NILs is associated with increased stalk mass and moisture, higher biomass digestibility, and an extended period of grain filling. Transcriptome profiling of midrib tissue at the 4–6 leaf stages, when NILs first become phenotypically distinct, reveals that dd NILs have increased expression of a miniature zinc finger (MIF) gene. MIF genes dimerize with and suppress zinc finger homeodomain (ZF‐HD) transcription factors, and a ZF‐HD gene is associated with midrib color variation in a GWAS analysis across 1,694 diverse sorghum inbreds. A premature stop codon in a NAC gene is the most likely candidate polymorphism underlying the sorghum D locus. More detailed understanding of the sorghum D locus could help improve agronomic potential in cereals.

Vegetative biomass properties are largely determined by the architecture of the vascular system, xylem, and phloem, which provide mechanical support, allow water and nutrient acquisition, and mediate transport of photoassimilates and signaling molecules from source to sink tissues (Bihmidine, Hunter, Johns, Koch, & Braun, 2013). Grass leaves and inflorescences are major source and sink organs, and the genetic control of their architecture has been intensely studied (Kellogg, 2007;. However, the architecture of the vascular system also plays a crucial role in determining cereal grain yields. During grain filling, continued acquisition of water, sugars, and nitrogen is accompanied by re-mobilization of transient storage reserves, and stalk strength must be maintained throughout senescence to protect the drying grain from lodging (Peiffer et al., 2013). The genetic control of vascular system architecture has been studied primarily in model systems (Handakumbura & Hazen, 2012), where genetic effects on agronomic performance may not be easily detected.
The large family of NAC transcription factors (TFs), named after founding members NAM, ATAF, and CUC (Fang, You, Xie, Xie, & Xiong, 2008), plays a critical role in plant vascular development (Ko, Jeon, Kim, Kim, & Han, 2014;Nakano, Yamaguchi, Endo, Rejab, & Ohtani, 2015). A subset known as the secondary wall NACs (SWNs) act as master regulators of secondary cell wall development in vascular tissues (Handakumbura & Hazen, 2012;Zhong, Demura, & Ye, 2006). Conservation of SWN function in moss hydroids and stereids, which, respectively, conduct water and lend mechanical support to the moss gametophyte, provides evidence that these structures are homologous to sporophytic vascular systems in higher plants (Xu et al., 2014). SWNs activate a subset of MYB transcription factors that contain secondary wall NAC binding elements (SNBEs) in their promoters, inducing secondary cell wall thickening and deposition of cellulose, hemicellulose and lignin (Zhong, Lee, & Ye, 2010). NAC TFs also act downstream of MYB TFs. During sieve element maturation, for example, the Altered Phloem Development (APL) gene both drives the expression of NAC045 and NAC086 and is itself regulated by NAC020 (Bonke, Thitamadee, Mahonen, Hauser, & Helariutta, 2003;Furuta et al., 2014). The extreme complexity of the secondary cell wall gene regulatory network in Arabidopsis is driven by functional redundancy, feed-back and feed-forward loops, and combinatorial control enabling functional fine-tuning (Taylor-Teeples et al., 2014).
The Dry Stalk (D) locus in sorghum conditions a difference between dry, pithy white stems and midribs (D−), and juicy green stems and midribs (dd) (Smith & Frederiksen, 2000). A previous genome-wide association study (GWAS) in sweet sorghum mapped large-effect QTL for sugar yield, juice volume, and stalk moisture to the D locus (Burks, Kaiser, Hawkins, & Brown, 2015). In this study, we fine-map the D locus to a 36 kb region, identify a premature stop codon in a NAC transcription factor as a candidate polymorphism, and show that nearly isogenic lines (NILs) segregating for the premature stop codon differ significantly in sugar yield, grain yield,

| Field experiments
For sugar yield measurements, paired rows of NILs were planted in six replicates at a single location in Urbana, IL in summer 2014, and phenotyped as previously described (Burks et al., 2015). For grain and stalk dry matter and moisture measurements, paired 2-row plots of NILs were planted in four replicates at each of two locations (Urbana and Savoy, IL), and samples were pooled from 2-6 individual representative plants per row for each time point. Immature grain weight was estimated by clipping individual panicle branches from the rachis, stalk weight was measured after stripping off leaf blades, and moisture was estimated by weighing samples before and after oven-drying at 45°C for 5 days. Midrib color in the GWAS panel was phenotyped as a binary trait (green vs white, Supporting information Table S7) in the youngest leaf at~45 days after planting. All field experiments were machine planted in rows 10′ long with 30″ row spacing. Seeds were treated with Apron fungicide (Syngenta, USA) and Concep II seed safener (Syngenta, USA) before planting, and weeds were controlled through use of a pre-emergent herbicide (Bicep). All raw agronomic data (Supporting information Tables S5 and S8) and raw compositional data (Supporting information

| Fine-mapping and GWAS
Two MITE-based indel markers (Supporting information Table S10) were used to screen 1,132 BC 3 and BC 2 F 2 individuals. Subsequent genotyping-by-sequencing was performed in both putative recombinants and a GWAS panel of 1,624 diverse sorghum accessions using the two-enzyme GBS protocol with PstI-HF and HinP1I enzymes (Poland, Brown, Sorrells, & Jannink, 2012), followed by alignment to v3 of the reference genome (www.phytozome.org) using bowtie2 (Langmead & Salzberg, 2012) and SNP calling using the TASSEL5 GBSv2 pipeline (Glaubitz et al., 2014). Imputation of biparental recombinants was performed using FSFHap (Swarts et al., 2014) in TASSEL, and imputation of the GWAS panel was performed using Beagle4 (Browning & Browning, 2007). Midrib color GWAS was performed using the mixed linear model implemented in GAPIT, using model selection to choose the optimal number of principal components, which was zero. A minor allele frequency cutoff of 5% resulted in 50,899 SNPs for testing, and only associations significant at a false discovery rate of less than 5% (q < 0.05) are reported.
GBS data for this project have been deposited at the Illinois Data Bank (https://databank.illinois.edu/).

| Lignocellulosic compositional analysis
Lyophilized tissue samples were ground to a fine powder using three 5 mm metal balls in 2 ml plastic tubes (Retsch Ball mill, 2 times at 25 Hz for 2.5 min) and washed sequentially with 70% ethanol, 1:1 (v: Neutral sugars were separated via a CarboPac PA20 column, while a CarboPac PA200 was used to separate uronic acids. Three distinct programs were used to resolve the sugars of interest. Samples were run at a flow rate of 0.4 ml/min and gradients consisted of (a) 2 mM NaOH for 20 min followed by a 5 min 100 mM flush and subsequent 5 min at 2 mM (neutral sugar separation 1; excludes xylose and mannose); (b) 18 mM NaOH for 15 min followed by a 5 min 100 mM flush and subsequent 7 min at 18 mM (neutral sugar separation 2; excludes rhamnose and arabinose); (c) 0.1 M NaOH with a gradient of 50-200 mM sodium acetate from 0 to 10 min followed by a 2 min 200 mM sodium acetate flush returning to 50 mM for 2.9 min (uronic acid separation). Crystalline cellulose was measured using the Updegraf method and the released glucose was measured using the anthrone assay (Updegraff, 1969;Laurentin & Edwards, 2003). Lignin content and lignin composition were measured using the ultra-violet acetyl bromide lignin method and a thioacidolysis procedure, respectively, according to Foster, Martin and Pauly (2010). Saccharification yield of cell wall materials was determined after enzymatic treatment.
In brief, 1 mg of destached cell wall was incubated with 0.5 μl Accellerase 1,500 enzyme mix (Gencor) in 1 ml of 50 mM citrate buffer (pH 4.5), shaking at 250 rpm at 50°C for 20 hr. Solubilized glucose and xylose were detected on a Bio-Rad HPX-87H, 300 × 7.8 mm column in an Shimadzu UFLC chromatography system. The elution profile encompassed 0.01 N sulfuric acid in 15 min at 0.6 ml/min and column temperature is 50°C.

| Gene expression analysis
Total RNA was extracted from greenhouse-grown midrib tissue of the youngest fully expanded leaf of DD and dd NILs at the four-leaf and six-leaf stages using Spectrum ™ Plant Total RNA Kit (Sigma-Aldrich, USA), following the manufacturer's protocol. All RNA samples were digested with DNase I (New England Biolabs, USA), and rtPCR was performed using M-MuLV reverse transcriptase (New England Biolabs, USA). RNAseq libraries were prepared and sequenced at Roy J. Carver Biotechnology Center at the University of Illinois using single-end, 100 bp reads on a HiSeq2500 instrument. Reads were processed using HISAT2 and StringTie, and the stattest function in the R package "Ballgown" was used to test for differential gene expression between DD and dd NILs using 3 biological replicates at each developmental stage, controlling the false discovery rate at 0.05. The R package "WGCNA" was used to construct co-expression modules using the arguments power = 8, minModuleSize = 20, and mergeCutHeight = 0.05. RNAseq reads for this project have been deposited at the Illinois Data Bank (https://databank.illinois.edu/).  becoming wider and more distinct in successive leaves (Figure 1a).

| Creation and characterization of nearly isogenic lines for the D locus
At anthesis, DD stalks are visibly drier and pithier than dd stalks (Figure 1b). All sampling times and tissues presented in this study are summarized in Supporting information Table S1.  Table S3). This interval contains four predicted genes, of which two are expressed in midrib tissue at either the fourth-or sixth-leaf stage (Supporting information Table S4): a NAC transcription factor (Sobic.006G147400) and a threonine aldolase (Sobic.006G147600). The two most significant hits in the GWAS analysis are the two closest flanking SNPs to the NAC gene. Moreover, the NAC gene is the only gene in the interval with a different annotated exon-intron structure compared to its closest homologs in other cereals, which is relevant because the sorghum reference genome is derived from Tx623, a recessive dd mutant. Reverse-transcriptase PCR and cDNA sequencing confirms that the annotated first intron in Sobic.006G147400 does not exist, and that a T/C SNP in this region produces a stop codon in the d allele from Tx623 but not the contrasting D allele (Figure 2c). The premature stop codon in Tx623 eliminates most of the NAC domain.

| Phylogeny and identification of homologs
The full-length NAC protein, derived from an allele of Sobic.006G147400 lacking the premature stop codon, was used to search for homologous proteins in four grasses (Oryza sativa subsp. japonica, Sorghum bicolor, Zea mays, Setaria italica) and two dicots (Arabidopsis thaliana and Glycine max), which were aligned using MUSCLE in MEGA7 and used to construct a neighbor-joining tree using UPGMA clustering (Figure 3). Syntenic regions on Oryza chromosome 4 and Setaria chromosome 7 each contain single orthologous copies of our sorghum NAC candidate, and co-syntenic regions on Zea chromosomes 2 and 10 each contain intact homeologues derived from segmental duplication. These proteins are part of the NAC1 sub-clade (Peng et al., 2015), and contain five conserved motifs in the N-terminal NAC domains whereas the C-termini are highly variable (Supporting information Figure S1). The closest Arabidopsis homolog is NAC074 (At4G28530), and we hereafter refer to the three co-orthologous grass clades as NAC074a, NAC074b, and NAC074c, and to Sobic.006G147400 as SbNAC074a.

| dd NILs show increases in stalk dry matter, soluble sugar yield, and grain yield
To confirm the previously-reported association between the D locus and sugar yield, we grew NILs in six replications of paired rows, extracted total sugar from 1 meter (m) of row 30 days after anthesis  Table S5). Sugar yield per meter of row (g/m; Figure 4a) is a function of juice volume (ml/m; Figure 4b) and the concentration of simple sugars in the juice (degrees Brix; Figure 4c), and we observe that increased yield in dd NILs is driven by increased juice volume, which in turn is associated with higher vegetative weight (kg/m; Figure 4d). Given the dramatic effect of the D locus on sugar yield, we next conducted a time series experiment to quantify its effects on the rate and extent of grain filling. Paired, two-row plots of NILs were grown in four replicates at each of two locations, and dry matter and moisture of stalk and grain were monitored at 2-week intervals beginning at anthesis.
Overall, this grain filling period is characterized by movement of dry matter from the stalk to the grain, and by decreases in grain moisture relative to stalk moisture ( Figure 5). Stalk dry matter is significantly higher in dd NILs at all stages, most notably at 4 weeks after anthesis when it is >58% higher (p = 0.004; Figure 5a). dd NILs also have higher grain moisture at 4 weeks after anthesis (p = 0.042; Figure 5d), and higher grain yield at 6 weeks after anthesis (p = 0.025; Figure 5b).

| Compositional differences drive increased biomass digestibility of dd NILs
Nearly-isogenic lines stalk tissue was subjected to detailed lignocellulosic compositional analysis at two growth stages: developing internodes from a 1 cm plug of tissue immediately below the  shoot apical meristem at 6 weeks after planting (WAP), and the third internode below the inflorescence at 9 WAP, which coincided with the boot stage shortly before flowering. While tissues show relatively few compositional differences between NILs at 6 WAP, by 9 WAP dd NILs show reduced lignin (p = 0.007; Figure 6a) and increased glucose release following enzymatic saccharification (p = 0.049; Figure 6b), while their decrease in crystalline cellulose is not significant at p < 0.05 (p = 0.055; Figure 6c).
dd NILs also show a lower ratio of syringyl to guaiacyl lignin monomers at 9 WAP (p = 0.029), and differences in acid-hydrolyzed lignocellulosic monosaccharides at both stages (Supporting information Figure S2).

| A miniature zinc finger (MIF) gene is upregulated in dd NILs
Transcriptome profiling was performed on mRNA from NIL midrib tissue at the four-leaf and six-leaf stages, just before and during the first appearance of phenotypic differences between the NILs (Figure 1a). Three independent biological replicates were obtained from pooled tissue from three separate greenhouse plantings, for a total of 12 samples across the two NILs and two stages. Co-expression analysis using WGCNA's step by step network construction and module detection (Langfelder & Horvath, 2008) shows that  Table S6) that includes SbNAC074c but not SbNAC074b as well as two NACs associated with xylem development in Arabidopsis, Xylem NAC Domain 1 (XND1) and NAC075 Zhao, Avci, Grant, Haigler, & Beers, 2007). However, using the HiSAT-Stringtie-Ballgown pipeline (Pertea, Kim, Pertea, Leek, & Salzberg, 2016), only a single gene (Sobic.008G020700) is differentially expressed between DD and dd NILs (q < 0.01), showing no expression in DD NILs and mean expression of 21 and 35 FPKM in dd NIL midribs at the four-leaf and six-leaf stages, respectively. Sobic.008G020700 is annotated as a MIF gene, which are seed plant-specific, truncated versions of ZF-HD transcription factors (Hu & Ma, 2006) that dimerize with ZF-HDs and suppress their transcriptional activation activity (Hong, Kim, Kim, Yang, & Park, 2011). Intriguingly, the only significant GWAS hit for midrib color other than our candidate NAC gene falls near a ZF-HD transcription factor on chromosome 1 (Supporting information Table S2).
However, this ZF-HD transcription factor (Sobic.001G112500) shows no expression in any of the DD or dd samples.

| DISCUSSION
In this study, we fine-map the sorghum D locus to a four-gene interval that includes two genes expressed in midrib tissue: a NAC transcription factor with a premature stop codon in the dd sorghum reference genome (Sobic.006G147400), and a threonine aldolase (Sobic.006G147600). Threonine aldolase controls the catabolism of threonine to glycine and acetaldehyde, and Arabidopsis homologs THA1 and THA2 are both expressed in vascular tissue. tha1-2 mutants show dramatic increases in seed threonine content, whereas tha2-1 mutants have a lethal albino seedling phenotype (Joshi, Laubengayer, Schauer, Fernie, & Jander, 2006 (Zhao, 2005), and is one of many  (Zhao et al., 2007).The authors suggest that XND1 may promote vessel elongation by repressing their terminal differentiation. Overexpression of NAC075 results in ectopic formation of xylem vessel elements  and rescues the pendent stem phenotype of nst1-nst3 double mutants, which results from complete loss of secondary cell wall deposition in xylem fibers.
Lignocellulosic compositional data are consistent with a role for