OsSYL2 AA, an allele identified by gene‐based association, increases style length in rice (Oryza sativa L.)

SUMMARY Stigma characteristics are important factors affecting the seed yield of hybrid rice per unit area. Natural variation of stigma characteristics has been reported in rice, but the genetic basis for this variation is largely unknown. We performed a genome‐wide association study on three stigma characteristics in six environments using 1.3 million single‐nucleotide polymorphism (SNPs) characterized in 353 diverse accessions of Oryza sativa. An abundance of phenotypic variation was present in the three stigma characteristics of these collections. We identified four significant SNPs associated with stigma length, 20 SNPs with style length (SYL), and 17 SNPs with the sum of stigma and style length, which were detected repeatedly in more than four environments. Of these SNPs, 28 were novel. We identified two causal gene loci for SYL, OsSYL3 and OsSYL2; OsSYL3 was co‐localized with the grain size gene GS3. The SYL of accessions carrying allele OsSYL3AA was significantly longer than that of those carrying allele OsSYL3CC. We also demonstrated that the outcrossing rate of female parents carrying allele OsSYL2AA increased by 5.71% compared with that of the isogenic line carrying allele OsSYL2CC in an F1 hybrid seed production field. The allele frequencies of OsSYL3AA and OsSYL2AA decreased gradually with an increase in latitude in the Northern Hemisphere. Our results should facilitate the improvement in stigma characteristics of parents of hybrid rice.


INTRODUCTION
Rice is cultivated on approximately 160 million hectares of land distributed across 117 countries, with an average annual yield of 4.4 tonnes per hectare (GRiSP (Global Rice Science Partnership), 2013), and feeds more than 3.5 billion people worldwide. As the human population increases and the area of arable land decreases, increasing the rice grain yield per unit area per unit time is a necessity. The utilization of F 1 heterosis is an effective strategy to enhance rice grain yield; however, this entails the production of F 1 hybrid seeds every year. The yield of F 1 hybrid rice seeds is determined by the outcrossing seed setting rate for a given number of spikelets per unit area. The outcrossing seed setting rate in rice is mainly affected by the stigma exertion percentage, which is largely determined by stigma length (STL), style length (SYL), and the sum of stigma and style length (TSSL) (Figure 1a).
Hybrid rice has been cultivated commercially in China for 43 years. In the past 20 years, the average seed production yield has remained stagnant at 2.25 t ha À1 . Increasing the TSSL, STL, and SYL to improve the rate of stigma exertion will be beneficial to increase the seed production yield of hybrid rice. In this study, we investigated the TSSL, STL, and SYL of 353 rice accessions at two locations over 3 years. By combining the TSSL, STL, and SYL with singlenucleotide polymorphism (SNP) data, we performed a genome-wide association study (GWAS) and identified significant SNP loci. Further, using the method of gene-based association we identified a novel causative gene OsSYL2 for SYL. The function of OsSYL2 AA was validated by transgenic complementation testing and evaluation of F 1 seed production in the paddy field. These results filled in the gap from gene cloning and functional analysis of stigma characteristics. The gene-based association method used in this study resulted in the ability to access the causative gene/SNP for OsSYL2. The favorable alleles identified herein could facilitate improvement in stigma characteristics of the parents of hybrid rice.

Phenotypic variation of the stigma characteristics in the natural population
According to the definitions of the three stigma characteristics (STL, SYL, and TSSL) (Figure 1a), the phenotypic values of STL, SYL, and TSSL were investigated in 353 accessions containing indica and japonica subspecies ( Table S1 in the online Supporting Information). Partial photos of pistils are shown in Figure 1(b) and (c). The distributions of global averages over the six environments for STL, SYL, and TSSL in each subspecies are shown in Figure 1(d). In the 353 accessions, the mean value of STL was calculated per environment, ranging from 1.16 AE 0.16 mm to 1.19 AE 0.18 mm, with coefficients of variation (CV) across the six environments from 13.79% to 15.38% ( Figure S1, Table S2); the mean value of SYL was calculated per environment, ranging from 0.60 AE 0.11 mm to 0.63 AE 0.43 mm, with CV from 18.03% to 19.35% ( Figure S1, Table S2); and the mean value of TSSL was calculated per environment, ranging from 1.77 AE 0.22 mm to 1.80 AE 0.24 mm ( Figure S1, Table S2), with CV from 12.43% to 13.33%. The average CV of TSSL across the six environments was 12.96%, which was lower than that for STL (14.67%) and SYL (18.91%) (Table S2). These results indicate that there was abundant phenotypic variation in the natural populations studied. Compared with indica rice, the japonica rice population had lower values for the three stigma characteristics ( Figure 1d). The results of joint analysis of variance for each trait showed significant differences among genotypes, but not among the environments and the interactions of genotypes with environments (Table S3), indicating that the environment had little effect on the stigma characteristics and the abundant phenotypic variation of the stigma characteristics was mainly attributable to variation in genotype.

Genomic variation at the SNP level in the 353 rice accessions
A total of 1.90 billion paired-end reads of length 150 bp were obtained from 353 rice accessions resequenced using the Illumina resequencing platform, with an average coverage depth of 4.36-fold for each accession. After mapping against the Nipponbare reference genome sequence, we identified 1 326 094 SNPs after excluding the sites with missing data for more than 18% of all the accessions. We observed 463 740 SNPs in the genic regions: 48 054 synonymous, 52 283 non-synonymous, 270 622 intronic, 62 181 3 0 -untranslated region (UTR), and 30 600 5 0 -UTR SNPs.
Based on the SNP data, the SNP density and nucleotide diversity (p) showed great variation along chromosomes ( Figure S2a,b). Some chromosome regions of the indica and japonica groups had high values of F ST (the population differentiation index, also called the fixation index; F ST > 0.5), including a total length of 1.6 Mb in japonica rice and 0.7 Mb in indica rice, indicating that they contain gene loci that may be involved in geographic adaptation ( Figure S2c). The F ST value between the indica group and japonica group was 0.52. These results suggest a high level of genetic differentiation between indica and japonica rice. Within cultivars, the p level of indica rice (6.75 9 10 À4 ) was higher than that of japonica rice (4.34 9 10 À4 ). These results indicate the presence of rich genomic diversity at the SNP level among the 353 accessions.

Population genetic structure and linkage disequilibrium
The Bayesian model-based population structure analysis provided evidence that there is a significant population structure in the 353 rice accessions. As the log-likelihood values increased with an increase in the K value (Figure S3a), we used the DK value to determine a suitable number of subgroups, K. The DK value was highest at K = 2 ( Figure S3b). Therefore, the entire population was divided into two subgroups, named the indica subgroup and japonica subgroup ( Figure S3c). We defined an accession as a non-admixed accession when its Q value was larger than 0.85. The number of non-admixed accessions in the indica and japonica subgroups was 166 and 165, respectively, and the remaining 22 accessions were assigned to the admixture group. The results of the population structure analysis based on the Bayesian model were further confirmed by principal component analysis ( Figure S3d) and neighbor-joining tree analysis based on Nei's genetic distances (Nei et al., 1983) (Figures S3e  and S4).
We further analyzed linkage disequilibrium (LD; expressed as r 2 ) in the whole rice population, the indica subgroup, and the japonica subgroup. The extent of LD was measured by the chromosomal distance at which r 2 decreased to half its maximum value. The LD decay distances in the whole rice population, the indica subgroup, and the japonica subgroup were 177 kb (r 2 = 0.26), 57 kb (r 2 = 0.26), and 214 kb (r 2 = 0.28), respectively ( Figure S3f). We believe that the faster LD decay in the indica rice subgroup than in the japonica rice subgroup may be attributable to the frequent artificial crossing of indica rice during the breeding process, because two to three crops of indica rice are grown per year in areas where the latitude is below 30°N, whereas only one crop of japonica rice is grown per year in area were the latitude is above 30°N. The LD decay distance of the indica rice subgroup was slightly lower than that reported by Huang et al. (2010) (123 kb). The LD decay distance of the japonica rice subgroup was slightly higher than that reported by Huang et al. (2010) (167 kb).

Genome-wide association mapping
Using the mixed linear model with correction of kinship bias, we conducted GWAS between stigma characteristics and SNPs [minimum allele frequency (MAF) > 0.05] in the 353 rice accessions. In this population, 41 significantly associated SNP loci were detected in the 34 LD regions (Table 1). These SNPs were located on chromosomes 1-4, 6, 7, 9, 10, and 12. In addition, these significantly associated SNP loci were repeatedly detected in at least four environments, which showed that the SNP-trait associations were stable (Table S4).
For TSSL, 17 significant SNP loci were identified on chromosomes 3, 6, 9, and 12, with R 2 from 3.68% to 5.33% (Table 1, Figure S7). Four of these SNP loci were detected across six environments and five loci were detected in five environments ( Figure S7, Table S4). The SNP locus (16 733 441 bp) was associated with both SYL and TSSL traits simultaneously. Next, we analyzed the major SNP loci relevant to SYL with a significant peak, present in chromosomes 3 and 2, respectively.

Allele OsSYL3 AA increases style length
For the association signal (chromosome 3: 16 733 441) in the 16.69-16.87 Mb region, there were 15 candidate genes associated with the significant SNP loci (Figure 2a,b). In this region, 9 of the 15 genes contained non-synonymous SNPs (Tables S5 and S6). Only one non-synonymous SNP in Os03g0407400 was found to be significantly associated with the SYL and TSSL traits (Àlog 10 P ≥ 5.5). Hereafter, the gene Os03g0407400 is referred to as OsSYL3. OsSYL3 was classified into two haplotypes based on three SNPs in its cDNA containing one SNP in an UTR, one in an intron and one missense SNP in the coding region ( Figure 2c). This missense SNP causes a base change from base C to base A at nucleotide (nt) 165 in the coding sequence, resulting in an amino acid change from cysteine (C) to a stop codon. The average TSSL and SYL values of 70 accessions carrying the allele OsSYL3 AA were 1.97 AE 0.23 mm and 0.76 AE 0.28 mm, respectively. The average TSSL and SYL values of 260 accessions carrying the allele OsSYL3 CC were 1.72 AE 0.16 mm and 0.58 AE 0.25 mm, respectively. The differences in TSSL and SYL values between the OsSYL3 AA and OsSYL3 CC genotypes were highly significant (Welch's t-test, P = 2.20 9 10 -16 ) ( Figure 2d).
The quantitative (q) RT-PCR results showed that the expression of OsSYL3 AA was higher than that of OsSYL3 CC in young panicles at differentiation stages 6, 7, and 8, but no significant difference was found at stage 5 (Figures 2e and S8). The expression of OsSYL3 AA was the highest in young panicles at stage 8, of the four stages investigated, whereas the expression of OsSYL3 CC did not peak in panicles among the four stages. We further performed qRT-PCR analysis of pistils at stage 8, sampled from three accessions (Xiangxiandao 10hao, Nongxiang 25, and Fengyouwan 8hao) with longer TSSL and SYL and three accessions (Nipponbare, Huajing 5hao, and Yujing 6hao) with shorter TSSL and SYL. The results showed that the expression of OsSYL3 AA in each of the three accessions with longer TSSL and SYL was significantly higher than that of OsSYL3 CC in each of the three accessions with shorter TSSL and SYL. By searching the China Rice Data Center (http://www.ricedata.cn/gene/) websites, we found that the gene locus Os03g0407400 was identical to Grain Size 3 (GS3) reported by Fan et al. (2006). The allele GS3 AA increased grain length and STL (Takano-Kai et al., 2009. Therefore, we will not study the function of OsSY-L3 AA further.

Introduction of the allele OsSYL2 AA increases SYL
For SNPs associated with SYL, a significant peak appeared in chromosome 2 and 33 candidate genes were detected in the candidate region of 30.45-30.65 Mb (200 kb) (Figure 3a,b). For SNPs in this candidate region, 10 of the 33 genes contain non-synonymous SNPs (Tables S7 and S8). However, only one non-synonymous SNP was significantly associated with SYL (Àlog 10 P ≥ 5.5); it was located within the gene locus Os02g0733900 (MSU ID LOC_Os02g50110). Hereafter, gene Os02g0733900 is referred to as OsSYL2. The full length of OsSYL2 is 602 bp, including one exon and no introns. Gene OsSYL2 encodes an 80-amino-acid protein. No putative conserved domains have been detected for OsSYL2. OsSYL2 was classified into two haplotypes based on three SNPs in its cDNA containing one missense SNP in the coding region and two SNPs in the UTR (Figure 3c). The missense SNP causes a base change from base C to base A at nt 126 in the coding sequence, which results in an amino acid change from histidine (H) to asparagine (N) at amino acid 42 ( Figure 3c). The average SYL value of 26 accessions carrying the allele OsSYL2 AA was 0.77 AE 0.18 mm. The average SYL value of 258 accessions carrying the allele OsSYL2 CC was 0.58 AE 0.21 mm. The difference in SYL value between the OsSYL2 AA and OsSYL2 CC genotypes was highly significant (Welch's t-test, P = 9.06 9 10 -5 ) ( Figure 3d).
The qRT-PCR results showed that the expression of OsSYL2 AA was higher than that of OsSYL2 CC in young panicles at differentiation stage 8, but no significant differences were found at stages 5, 6, and 7 (Figure 3e). We further performed qRT-PCR analysis using pistils at stage 8, sampled from the aforementioned six accessions, and  found that the expression of OsSYL2 AA in each of the three accessions with longer SYL was significantly higher than that of OsSYL2 CC in each of the three accessions with shorter SYL. These results suggested that enhanced expression of OsSYL2 AA might increase SYL. Based on the results of GWAS, no SNP loci found in the promoter region of OsSYL2 were associated with SYL. Further, we searched the website of promoter functional elements (http://bioinformatics.psb.ugent.be/webtools/plantca re/html/#opennewwindow), but found no SNP loci in the cis-element regulatory region. We speculated that phenotypic variation between the accessions with the AA allele and those with the CC allele was caused by SNP loci in the coding sequence region. Thus, we conducted transformation of OsSYL2 gene to confirm our speculation.
We introduced the genome sequence of the allele OsSYL2 AA and empty vector into Nipponbare, respectively. Compared with the plants of Nipponbare genome, plants transformed with the allele OsSYL2 AA had a longer SYL whereas those transformed with the empty vector showed no phenotypic change (Figure 3f,g). These results showed that OsSYL2 was the causal gene for SYL on chromosome 2.
To further evaluate the potential of the OsSYL2 AA allele in hybrid rice seed production, we performed a field experiment using the two combinations, Nipponbare (OsSYL2 CC ) 9 purple rice accession and Nipponbare (OsSYL2 AA ) 9 purple rice accession. The potential of OsSYL2 AA for hybrid rice seed production was evaluated by calculating the percentage of purple seedlings in the germination experiment with the two F 1 populations. After investigating the frequency of purple seedlings, we found that in the combination of Nipponbare (OsSYL2 CC ) 9 purple rice, the percentage of purple seedlings, which represents the outcrossing seed setting rate, was 9.47% (Figure 4). In the combination of Nipponbare (OsSYL2 AA ) 9 purple rice, the outcrossing seed setting rate was 15.18%, an extra 5.71% compared with the former combination.

Allele frequency distribution and stigma characteristic performance
To elucidate the allele types of OsSYL2 and OsSYL3 loci in Oryza rufipogon and Oryza nivara, we sequenced eight accessions of O. rufipogon and four accessions of O. nivara at 5.64-fold coverage. The information for 12 wild rice accessions is presented in Figure 5 and Table S9. The sequencing results showed that the alleles of both OsSYL2 and OsSYL3 loci were CC in both O. rufipogon and O. nivara, which was similar to OsSYL2 and OsSYL3 loci in japonica rice. We investigated the stigma characteristics of wild rice and found that STL was longer than SYL. The average SYL of japonica rice was 0.62 AE 0.12 mm, which was not significantly different from that of O. rufipogon (0.50 AE 0.08 mm) and O. nivara (0.53 AE 0.07 mm) (Figure 5). Therefore, we defined the CC alleles as wild type and the others as mutant alleles.
We investigated the regional differentiation of diverse alleles on OsSYL2 and OsSYL3 gene loci. The OsSYL2 AA mutant allele (longer SYL) was found to be mainly distributed in accessions collected from low-latitude regions, such as southern and central China, which is consistent with the grain length distribution of accessions collected in these areas ( Figure S9). A similar situation was observed for OsSYL3, in which the mutated allele OsSYL3 AA was mainly distributed in accessions collected from southern China and southeastern Asia ( Figure S9). These results suggested that the longer SYL accession with a mutated allele was naturally selected during the process of domesticating indica rice and that the shorter SYL accession with a wild-type allele was naturally retained during the domestication of japonica rice.

DISCUSSION
In this study we investigated the phenotypic data for three stigma characteristics in 353 rice accessions and identified a rich variety of phenotypic variances. The CV for the three traits ranged from 12.43% (TSSL in E4) to 19.35% (SYL in E2, E3, and E6) ( Table S2). The results of a joint variance analysis showed that variations in these three stigma characteristics were the main contribution to diverse genotypes, and no interactions between genotypes and environments were detected. These facts mentioned above provide a basis for the discovery of the favorable alleles of stigma characteristics. We also investigated the grain length of 353 rice accessions and calculated the Pearson's correlation coefficients (Table S9). The results showed that TSSL and SYL are positively correlated with grain length, which is consistent with the results reported by Virmani and Athwal (1973) and Zhou et al. (2017). In the comparison of stigma characteristics between cultivated rice plants and the wild rice plants O. rufipogon and O. nivara, we found that TSSL in the wild rice plants was longer than that in the cultivated rice plants ( Figure 5, Table S2), which provided a partial explanation for the higher outcrossing rate of wild rice than the cultivated rice plants. Therefore, increasing STL or SYL, or both, to lengthen TSSL is an effective strategy to enhance the outcrossing rate in F 1 hybrid rice seed production. We detected 41 SNP loci that were significantly associated with stigma characteristics; these were located in 34 LD regions (Table 1). Based on the information in the Gramene website (http://www.gramene.org/markers/) and the China Rice Data Center database (http://www.ricedata.c n/gene/), local LD regions harboring the 13 associated SNP sites were overlapped with the flanking regions of two QTLs (qSYL3 and qSYL6) and two genes (GL7 and GS3) reported previously (Uga et al., 2003;Fan et al., 2006;Wang et al., 2015;Zhou et al., 2017) (Table S10), and the remaining 28 associated SNP loci were newly identified in this study.
We identified two GWAS signals significantly associated with SYL to nearly single-gene resolution. Gene OsSYL3 coincided with the location of the grain size gene GS3. Takano-Kai et al. (2011) reported that GS3 could also increase STL. Zhou et al. (2017) confirmed that GS3 had a large effect on SYL and no significant effect on STL. In the present study we further confirmed that OsSYL3 (GS3) could regulate SYL. The allele OsSYL3 AA increased SYL and the allele OsSYL3 CC decreased SYL.
Gene OsSYL2 is a gene newly identified in this study. The full length of OsSYL2 is 602 bp, including one exon and no introns. Gene OsSYL2 encodes an 80-amino-acid protein. We have demonstrated that a base C-to-A non-synonymous mutation at nt 126 in the coding sequence of OsSYL2 caused the long SYL phenotype by qRT-PCR, the complementation test and evaluation of F 1 seed production potential in the paddy field. By searching the STRING website, we found that 10 proteins interacted with OsSYL2 (http://string-db.org/cgi/ network.pl?taskId=jf8rY202mAHU) ( Figure S10). Of these, five are jasmonate ZIM-domain (JAZ) proteins participating in jasmonic acid (JA) signal transduction, one is a member of the basic helix-loop-helix transcription factor family, one is a member of the protein kinase superfamily, and the other three are uncharacterized proteins. Considering that no putative conserved domains have been detected for OsSYL2, we speculated that the OsSYL2 gene may regulate SYL through participation in the JA signal transduction pathway. Yang et al. (2012) reported that there was an antagonistic effect between JA and gibberellic acid (GA). Gibberellic acid causes elongation of the cells, lengthening SYL, as reported by Zhang (2020). Therefore, we speculated that the OsSYL2 CC protein (containing histidine) interacted with the JAZ protein to inhibit GA, resulting in a shorter SYL. The OsSYL2 AA protein, containing asparagine, interacted with JAZ protein weakly or not at all, allowing us to deduce that the inhibition of GA was weakened or relieved and SYL was lengthened. These results provide the basis for the further functional research into the OsSYL2 gene.
In this study we found that neither the OsSYL2 AA allele nor the OsSYL3 AA allele, which increase SYL, are found in wild rice. This suggests that the genetic factors driving TSSL length and increased outcrossing rate in wild species are different from those mapped here. We speculated that the OsSYL2 AA allele or the OsSYL3 AA allele were naturally occurring mutations and were selected for during rice domestication.
We also found that SYL from indica rice (at low latitude) was longer than that of temperate japonica rice (at high latitude). Collectively, we inferred that there may be selective hitchhiking between SYL and rough rice grain length because the latter became shorter from indica to temperate japonica rice, which is a consequence of artificial selection. Thus, the accessions with the two alleles, OsSYL2 AA and OsSYL3 AA , can be used to increase SYL in the maintainer lines (pollen parents used for multiplying the cytoplasmic male sterile lines) of hybrid japonica rice by crossing and marker-assisted selection breeding methods.

EXPERIMENTAL PROCEDURES
Accession sampling, field planting, and phenotypic identification Accession sampling. To reflect geographical distribution (different latitudes) and phenotypic differences in stigma characteristics, we selected the 353 rice accessions, including 134 landraces from China, 206 modern improved cultivars, and 13 javanica accessions. The 206 modern improved cultivars were from China (170), Vietnam (21), Japan (7), the Philippines (6), Indonesia (1), and Malaysia (1). The 13 javanica accessions were from Indonesia (9) and China (4). Information regarding the accessions, including the variety name, country of origin, latitude, and longitude, is listed in Table S1.
Field planting. The seeds of 353 germplasms were collected, stored, and supplied by the State Key Laboratory of Crop Genetics and Germplasm Enhancement at Nanjing Agricultural University. All 353 accessions were grown during the normal season (May to October) across six different environments, over 3 years (2014)(2015)(2016) and two locations. The two locations were Nanjing (32°07 0 N, 118°64 0 E) in Jiangsu province and Yuanyang (35°05 0 N, 113°96 0 E) in Henan province. In each environment, the field trials were conducted with two replicates, using a completely randomized block design. Each plot contained 40 plants with five rows. The plants were spaced at 20 cm 9 17 cm and managed in accordance with routine agricultural management practices.
Phenotypic evaluation. In the rice growing season we investigated three rice stigma characteristics, that is, STL, SYL, and TSSL. At the full-bloom stage, 10 flowering spikelets were collected from the highest panicle on each individual plant and placed in an Eppendorf tube containing tap water. We used the tap water to keep the spikelets fresh and retain the original appearance of the pistil. Next, on the same day, the pistils were taken out from the glume and photographed under a stereomicroscope (109, MC50, Guangzhou Micro-shot Technology Co., Ltd, http://mshot.qianyan.biz/). The STL, SYL, and TSSL were measured with a micrometer (Figure 1a). For each replication, the average of 10 STLs (two stigmas per pistil) was used as the mean value for each plant. The average of 10 SYLs was used as the mean value for each plant and the average of 10 TSSLs was used as the mean value for each plant. Five plants were evaluated in each accession in each replicate.
Library construction and sequencing. For each of the 191 accessions to be sequenced, two blades from a single plant were collected at tillering stage (1 month after seedling transplanting) for extraction of genomic DNA using the standard cetyltrimethylammonium bromide protocol (Murray and Thompson, 1980). In accordance with the manufacturer's instructions (Illumina, https:// www.illumina.com/), 5 lg of genomic DNA from each accession was used to construct paired-end sequencing libraries, with insert sizes of approximately 350 bp. Paired-end 150 bp reads were obtained using the Illumina HiSeq X10 platform, and the raw sequences were further processed by removing the adaptor contamination and low-quality reads, yielding a total of 0.532 Tb of genomic sequence data, with an average of 5.48-fold genomic coverage for each of the 191 accessions. All library construction, sequencing, and sequence cleaning was performed by Mega Genomics-Beijing (http://www.megagenomics.cn/mobile.php/).
For the 162 sequenced accessions, the raw Illumina sequencing data generated by Chen et al. (2014) (151) and Huang et al. (2012) (11) were downloaded from the NCBI Sequence Read Archive using the accession number PRJNA171289 and from EBI European Nucleotide Archive using the accession number ERP000106, respectively. This information is listed in Table S1.
Annotation. The software snpEff (Cingolani et al., 2012) was used for SNP annotation of the Nipponbare genome sequence. Exonic regions, splicing sites (within 2 bp of a splicing junction), 5 0 -and 3 0 -UTRs, intronic regions, upstream and downstream regions (within a 5-kb region upstream or downstream from the transcription start site), and intergenic regions were categorized. The SNPs in the coding exons were of two types, synonymous and non-synonymous; the former does not cause amino acid changes whereas the latter does. The cases in which base substitutions cause a stop gain and stop loss are non-synonymous SNPs. Indels in the exonic regions were classified according to whether there were frameshift (3 bp insertion or deletion) mutations.

Population genetic analysis
Based on the SNP matrix of the 353 rice accessions, we calculated the simple matching coefficient for all SNPs as the genetic distance using the software SSAHA (Ning et al., 2001). The software PHYLIP 3.52 (Felsenstein, 1993) was used to construct the neighbor-joining tree, and MEGA 5.0 software (Tamura et al., 2011) was used to display the tree. We performed principal component analysis using the smartpca program of the gcta64 software (Price et al., 2006), and the first two principal components were plotted in two dimensions. The population structure was analyzed using STRUCTURE 2.3.4 software (Pritchard et al., 2000). We set the parameter for admixture and no linkage to run STRUCTURE 2.3.4 software with a burn-in of 50 000 replicates and 100 000 Markov chain Monte Carlo iterations. For each K value (K = 1-10), 10 repeats were performed. We analyzed the results by the EVANNO method with STRUCTURE HARVESTER (Earl and vonHoldt, 2012) and used CLUMPP (version 1.1.2) (Jakobsson and Rosenberg, 2007) to permute run clusters. If the mean log-likelihood value increased with an increase in K value, DK values were calculated to determine a suitable K value. DISTRUCT (Rosenberg, 2004) was used to plot the results. We determined the non-admixed individuals in each genetic subpopulation with Q-matrix assignment of above 0.85.
The LD was estimated using the r 2 value (Mather et al., 2007) for all inter-and intra-chromosomal SNP pairs and was calculated using PopLDdecay 3.40 software (Zhang et al., 2018), with default parameters. The LD decay was calculated by grouping the SNP pairs into 1-kb bins, and averaging the r 2 within bins; the average r 2 values were then used to plot the fitting curve line for the whole population, and the indica and japonica subpopulations. The LD heatmaps surrounding peaks in the GWAS were constructed using the R package 'LDheatmap' (Shin et al., 2006).
Across the rice genome, the nucleotide diversity (p) of indica rice and japonica rice in each 100 kb was calculated by the program ANGSD (version 0.613) (Korneliussen et al., 2014). We calculated the F ST between the indica subpopulation and the japonica subpopulation by VCFtools (version 0.1.12b) (Danecek et al., 2011) based on the Weir and Cockerham method (Weir and Cockerham, 1984).

Genome-wide association mapping
We carried out a GWAS on a total of 18 sets of phenotypic data (six environments 9 three traits) and the genotypic data of 1 326 094 common SNPs (MAF > 0.05) with a mixed linear model (MLM) program by GAPIT (version 2.12) (Lipka et al., 2012). The false discovery rate (FDR) was calculated for significant associations using the Benjamini and Hochberg (1995) correction method, with 1.0 9 10 À5 as the threshold. The Manhattan and quantilequantile plots were generated by the qqman (Turner, 2014) package in R. The candidate genes in these associated loci were identified by BLAST query of the Nipponbare genome to obtain coordinates and were confirmed as significantly associated with the corresponding traits.

Identification of the candidate genes in the GWASassociated loci
We took the following steps to narrow the candidate gene region. First, based on the associated loci identified, the candidate region was estimated by pairwise LD correlation (Shin et al., 2006). Second, according to the reference genome sequence of Nipponbare, the SNP types located in the candidate region were analyzed. We focused on the associated non-synonymous SNPs which could induce amino acid changes and were significantly associated with the traits in the GWAS result. Third, the different expression of candidate genes between three samples with shorter SYL and three samples with longer SYL was used to narrow the candidate genes. We then conducted the gene-based association analysis to classify the samples into distinct alleles. Finally, to confirm the causal SNPs per gene, the difference between phenotypes with distinct alleles was calculated and the significance was tested by a two-tailed Welch's t-test.
The RNA extraction procedure and quantitative real-time PCR (qRT-PCR) Using an ultrapure RNA kit (Omega Bio-tek, https://www.omegab iotek.com), total RNA was extracted from young panicles at development stages 5-8 (as per the criterion described by Itoh et al., 2005) and young pistils at stage 8, respectively, sampled from the six accessions (three accessions with shorter SYL and three accessions with longer SYL). We used RNase-free DNase I treatment to remove any contamination from genomic DNA and HiScript II Q RT SuperMix (Vazyme, http://www.vazyme.com) to perform the first-strand cDNA synthesis by reverse transcription from 1 µg of RNA. The 18S rRNA gene was used as an internal control. Realtime quantitative RT-PCR was conducted in a 96-well thermocycler (Roche Applied Science LightCycler 480, https://lifescience.roche.c om/) using SYBR Green (Vazyme). We set the following cycling conditions: first, denaturation, at 95°C for 5 min; followed by an amplification and quantification program, 40 cycles of 95°C for 10 sec, 60°C for 30 sec, and 72°C for 60 sec with a single fluorescence measurement; and third, the melting curve (60-95°C with a heating rate of 0.1°C sec -1 and continuous fluorescence measurement); and finally, a cooling step to 40°C. Three independent replicates were performed. The sequences of the primers used for qRT-PCR are listed in Table S11. Relative gene expression of the target gene was calculated following the equation: exp = 2 ÀDCt , where DCt = Ct target gene ÀCt 18S rRNA .

Generation of Os02g0733900 transgenic plants
The full-length genomic DNA of Os02g0733900 was PCR amplified from Xiangxiandao 10hao. The PCR product was cloned into the pBWA(V)HII vector. The primer sets used for PCR are listed in Table S11. The construct was introduced into Agrobacterium tumefaciens (EHA105) and transferred into Nipponbare. The corresponding empty vector was also transformed into Nipponbare as a control. Twenty-five independent T 1 seedlings obtained were grown in a paddy field under natural conditions. The T 2 seeds harvested from T 1 plants at the maturity stage were grown in the paddy field in the next rice growing season (May to October). At the tillering stage, the three allele genotypes (AA, AC, CC) on the Os02g0733900 locus were determined using the primers listed in  Evaluation of F 1 hybrid seed production potential for OsSYL2 in the paddy field To evaluate the potential of the OsSYL2 gene in F 1 hybrid rice seed production we selected Nipponbare (short stigma), transgenic complementary line with OsSYL2 AA (long stigma), as the female parent, and a purple rice accession as the male parent. The female and male lines were grown in a 4:2 row ratio. The male line was planted on both sides in four rows each, and the female material was planted in the middle in two rows. During the flowering stage, artificial supplementary pollination was performed twice per day. After 30 days, the seeds of the female lines were harvested. The potential of OsSYL2 for hybrid rice seed production was evaluated by detecting the percentage of purple seedlings in the germination experiment in the F 1 populations. Ten thousand 'F 1 ' seeds were sampled from each F 1 combination and sown in plastic plates (39 cm 9 39 cm 9 6 cm) to investigate the frequency of purple seedlings. Two replicates were conducted for each combination.

AUTHOR CONTRIBUTIONS
DH and XD designed the experiments and managed the project. XD, YZ, XC, ZF, QL, JJ, DL, YL, BF, ZW, EL, XH, SZ, DS, HW, YL, and SC conducted field planting, phenotyping, and DNA sampling. XD and YZ prepared DNA samples, qRT-PCR and transformation analysis, and performed the test evaluating the potential of the OsSYL2 gene in hybrid rice seed production. XD and YY performed the data analysis. XD and YY wrote the manuscript draft, which was revised by YW and DH.

CONFLICT OF INTEREST
The authors declare no competing financial interests.

DATA AVAILABILITY STATEMENT
Sequencing data that support the findings of this study have been deposited in the Sequence Read Archive (SRA), NCBI with the accession code PRJNA554986.

SUPPORTING INFORMATION
Additional Supporting Information may be found in the online version of this article. Figure S1. Phenotypic distribution of stigma length, style length and the sum of stigma and style length traits in the germplasm collection in six environments. Figure S2. Genetic diversity across 12 chromosomes. Figure S3. Population structure analysis of 353 rice accessions and the decay of linkage disequilibrium. Figure S4. Neighbor-joining tree with accession ID. Figure S5. Manhattan plots and quantile-quantile plots depicting the results of genome-wide association study for the stigma length trait using a mixed line model in the 353 cultivated rice accessions in each environment. Figure S6. Manhattan plots and quantile-quantile plots depicting the results of genome-wide association study for the style length trait using a mixed line model in the 353 cultivated rice accessions in each environment. Figure S7. Manhattan plots and quantile-quantile plots depicting the results of genome-wide association study for the sum of stigma and style length trait using a mixed line model in the 353 cultivated rice accessions in each environment. Figure S8. The morphology of young panicles in different development stages. Figure S9. The gene allele frequency differences at the causal polymorphisms of OsSYL2 and OsSYL3 in five geographic groups. Figure S10. Protein networks interacting with OsSYL2. Network nodes represent proteins and edges represent protein-protein association. Table S1. Names and origins of 353 rice accessions used for association mapping and the corresponding Q-values calculated by STRUCTURE software. Table S2. Basic statistics of the three stigma characteristics in each environment. Table S3. The results of joint analysis of variance for the three stigma characteristics. Table S4. The distribution of the significant association single-nucleotide polymorphism loci detected in the rice population composed of 353 accessions in the six environments. Table S5. The single-nucleotide polymorphism information in the 16.69-16.87 Mb candidate region for style length and the sum of stigma and style length traits. Table S6. Candidate gene annotation in the linkage disequilibrium region 16.69-16.87 Mb associated with style length and the sum of stigma and style length traits. Table S7. The single-nucleotide polymorphism information in the 30.45-30.65 Mb candidate region for style length trait. Table S8. Candidate gene annotation in the linkage disequilibrium region 30.45-30.65 Mb associated with style length trait. Table S9. Correlation coefficients between grain length trait and stigma characteristics. Table S10. The results of significantly associated single-nucleotide polymorphism loci detected in this study overlapped with the quantitative trait loci/genes of rice stigma characteristics reported previously. Table S11. Primers used in this study.