Resequencing of 301 ramie accessions identifies genetic loci and breeding selection for fibre yield traits

Summary Ramie is an important fibre‐producing crop in China; however, the genetic basis of its agronomic traits remains poorly understood. We produced a comprehensive map of genomic variation in ramie based on resequencing of 301 landraces and cultivars. Genetic analysis produced 129 signals significantly associated with six fibre yield‐related traits, and several genes were identified as candidate genes for respective traits. Furthermore, we found that natural variations in the promoter region of Bnt14G019616 were associated with extremely low fibre abundance, providing the first evidence for the role of pectin methylesterase in fibre growth of plants. Additionally, nucleotide diversity analysis revealed that breeding selection has been markedly focussed on chromosome 9 in which ~ 39.6% sequence underwent selection, where one gibberellin‐signalling‐repressed DELLA gene showed distinct selection signatures in the cultivars. This study provides insights into the genetic architecture and breeding history of fibre yield traits in ramie. Moreover, the identification of fibre yield‐related genetic loci and large‐scale genomic variation represent valuable resources for genomics‐assisted breeding of this crop.


Introduction
Bast fibres and silks are two of the most popular sources of textile fabric in Chinese history. Ramie, commonly known as China grass, is one of the most important bast fibre crops in ancient China, and its fibres have been used as textiles for over 4700 years, according to archaeological evidence (Liao et al., 2014). Presently, ramie is the second most important fibre crop in China, and its growth acreage and quantity of fibre production are second only to those of cotton (Liu et al., 2014a). Currently, ramie fibre is a popular natural fibre in the textile industry because of its ability to retain shape, reduce wrinkling and introduce a silky lustre to the appearance of a fabric (Kadolph and Langford, 2001).
Even though ramie fibres have been commonly used for thousands of years, the breeding history of ramie in China spans no more than 60 years as it began in the 1960s (Xiong, 2008). In recent years, despite the development of breeding studies, little progress has been made in improving fibre yield and quality. Focusing on the developmental process will improve ramie fibre traits; therefore, in the past ten years, hundreds of genes potentially associated with fibre development have been identified based on homology and/or expression analyses (Chen et al., 2014;Liu et al., , 2014bTang et al., 2019). However, none of these genes were confirmed to be involved in fibre development. Recent genetic studies identified some genetic loci associated with fibre traits Liu et al., 2014cLiu et al., , 2017Zeng et al., 2019); however, these loci showed limited benefits for breeding improvement based on marker-assisted selection because the examined population originates from only a few parent individuals, which provided only a limited number of excellent alleles.
Comprehensive genomic variation maps are essential for breeding studies, because of its plentiful elite alleles. Genomewide association studies (GWASs) based on whole-genomic variation maps are powerful tools for identifying genes or quantitative trait loci (QTLs) underlying complex traits, as was demonstrated in rice (Huang et al., 2010), maize (Tian et al., 2011), cotton (Du et al., 2018), peach  and melon . Here, we present a comprehensive genomic variation map of ramie by resequencing 301 landrace and cultivar accessions. Furthermore, to elucidate the genetic architecture of fibre yield traits, which is a major objective of this study, we performed a GWAS with respect to six fibre yield-related traits. We also characterized and compared population diversities of landraces and cultivars, thereby providing genomic insights into the breeding history of ramie.

Genomic variation
In total, 301 ramie accessions including 254 landraces, 13 cultivated materials and 34 improved cultivars were used in this study (Table S1). Sequencing of these 301 accessions using an Illumina HiSeq platform (Illumina, San Diego, CA, USA) generated 11.4 billion paired-end reads of 150 bp in length (1.7 Tb of sequences), at 19.5-fold coverage. After aligning the produced reads to the ramie reference genome (downloaded from NCBI, accession no. PRJNA663427), we identified 5 951 826 single nucleotide polymorphisms (SNPs) and 781 484 small insertions or deletions (indels; ≤5 bp), with an average of 22.0 SNPs and 2.9 indels per kb (Tables S2 and S3). Among these variants, 522 587 SNPs and 18 237 indels were within coding regions; 245 951 of these SNPs caused a change in amino acids in the encoded protein, and 11 315 indels led to gene transcriptional frameshifts (Tables S2 and S3), suggesting that they may affect the function of the respective gene. Taken together, these genomic variations provide a valuable resource for studies on biology and breeding in ramie.

Association analyses
The genetic structure of a population can markedly influence the power of a GWAS (Korte and Farlow, 2013), thus, before performing the GWAS, we inferred the population structure using a subset of 1 433 616 SNPs without close linkages. A Bayesian clustering analysis and a principal component analysis (PCA) indicated no distinct grouping in the populations (Figures S1 and S2), suggesting that our population was suitable for a GWAS. We also analysed the linkage disequilibrium (LD), and discovered that the LD decay reached half its maximum average correlation coefficient (r 2 ) at a distance of 0.65 kb in the landrace and of 1.3 kb in the cultivar ( Figure S3). The fibre yield of ramie is determined mainly by three stem bark traits (bark weight, bark thickness, and fibre output ratio) and two stem morphological traits (stem length and diameter). Phenotypic investigation indicated substantial variation among fibre yield traits and five associated traits in the population, with the coefficient of variation ranging from 16.1% to 45.3% (Table S4). These six traits showed significant correlations with each other in the three environments (P < 0.01), except for fibre output ratio and bark thickness in 2019 (Table S5). We then performed an association analysis for fibre yield traits and five related traits in this population. In total, 129 significant association signals involved in 70 genomic regions were detected, nine of which were associated with two traits (Figures  S4-S9, Tables S6 and S7). Twenty-one fibre yield-associated loci were identified, eight of which showed an overlap with loci of related traits (within 30 kb). Notably, the SNP on chromosome 10 (position 14,843,234) was strongly associated with the traits bark weight and fibre yield, and it showed the strongest signal associated with fibre yield (P = 2.4 9 10 À9 ) among all association SNPs.

Candidate genes of fibre yield and stem bark traits
We examined genes containing SNPs associated with traits and found that 58 fibre yield-associated SNPs were located in 21 genes (including the genic and upstream/downstream regions within 2 kb), and 47 SNPs associated with three stem bark traits (bark weight, bark thickness and fibre output ratio) occurred in 27 genes (Table S6). Because fibre yield and three stem bark traits are mainly determined by fibre growth in the bast bark, we subsequently investigated the expression profile of bark at different stages of fibre development to identify candidate genes. Our results revealed that, among those genes containing SNPs associated with these four traits, seven genes showed distinct expression differences in fibre-developing bast bark of stems, compared to stem bark that did not show initiation of fibre secondary wall growth (Table 1).
A peak strongly associated with bark thickness, identified on chromosome 9, was located in Bnt09G014694 ( Figure S10), which is an SBP-box transcription factor-encoding gene, and its expression was considerably downregulated in fibre-developing bast bark ( Figure S10). Similarly, expression of the cotton SBP transcript TC253516 was gradually reduced from the elongation stage to the stage of secondary wall thickening; TC253516 is a target of small RNA Gb-miRNA156/157, and suppression of the activity of miRNA156/157 not only caused SBP transcript accumulation but also resulted in inhibition of fibre elongation, indicating that the SBP transcript TC253516 negatively modulates initiation of production and elongation of cotton beer (Liu et al., 2014). Therefore, our results suggest that Bnt09G014694 is a candidate gene responsible for bark thickness.
Among the 14 signals associated with fibre output ratio, one was located in the promoter region of Bnt03G004997 (Figure 1a). Bnt03G004997 is the homologue of Arabidopsis VND4 and VND5, which are two master switches eliciting secondary wall biosynthesis ( Figure 1b; Zhou et al., 2014), and overexpression of Bnt03G004997 caused an increase in the number of fibres in transgenic Arabidopsis (Figure 1c). Additionally, the rice COBRA-Like gene Brittle Culm1 can modulate cellulose assembly by interacting with cellulose and affecting microfibril crystallinity; thus, this gene is critical for secondary wall cellulose biosynthesis (Liu et al., 2013a, b). We here identified an associated signallocated 265-bp downstream of the COBRA-like gene Bnt10G014913; expression analysis of Bnt10G014913 revealed a distinct expression difference between fibre-developing bast bark and stem bark whose fibres did not show initiation of secondary wall thickening ( Figure S11). Taken together, our results indicate that Bnt03G004997 and Bnt10G014913 are two candidate genes associated with fibre output ratio. One peak on chromosome 10 was strongly associated with both bark weight and fibre yield. This peak was produced by the peroxidase-encoding gene Bnt10G015830 (Figure 2a). Peroxidases play key roles in the biosynthesis of secondary walls and are involved in the polymerization of lignin (Herrero et al., 2013;Shigeto et al., 2013Shigeto et al., , 2014. Our expression analysis showed significant downregulation in the expression of Bnt10G015830 in fibre-developing bast bark ( Figure 2b). Notably, we identified a single-base insertion in the fourth exon of Bnt10G015830 in six accessions, which caused a frameshift change in the encoded protein ( Figure 2c). Phenotypic investigation showed that more stem bark and fibre could be harvested from these six mutants than from the others (Figure 2d). Therefore, our results suggested that Bnt10G015830 is a pleiotropic candidate gene responsible for bark weight and fibre yield traits.
Natural variation in Bnt14G019616 associated with extremely low fibre content We identified a germplasm, '1380', which showed extremely thin bark (~0.33 mm), low bark weight (~5.2 g/stem), and low fibre content (~0.64 g/stem), thereby causing brittleness of the stem (Figure 3a). To identify QTLs responsible for thin bark and low fibre content in '1380', we developed an F 2 segregation population derived from a crossing of '1380' and the elite cultivar Xiangzhu 1 and then produced a high-density genetic map ( Figure S12). Genetic mapping for bark thickness and fibre yield traits identified a major-effect QTL on chromosome 14 ( Figure 3b; Figure S13). We further mapped this QTL by resequencing 40 F 2 progenies with extremely thin stem bark (<0.28 mm), and delimited the QTL into a 168-kb region (from position 6 783 728 to 6 952 260 of chromosome 14; Figure 3c) Figure 1 Candidate gene for fibre output ratio trait on chromosome 5. (a) A significant signal was associated with fibre output ratio on chromosome 3, as assessed using EMMAX, and it was in Bnt03G004997. The graph shows genomic positions (x axis) and the respective significance expressed as -log10 Pvalue (y axis). The genomic position spans~20 kb on either side of the peak SNP, indicated by a grey dashed vertical line. The red diamond indicates the associated SNP, and the lighter colours of the remaining markers reflect their successively lower r 2 values (indicated in the legend on the top right). The yellow and purple boxes under the association graph indicate the coding and untranslated regions of annotated candidate genes, respectively. in two recombinant individuals. At this interval, we identified an SNP (position 6 934 389) that was strongly associated with the two traits bark weight and fibre yield ( Figure 3d). This associated SNP was found in Bnt14G019616, which is a pectin methylesterase-encoding gene.
Pectin is one of the main components of the plant primary cell wall, and pectin methylesterase is a key enzyme for catalysing the demethoxylation of pectin, thereby affecting the assembly of pectin for cell wall biosynthesis (Jolie et al., 2010). Notably, our expression analysis showed significantly lower abundance of Bnt14G019616 mRNA and protein levels in bast bark showing fibre growth (Figure 3e), indicating that Bnt14G019616 may exert negative effects on fibre development. We performed an overexpression analysis of Bnt14G019616 in Arabidopsis, which produced transgenic plants with low fibre content in the stems ( Figure S14). Moreover, investigation of pectin content in the bark of '1380' and the other nine cultivars revealed substantially more pectin in '1380' than in the other cultivars ( Figure 3f). We observed a significant negative correlation between pectin content and fibre yield in the bark of the ten investigated accessions (P = 0.0186, r 2 = À0.721). Taken together, our results strongly suggest that the pectin methylesterase gene Bnt14G019616 was a candidate gene affecting the fibre yield QTL mapped on chromosome 14.
We further investigated natural variation in Bnt14G019616 by comparing the gene sequences between '1380' and cultivar Xiangzhu 1 using Sanger sequencing. Our results revealed only three nonsynonymous mutations in the coding sequence region. However, substantial variations were identified in the promoter region of Bnt14G019616 in the '1380' genome, including four deletions and 148 SNPs (within~2.5 kb; Figure 3g; Figure S15). After comparing the two parents' expression levels of Bnt14G019616, we observed markedly more transcripts in the bark of '1380' than in that of Xiangzhu 1 (Figure 3h). Furthermore, Bnt14G019616 expression levels showed a significant positive correlation with the pectin content of bark (P = 0.0262, r 2 = 0.693), but a negative correlation with fibre yield (P = 0.0277, r 2 = À0.590), among ten investigated accessions ( Figure 3h). Therefore, our result indicated that natural variations from promoter region of Bnt14G019616 were likely associated with the extremely low fibre content in the bark of germplasm '1380.'

Breeding selection for increasing fibre yield
To acquire insights with respect to selection for increasing fibre yield during ramie breeding history, we scanned genomic regions that showed markedly decreased nucleotide diversity by comparing the cultivar group with the landrace group using 50-kb windows. We discovered 173 putative selection sweeps in ramie (i.e. with p l /p c value in the top 5%), covering 6.48% of the ramie genome (17.5 Mb) and comprising 1,128 predicted genes ( Figure 4a; Tables S8 and S9). The largest region reflecting selection (925 kb) was identified on chromosome 9. Notably, 40.5% (7.08 Mb) of the selection sweeps were identified on chromosome 9, accounting for 39.6% of the length of this chromosome; this indicates that chromosome 9 was frequently selected for the improvement of ramie during its breeding history.
Stem length is one of the most important factors determining fibre yield and has received substantial attention from breeders. We identified one stem length-associated signal that overlapped with the selective sweep on chromosome 9 ( Figure S16). Gibberellins (GAs) play key roles in promoting stem elongation (Yamaguchi, 2008), and we identified four gibberellin-related genes from the selection sweeps, including two gibberellin 2beta-dioxygenase genes (Bnt13G019022 and Bnt13G019023), one gibberellin receptor GID1C gene (Bnt02G003447), and one DELLA gene (Bnt09G013717). DELLA protein is a key regulator of plant growth because of its role in GA-signalling repression (Daviere et al., 2008). We investigated the allele frequency of the DELLA gene Bnt09G013717 in 301 germplasms and identified a total of eight haplotypes with respect to this gene (Figure 4b).
Our results revealed that four haplotypes appeared only in Figure 3 Candidate gene for fibre growth on chromosome 14. (a) Morphological comparisons of stems and stem bark between germplasm '1380' and the cultivar Xiangzhu 1. Because of abundant fibre in the bark, the stem of Xiangzhu 1 is difficult to break (outer), whereas germplasm '1380' has a brittle stem (inner). Additionally, the thickness of twenty barks is shown for germplasm '1380' (left) and cultivar Xiangzhu 1 (right). (b) A major-effect quantitative trait locus (QTL) for the bark thickness trait was identified on chromosome 14, based on an F 2 population derived from crossing '1380' with Xiangzhu 1. The broken line indicates the genome-wide significance LOD threshold. (c) Two recombinant individuals from 40 F 2 progenies with extremely thin stem bark delimited the QTL into a 168-kb region (from 6 783 728 to 6 952 260 nt). (d) The significant signal was associated with fibre with respect to the traits yield (upper) and bark weight (lower) on chromosome 14, as assessed using EMMAX, and it was in Bnt14G019616. The graph shows the genomic position (x axis) against its significance expressed as -log10 P-value (y axis). The genomic position spans~20 kb on either side of the peak SNP, indicated by a grey dashed vertical line. The red diamond indicates the associated SNP, and the lighter colours of the remaining markers reflect their successively lower r 2 values (indicated in the legend on the top right). The yellow and purple boxes between two association graphs indicates the coding and untranslated regions of the annotated candidate gene, respectively. landraces, including Hap3 and Hap7, and two haplotypes (Hap4 and Hap6) were present mainly in the cultivars. Only one variation (a C/T substitution at the 7 774 070th nucleotide site on chromosome 9) was identified between Hap7 and Hap4, and one (a C/G substitution at the 7 774 544th nucleotide site on chromosome 9) between Hap3 and Hap4, indicating that these two variations of Bnt09G013717 were frequently the focus of selection during ramie breeding history, thereby resulting in two selection signatures in this DELLA gene of the breeding cultivars.
The improved cultivar showed better performance regarding fibre yield and its related traits, compared to the landraces (Figure 4c; P < 0.0001). To understand the selection of genes underlying these agronomic traits during ramie improvement, expression levels of genes from the sweeps were investigated. Our results revealed that 156 genes, which had undergone putative selection, showed significant differences in the abundance of transcripts and/or peptides in fibre-developing bast bark (Tables S10 and S11). Fibber development mainly involves in the biosynthesis of the secondary wall, which is mediated by an NAC-MYB-based regulatory network (Zhong and Ye, 2015). Among these differentially expressed genes, one NAC gene (Bnt05G007257) and four MYB genes (Bnt04G006700, Bnt07G011845, Bnt09G014274 and Bnt09G014294) were identified (Figure 4d). Bnt05G007257 is an orthologue of Arabidopsis SND2, a key regulator of secondary wall biosynthesis (Figure 1b), and Bnt07G011845 is an orthologue of AtMYB103 that is involved in fine-tuning the transcriptional regulation of secondary wall biosynthesis (Figure 4e; Zhong et al., 2008). Moreover, among the signals associated with fibre yield and three bast bark traits, four were located in the sweeps (Figure 4a), indicating that these associated loci were subjected to selection during the improvement history. Bnt05G008345 is a peroxidase-encoding gene, near a signal associated with bark weight, which was located in a sweep on chromosome 5 ( Figure S17). Allele frequency investigation identified six haplotypes of Bnt05G008345 in the 301 germplasms, and only two of them Figure 4 Breeding selection for the fibre yield trait in ramie improvement. (a) Selection indicators in 14 chromosomes. In total, 173 regions with the top 5% of pw/pc values were considered as candidate domestication sweeps. Horizontal dashed lines indicate the genome-wide thresholds of the selection signals. Candidate genes and association signal for fibre growth identified in this study that overlapped with selective sweeps are marked. (b) Haplotype type for the DELLA gene Bnt09G013717, and these haplotypes' network relationship and their distribution in landraces and cultivars. The short line in the line linked two haplotypes indicates the number of variations between them, and the green and blue part in each circle indicates the ratio of landrace and cultivar that carry the respective haplotype. (c) Phenotypic comparison between landrace and cultivar groups revealed a significant difference in six fibre yield-related traits (P < 0.0001). The blue and orange box indicate the trait value of landrace and cultivar group, respectively (d) A heatmap for five genes, which had potentially undergone selection showed distinct differential expression in the fibre development barks (MPS). (e) Phylogenetic relationships showed that Bnt07G011845 is an orthologue of Arabidopsis fibre developmental regulator AtMYB103.  (Figure 5), further suggesting that this gene had undergone selection for improvement. Expression analysis showed a distinct decrease in the abundance of transcripts and peptides in fibre-developing bark ( Figure S18). Because peroxidases play a role in the biosynthesis of secondary walls (Herrero et al., 2013;Shigeto et al., 2013Shigeto et al., , 2014, we deduced that Bnt05G008345 is a candidate gene for bark weight and that it had undergone selection for ramie improvement.

Discussion
In this study, we characterized the population structure and elucidated the genetic architecture associated with fibre yield and its related traits in ramie based on whole genome variations. Our population analysis revealed that the LD of ramie decayed to half-maximum within only 0.65-1.3 kb, which is lower than that in other small LD-extent species reported previously, such as tea (5 kb; Xia et al., 2020) and wild European grapevines (2.9 kb; Liang et al., 2019). Therefore, a comprehensive map of genomic variations is essential to identify more markers/genes associated with traits from the GWAS analysis in ramie. Two previous studies performed association analyses of agronomic traits using limited single sequence repeats and SNPs from specific-locus amplified fragment sequencing technology in ramie Luan et al., 2017), and only few markers have been identified to be associated with traits at a very low statistical level (P < 0.01 and P < 0.05, respectively). The current study showed large-scale variations in the ramie genome (~24.9 markers/kb), which constitutes a comprehensive genomic variation map for this crop. Identification of these variations facilitates comprehensive association analyses of agronomic traits in ramie, we, therefore, identified hundreds of SNPs associated with fibre yield-related traits and identified several genes as candidates for respective traits. Fibre yield was significantly correlated with its related traits, especially with bark weight (r 2 = 0.875-0.911). Notably, genetic analysis showed that 8 of 21 fibre yield-associated genomic regions showed a pleiotropy. This result indicated that these pleiotropic regions represent an important genetic basis for the correlation of fibre yield and its related traits. Taken together, the results of the present study provide new insights into the genetic architecture of fibre yield of ramie, and genome-wide variations are valuable resources for future breeding studies on this fibre crop.
Pectin is one of the main components of the primary cell wall of plants, and pectin secreted from the Golgi complex occurs in highly methylesterified forms. For cell wall biosynthesis, these forms must be further modified by pectinases such as pectin methylesterases, which catalyse demethylesterification of acidic pectin and methanol (Micheli, 2001). Therefore, pectin methylesterases are essential for cell wall growth. Fibres are cells with a thickened secondary cellular wall, and the role of pectin methylesterase in the biosynthesis of secondary cellular walls is largely unknown. The results of the current study strongly suggest that the pectin methylesterase-encoding gene Bnt14G019616 was negatively associated with fibre growth in bast bark of ramie, thereby providing the first evidence for the role of pectin methylesterase in bast fibre growth in plants. However, due to the lack of a practical and highly efficient method of genetic transformation of ramie, it is challenging to verify the function of Bnt14G019616 in ramie. The exact mechanism of fibre growthinhibition by Bnt14G019616 thus requires further clarification.
A large number of landraces has been generated in China throughout the cultivation history over thousands of years (Ni et al., 2018). Since the 1960s, ramie breeding studies have been performed in China using elite landraces as breeding materials, which yielded dozens of cultivars (Xiong, 2008). The genomic basis underlying the selection of traits during the breeding history of ramie is poorly understood. This study discovered evidence for substantially focussed selection on chromosome 9 (including a DELLA gene) by breeders. DELLA protein acts as a negative regulator of GA responses and plays a crucial role in determining plant height (Sun, 2010). There were two DELLA genes in the ramie genome (Bnt04G005980 and Bnt09G013717), one of which, Bnt09G013717, had undergone selection for improvement of ramie. It is likely that the focus of Bnt09G013717 is based on its contribution to stem length and that this gene was, therefore, selected by breeders. Although five genomic regions associated with fibre yield-related traits were identified, most of the fibre yield-related loci had not undergone selection in ramie breeding. This is likely due to its short breeding history.
Exploring and utilizing favourable alleles is an effective method for improving crop traits (Guo et al., 2020). Substantial allelic variation was identified in ramie germplasms with respect to the fibre yield-associated candidate genes identified in the current study, and different haplotypes showed considerable differences in fibre yield ( Figure 5). However, our study revealed that most haplotypes of these genes commonly occurred in cultivated accessions, indicating that they were rarely selected during breeding. This is probably due to the short history of ramie breeding. Notably, some favourable alleles such as Hap_6 rarely occurred in the cultivars, suggesting large potential for the improvement of fibre yield in this crop. Therefore, pyramiding these favourable alleles may, in theory, substantially improve the fibre yield of ramie. Taken together, this study not only provides insights into the genomic basis underlying fibre yield improvement in ramie breeding history but also provides an important basis for further improvement of traits in the future breeding of this fibre crop.

Experimental materials and phenotype collecting
In total, 301 ramie accessions were collected and grown on the experimental farm of the Institute of Bast Fiber Crops (IBFC) in 2016. Information on the accessions, including names, collection sites, categories, and resequencing data summary, is provided in Table S1. For each accession, two replications with six individuals were planted, according to Zeng et al. (2019). Fibber yield-related traits were investigated each year in June, from 2018 to 2020. Stem length was measured from the bottom of the stem to the shoot. Stem diameter and bark thickness were measured in the middle of the stem using a Vernier calliper. Subsequently, stem bark was harvested individually and was weighed; bark weight was calculated as mean bark weight per stem. Fibres were extracted from these stem barks and were dried, and fibre yield was calculated as the mean weight of fibre per stem. The fibre output ratio was estimated for each accession as the ratio of fibre yield to bark weight. Pectin content in stem bark of ten accessions was assessed according to the following method: 5 g dry sample was placed in 150 mL ammonium oxalate solution with a concentration of 5 g/L, which was boiled for 3 h. The sample was then removed from the solution and was dried and weighed; the dry weight divided by total weight (i.e. 5 g) was considered the pectin content.

Resequencing and variation calling
Young leaves of all 301 accessions were individually sampled, and DNA was extracted. The DNA extracts were separately used to construct Illumina sequencing libraries with~350-bp inserts, according to the manufacturer's instructions. Paired-end sequencing of the libraries was performed using an Illumina HiSeq X Ten platform (Illumina). Raw sequencing reads were filtered to produce clean reads, which were aligned to the ramie genome (downloaded from NCBI; accession no. PRJNA663427) using BWA (version 0.7.15; Li and Durbin, 2010) and allowing no more than 4% mismatches and one gap. The alignment results were converted to bam format using SAMtools (version: 1.9; Li et al., 2009), and duplicated reads were removed using the Picard package. GATK (version 3.8.1;McKenna et al., 2010) was used to identify SNPs and small indels. Briefly, GATK local realignment was performed to refine the read mapping in the presence of the variants, thereby generating a gVCF file. Thereafter, SNP calling was performed using GenotypeGVCFs software, and SNPs were filtered out using the following parameters: call quality divided by depth (QD), 2.

Population structure and LD analysis
All SNPs from ramie accessions were filtered using the PLINK software (version: 1.9; parameters -indep-pairwise 50 5 0.5;  Purcell et al., 2007), generating a subset of SNPs without close linkages. Investigation of population structures was carried out using ADMIXTURE software (version: 1.3.0; Alexander et al., 2009), and 10 times for each K value was run with varying random seeds; the Q-matrices were aligned using pong software (Behr et al., 2016) and were clustered based on similarity. Thereafter, the matrices from the largest cluster were averaged to produce a final matrix of admixture proportions. A PCA was performed using the smartPCA program of EIGENSOFT software (version: 6.1.4; Price et al., 2006). To measure LD levels in groups of landraces and the bred cultivars, r 2 of each allele was estimated using PopLDdecay (version: 3.34; Zhang et al., 2019) with the following parameters: -MAF 0.05 -Miss 0.2 -MaxDist 300. Average r 2 values were calculated for each distance.

GWAS
All high-quality SNPs (MAF ≥ 0.05, missing rate ≤0.2) identified from 301 accessions were used to perform the GWAS. The efficient mixed model association expedited (EMMAX) program (Kang et al., 2010) was used to carry out the GWAS analysis for the six traits in the three environments. Population stratification and hidden relatedness were determined using a kinship (K) matrix generated using the FaST-LMM program (Lippert et al., 2011). The P-value thresholds for suggestive associations were set to 10 À6 .

Identification of domestication sweeps
To detect genomic regions that had potentially undergone selection during ramie breeding, we calculated the nucleotide diversity (p) in the landrace and bred cultivar groups and then compared them, based on 50-kb sliding windows in 25-kb steps, using VCFtools (version: 0.17; Danecek et al., 2011). Windows with the top 5% of the p ratios (p l /p c > 1.318) were selected and were merged as candidate selective-sweep regions.

QTL analysis from the F 2 population
To develop a segregated population, a crossing between germplasm '1380' and the elite cultivar Xiangzhu 1 was produced, resulting in a population comprising 528 F 2 progenies. All F 2 progenies and two parents were grown in the field of IBFC in April, 2019, and bark thickness and weight of all individuals were investigated in June 2020, as described above. A subpopulation comprising 88 F 2 progenies was randomly chosen, and 40 F 2 individuals with extremely thin bark (<0.28 mm) were selected from the subpopulation. These 128 progenies and two parents underwent resequencing using genotyping-by-sequencing (GBS) technology. Briefly, GBS libraries of each plant were constructed using the restriction enzymes PstI-HF and MspI, according to the manual of the GBS Kit, and the GBS libraries were subjected to Illumina sequencing. After filtering, clean reads of F 2 progenies and two parents were aligned with the ramie genome (NCBI accession number PRJNA663427) using bowtie2 (Langmead and Salzberg, 2012). Each plant's SNPs were called using Haplo-typeCaller in GATK (version 3.8.1;McKenna et al., 2010), and those of different individuals were merged using the GATK GenotypeGVCFs program. Low-quality SNPs were removed using the following parameters: MAF > 0.01, DP > 4, and Miss < 20.0. In total, 22 321 high-quality SNPs were obtained from these F 2 progenies, 6035 of which were homozygote in two parents (aa 9 bb), which were used for constructing a genetic map of the subpopulation ( Figure S12) using joinmap5 (Van Ooijen, 2006). QTL analysis of traits in the population consisting of 88 F 2 lines was performed using MapQTL6 (Van Ooijen, 2011), and the experimental LOD threshold significance level was determined by computing 1,000 permutations (P < 0.05). To delimit QTL to an exact genome region, recombinant individuals in the QTL interval were screened from the 40 progenies with extremely thin bark, and the region in which SNPs with the '1380' allele showed cosegregation with genes responsible for thin bark traits was determined based on the genotypes of recombinant individuals.

Expression analysis
Bark tissue samples from the top and middle parts of the stems, where the secondary walls of fibres do not initiate growth or thickening (Chen et al., 2014), were collected from the elite variety Zhongzhu 1 and were used to analyse expression differences of genes/proteins, thereby identifying fibre growthrelated genes and proteins. Transcriptome comparison between the two tissue types was performed in our previous study, and 1758 differentially expressed genes were identified (Wang et al., 2021). To detect differentially expressed proteins between them, the two tissue types (using three replicates of each) were subjected to proteome analysis. Briefly, protein extraction was carried out using a Tris-saturated phenol method, followed by digestion and labelling using tandem mass tags (TMT). After fractionation, the peptides were subjected to proteomic analysis using an LC-MS/MS system. The MS/MS data of the proteome analysis were processed using MaxQuant software (v.1.5.2.8; Tyanova et al., 2016). Tandem mass spectra were searched against the protein sequences annotated according to the ramie genome. Subsequently, normalization of protein abundance was performed by TMT, and proteins with more than 1.5-fold change were considered significantly differentially expressed (at P < 0.001). To compare the expression levels of Bnt14G019616 in stem barks of ten accessions, bark samples from stem of 50 day-old ramie were collected, and RNA was extracted. After reverse transcription, qRT-PCR of Bnt14G019616 was performed using the 18S ribosomal RNA gene as an internal control (primer sequences are listed in Table S12). The relative expression levels were determined as previously described (Livak and Schmittgen, 2001).

Functional characterization of candidate genes
To characterize the function of candidate genes, amplification of its full-length cDNA was carried out using a high-fidelity thermostable DNA polymerase (primer pair listed in Table S12). The amplified cDNA was ligated to the downstream CaMV 35S promoter in a PBI121 vector. The produced vector was introduced into an Agrobacterium tumefaciens GV3101 strain using the heat shock method, and the resulting Agrobacterium strain was infiltrated into Arabidopsis using through floral dipping (Zhang et al., 2006). The middle parts of stems of transgenic and wildtype plants were used for cell observation. Briefly, transverse sections (~13 lm thick) of stems were prepared from 40-days-old plants, were stained using Safranin O-Fast Green resulting in red coloration of lignified cell walls, and were then observed using light microscopy.

Gene sequence analysis and comparison
To compare the sequences of Bnt14G019616 between '1380' and Xiangzhu 1, the genome sequences of the genic and promoter regions were amplified from these two accessions using six primer pairs (Table S12). All amplified fragments were ligated into plasmids through TA cloning, and Sanger sequencing was performed. Sequences were compared using the SnapGene program. The haplotype type and statistics of candidate genes in different populations were reconstructed using the R packages pegas (Paradis, 2010) and VcfR (Knaus and Gr€ unwald, 2017), and a haplotype network of Bnt09G013717 was constructed using popart (Leigh and Bryant, 2015). In total, 13 NAC proteins and 16 MYB proteins have previously been identified to be involved in the biosynthesis of secondary cellular walls in Arabidopsis (Zhong and Ye, 2015). To discover the homologs of our candidates from Arabidopsis fibre growth-related MYB/NAC proteins, a protein alignment was carried out using the Clustal program (Thompson et al., 1997), and an unrooted phylogenetic tree was constructed using MEGA 5 with the neighbour-joining method and a bootstrap test with 1000 replicates (Tamura et al., 2011).

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article.

Figure S1
Principal component analysis plots of the first three components of 301 landrace and cultivar accessions. Figure S2 Model-based clustering analysis with different cluster numbers (k = 2-7). The y axis quantifies cluster membership, and the x axis lists the different accessions. Figure S3 Decay of linkage disequilibrium measured by r 2 for the landrace and cultivar group. Figure S4 Whole genome association analysis of the fibre yield trait in 301 accessions in three environments. Figure S5 Whole genome association analysis of the bark weight trait in 301 accessions in three environments.. Figure S6 Whole genome association analysis of the fibre output ratio trait in 301 accessions in three environments. Figure S7 Whole genome association analysis of the bark thickness trait in 301 accessions in three environments. Figure S8 Whole genome association analysis of the stem length trait in 301 accessions in three environments. Figure S9 Whole genome association analysis of the stem diameter trait in 301 accessions in three environments. Figure S10 The significant signal was associated with bark thickness on chromosome 9 using the EMMAX, and it was in Bnt09G014694. The graph shows the genomic position (x axis) against its significance expressed as -log10 P-value (y axis). The genomic position spans~40 kb on either side of the peak SNP, indicated by a grey dashed vertical line. The red diamond indicates the associated SNP, and the lighter colours of the remaining markers indicate their successively lower r 2 values (indicated in the legend on the top right). The yellow and purple boxes under the association graph indicate the coding and untranslated region of annotated candidate gene. Figure S11 (a) Significant signal was associated with fibre output ratio on chromosome 10 using the EMMAX, and it was located into the downstream of Bnt10G014913, with a 265-bp distance. The graph plots genomic position (x axis) against its significance expressed as -log10 P-value (y axis). Genomic position spans 20 kb on either side of the peak SNP, indicated by a grey dashed vertical line. The red diamond indicates the associated SNP, and the lighter colours of the remaining markers reflect their successively lower r 2 values (indicated in the legend on the top right). The yellow boxes under the association graph represented the coding region of annotated candidate gene. (b) Expression of Bnt10G014913 in the bark from the top part of the stem (TPS) and the middle part of the stem (MPS), where the fibres do not initiate growth or are growing, respectively. FPKM, fragments per kilobase per million reads. Figure S12 Genotyping-by-sequencing-based high-density genetic map based on the ramie F 2 population derived from the crossing of '1380' and Xiangzhu 1. Figure S13 Quantitative trait locus (QTL) analysis for fibre yield trait using a F 2 population derived from the cross of '1380' and Xiangzhu 1. A major-effect QTL was identified on chromosome 14. The broken line indicates the genome-wide significance LOD threshold. Figure S14 Microscopic examination of the transected stems of Bnt14G019616-overexpressing (OE) and wild Arabidopsis (WT), at 100-fold magnification. The red arrow indicates fibre cells, and the scale bar indicates 200 lm. Figure S15 Three deletion in the promoter region of Bnt14G019616 in '1380' were identified at the 561 bp, 681 bp and 778 bp site of gene upstream, from Sanger sequencing. Figure S16 Association signal for stem length trait identified in this studythatoverlappedwithselectivesweepsonchromosome9.
Figure S17 Significant signal was associated with the bark weight trait on chromosome 5 using the EMMAX, and it was located into the upstream of Bnt05G008345, with a 5466-bp distance. The graph plots genomic position (x axis) against its significance expressed as -log10 P-value (y axis). Genomic position spans 50 kb on either side of the peak SNP, indicated by a grey dashed vertical line. The red diamond indicates the associated SNP, and the lighter colours of the remaining markers reflect their successively lower r 2 values (indicated in the legend on the top right). The purple boxes under the association graph represented the coding region of annotated candidate gene. Figure S18 Expression of mRNA (left) and protein (right) for Bnt05G008345 in the bark from the top part of the stem (TPS) and the middle part of the stem (MPS), where the fibres do not initiate growth or are growing, respectively. The y axis in the left and right graph represented the FPKM value (i.e. fragments per kilobase per million reads) and normalized protein abundance.  Table S7 Genomic regions associated with fibre yield and its related traits in ramie Table S8 Genomic region with top 5% p w /p c value Table S9 Genes located in the sweep region Table S10 Genes from the sweeps with expressed difference between the barks of the top part of the stem (TPS) and the middle part of the stem (MPS) Table S11 Genes from the sweeps with differential protein expression between bark of the top part of the stem (TPS) and the middle part of the stem (MPS)