High‐density genetic variation maps reveal the correlation between asymmetric interspecific introgressions and improvement of agronomic traits in Upland and Pima cotton varieties developed in Xinjiang, China

SUMMARY The two new world tetraploid cottons, Gossypium hirsutum and Gossypium barbadense, are cultivated worldwide and are characterised by a high yield and superior fibre quality, respectively. Historical genetic introgression has been reported between them; however, the existence of introgression and its genetic effects on agronomic traits remain unclear with regard to independent breeding of G. hirsutum (Upland cotton) and G. barbadense (Pima cotton) elite cultivars. We collected 159 G. hirsutum and 70 G. barbadense cultivars developed in Xinjiang, China, along with 30 semi‐wild accessions of G. hirsutum, to perform interspecific introgression tests, intraspecific selection analyses and genome‐wide association studies (GWAS) with fibre quality and yield component traits in multiple environments. In total, we identified seven interspecific introgression events and 52 selective sweep loci in G. hirsutum, as well as 17 interspecific introgression events and 19 selective sweep loci in G. barbadense. Correlation tests between agronomic traits and introgressions showed that introgression loci were mutually beneficial for the improvement of fibre quality and yield traits in both species. In addition, the phenotypic effects of four interspecific introgression events could be detected by intraspecific GWAS, with Gb_INT13 significantly improving fibre yield in G. barbadense. The present study describes the landscape of genetic introgression and selection between the two species, and highlights the genetic effects of introgression among populations, which can be used for future improvement of fibre yield and quality in G. barbadense and G. hirsutum, respectively.


INTRODUCTION
Six new world tetraploid cottons are derived from a single hybridisation between the A diploid and D diploid genome species (Wendel and Grover, 2015). Two of these species, Gossypium hirsutum and Gossypium barbadense, have been domesticated and cultivated in the world. Abundant germplasm resources for these species have been collected and applied globally (Campbell et al., 2010). G. hirsutum (Upland cotton) is superior in terms of yield and accounts for more than 95% of world cotton production, whereas G. barbadense (Pima cotton) is superior for its fibre quality and accounts for about 2% of cotton production (Chen et al., 2007). In addition, these two species differ with respect to other traits such as resistance to Verticillium wilt (Zhao et al., 2018;Li et al., 2019) and Fusarium wilt (Ulloa et al., 2016). To take full advantage of natural genetic variations within gene pools, introducing beneficial alleles through interspecific hybridisation has been a strategy for broadening the genetic basis of cultivated cotton (Shim et al., 2018). This strategy involves the invasion of a foreign genetic fragment into the host genome and is known as introgression.
During the modern breeding and research process, manmade introgression lines derived from crosses between G. hirsutum and G. barbadense have improved fibre quality and resistance to certain pathogens (Fang et al., 2014;Song et al., 2017). In addition, natural and ancestral introgressions were reported between G. hirsutum and G. barbadense as early as 1990 and 1992, respectively (Percy and Wendel, 1990;Wendel et al., 1992). Brubaker et al. (1993) documented asymmetry and bidirection in cytoplasmic introgression between G. hirsutum and G. barbadense. These introgression events have also been detected in modern cotton cultivars (Wang et al., 2015;Fang et al., 2017a). However, the genetic effects of introgression on agronomic traits have not been well studied in modern cotton cultivars. China is one of the major cotton producing countries. In China, varieties grown in early period  were directly introduced from foreign countries, whereas independent breeding started in 1960s. During this independent breeding era, man-made crosses and selection have been a driving force for improving fibre quality and resistant traits by retaining favourable alleles of important loci. Thus, there is a need to understand the existence and genetic effects of selection and mutual introgression signatures in cultivars between or within G. hirsutum and G. barbadense.
The genetic study of populations is a main approach for analysing crop evolution and domestication (Schreiber et al., 2018). The published cotton reference genomes of G. hirsutum and G. barbadense (Hu et al., 2019;Wang et al., 2019) have significantly contributed to our knowledge of cotton population genetics, and have allowed for the dissection of complex traits, in addition to uncovering domestication signatures and large-scale genetic variations (Wang et al., 2015;Wang et al., 2017;Ma et al., 2018;Wen et al., 2018;Wen et al., 2019). Introgression is common between species and is often related to adaptive traits (Whitney et al., 2006;Clarkson et al., 2014;Racimo et al., 2015;Akpertey et al., 2018). Detection of naturally mutual introgression is important for uncovering population history and structure (Reich et al., 2009;Fontaine et al., 2015) and multiple methods have been developed to achieve this, such as TREE-MIX software for inferring the population splits and mixtures (Pickrell and Pritchard, 2012;Karlsson et al., 2014;Martin et al., 2015;Pease and Hahn, 2015;Elworth et al., 2018).
In the present study, we collected an interspecific panel consisting of 159 G. hirsutum and 70 G. barbadense cultivars developed in Xinjiang, the biggest cotton growing region in China, and the cultivars were planted in multiple environments to collect phenotypic data. We also included 30 G. hirsutum races (e.g. yucatanensis, richmondi, morrilli, etc.) in the study to determine (i) selection loci for fibre traits and a collinear relationship between G. hirsutum and G. barbadense cultivars; (ii) the existence and genomic location of introgression in modern cultivars; and (iii) the genetic and phenotypic effects of introgression loci on fibre traits during independent breeding in modern cultivars.

RESULTS
High-density genetic variation map and population structure in the interspecific panel In the interspecific panel, 159 G. hirsutum cultivars (hereafter defined as Gh) and 70 G. barbadense cultivars (hereafter defined as Gb) from introduced and independent breeding cultivars in Xinjiang, China ( Figure S1; Table S1) were re-sequenced with 109 genomic coverage. Thirty G. hirsutum races (hereafter defined as Gh-race) were collected from published data (Fang et al., 2017a) (Table S1). A high-density and -quality genetic variation map was then constructed from 6 318 993 single nucleotide polymorphisms (SNPs) in the interspecific panel, and a total of 1 034 682 SNPs in Gh, 1 507 705 SNPs in Gh and Gh-race, and 3 465 402 fixed interspecific SNPs between Gh and Gb were identified (Table 1), which suggests extreme interspecific differentiation. The high-density genetic variation map in four levels (fixed interspecific SNPs, Gh&Gb SNPs, Gb SNPs and Gh SNPs) indicated that the genetic variations were not uniformly distributed across chromosomes and regions ( Figure 1). For example, the fixed interspecific SNPs on chromosome A01 showed the lowest density, whereas chromosome D02 showed the highest density ( Figure 1b; Table 1).
To demonstrate genetic relationships among three groups of cotton accessions (Gh, Gb and Gh-race) (Table S1), population structure analyses were performed. Principal component analysis showed that the first eigenvector (PC1) clearly separated Gb from Gh and Gh-race, while the second eigenvector (PC2) distinguished Gh from Gh-race ( Figure 2a). Two distinct branches of the phylogenetic tree support the hypothesis that G. hirsutum and G. barbadense species have been remarkably diversified over the course of evolutionary history (Figure 2b). The population structure collaboratively indicated that a significant population stratification exists between G. hirsutum and G. barbadense species (K = 2), and the Gh, Gh-race and Gb groups were generally distinguished from each other (K = 3) ( Figure 2c). Interestingly, Figure 2(c) shows that the population structure between G. hirsutum and G. barbadense species is slightly mixed, which may be a result of introgression.

Estimation of interspecific differentiation and introgression
To explain the uneven distribution of SNPs in the interspecific panel ( Figure 1) and precisely estimate interspecific differentiation, genome-wide population divergence with Gb, Gh and Gh-race groups was tested. The fixation index (F ST ) showed that a notable interspecific differentiation (F ST = 0.911) existed between the Gb and Gh groups, whereas the intraspecific differentiation (F ST = 0.209) between Gh and Gh-race groups was lower. A lower interspecific differentiation between Gb and Gh-race (F ST = 0.859) was also observed compared to Gb and Gh differentiation (F ST = 0.911) ( Figure 3a). Phenotypic data in multiple environments also collectively revealed a significant interspecific difference (Table S2) for nine fibre traits (LP, lint percentage; SCW, seed cotton weight; LW, lint weight; FL, fibre length; FS, fibre strength; MV, micronaire value; FU, fibre uniformity; SFC, short fibre content; FE, fibre elongation) in six environments. Analysing with BLUP (Best Linear Unbiased Prediction) phenotype data revealed that the Gh group had a significantly higher fibre yield (P < 0.01) compared to the Gb group in traits of LP, SCW and LW; however, the Gb group had significantly better fibre quality (P < 0.01) than the Gh group in traits of FL, FS, FU and SFC ( Figure S2). In general, the correlation and heritability of most fibre traits showed a higher value in the Gh and Gb panel than in the Gh or Gb panel and this may have resulted from the dramatic interspecific difference of phenotype; in the Gh and Gb panels, the correlation indicated that FL remains stable in different environments and FL has a higher heritability, whereas LP has a relative low correlation and heritability (Table S3). The fixation index at the chromosome level revealed an accurate intraspecific and interspecific differentiation map ( Figure S3). Additionally, most SNPs harboured a high F ST value between Gh and Gb, although abundant loci with relatively lower differentiation also existed on the chromosomes ( Figure S3b).
To determine whether the lower differentiation loci in the interspecific map belonged to introgressed or undifferentiated loci, 'three population statistics' and phylogenetic distance analysis were performed using a 1-Mb chromosome scale. The results showed that history introgression events occurred between the Gh and Gb groups (f 3 = À0.006 and Z score = À73.3585) (Table S4), providing evidence that interspecific introgression has been retained in modern breeding cultivars. A genome-wide scan of introgression with a phylogenetic tree analysis indicated that a total of seven introgression events, including 10 introgression loci, were detected in G. hirsutum cultivars. In addition, 17 introgression events, including 32 introgression loci, were detected in G. barbadense cultivars (Figure 3b; Table S5), which demonstrates that asymmetrical introgression events flowed more frequently from G. hirsutum to G. barbadense than from G. barbadense to G. hirsutum. The phylogenetic trees can clearly distinguish the non-introgression ( Figure S4a) or introgression region ( Figure S4b) between Gh and Gb, and it can be observed that Gb accessions were mixed in Gh branch in A06~87 Mb ( Figure S4b). Interestingly, a bidirectional introgression locus (Gh_INT1 and Gb_INT1) was found on chromosome A01 (Figure 3b), indicating that G. hirsutum and G. barbadense cultivars mixed in both branches of the phylogenetic tree ( Figure S4c). This bidirectional introgression region showed a lower population differentiation (Figure S3b). In the Gh_INT1 event, differentiation tests between introgression and non-introgression groups showed that one stable effective locus existed in this region; for Gb_INT1, differentiation tests showed that three stable effective loci existed in this region (Table S6). Introgression analyses indicated that 87 of 159 G. hirsutum cultivars harboured genomic fragments from G. barbadense species, although all 70 G. barbadense cultivars harboured uneven numbers of genomic fragments from G. hirsutum species. Both core Gh breeding parent cultivars (e.g. Y-231 and Y-232) and core Gb breeding parent cultivars (e.g. Y-217, Y-218 and Y-219) had introgression events, which suggests that most of introgression events occurred before the Xinjiang independent breeding era. Interestingly, the fixed value (F ST ) between Gh and Gb was increased after excluding the genetic introgression loci (Figure 3a), suggesting that introgression decreased population divergence.
To determine whether introgression has a genetic effect on the nine fibre traits, we performed linear analysis between introgression loci and these traits in multiple environments. The results showed that accumulated introgression loci could generally improve fibre traits for both the Gh and Gb panels ( Figures S5 and S6). In the Gh panel, SCW, LW, FU in E1, FS and FE in E2, SCW, LW and FL in E3 and LW in E5 and E6 had a significant correlation between introgression loci and fibre traits (P < 0.05); for the Gb panel, SCW, LW and FS in E1, LP in E5 and FE in E6 had a significant correlation between introgression loci and fibre traits (P < 0.05). Remarkably, when nine fibre traits in six environments were compared between introgression and non-introgression groups in each introgression locus, three stable effective loci were detected in the Gh panel and 17 stable effective loci were detected in the Gb panel. Notably, the introgression group harboured a higher significant value of fibre traits than the non-introgression group in abundant introgression loci (Table S6). Hence, the benefit of introgression events has been retained in breeding populations to improve fibre traits.
Genetic introgression signature during intraspecific selection as revealed by high-density genetic variation maps and genome-wide association studies (GWAS) We utilised multiple bioinformatics software to construct three high-density maps with three kinds of genetic variations, SNP, insertion-deletion (Indel) and structural variation (SV) based on re-sequencing data of Xinjiang cultivars. In total, 3 876 899 SNPs, 756 666 indels and 39 363 SVs were detected in the Gh panel (Table S7; (Table S8; Figures S7b, S8b and S9b). Three maps of genetic variations collectively revealed that introgression loci had more genetic variation and diversity (pi) for both Gh and Gb panels (Tables S7 and S8). In particular, this phenomenon was obvious on chromosome A01, which included a long bidirectional introgression fragment ( Figures S7-S9). Therefore, introgression events significantly increased genetic variation and diversity in intraspecific populations.
During introduction and independent breeding, genomic loci associated with important agronomic traits were selected an d showed an unbalanced allele frequency in the population. In this study, 52 selective sweep loci were identified in the Gh panel, as well as 19 selective sweep loci in the Gb panel ( Figure 3c; Table S9). Collinear analysis between the interspecific selective sweep loci, which were aligned to the G. hirsutum reference genome, identified five co-selected loci between the Gh and Gb panels (Figure 3c). In comparison with the published quantitative trait locus (QTL) by GWAS Fang et al., 2017b;Ma et al., 2018), 66 reported QTL overlapped in 16 selective sweeps for the Gh panel (Table S10). These results indicate that the independent breeding cultivars in Xinjiang have retained an introgression signature.
Additionally, we performed GWAS of nine fibre traits with G. hirsutum and G. barbadense populations, and estimated linkage disequilibrium (LD) for association mapping resolution (Table S11; Figures S10 and S11). We collected phenotype data of nine fibre traits from six environments, predicted one BLUP phenotype data and performed genome-wide association analysis with the intraspecific Gh and Gb panels. Association mapping revealed a total of 40 and 63 QTL in Gh and Gb, respectively (Tables S12 and S13; Figures S12 and S13). Within the associated QTL, we found that q-A01-FE overlapped in Gb_INT1; q-A08-FU-2 overlapped in Gb_INT8; and q-D05-SFC overlapped in Gb_INT17; importantly, two QTL, q-D03-SCW and q-D03-LW ( Figure S13a,b), which were located in the region of introgression event Gb_INT13 (Table S13), also significantly affected fibre quality traits (e.g. FL, FS and FU) (P < 0.05) (Table S6).
Gene expression pattern in introgression QTL q-D03-SCW and q-D03-LW is illustrated by a chromosome segment substitution line (CSSL) A Manhattan plot showed that SNPs were significantly associated with the yield traits SCW and LW on chromosome D03 [Àlog 10 (P-value) > 6.6] (Figure 4a). These SNPs overlapped with the Gb_INT13 introgression, and a long segment of LD block was identified in this region (Figure S14). Compared with the non-introgression group, the introgression group harboured traits favouring a higher yield (Figure 4b). Furthermore, we found an introgression line, N139, which has been reported to harbour an introgression locus (D03: 0-2.8 Mb in G. hirsutum) from G. barbadense (Figure 5a) ). An analysis of fibre traits revealed that the introgression line showed a better fibre quality compared to that of its parent (N178, G. hirsutum cv. E22), indicating that this region was also beneficial for improving fibre traits in the genetic background of G. hirsutum (Figure 5b). Gene expression data for the introgression locus indicated that 191 genes were preferentially expressed in 10-day post-anthesis fibre, and these genes showed a higher expression in the CSSL (N139) compared to N178 (Figure 5c). Therefore, an introgression event may change expression patterns in fibre tissues.

DISCUSSION
Dramatic diversification exists between G. hirsutum and G. barbadense cultivars during parallel breeding The present study reflects dramatic genetic and phenotypic diversification between G. hirsutum and G. barbadense cultivars (Figures 1 and 3a; Figure S2). As reported, these two cultivated species, G. hirsutum and G. barbadense, have a short evolutionary history of approximately 1-2 million years (Wendel and Grover, 2015). A selective sweep has been shown to strengthen subspecies differentiation between japonica and indica rice cultivars (Yuan et al., 2017); we found that the independent breeding of G. hirsutum and G. barbadense cultivars also strengthened population diversification of interspecific cultivars (Figure 3a). In terms of population structure, Gh and Gb were clustered, respectively, whereas Gh and Gh-race groups were in a continue distribution, which is consistent with the domestication of G. hirsutum process. One expected Gb dot was located away from the Gb cluster (Figure 2a,b), which may be attributed to the complex and obscure origin of G. barbadense cultivars. Southern Mexico and Central America are assumed to be the regions of origin for G. hirsutum, whereas the Andean region of Peru, Ecuador and Colombia are the assumed regions of origin for G. barbadense (Richmond, 1951). An independent domestication of approximately 5000 years has also been predicted (Brubaker and Wendel, 1994;Westengen et al., 2005;Gross and Olsen, 2010), especially for G. hirsutum, which spans the wild-to-domesticated continuum (wild, dooryard, land race and cultivated types) (Rapp et al., 2010). The combined factors of natural mutation, selection and human domestication may drive the divergence and evolution of these two species (Figure 3). Because of the highly homologous relationship between G. hirsutum and G. barbadense , genomic loci containing functional genes associated with agronomic traits may undergo parallel domestication and selection (Wendel and Grover, 2015). Presently, few studies have demonstrated the co-domestication and selection phenomenon between G. hirsutum and G. barbadense. Domestication and selection are common in plants (e.g. rice, tomato and barely), and positive selection rates can be approximately 7.6% in maize (Hufford et al., 2012). In the present study, of the 52 and 19 selective loci identified in G. hirsutum and G. barbadense cultivars, respectively, only five were found to be co-selected (Figure 3c; Table S9). This finding suggests that the breeding of G. hirsutum and G. barbadense belongs to an independent process, whereas the co-selected regions may result from a co-linear functional genomic relationship between the cotton species. The co-selected loci in the selection map also showed a greater amount of selection in the At subgenome than in the Dt subgenome, which is consistent with asymmetric selection in G. hirsutum populations .

Asymmetric mutual introgression exists between G. hirsutum and G. barbadense cultivars
The two globally distributed cultivated cottons, G. hirsutum and G. barbadense, could be crossed without a reproductive barrier, which may be an important factor for inducing interspecific introgressions. The availability of a high-quality reference genome allowed us to detect  (Tables S4 and S5), collectively showed that interspecific introgression exists in cotton. The F ST indicated a lower differentiation value in the introgression loci; however, a number of lower differentiation regions existed without significant signature of an introgression event ( Figure S3). These regions may result from the complex of a polyploid cotton genome or less divergence existed between the G. hirsutum and G. barbadense genomes in history. Therefore, these regions were not inferred from the phylogenetic tree, and a number of introgressions may have not been detected as a result of the limited scanning window. These ancestral introgressions may act in a role opposite to that of reproductive isolation; alternatively, regions with lower interspecific differentiation may exist at a genome-wide level. As early as 1994, bidirectional nuclear introgressions between G. hirsutum and G. barbadense were detected in an asymmetrical distribution (Brubaker and Wendel, 1994). The present study identified 17 introgression events that flowed from Gh to Gb, wheeras only seven introgression events flowed from Gb to Gh, which is consistent with previous studies (Brubaker et al., 1993).
Potential mechanisms may exist in this asymmetrical introgression phenomenon. Introgression is commonplace in animals and plants and is related to environmental adaptation and trait improvement (Lucek et al., 2014;Ai et al., 2015;Clark et al., 2015;Zhang et al., 2016;Zou et al., 2019). Therefore, we hypothesised that the asymmetrical introgression mechanism in cotton species may relate to adaptability. In comparison with that of G. hirsutum, the G. barbadense cytoplasm background may present a better compatibility, although no reproduction barrier existed between G. hirsutum and G. barbadense. A previous study reported that different maternal cytoplasmic environments (G. hirsutum and G. barbadense) have significant effects on reproductive traits such as infertility and seed production (Dai et al., 2016); on the other hand, G. hirsutum displays a better yield and adaption and is cultivated worldwide, which may assist transfer of the G. hirsutum nuclear genome to G. barbadense. Fang et al. (2017a) also found that the genes in the introgression events were enriched in the reproduction, epithelial cell development and cell proliferation processes. Therefore, the asymmetrical introgression between G. hirsutum and G. barbadense may result from these two aspects.
There is strong evidence to suggest that introgression occurred in the early breeding era (Richmond, 1951;Wang et al., 1995). We show here that introgression loci existed in both core parents introduced from foreign countries and breeding cultivars, whereas no introgression events occurred during the independent breeding of Xinjiang elite cultivars. Taken together, these results suggest that the retained introgression event in modern cultivars may derive from the early breeding era. At present, introgression lines with single or very few genomic fragments have been developed to improve single traits. However, the present study indicates that a better strategy for improving modern cultivars involves the introduction of multiple beneficial introgression loci. The introgressions produce significant genetic and phenotypic effects In the present study, bidirectional introgression events enriched the density of SNPs, indels and SVs within intraspecific populations (Figures S7 and S8). This broadens the gene pools and also the functional diversity in G. hirsutum and G. barbadense. Artificial introgression lines have been developed and studied with agronomic traits in rice (Ma et al., 2016), maize (Liu et al., 2016), wheat (Ali et al., 2016) and cotton . In the present study, the relationship between the number of introgression loci and fibre traits indicated that introgressions are beneficial for the improvement of fibre traits within intraspecific species (Figures S5 and S6). A Student's t-test conducted between introgression and non-introgression groups showed three stable effective loci in the Gh panel and 17 stable effective loci in the Gb panel (Table S6). Certain loci have less of an effect on fibre traits, and these loci may instead be important for resistance to diseases or for other traits. Furthermore, the phenotypic effects of three introgression events could be detected by GWAS, and Gb_INT13 significantly improved fibre yield in G. barbadense (Figure 4). Intriguingly, introgression in a CSSL with the same locus in both G. barbadense and G. hirsutum affected fibre characteristics and also modified the gene expression pattern, indicating transgressive gene expression in this region ( Figure 5). The transgressive expression model has also been found in hybrid sunflower species (Lai et al., 2006). This suggests that exploiting and applying transgressive genetic effect loci with respect to interspecific introgression lines is important for future breeding programmes. In summary, beneficial reciprocal genetic introgression events in G. hirsutum and G. barbadense cultivars are retained from the evolution and independent breeding of cotton. Using a high-density interspecific SNP map and three types of genetic variation maps of intraspecific species for G. hirsutum and G. barbadense cultivars, we identified 52 G. hirsutum and 19 G. barbadense selective sweep loci in the population, as well as five co-selected loci in both populations. We also uncovered asymmetrical interspecific introgression between G. hirsutum and G. barbadense; 17 interspecific introgression events occurred in G. barbadense, whereas seven events occurred in G. hirsutum. We fine-mapped introgression loci with significant effects on fibre traits and, in particular, an important introgression event, Gb_INT13, was identified with the GWAS method, and we explored its transgressive expression pattern with a CSSL. The findings of the present study should increase our understanding of genetic introgression and help to advance interspecific molecular breeding.

Population re-sequencing and genetic variation calling
For each G. hirsutum and G. barbadense cultivar, a young leaf was collected from the plant, and genomic DNA was extracted for construction of a paired-end sequencing library to perform 109 genomic coverage re-sequencing with the HiSeq 2000 platform (Illumina, Inc., San Diego, CA, USA). The clean genomic data were generated and deposited in NCBI Sequence Read Archive (SRA) (https://www.ncbi.nlm.nih.gov/sra) under accession number PRJNA473334. Thirty Gh-race accessions were downloaded from the NCBI databank (https://www.ncbi.nlm.nih.gov) under accession numner PRJNA257154 (Table S1).
The processed Binary Alignment/Map (BAM) files were applied to call three types of genetic variations, including SNPs, Indels and SVs. The HaplotypeCaller module of GATK, version 3.1.1 (McKenna et al., 2010) was applied to produce GVCF files of each accession; GenotypeGVCFs module in the GATK toolkit was applied to merge all individual GVCF files together, and genetic variations of SNPs and Indels were obtained in intraspecific (Gh or Gb) or interspecific panels (Gh, Gb and Gh-race). BCFTOOLS, version 1.3 (Li et al., 2009) was applied to filter the SNPs with parameters QD < 2.0 || MQ < 40.0 || FS> 60.0 || MQRankSum < À12.5 || ReadPosRankSum < À8.0, and the Indels were filtered with parameters QD < 2.0 || FS> 200.0 || ReadPosRankSum < À20.0. BEAGLE, version 4.1 (Browning and Browning, 2007) was applied to impute the missing genotype. LUMPY, version 0.2.13 (Layer et al., 2014) was applied to call the SVs with the BAM file, and then genotyping was performed with SVTyper (https://github.com/hall-lab/sv typer). VCFTOOLS, version 0.1.14 (Danecek et al., 2011) was applied to filter the genotype data with parameter MinQ > 200 and merge the SVs of all the intraspecific individuals together. The genetic variation maps were drawn using the R package (CMplot).

Population structure, introgression and differentiation analyses
In total, 6 318 993 high quality SNPs in the interspecific species panel were applied to conduct the population structure analysis and perform a genome-wide scanning of the interspecific population introgression. ADMIXTURE, version 1.3 (Alexander et al., 2009) was applied to impute the population structure number from 1 to 9, and TASSEL, version 5.0 (Bradbury et al., 2007) was used to perform principal component analysis and phylogenetic analysis. Based on the allele frequency in the interspecific species panel, the fixed species-specific SNPs (Gh alternative allele < 6 and Gb alternative allele > 134) were quantified using VCFTOOLS, version 0.1.14.
The introgression between three population groups was detected by f 3 statistic of TREEMIX, version 1.13 (Pickrell and Pritchard, 2012). To reveal fine-scaled introgression loci and introgression materials, a phylogenetic tree was imputed by SNPs in a 1-Mb sliding window scale using TASSEL, version 5.0 and ITOL (https:// itol.embl.de) was applied to draw the phylogenetic tree. To quantify the number of introgression cultivars, we designated only the significant divergent region as the introgression locus, although Gb and Gh were mixed together in phylogenetic tree.
The population differentiation among Gb, Gh and Gh-race was calculated by VCFTOOLS (parameter: --fst-window-size 2 000 000 -fst-window-step 50 000); to explore the introgression effect on population differentiation, we excluded the SNPs located in the introgression region, and calculated the population differentiation with the same parameter by VCFTOOLS again. The scatter plot map of population differentiation was plotted using CIRCOS, version 0.67 (Krzywinski et al., 2009).

Genome-wide selective sweeps and association analyses in two intraspecific populations
The high-density SNPs within intraspecific populations (Gh and Gb) were applied to impute the selective sweeps using SWEED, version 3.2.1 (Nielsen et al., 2005). Depending on the domestication and selection of crop genomes, the top 5% value of the composite likelihood value of the SNPs was set as the selective sweep SNPs. The threshold values of Gh and GB were therefore set as 228 and 869, respectively. The selective sweep SNPs located within the distance of LD decay were set as the same select sweep locus.
High-quality SNPs (minor allele frequency > 0.05) of intraspecific populations (Gh and Gb) were applied to associate with nine fibre traits (SCW, LW, LP, FL, FS, MV, FU, SFC and FE) in multiple environments. The association mapping was performed with the LMM model of GEMMA, version 0.97 (Zhou and Stephens, 2012). The threshold of respective panels was determined by the significant P value, which was calculated using GEC, version 0.2 (Li et al., 2012).

RNA-sequencing data analyses of a chromosome segment substitution line (CSSL)
The RNA-sequencing data of 10-day post-anthesis fibres of a CSSL (N139) and its parents (G. hirsutum cv. E22, N178; G. barbadense acc. 3-79, N179) was downloaded from PRJNA433615 in the NCBI databank . TOPHAT, version 2.0.13 (Trapnell et al., 2009) was applied to align the clean reads to the TM-1 reference genome  and HTSEQ, version 0.8.0 (Anders et al., 2015) was used to calculate gene expression. Six fibre traits (FL, FS, MV, FU, SFC and FE) of N139, N178 and N179 were collected from previously published phenotype data .  Figure S2. BLUP phenotype data of nine fibre traits between Gb and Gh groups. LP, lint percentage; SCW, seed cotton weight; LW, lint weight; FL, fibre length; FS, fibre strength; MV, micronaire value; FU, fibre uniformity; SFC, short fibre content; FE, fibre elongation. cultivars; Gb, Gossypium barbadense cultivars; Gh-race, G. hirsutum races. Figure S4. Phylogenetic tree based on genetic variations in the A06~3 Mb (a), A06~87 Mb (b), and A01~50 Mb (c). Gh, Gossypium hirsutum cultivars; Gb, Gossypium barbadense cultivars; Gh-race, G. hirsutum races. Figure S5. The collinear relationship between nine fibre traits and number of introgression loci in the Gh panel. The relationship in E1 (a), E2 (b), E3 (c), E4 (d), E5 (e) and E6 (f). LP, lint percentage; SCW, seed cotton weight; LW, lint weight; FL, fibre length; FS, fibre strength; MV, micronaire value; FU, fibre uniformity; SFC, short fibre content; FE, fibre elongation; Gh, Gossypium hirsutum cultivars. Figure S6. The collinear relationship between nine fibre traits and the number of introgression loci in the Gb panel. The relationship in E1 (a), E2 (b), E3 (c), E4 (d), E5 (e) and E6 (f). LP, lint percentage; SCW, seed cotton weight; LW, lint weight; FL, fibre length; FS, fibre strength; MV, micronaire value; FU, fibre uniformity; SFC, short fibre content; FE, fibre elongation; Gb, Gossypium barbadense cultivars. Figure S7. Genome-wide single nucleotide polymorphisms (SNPs) in the Gh and Gb panels. Genome-wide SNPs in the Gh panel (a) and the Gb panel (b). The number of SNPs was within a 1-Mb window size. Gh, Gossypium hirsutum cultivars; Gb, Gossypium barbadense cultivars; the red box indicates bidirectional introgression on chromosome A01. Figure S8. Genome-wide insertion-deletions (Indels) in the Gh and Gb panels. Genome-wide Indels in the Gh panel (a) and the Gb panel (b). The number of Indels was within a 1-Mb window size. Gh, Gossypium hirsutum cultivars; Gb, Gossypium barbadense cultivars; the red box indicates bidirectional introgression on chromosome A01. Figure S9. Genome-wide structure variations (SVs) in the Gh and Gb panels. Genome-wide SVs in the Gh panel (a) and the Gb panel (b). The number of SVs was within a 0.1-Mb window size. Gh, Gossypium hirsutum cultivars; Gb, Gossypium barbadense cultivars; the red box indicates bidirectional introgression on chromosome A01. Figure S10. Linkage disequilibrium (LD) of At subgenome in the Gh and Gb panels. LD of At subgenome in the Gh panel (a) and the Gb panel (b). Gh, Gossypium hirsutum cultivars; Gb, Gossypium barbadense cultivars. Figure S11. Linkage disequilibrium (LD) of Dt subgenome in the Gh and Gb panels. LD of Dt subgenome in the Gh panel (a) and the Gb panel (b). Gh, Gossypium hirsutum cultivars; Gb, Gossypium barbadense cultivars. SCW, seed cotton weight; LW, lint weight; FL, fibre length; FS, fibre strength; MV, micronaire value; FU, fibre uniformity; SFC, short fibre content; FE, fibre elongation; Gb, Gossypium barbadense cultivars. Figure S14. Linkage disequilibrium in the introgression event of Gb_INT13. The red triangle indicates the LD block. Gh, Gossypium hirsutum cultivars; Gb, Gossypium barbadense cultivars. Table S1. Information on the collected germplasm resources. Table S2. Statistics of nine fibre traits of the Gh and Gb panels in six environments. Table S3. Correlation between six environments and heritability of nine fibre traits. Table S4. Introgression test by three population statistics methods. Table S5. Introgression events on a genome-wide scale between the Gh and Gb groups. Table S6. Significance test of nine fibre traits between the introgression and non-introgression groups in the Gh and Gb panels. Table S7. Genetic variation and diversity in the Gh panel. Table S8. Genetic variation and diversity in the Gb panel. Table S9. Selective sweeps in the Gh and Gb panels. Table S10. Reported QTL overlapped with selective sweeps in the Gh panel. Table S11. Linkage disequilibrium (LD) in the Gh and Gb panels. Table S12. Genome-wide association mapping in the Gh panel.