The celery genome sequence reveals sequential paleo‐polyploidizations, karyotype evolution and resistance gene reduction in apiales

Summary Celery (Apium graveolens L. 2n = 2x = 22), a member of the Apiaceae family, is among the most important and globally grown vegetables. Here, we report a high‐quality genome sequence assembly, anchored to 11 chromosomes, with total length of 3.33 Gb and N50 scaffold length of 289.78 Mb. Most (92.91%) of the genome is composed of repetitive sequences, with 62.12% of 31 326 annotated genes confined to the terminal 20% of chromosomes. Simultaneous bursts of shared long‐terminal repeats (LTRs) in different Apiaceae plants suggest inter‐specific exchanges. Two ancestral polyploidizations were inferred, one shared by Apiales taxa and the other confined to Apiaceae. We reconstructed 8 Apiales proto‐chromosomes, inferring their evolutionary trajectories from the eudicot common ancestor to extant plants. Transcriptome sequencing in three tissues (roots, leaves and petioles), and varieties with different‐coloured petioles, revealed 4 and 2 key genes in pathways regulating anthocyanin and coumarin biosynthesis, respectively. A remarkable paucity of NBS disease‐resistant genes in celery (62) and other Apiales was explained by extensive loss and limited production of these genes during the last ~10 million years, raising questions about their biotic defence mechanisms and motivating research into effects of chemicals, for example coumarins, that give off distinctive odours. Celery genome sequencing and annotation facilitates further research into important gene functions and breeding, and comparative genomic analyses in Apiales.


54
We obtained the high quality clean data after a series of strict filtering. Then we 134 summarized the sequencing output data features, including read quantity, data yield, error 135 rate, Q20, Q30, and GC content (Supplementary Table 1). 136

Data evaluation and conclusion 137
Original sequencing data of celery is 181.27 Gb in total. The sequencing data was of 138 high quality (Q20 ≥90%, Q30 ≥85%), and sequencing error rate was rather low (<0.05%). 139 Nucleotide library comparison revealed there was no contamination in the sample. 140

K-mer analysis 141
We adopted K-mer to estimate the celery genome size and hybridization rate, that is, 142 from a continuous sequence to iteratively select the length of K base sequence. If the 143 length of each sequence is L, the k-mer length is K, we can get the L-K+1 k-mer. Here 144 we took k =17 to perform the analysis. 145 According to the survey analysis, the main peak is near depth =22 (Supplementary 146 The process of error correction firstly established a K-mer frequency table with 153 sequencing data. After setting cutoff, the K-mers can be divided into high frequency and 154 low frequency ones. For reads with low-frequency K-mers, we made the K-mers of the 155 entire reads high by changing some bases. Then we corrected potential errors possibly 156 caused by sequencing. The large segments do not need to be used in this error correction 157 process, therefore data correction is usually performed on small segment library data. The 158 genome error correction was conducted using second and third sequencing data by Pilon 159 (https://github.com/broadinstitute/pilon/wiki) and Quiver software with the default 160 parameters, respectively (Chin et al., 2013;Walker et al., 2014). 161

Genomic base composition 209
The ratio of GC is 35.68%, and the ratio of N is 0.81%, which was an acceptable 210 range (<10%) (Supplementary Table 5). 211

Sequence consistency assessment 213
To evaluate the accuracy of the genome assembly, the small fragment library reads 214 were mapped to the assembled celery genome using BWA software (http://bio-bwa.sour 215 ceforge.net/) (Jo and Koh, 2015

Introduction 247
The Hi-C technology was further used to assist celery genome assembly. The 248 libraries were sequenced using Illumina HiSeq 4000. The analyses mainly contained the 249 data quality control, mapping the genomes, clustering, sorting, orientation, accuracy 250 assessment for the genome. 251

Hi-C biotin labeling 253
Chromatin was digested for 16 h with 400 U HindIII restriction enzyme (NEB) at 254 37 °C. DNA ends were labeled with biotin and incubated at 37 °C for 45 min, and the 255 enzyme was inactivated with 20% SDS solution. The specific steps as follows. 256 (1) Using cell cross-linking agent paraformaldehyde to make DNA and cell 257 combined; 258 (2) Using the restriction enzyme to deal with the cross-linked DNA; 259 (3) Adding biotin label at the end of oligonucleotide; 260 (4) Using nucleic acid ligase to make the adjacent DNA fragments linked; 261 (5) The protease digests the protein at the junction to de-crosslink protein and DNA. 262 DNA was extracted and randomly broken into fragments of 350 bp by Covaris crusher.

Library Check 274
Using Qubit 2.0, we performed preliminary quantification, and the library was 275 diluted to 1 ng/µl. Then we tested the insert size of library followed by Agilent 2100. If 276 the insert size was as expected, starting accurate quantification to the effective 277 concentration of the library by Q-PCR (the library effective concentration >2 nM). 278

Sequencing 279
Different libraries were pooled according to the effective concentration and the 280 target data volume, and then using Illumina HiSeq X Ten to sequence. 281

Bioinformatics analysis 282
The steps of Hi-C are mainly as follows: 283 (1) Quality control of raw data to obtain clean data; 284 (2) Mapping the clean data to the celery genome; 285 (3) Clustering, sorting, orienting, and assisting genome to anchor the chromosome.

Sequencing data quality assessment 292
The total of sequencing data for Hi-C is 378.06 Gb is with the high sequencing 293 quality (Q20 ≥90%, Q30 ≥85%). The GC distribution is normal, and the sample is not 294 contaminated (Supplementary Table 10). The Hi-C construction library has a relative 295 high quality. The finally valid read pairs is 3,000,276, and the average data effect rate is 296 34.89% (Supplementary Table 11). 297

Hi-C technology assisted genome assembly 298
Hi-C analysis produced spatially connected DNA fragments, showing interactions 299 between distantly located DNA fragments. According to whether the interaction 300 probability inside the chromosome is higher than that of between two chromosomes, and 301 the contig or scaffold were divided into different chromosomes. According to the 302 interaction probability decreases with the increase of the interaction distance on the same 303 chromosome, sorting and orienting the contig or scaffold of the same chromosome was 304 performed (Fig. 1). 305 Hi-C assisted genome assembly using the software LACHESIS 306

Comparison with draft genome 310
The high-quality sequencing data was mapped to the draft celery genome by BWA 311 software. The repeat data and no paired data were removed by SAMTOOLS (parameter: 312 rmdup), and the high quality data was obtained (Etherington et al., 2015;Li et al., 2009). 313 Meanwhile, we extracted the reads near cleavage site for assisted genome assembly. The 314 sample alignment rate reflected the similarity between sequencing data and reference 315 genome. 316

Clustering 317
Short reads were compared to the draft genome, and the reads were compared to 318 contigs or scaffolds. If reads pairs were captured by Hi-C on two contigs, an interaction 319 between two contigs was inferred. The more reads that two contigs share, the stronger the 320 interaction is, and the more likely they were grouped together. Contigs were clustered 321 according to the interactions number, and chromosomes were then divided and inferred. 322

Sorting and Orientation 323
The positions of the strengths of each pair of two contig interactions and the 324 interaction reads were sorted and oriented. 325

Assembly result statistics 326
Finally, a total of 3.047 Gb, accounting for 91.44% of the assembled celery genome, 327 was anchored onto 11 chromosomes by Hi-C (Supplementary program was used to detect tandem repeat in celery genome (Benson, 1999

Tandem repeat analyses 391
Usually, repeat sequences were divided by that whether the repeat unit were 392 clustered or not in a chromosome region, we defined that the clustered ones as the tandem 393 repeats (TR), while the scattered ones dislocated in one whole chromosome were 394 so-called transposons. The former ones can be divided into microsatelites, minisatelites, 395 macrosatelites based on their repeat times. The latter ones can be grouped into more 396 specific ones, like SINE, LINE and others (Gemayel et al., 2012;Mayer et al., 2010). In 397 the celery genome, we detected 158.15Mb tandem repeat sequences using TRF, which 398 accounted up to 4.75% of the whole genome (Supplementary Table 14). 399 According to the calculation of repeat type from single (mono-) to triple (tri-) repeat 400 bases, we regarded the appearance of the repeat unit "A" or "C" as the Mononucleotide. 401 Considering the complementary strand of "A" and "C" are "T" and "G", separately, we 402 totally unified the "A" or "T" as "A", and took the "C" or "G" as "C". Likely in the two 403 repeat unit, Dinucleotide represent "AT" (including AT and TA), "GC" (including "GC" 404 and "CG"), "AC" (including "AC", "CA", "TG" and "GT") and "AG" (including "AG", 405 "GA", "TC" and "CT"). With more repeat types, the repeat unit became more 406 complicated and we here only calculated the former three types from the 407 "Mononucleotide" to "Trinucleotide". 408 With the calculation of tandem repeat sequences, we found the range of repeat unit 409 from one single nucleotide to 2000 nucleotides, and drew the distributions of the repeat 410 regions of tandem repeat and the density of different scale tandem repeats 411 (Supplementary Table 16). We showed the distribution of the smaller tandem repeats 412 with repeat units less than 10. We found the distribution of tandem repeat times were 413 accompanied by the distribution of tandem density, and the peak of tandem density 414  Table  419 16; Supplementary Fig. 4b). With all kinds of tandem repeats, the distribution of tandem 420 repeat density and the tandem repeat regions are diverse when the repeat units are fewer 421 than about 180 and 112 (Supplementary Table 16; Supplementary Fig. 5). 422 We specifically calculated the three types of tandem repeats, which mainly included 423 the information about their repeat units, repeat regions, repeat copies, repeat bases, and 424 also the bases within the limited regions (Supplementary Table 17). From the type of 425 mononucleotide, the "A" ("A" or "T", with 2225 regions, 0.15Mb bases and the 426 maximum region(s) including 217 units) apparently was dominant compared with the "C" 427 ("C" or "G", with 20 regions, 0.75Kb bases and the maximum region containing 91 428 bases). Considering the dinucleotide, "AT" ("AT" and "TA") took the most in all 429 calculation levels compared to other three sub-types "AC", "AG" and "GC", and "GC" 430 ("GC" or "CG", with only one repeat region) was barely appeared. Repeat type with three 431 nucleotides named trinucleotide, contained ten kinds of sub-types, within which "AAT" 432 took the most percent in the trinucleotide type including about 5,980 repeat regions and 433 0.32 Mb repeat bases. 434

Centromeres and telomeres prediction 435
In this study, we predicted the centromeres and telomeres of celery  Fig. 6, Supplementary Table S18). 455 The telomere sequences for each chromosome were identified using the sequence 456 repeat finder (SERF) analysis platform (bioserf.org) (Somanathan and Baysdorfer, 2018). 457 Both of two telomeres were predicted for 9 chromosomes, while only one telomere was 458 detected in chromosomes Agr3 and Agr10 (Supplementary Table S18). 459

Gene structure annotation 460
We conducted de novo prediction of gene structure using Augustus, Genscan, 461 GlimmerHMM, Geneid, and SNAP. The homologous species include C. sativus, D. 462 Carota, L. sativa, and A. thaliana. A total of 31,326 genes were predicted in celery 463 genome, and the support of each evidence for gene set were also shown (Supplementary 464 Fig. 7; Supplementary Table 19). We further conducted the analyses of genes in celery 465 and above mentioned species. Celery has fewer genes than Arabidopsis, coriander, carrot, 466 and lettuce (Supplementary Table 19). 467

Library construction 491
Using magnetic beads with Oligo (dT) to enrich the mRNA by base A-T pairing and 492 the combination of mRNA ploy A tail, then, breaking mRNA into short fragments by 493 adding fragmentation buffer, a single-strand cDNA was synthesized by random hexamers 494 using mRNA as a template. The double-stranded cDNA was synthesized by adding 495 buffer, DNA polymerase I, and dNTPs. We purified double-stranded cDNA using 496 AMPure XP beads. Choosing the size of fragments using AM Pure XP beads after adding 497 tail A and connecting the sequencing linker, finally, PCR enrichment was performed to 498 obtain the cDNA library. 499

Library inspection 500
We performed preliminary quantification by using Qubit 2.0, and the library was 501 diluted until 1 ng/ul. Then, we detected the insert size of the library using Agilent 2100. 502 Finally, we did accurate quantification for the effective concentration of the library 503 (effective concentration >2 nM) using Q-PCR to ensure the quality. 504

Sequencing 505
We used HiSeq sequencing for the different libraries according to the effective 506 concentration and target data volume. 507

Original sequences data 509
The original image data files were obtained by Illumina HiSeq TM transformed the 510 original sequencing sequences by CASAVA Base Calling. We called it Raw Data or Raw 511 Reads, and the results were stored in FASTQ format. Error rate of each base sequencing was obtained by Phred score (Qphred= 515 -10log10(e)). Phred value was obtained by a rate model during base calling process. 516

Check A/T/G/C content 517
The GC content distribution was used to detect the phenomenon whether there exists 518 the separation between AT and GC. 519

Sequencing data filtering 520
The original sequencing sequence from sequencing contained low-quality reads with 521 connectors. In order to ensure the quality of information analysis, we filtered the raw 522 reads to gain clean reads. Secondly, to see directly the homology within and between genomes, homologous 558 gene dotplots were produced using MCScanX toolket . Dotplots were 559 used to facilitate identification of homologous blocks produced by different 560 polyploidization events (Supplementary Fig. 12). Ks values were estimated between 561 colinear homologous genes, by using the YN00 program in the PAML (v4.9h) package 562 with the Nei-Gojobori approach (Yang, 2007), and the median Ks of colinear homologs 563 in each block was shown in the constructed dotplots to help group blocks produced by 564 different events. This would found paralogous blocks and genes produced by each WGT 565 or WGDs in each Apiaceae plants, and orthologous genes between different plants. With 566 each grape chromosome, its 4X duplicated celery regions were inferred, and pinched into 567 four sets of pseudo-chromosomes by checking whether two blocks were neighboring to 568 one another as to the reference chromosome ( Supplementary Fig. 13). Each set of 569 reconstructed pseudo-chromosomes is assumed to form the corresponding subgenome 570 produced by the recursive polyploidizations. Similar is with each of the other Apiaceae 571 plants. Taken celery as an example, the (colinear) paralogs produced by each WGT or 572 WGDs were used to infer the evolutionary dates of the related events; and the 573 celery-coriander (colinear) orthologs were used to date their divergence. 574 Thirdly, the probability density distribution curve for Ks was estimated by MATLAB 575 with the kernel smoothing density function (ksdensity, bandwidth was set to 0.025, 576 typical value). Then, multi-peak fitting of the curve was performed using the Gaussian 577 approximation function in the curve fitting toolbox cftool within MATLAB. The 578 coefficient of determination (R-squared) was required to be at least 0.95 (Supplementary 579 Fig. 14). 580 Fourthly, in that we have diverged evolutionary rates among Apiaceae plants and 581 others, to have a common evolutionary rate to perform a reasonable dating, we performed 582 a correction of evolutionary rates (Supplementary Figs. 14,15). Here, different from 583 previous practice (Wang et al., 2017b; Wang et al., 2016c), we performed a two-step rate 584 correction. Based on the fact that celery, carrot, and coriander shared two extra 585 polyploidizations after the split with lettuce, and the different evolutionary rates of these 586 two polyploidizations, we conducted two rounds of rate correction. In the first step, we 587 managed to correct evolutionary rate by aligning the Ks distributions of celery, coriander, 588 lettuce and carrot γ duplicates to that of grape γ duplicates, which have the smallest Ks 589 values. Then, according to the result that celery with the slower rate during both the two 590 extra polyploidizations, we re-corrected the evolutionary rates of celery α produced 591 duplicates with coriander as the reference. The follows as details. 592 We estimated the evolutionary rates of γ-produced duplicated genes, corrected 593 according to our report (Wang et al., 2019). The maximum likelihood estimated from 594 inferred Ks median of γ-produced duplicated genes were aligned to have the same value 595 of those of grape. Supposing a grape duplicated gene pair to have a Ks value that is a 596 random variable, and for a duplicated gene pair in another genome the Ks to be 597 . 598 We also performed the Ks correction analysis to distinguish the order of each 599 polyploidization events with the method applied in previous study (Wang et al., 2015).
After its divergence from the other studied plants, grape has not been affected by 618 polyploidization any more, we assumed that the evolutionary rate of grape genes is 619 relatively stable and, therefore, set 1 Vv l = . 620 Finally, for each species i , the correction coefficient ratio should be calculated by 622 Specially, due to the rapid evolution rate of goldfish and rice, it requires multiple 625 corrections, and the recent doubling event has not been corrected again. 626 After correction, the Ks peak for ω is basically similar, however, the ks peak for α 627 has significant deviations. It shows that the rate of evolution of carrots, coriander, and 628 celery is significantly different after the most recent divergence. Based on this, we have 629 Eventually, to construct the table with the grape genome as a reference, all grape 635 genes were listed in the first column. Each grape gene may have two additional colinear 636 genes in its genome due to WGT event, and two other columns in the table listed this  637 information. For a grape gene, when there was a corresponding colinear gene in an 638 expected location, a gene ID was filled in a cell of the corresponding column in the table. 639 When it was missing, often due to gene loss or translocation in the genome, the cell 640 contained a dot. For the lettuce genome, with whole-genome triplication (WGT), we 641 assigned three columns. For the carrot, coriander or celery genome, each affected by two 642 paleo-polyploidization events, we assigned four columns. Therefore, the table had 48 643 columns, reflecting layers of tripled and then fourfold homology due to recursive 644 polyploidies across the genomes. 645

Reconstruction of ancestral karyotypes of Apiales plants 646
The colinearity of compared genomes could reflect the karyotype change and even to 647 uncover the trajectories of the formations of their ancestors. Based on the homologous 648 dot-plots, we selected the four compared genomes presented in the phylogenetic locations 649 and deduced their ancestral chromosomes at the important evolutionary periods, eg. 650 before the divergent nodes and the periods before or after different polyploidizations. 651 With the potential existent theory showed in the dotplots of two compared genomes, the 652 extant chromosomes came from the interaction of ancestral chromosomes, which usually 653 include the following cases, the "crossover" appeared in the arms of two interacted 654 chromosomes, the "end to end joint" appeared in the end of chromosomes' arms, also 655 "nested chromosome fusion" showed in one chromosome inserted into another one 656 completely. Most extant chromosome suffered more than one kind of interaction within 657 their evolutionary history, especially after once or more rounds of polyploidizations. 658

Gene colinearity within and among genomes 660
Homologous colinearity of existing genomes is an important clue to reveal the evolution 661 of complex genomes. Using ColinearScan (Wang et al., 2006), we inferred colinear genes 662 within and between celery and other reference genomes, which provides a function for 663 evaluating the statistical significance of blocks of colinear genes (Supplementary Table  664 25). For the blocks with four or more colinear genes, we found 22,433 duplicated genes 665 pairs in celery. For the colinear regions containing more than 10 gene pairs, celery (9,834 666 pairs reside in 394 blocks) has larger number than grape, which has 7,275 pairs residing 667 in 286 blocks (Supplementary Table 25). 668 In addition, we indicated that the colinearity between genomes is much better than 669 within each genome (Supplementary Table 25 Asteraceae-common WGT event was between the two paleo-polyploidizations events of 706 Apiaceae. In addition, the celery-coriander and celery-carrot splits were inferred to have 707 occurred 11-13 Mya, 20-22 Mya, respectively (Fig. 2). The estimated time was 708 consistent with estimation by MCMCtree in PAML software (Supplementary Fig. 16). 709 The Apiaceae species split from lettuce at 82-93 Mya ( Fig. 2; Supplementary Fig. 15). 710

Multiple alignment 711
With the grape genome as a reference, we produced a table to store inter-and 712 intra-genomic homology information (Supplementary Tables 26-30). First, we filled in all 713 grape gene IDs in the first column of the table, then added gene IDs from celery and other 714 genome column by column, species by species according to the colinearity inferred by 715 above alignments. As noted above, if no gene lost, a grape gene would have 3 716 orthologous genes in lettuce, and 4 in each of an Apiaceae plant (celery, coriander, and 717 carrot) genome. When a species contained a gene showing colinearity with a grape gene, 718 a gene ID was filled into an appropriate cell in the table. When a species did not have an 719 expected colinear gene, often due to gene loss, translocation or insufficient assembly, a 720 dot (signifying missing) was filled into the appropriate cell. For grape, lettuce, carrot, 721 coriander, and celery there were allocated 16 (1+3+4x3) columns in the table. Moreover, 722 due to their shared the WGT (γ), each chromosomal segment would repeat three times in 723 each genome. Based on homology inferred in grape, we therefore extended the table to 48 724 columns ( Supplementary Fig. 13). Eventually, we constructed a table of celery and other 725 plant genes reflecting three polyploidizations and all salient speciation. In summary, the 726 table summarized results of multiple-genome and event-related alignment, reflecting 727 layers of tripled and/or doubled homology due to recursive polyploidizations. 728

Alignment analysis 750
We used the software HISAT to perform genomic positioning analysis for the 751 filtered sequences . The total mapped rates of 3 tissues were more than 752 95%, and the uniquely mapped rates were more than 90% (Supplementary Table 38 The correlation of gene expression between samples is an important indicator to test 765 the accuracy of the experiment. The closer the correlation coefficient is to 1, the higher 766 the similarity in expression patterns between samples. We required that the biological 767 repeat sample relative coefficient R 2 to be at least greater than 0.8 ( Supplementary Fig.  768   19). 769 6.2.5.5 Differentially expressed genes (DEGs) 770 The differential expression analysis was mainly divided into the following three 771 parts.   Fig. 26f; Supplementary Table 42). 819

Celery chromosomes representing the Apiaceae proto-chromosomes 820
We reconstructed the Apiaceae proto-chromosomes and their evolutionary 821 trajectories to extant chromosomes (Fig. 3). Actually, we found that the Apiaceae 822 proto-chromosomes could be represented by the celery chromosomes. 823 Using homologous gene dotplots, we characterized the correspondence between 824 genomes of Apiaceae plants and grape ( Supplementary Fig. 12). The undisturbed 825 integrity of celery chromosomes Ag1-5 and Ag8 could be evidenced by each of them 826 having complete correspondence to one of carrot chromosomes (Supplementary Fig. 12a). 827 Therefore, they could be used to represent the Apiaceae proto-chromosomes, at least with 828 the information so far. 829 The proto-integrity of the other celery chromosomes is supported by homology with 830 grape chromosomes ( Fig. 3a; Supplementary Fig. 12b). Taking celery chromosome Ag10 831 as an example, ignoring permuted correspondence due to reciprocal DNA inversions, to its 832 ~3/4 length Ag10 shared orthology with grape Vv13, at the meantime paralogous to Vv6 833 and Vv8 due to the γ WGT ( Supplementary Fig. 12b). In contrast, the same Ag10 region 834 corresponds to different regions in Dc3, Dc4, and Dc6 ( Supplementary Fig. 12a). These 835 showed that the Ag10 most likely preserved much the proto-chromosome structure, while 836 the Dc3, Dc4, and Dc6 were reconstructed chromosomes after their split. The remaining 837 part of Ag10, merged from Vv16 ( Supplementary Fig. 12b), was shared with the other 838 Apiaceae ( Supplementary Fig. 12c-e). Putting together, Ag10 could represent an Apiaceae 839 proto-chromosome. 840 Formation of carrot chromosomes. Continuingly exploiting the orthologous 841 correspondence between genomes, we managed to reconstruct the ancestral karyotypes 842 on key evolutionary nodes and evolutionary trajectories to produce extant chromosomes 843 (Fig. 3a). Firstly, starting from the 11 Apiaceae proto-chromosomes, renamed as R1-11, 844 corresponding to Ag1-9 orderly, we inferred how the carrot and coriander chromosomes 845 formed. We found that Dc7-9 preserved the integrity of proto-chromosomes, R1, R5, and 846 R8, ignoring some intra-chromosome inversions. The other five carrot chromosomes 847 were each reconstructed after its split from the other Apiaceae plants. Specifically with D3I, which was produced by a crossing-over between R2 and R4 to form D5. 861 During the end-end joining, D3 and a satellite chromosome S2 was produced. Grossly, 862 during the formation of carrot chromosomes, two putative satellite or B chromosomes 863 (S1-2), each formed mainly two telomeres, might have produced but lost, resulting in 864 chromosome number reduction. 865 Formation of coriander chromosomes. The trajectories to form coriander 866 chromosomes were showed in Fig. 3. By checking carrot and celery chromosome 867 orthology, we inferred the Apiaceae proto-chromosomes R1-10 (Ag1-10). C5 and C7 868 were completely succeeded from their ancestral chromosomes R8 and R5. With the 869 homologous gene dotplot between coriander and celery, we managed to deduce the 870 formation of the other extant 9 coriander chromosomes ( Fig. 3c; Supplementary Fig. 12). 871 R4 and R11 crossed-over to produce two intermediate of C1I and C10I. C10I then 872 crossed over with R1 to produce to C9I and C3I. C9I crossed over R6 to produce C9 and 873 C11. C3I crossed over with a mediate C3II, which was by-produced in the formation 874 process C4 by the crossover between R3 and R7, to produce C3III and C10II. C10II 875 crossed over R9 to generate C10III and C6I. C6I and C3III crossed over to produce C6 876 and an intermediate C3IV. C3IV crossed over C8I, which was generated by the 877 cross-over between R2 and R10 to produce C2, to generate C8 and C3. C10III combined 878 with C1I to form C10 and C1. 879 The Apiaceae proto-chromosomes R1-10 were compared to grape chromosomes to 880 reconstruct karyotypes before and after ω and Apiaceae-α polyploidizations (Fig. 3a). 881 Nineteen grape chromosomes could be used to reconstruct 21 proto-chromosomes of early 882 eudicot plants (A1-A7; B1-B7; C1-C7), tripled from seven pre-ECH proto-chromosomes: 883 E1-E7 (Fig. 3a). Repetitive co-occurrence of the 21 post-ECH chromosomes (represented 884 by grape chromosomes) in the celery chromosomes permitted deductions about the timing 885 of rearrangements. That is, if two or more grape chromosomes showed corresponding 886 homology four times to celery chromosomes, they most likely had merged before the ω 887 (Fig. 3d). In contrast, if two or more grape chromosomes showed corresponding homology 888 only two times in celery chromosomes, they most likely had merged after the ω but before 889 the Apiaceae-α. For example, the post-ECH chromosomes A5, A1, and A2 coincided in 890 each of Ag1, Ag5, Ag6, and Ag8, which could be explained by their fusion into a 891 proto-chromosome P1 before the ω (Fig. 3d). A segment of A5 unexpectedly appearing in 892 Ag9 but not in Ag6 as part of a P1 duplicate could be explained by accidental crossing-over 893 between the P1 duplicate and a P5 duplicate, mainly formed by A6 and the part of B5 (Fig.  894 3e,f). In contrast, A7 appeared twice in homologies with Ag5 (or R5) and Ag8 (or R8), but 895 not in Ag1 (or R1) or Ag6 (or R6), which implied that after ω as part of another 896 proto-chromosome P7, A7 fused with P1, and formed a relatively recent chromosome Q2 897 before Apiaceae-α (Fig. 3d,f). After the Apiaceae-α, Q2 duplicated to produce Q2a and 898 Q2b, with the former crossing-over with an intermediate chromosome R4I to produce R4, 899 and with the latter crossing-over with Q9b (formed by steps of fusion or crossing-over) to 900 make R3 (Ag8) and R8 (Ag8) (Fig. 3g). 901 By checking the homologous dotplot between grape and celery, we managed to 902 deduce the karyotype and proto-chromosome formation before the Apiales 903 whole-genome duplication. Actually, we inferred 8 chromosomes at node P, and found 904 14 step of changes along with their formation. The core eudicot had 21 chromosomes at 905 node H after the whole-genome triplication shared by major eudicots, originated from the 906 ancestral 7 haploid chromosomes at node E (Fig. 3e). After then, Apiales underwent a 907 polyploidization closely and its ancestral genome reorganization significantly from the 908 dotplot between grape and its extant genome (Supplementary Fig. 12e; Supplementary 909 Table 43), from which we could traced back to the details of the formation of the 910 ancestral chromosomes at different significant evolutionary nodes and we finally got their 911 trajectories of its formation of their karyotypes (Fig. 3e-g). During the trajectory from 912 node H to P, the reorganization within this period mainly included 13 times of "end to 913 end joint" signed with "EJ" and two times of crossover signed with a cross and an arrow. 914 A1 and A5 jointed from end to end and formed into the mediate chromosome P1I at step 915 one, which then jointed to A2 triplicated from E2 and got the P1 at the node P at step two. 916 Likely, C3 and C7 also jointed into one mediate and then jointed C2 got an P6VI at step 917 three, which would be used to combined another mediate chromosome and finally formed 918 P6 at node P at step four and eight. Continually, C6 and C5 interacted into two mediate 919 chromosomes (P6I and P6II) and separately attended into two breaches at step five, 920 within which the latter one then jointed with C4 and formed another mediate P6III. While 921 P6I jointed another mediate P2I originated from the crossover between B7 and A3, and 922 they then jointed into P2 at step 8. The by-produced P5I finally jointed with the former 923 P6III and got P6IV, which then joined with B6 and P6VI and finally formed its P6 at step 924 eleven. B3 successively jointed B1, C1 and B4 after three times of end to end joint, and 925 finally formed P3. The left P5 was simply formed by the joint between A6 and B5. 926 Then, during the process from node P to node Q, the ancestral genome changed from 927 8 to 10 and fairly included 6 times of end to end joint during its 8 main steps of 928 reorganizations (Fig. 3f). Likely, we deduced the trajectory from the homologous dotplot 929 between grape and celery, and the homologous blocks showed the clues to reflect their 930 shared homologous parts within Apiaceae-α or ω. After ω, the ancestral chromosome 931 doubled into "a" and "b" right after. From the trajectory from node P to Q, P1a jointed 932 P2a and P8a and formed into Q1 at node Q along with step one and step two. Likely, Q2, 933 Q9, Q10 and Q 3 all generated from two ancestral chromosomes simply end to end joint, 934 while the left ones just completely inherited from their ancestral chromosomes (Fig. 3f). 935 The trajectory from node Q with 10 chromosomes to R with 11 chromosomes was 936 exclusively depicted in Fig. 3g. We totally deduced 19 steps of reorganizations and 937 mainly included 13 times of crossover and 5 times of end to end joint signed with "EJ". 938 Followed with the former trajectory from node P to Q, the genome doubled its 939 chromosomes signed with "a" and "b" after Apiaceae-α. At step one in the trajectory, 940 Q9A and Q10a interacted with each other and formed into R2 and a mediate R4I, which 941 lately got crossover with Q2a and formed into R4 at node R and another mediate R5I. 942 R5I then jointed with Q8a and Q4a and generated R5 at node R along with step three to 943 six. Likely, the following trajectories of the formation of each chromosome were showed 944 at Fig. 3g along with the left steps, and finally formed the genomes appeared at node R. 945 Eventually, we inferred the formation from ECH chromosomes of 8 P chromosomes 946 before the ω, about 10 Q chromosomes after the diploidization following ω and before the 947