The symmetry spectrum in a hybridising, tropical group of rhododendrons

Summary Many diverse plant clades possess bilaterally symmetrical flowers and specialised pollination syndromes, suggesting that these traits may promote diversification. We examined the evolution of diverse floral morphologies in a species‐rich tropical radiation of Rhododendron. We used restriction‐site associated DNA sequencing on 114 taxa from Rhododendron sect. Schistanthe to reconstruct phylogenetic relationships and examine hybridisation. We then captured and quantified floral variation using geometric morphometric analyses, which we interpreted in a phylogenetic context. We uncovered phylogenetic conflict and uncertainty caused by introgression within and between clades. Morphometric analyses revealed flower symmetry to be a morphological continuum without clear transitions between radial and bilateral symmetry. Tropical Rhododendron species that began diversifying into New Guinea c. 6 million years ago expanded into novel floral morphological space. Our results showed that the evolution of tropical Rhododendron is characterised by recent speciation, recurrent hybridisation and the origin of floral novelty. Floral variation evolved via changes to multiple components of the corolla that are only recognised in geometric morphometrics with both front and side views of flowers.


Fig. S10 Heatmap of Dmin-statistics in Rhododendron sect. Schistanthe.
D-statistics were calculated for all possible trios without assuming any knowledge of relationships; then, the lowest D-statistic for each trio was used as a conservative estimate. P2 and P3 taxa are arranged along the vertical and horizontal axes. Clade designations and numbers are indicated along taxon names: Pse = Pseudovireya, Dis = Discovireya, Mal = Malayovireya, EuA = Euvireya A, EuB = Euvireya B, EuC = Euvireya C, and EuD = Euvireya D. The most significant D-statistic found for two species across all possible P1 taxa is indicated by heatmap cells. Red cells indicate higher D-statistics (presence of introgression), blue cells indicate lower D-statistics (absence of introgression), increasing color saturation indicates greater significance based on P-values, and white cells indicate lack of information.       Methods S1 Library preparation Samples with insufficient initial DNA concentrations were re-extracted as follows. We increased incubation of the sample in Qiagen P3 buffer to 30 min and omitted binding of the DNA to the DNeasy Mini spin column. Instead, we precipitated the cleared lysate from the Qia-shredder column using isopropanol, followed by a second precipitation using ethanol and sodium acetate. We then estimated DNA quantity using a Qubit ® 2.0 Fluorometer (Invitrogen ™ , Life Technologies ™ , Carlsbad, CA, USA).
Low concentration DNAs were digested in multiple reactions, then combined and concentrated using Agencourt ® AMPure XP (Beckman Coulter, Indianapolis, IN, USA). P1 adapter-ligated DNA was pooled and sheared using a Bioruptor ® Pico (Diagenode Inc, Denville, NJ, USA) according to the manufacturer's protocol. Sheared DNA was size selected for 300-to 500-bp fragments by electrophoresis on 1% 1X TAE agarose. After RAD tag enrichment, DNA was size selected again for 300-to 500-bp fragments and verified by qPCR using the KAPA Universal Library Quantification Kit (Roche Sequencing, Pleasanton, CA, USA).

Methods S2 Data processing
We used a number of bioinformatic tools to further process the restriction-site associated DNA (RAD) data. To determine the optimal clustering threshold to use in ipyrad, we followed the method of Ilut et al. (2014). SEED (Bao et al., 2011) was used to remove redundant reads from each sample's data. Custom Python scripts removed sequences with less than four reads. SlideSort v2 (Shimizu & Tsuda, 2010) found matching pairs of reads at distance thresholds from one to 20 under default settings. Transitive clusters were then generated using scripts kindly provided by D. Ilut (Cornell University, Ithaca, NY, USA) and visualized in R v4.0.3 (R Core Team, 2020) using ggplot2 (Wickham, 2016). The optimal clustering threshold was determined based on maximization of clusters with two haplotypes and minimization of clusters with one haplotype (Supporting Information Fig. S1).
Next, we filtered reads again using ipyrad with level two strictness for filtering adapters. We then conducted a reference-based assembly of loci within ipyrad under default settings, a clustering threshold of 0.91, as determined above, and a minimum depth of ten (Ilut et al., 2014) for statistical base calling and majority-rule base calling. To determine the optimum values for maximum number of indels, maximum number of SNPs, and maximum shared heterozygosity allowed per locus, we followed the method of Nieto-Montes de Oca et al. (2017) and visualized our results in R v4.0.3. Based on these results (Supporting Information Fig. S2ac), we allowed a maximum of eight indels, 34 SNPs, and 0.3 shared heterozygosity per locus to obtain alignments for phylogenetic inference.

Methods S3 Phylogenetic analyses
First, we performed a maximum likelihood (ML) topology search in RAxML v8.2.11 (Stamatakis, 2014) using the concatenated alignment of all loci from ipyrad for each of the four datasets (Supporting Information Table S2). We specified Kalmia and Kalmiopsis as outgroups and ran a rapid bootstrap analysis and search for the best-scoring ML tree under the GTRCAT model, using the extended majority-rule consensus tree bootstopping criterion (Pattengale et al., 2009).
Second, we reconstructed topologies for each dataset using the multispecies coalescent model in SVDQuartets (Chifman & Kubatko, 2014, 2015 on a concatenated alignment of one SNP sampled per locus from ipyrad (Supporting Information Table S2). For these analyses, we only included close outgroups from subg. Rhododendron. We conducted analyses in PAUP* v4.0a159 or 4.0a161 (Swofford, 2003) with 100 standard bootstrap replicates, evaluating all possible quartets, building trees using the Quartet FM (Reaz et al., 2014) quartet assembly, using the multispecies coalescent tree model, and distributing ambiguities. We then summarized topologies using Consense v3.6.6 (Felsenstein, 2005) on the CIPRES Science Gateway v3.3 (Miller et al., 2010) to compute a majority rule extended consensus tree.
To assess our two phylogenetic reconstruction methods on each of the four datasets, we used both bootstrap (bs) support and Quartet Sampling v1.3.1 (Pease et al., 2018). First, we calculated the percent of bootstrap values for each topology that had support greater than or equal to 70% to find the highest supported topology from each reconstruction method. We then used Quartet Sampling on each of the four topologies resulting from the four datasets for each reconstruction method. We used the concatenated alignment of loci for each dataset with its respective topologies to obtain enough information for Quartet Informativeness (QI). We conducted Quartet Sampling with 500 replicates under default settings using RAxML-NG v0.9.0 (Kozlov et al., 2019) as the engine. We visualized bootstrap and Quartet Sampling scores on topologies in R v4.0.3 using treeio v1.12.0 (Wang et al., 2020) and ggtree v2.2.4 (Yu, 2020).

Methods S4 Introgression analyses
Reconstructed species relationships within clades varied among datasets and among phylogenetic reconstruction methods. Therefore, we assessed whether admixture could be a factor for incongruences using the D-statistic (ABBA-BABA test) (Green et al., 2010;Durand et al., 2011) as implemented in Dsuite v0.3r21 (Malinsky et al., 2020). We used Dtrios on all SNPs resulting from ipyrad to calculate the D-statistic using three methods under default settings: (1) computing the D-statistic for all possible trios of species, without assuming any a priori knowledge of relationships, then using the lowest D-statistic (Dmin) for each trio as a conservative estimate, (2) inferring species relationships using the frequency of BBAA patterns, then calculating discordant site patterns with the D-statistics (DBBAA), and (3) using the topology from our reconstruction methods, with the most support for clades and their relationships, as a guide for relationships in computing the D-statistic (Dtree) (Malinsky et al., 2018(Malinsky et al., , 2020.

Methods S5 Molecular dating
We used fossil calibration points for Erica, Kalmia, and Rhododendron and a maximum age for the root: 5.33, 15.97, 56, and 89.8 mya, respectively (Collinson & Crane, 1978;Van Der Burgh, 1987;Nixon & Crepet, 1993;Mai, 2001). We first conducted a priming analysis to determine the best optimization parameters, followed by a random subsample and replicate cross validation to determine the optimum smoothing value. We then conducted a thorough analysis in treePL on the bootstrap replicates generated above under default settings, but with a smoothing value of 0.1, gradient based optimizer of 2, auto-differentiation based optimizer of 2, and autodifferentiation cross validation based optimizer of 1. We summarized the dated bootstrap replicates with TreeAnnotator v2.6.0 (Bouckaert et al., 2019) to obtain confidence intervals. However, the confidence intervals were very narrow, so we just report the mean ages in the chronogram.

Methods S6 Morphometric analyses
For the front of the corolla, most outlines were generated by reflecting the half of the corolla that was in best shape or easiest to extract. For the side of the corolla, we attempted outlines either closed (i.e., with a line connecting petal lobes to form an unbroken outline) or open at the mouth. After comparing both, we determined that open outlines preserved the whole side better for some species with reflexed petals. For grouping taxa in morphospaces by corolla color, color was assigned to taxa based on the three categories used by Stevens (1976) to describe floral types in sect. Schistanthe: red, yellow, and white. In cases where a taxon was polymorphic in color, we used the corolla color represented in our sampled taxon.