Source pools and disharmony of the world’s island floras

Island disharmony refers to the biased representation of higher taxa on islands compared to their mainland source regions and represents a central concept in island biology. Here, we develop a generalizable framework for approximating these source regions and conduct the first global assessment of island disharmony and its underlying drivers. We compiled vascular plant species lists for 178 oceanic islands and 735 mainland regions. Using mainland data only, we modelled species turnover as a function of environmental and geographic distance and predicted the proportion of shared species between each island and mainland region. We then quantified the over- or under-representation of families on individual islands (representational disharmony) by contrasting the observed number of species against a null model of random colonization from the mainland source pool, and analysed the effects of six family-level functional traits on the resulting measure. Furthermore, we aggregated the values of representational disharmony per island to characterize overall taxonomic bias of a given flora (compositional disharmony), and analysed this second measure as a function of four island biogeographical variables. Our results indicate considerable variation in representational disharmony both within and among plant families. Examples of generally over-represented families include Urticaceae, Convolvulaceae and almost all pteridophyte families. Other families such as Asteraceae and Orchidaceae were generally under-represented, with local peaks of over-representation in known radiation hotspots. pollination and a lack of dispersal specialization were most strongly associated with an insular over-representation of families, whereas other family-level traits showed minor effects. With respect to compositional disharmony, large, high-elevation islands tended to have the most disharmonic floras. Our results provide important insights into the taxon- and island-specific drivers of disharmony. The proposed framework allows over-coming the limitations of previous approaches and provides a quantitative basis for incorporating functional and phylogenetic approaches into future studies of island disharmony.


Introduction
Oceanic islands offer unique opportunities to study the assembly of plant communities. Emerging as sterile landmasses in the open sea, oceanic islands are colonized exclusively by means of long-distance dispersal (Carlquist 1966). In combination with the geographic isolation from other terrestrial ecosystems, this results in a wide range of remarkable biotic features such as high levels of endemism (Kier et al. 2009), species radiations (Givnish et al. 2009), relictual taxa (Cronk 1997), and peculiar shifts in species' ecological strategies (Carlquist 1965). Another well-known expression of the unique conditions under which oceanic island floras assemble is their disharmonic taxonomic composition, i.e. the systematic over-or under-representation of certain taxonomic groups compared to mainland source regions (Carlquist 1965(Carlquist , 1974. Prominent examples of disharmony are, e.g., the over-representation of ferns and fern allies (Kreft et al. 2010) and the scarcity of orchids (Taylor et al. 2019) on oceanic islands.
Disharmony is generally considered to be the result of selective assembly mechanisms -dispersal filtering, environmental filtering and biotic filtering (Carlquist 1974, Keddy 1992, Weiher et al. 2011, Kraft et al. 2015 -that permit only a subset of the mainland flora to successfully colonize an island. In situ diversification in some clades may further accentuate disharmony (Gillespie 2007, Weigelt et al. 2015. Consequently, disharmony relates to key concepts from island biogeography, evolutionary biology and functional ecology, providing a strong foundation for gaining a deeper understanding of assembly processes on islands. While the theoretical underpinnings of disharmony are well established (Carlquist 1974), the practical application of the concept has often been vague, anecdotal and largely non-quantitative (Midway and Hodge 2012). In fact, disharmony is frequently used as an umbrella term to describe any deviation of island assemblages from what is subjectively expected (Ono 1991, Meyer 2004, Francisco-Ortega et al. 2010, but rarely as a predictive framework for testing island biogeographical theory. Transferring the concept of disharmony into modern biogeographical research poses several challenges. First, the traditional focus of disharmony on taxonomic groups needs to be complemented with ecologically more informative classifications based on functional and phylogenetic characteristics to facilitate robust inferences about assembly mechanisms on oceanic islands. Second, the disharmony of an island needs to be viewed relative to its source regions, while excluding other regions that are unlikely sources of colonization. The identification of island-specific source regions is usually based on qualitative (Bernardello et al. 2006) or quantitative (Papadopulos et al. 2011) comparisons between a focal island and a set of predefined mainland regions, but a generalizable, scalable method for macroecological applications is still lacking. Third, in order to make the concept amenable to statistical inference, a quantitative measure of disharmony is required. Such a measure should ideally not only reflect the differences in taxon representation relative to the source pool, but also the uncertainties around these differences. Finally, identifying general statistical patterns in the geographic, taxonomic and functional distribution of disharmony requires a comprehensive floristic dataset, allowing for the analysis of a wide range of taxa under a wide range of conditions.
Here, we outline a framework that overcomes the abovementioned challenges and enables us to predict island-specific probabilistic source pools and conduct the first global-scale analysis of island disharmony. We define two aspects of disharmony; the over-or under-representation of individual taxa on an island (representational disharmony) and the overall bias of an island flora (compositional disharmony) relative to the source pool. Given the explicit focus of the classical disharmony concept sensu Carlquist (1965Carlquist ( , 1974 on higher taxonomic groups, we chose families as the focal taxonomic unit. We analyse representational disharmony as a function of six family-level functional traits reflecting dispersal capacity, pollination syndrome and life history strategy. Furthermore, we analyse compositional disharmony as a function of island area, elevation, distance to the mainland and geological origin. If dispersal filtering is the dominant driver underlying disharmony (Carlquist 1967(Carlquist , 1974, we expect strong effects of dispersal traits on our measure of representational disharmony as well as a positive relationship between compositional disharmony and distance to the mainland. Alternatively, if environmental and biotic filtering play important roles in structuring plant assemblages on islands, we expect strong effects of the corresponding life-history and pollination traits. Finally, if in situ diversification strongly affects patterns of disharmony, we expect higher values of compositional disharmony on large and isolated islands that provide various habitats and arenas for adaptive radiations (Givnish 2010).

Data collection
Data for this study were predominantly sourced from the Global Inventory of Floras and Traits (GIFT) database (Weigelt et al. 2020). GIFT is an effort to mobilize and integrate distributional data from various Floras and checklists with a wide range of information at the level of taxa (e.g. functional traits, phylogenetic relationships) and geographical units (e.g. climate, topography). A detailed account of the structure and workflows underlying GIFT, including an assessment of data coverage and a description of the procedures related to taxonomic standardization and functional trait harmonization, is available in Weigelt et al. (2020).
We extracted plant species checklists from GIFT and evaluated checklist completeness based on the reference type (e.g. multi-volume Floras being more reliable than rapid assessments), specific comments included in the reference (e.g. statements regarding sampling effort, timeframe or use of additional data sources), and general properties of the species list (e.g. plausible number of species for the given area and biome, species-to-genus ratio, presence of regionally important taxa). Checklists with considerable deficits in any of these categories were excluded. We then combined checklists referring to the same geographical unit and excluded geographical units with (1) a combined checklist that does not cover all divisions of vascular plants, (2) fewer than 30 species or (3) an area of less than 1 km 2 to reduce potential sources of bias in the dataset. We also excluded land-bridge islands and continental fragments because their floras are mostly the result of vicariance rather than colonization by long-distance dispersal (Duryea et al. 2015). The final dataset contained native vascular plant checklists for 735 mainland regions and 178 oceanic islands (Supporting information Fig. 1).
Family-level functional traits (Table 1 and Supporting information 4) were derived from the botanical literature (Kubitzki 1990-2014, Vamosi and Vamosi 2010, Hawkins et al. 2011, Hintze et al. 2013) or aggregated from species-level information available in GIFT (Weigelt et al. 2020) and the TRY database (Kattge et al. 2011). For categorical traits (woodiness, pollination syndrome, dispersal syndrome), we prioritized the botanical literature and assigned a value if all literature resources indicated the same predominant trait syndrome for a family. If information from the literature was unavailable or conflicting, we assigned a value based on the most frequent species-level trait syndrome within a family. For numerical traits (seed mass, plant height, specific leaf area), we calculated the median of all available species-level trait values per family. To assess the amount of trait variation at the family level and confirm the validity of our approach, we performed comprehensive supplementary analyses (Supporting text 1, Supporting information 3).
For the characterization of climatic conditions, we extracted mean annual temperature, mean annual precipitation, temperature seasonality and precipitation seasonality from CHELSA climate layers (Karger et al. 2017). These variables are strong predictors of large-scale patterns in plant diversity (Kreft andJetz 2007, König et al. 2017) and are widely established measures of energy and water availability. We calculated median values of each variable for all 913 geographical units and, additionally, for all cells in a global equal-area grid (6495 cells at 23 300 km 2 each) created with the 'dggridR' R-package (Barnes 2018). For islands only, we calculated area and distance to the nearest mainland based on spatial polygons of floristic units in GIFT. Island elevation was extracted from a global digital elevation model (GMTED2010, Danielson and Gesch 2011) and geological origin (volcanic, tectonic uplift, atoll) was researched based on pertinent literature.

Source pool estimation
We based our method for estimating source regions on the fact that geographical distance and environmental gradients produce distinctive and predictable patterns in species turnover (Fitzpatrick et al. 2013, König et al. 2017. Species turnover is a richness-insensitive measure of compositional similarity that quantifies the proportion of shared species between assemblages (Baselga 2010). This makes turnover a crucial concept for delineating biogeographical species pools and source regions (Carstensen et al. 2013).
In contrast to existing approaches for the reconstruction of biogeographical source regions (Graves andRahbek 2005, Papadopulos et al. 2011), our method is based on statistical predictions (rather than pairwise calculations) of species turnover. Consequently, our framework does not require floristic data for the focal area or any of the potential source regions, but only a fitted model of species turnover, which may be calibrated using readily available data, and a set of environmental predictor variables. We used generalized dissimilarity modelling (GDM, Ferrier et al. 2007) to model species turnover (β sim , Koleff et al. 2003) as a function of geographical distance and differences in mean annual temperature, mean annual precipitation, temperature seasonality and precipitation seasonality (Fig. 1a). We fitted the model using species checklists of mainland regions only (deviance explained = 80.5%), because island floras exhibit strong imprints of ecological filtering, which would mask the very effects we aim to quantify in this study. The calibrated model was then used to predict species turnover between each island and all 6495 mainland grid cells based on geographical and environmental information only (Fig. 1b). Based on the model predictions, we calculated the expected proportion of shared species between each island i and mainland region j as the area-weighted mean of grid cell values where β i,k is the predicted species turnover between island i and grid cell k, and A k is the area of grid cell k intersecting with mainland region j (Fig. 1c). Mainland regions with very low values, i.e. highly improbable sources of colonization for a given island, were excluded from further calculations (see Supporting text 1 for details). For each island, we then normalized the predicted values by their total sum to obtain probabilities integrating to 1 These normalized probabilities can be interpreted as the likelihood of mainland region j being the source of colonization for a random species on island i, and they constitute the probabilistic source pool of island i. The expected proportion of a given taxon t on island i can then be calculated as where s i,j is the source region probability of mainland region j and p t,j the relative proportion of t in j.
To assess the performance of our source region predictions, we compared the results to empirical source region Figure 1. Proposed workflow for constructing island-specific probabilistic source pools and assessing representational and compositional disharmony. Probabilistic source pools were based on predictions of species turnover derived from a generalized dissimilarity model (Ferrier et al. 2007), fitted with geographical and climatic characteristics of 735 mainland floras worldwide (a-c). We then calculated representational disharmony (d-e) for each family on each island as the probability (D t,i ) of observing the recorded number of species (n t,i ) given a binomial null distribution N t,i parametrized with the total number of species on the islands (number of draws, n i ) and the relative proportion of the focal family in the island-specific source pool (probability of success, p t,i ). Compositional disharmony (e) was aggregated from all values of representational disharmony per island, as the median absolute deviation from the null expectation (D t,i = 0.5). Representational and compositional disharmony were then analysed as a function of functional and biogeographical variables, respectively (f ). reconstructions from the literature. We found six studies that derive quantitative estimates of island source regions from floristic or phylogenetic relationships (Strahm 1993, Schaefer 2002, Papadopulos et al. 2011, Igea et al. 2015, Carvajal-Endara et al. 2017, Price and Wagner 2018. These studies differed widely in their methodologies and geographical precision, which precluded quantitative comparisons with our estimates. Instead, we performed qualitative comparisons between our predicted source pools and the empirical reconstructions from the literature.

Quantifying representational and compositional disharmony
Based on the island-specific probabilistic source pools, we modelled the number of species per family on an island as the outcome of a binomial process where n i (the number of draws) is the species richness of island i and p t,i (the probability of success) is the relative proportion of t in the probabilistic source pool of i. Thus, N t,i is a null distribution reflecting the expected number of species in taxon t on island i if there were no differences in colonization success among taxa (Fig. 1d).
To assess representational disharmony, i.e. whether a given taxon is over-or under-represented relative to the source pool, we contrasted the observed number of species with the statistical null distribution: Our measure of representational disharmony D is a modification of the cumulative probability of N t,i at n t,i (see also Fig. 1), i.e. the probability of observing less than n t,i species of taxon t on island i. We divided the probability mass at n t,i by two because, due to the discrete nature of the binomial distribution, D would be increasingly biased towards 1 as p t,i gets smaller. Moreover, we excluded instances where the expected species richness of a taxon on an island, E(N t,i ), was lower than 1 to avoid an inflation of our measure by families that are neither expected (E(N t,i ) ≈ 0) nor present (n t,i = 0) on an island. Consequently, many rare plant families dropped out of the analyses because they did not reach this threshold on any island (261 out of 474 families). While this seems like a drawback from an ecological point of view, it reflects the uncertainty associated with small sample sizes. Put simply, it is statistically not decidable whether the absence of a family on an island deviates from the null expectation when the null expectation is close to zero. The proposed disharmony metric is similar to a p-value in frequentist hypothesis testing and should be understood more as a measure of certainty rather than effect size. A value of D = 0.5 indicates a harmonic representation of a taxon relative to the source pool, whereas higher and lower values respectively indicate an over-and under-representation with increasing certainty. A supplementary analysis of the metric's sensitivity to variation in sample size is provided in Supporting text 3.
Compositional disharmony, i.e. the disharmony of an island flora as a whole, was calculated as the median absolute deviation from the null expectation (D t,i = 0.5) across all families (Fig. 1e). Thus, the measure of compositional disharmony ranges from 0 to 0.5, where 0 means a perfectly proportional (harmonic) and 0.5 an extremely biased (disharmonic) representation of taxa relative to the source pool.

Statistical analysis
For the analysis of representational disharmony, we log10transformed and standardized numerical trait variables (seed mass, plant height and specific leaf area) and fitted a beta regression model with representational disharmony as the response and the six family-level functional traits as predictors. Families with missing data in either of these traits (which included all pteridophyte families due to the inapplicability of seed mass) were dropped during model fitting. We considered only additive effects, i.e. we did not investigate interactions among predictors, to maintain a direct relationship between the results and our hypotheses. To model the drivers of compositional disharmony, we log-10 transformed island area and standardized island area, distance to the nearest mainland and elevation. We then fitted a linear model to evaluate the effects of these three variables and the geological origin of the islands on compositional disharmony.
Models were fitted within a Bayesian framework using the 'brms' R-package (Bürkner 2017). We evaluated model convergence based on parameter trace plots and quantified model fit using a Bayesian version of the R 2 metric, which can be interpreted as the proportion of variance explained for new data (Gelman et al. 2019). The effect size of predictor variables was assessed based on standardized regression coefficients. All analyses were performed in the R statistical programming language, ver. 3.6.2 (R Core Team).

Source pool estimates
Our source pool estimates showed a strong agreement with empirical source region reconstructions (Fig. 2). Accordingly, most island floras are derived from a limited set of nearby and climatically similar mainland regions. The estimated source regions for La Réunion, for instance, are concentrated in Madagascar and East Africa, which corresponds closely to the account given by Strahm (1993) (Fig. 2b). Similarly, the most likely source regions for the flora of Lord Howe or Cocos Island are restricted to a few regions in Australasia and the Neotropics, respectively (Fig. 2d-e). With increasing isolation from the mainland, however, the distribution of island source regions becomes more diffuse in both the statistical and empirical reconstructions (Fig. 2a, c, f ). For example, a wide, circum-Pacific distribution of source regions for the Hawaiian flora emerges in our model predictions as well as in the phylogenetic reconstruction of Price and Wagner (2018) (Fig. 2a). While the accuracy of our estimates seems to decrease slightly for very isolated islands (Fig. 2f ), the overall congruency of statistically estimated source pools with empirical reconstructions demonstrates the robustness of our method.

Island disharmony
We estimated representational disharmony for a total of 7048 instances of 213 families on 178 islands (see Supporting information 2 for global maps for each family). We found pteridophyte families to be strongly and consistently overrepresented on islands, whereas the representation of angiosperms and gymnosperms was more heterogeneous both within and among families (Fig. 3a). For example, Asteraceae were generally under-represented on islands relative to their source pools, but nonetheless present on about 92% of the investigated islands (Fig. 3a). Notable exceptions in Asteraceae disharmony occurred on the Canary Islands and Cabo Verde Islands, where the family was strongly over-represented (Fig. 3a). Orchidaceae (present on 64% of islands) exhibited a similar pattern of general under-representation with local peaks of over-representation, e.g. on the Mascarenes or in the Gulf of Guinea (Fig. 3a). Other large angiosperm families that tended to be globally under-represented on islands include, e.g. Fabaceae, Rosaceae, Bignoniaceae and Araceae. In contrast, numerous large families showed a consistent over-representation on islands, including Urticaceae, Convolvulaceae, Cyperaceae and Primulaceae (Fig. 3a). The representational disharmony of most families, however, varied significantly among islands, sometimes exhibiting striking geographical patterns (e.g. Amaranthaceae in Fig. 3a). Gymnosperms were largely absent from oceanic islands, yet often too rare in the source pool to draw statistically reliable conclusions about their over-or under-representation (e.g. Pinaceae in Fig. 3a).
Compositional disharmony, i.e. the median deviation from the null expectation across families on an island, ranged from 0.18 on Rodrigues to 0.49 on New Caledonia (Fig. 3b), with most islands obtaining relatively disharmonic values around 0.35. The geographical distribution of compositional disharmony, however, did not exhibit any obvious patterns associated with geographical gradients.
We found clear statistical relationships between several functional traits and representational disharmony (Fig. 3a). In the multi-predictor model, which did not include the highly over-represented pteridophyte group, two variables emerged as particularly important. First, plant families with a predominantly biotic pollination syndrome were under-represented on islands relative to families with predominantly abiotic pollination. Second, and more unexpectedly, families without specialized dispersal syndrome were over-represented relative to other families, especially to families with predominantly hydrochorous and anemochorous dispersal syndromes. Minor effects on representational disharmony were found for woodiness (positive), seed mass (negative), specific leaf area (positive) and plant height (positive). Owing to the large within-family variation in representational disharmony across islands (Fig. 3a, Supporting information Fig. 2), the full model explained only a small fraction of the total variation in the data (Bayesian R 2 = 0.022 ± 0.033).
Looking at the level of entire assemblages, island area and elevation were positively associated with compositional disharmony whereas distance to the nearest mainland did not 8 show a clear effect (Fig. 4b). Geological origin did not show a clear effect either, although islands formed by tectonic uplift seemed to have elevated values. The amount of variance in compositional disharmony that could be explained with the four considered variables was moderate (Bayesian R 2 = 0.168 ± 0.044).

Discussion
This study introduced a general framework for approximating island-specific probabilistic species pools and contrasting the taxonomic composition of island floras against statistical null distributions derived from these species pools. We applied this framework to conduct the first global quantitative assessment of island disharmony and showed that the representation of families on islands is related to familyspecific functional traits. Moreover, our results at the level of entire island assemblages suggest that geographical island characteristics and in situ diversification are important codrivers of global patterns in disharmony.

Taxon-and island-specific drivers of disharmony
Dispersal filtering is commonly regarded as the predominant process in the assembly of island biotas, and therefore as the main driver of disharmony (Carlquist 1966, 1967, Howe and Smallwood 1982. Our results suggest that this explanation is not sufficient to understand the differential representation of taxa on islands. While we did find an under-representation of poor dispersers and over-representation of good dispersers in some prominent cases (e.g. gymnosperms and pteridophytes), the effects of dispersal traits on our measure of representational disharmony were generally inconclusive (Fig. 4a). The tendency of large-seeded families and, especially, of families without specialized dispersal syndrome to be over-represented relative to other families suggests that classical dispersal traits are either less important for colonization success than previously thought, or imprecise proxies of dispersal capacity, particularly at the family level. Indeed, while a negative relationship between dispersal distance and seed mass is evident at small spatial scales up to a few kilometres (Tackenberg et al. 2003), the high stochasticity associated with long-distance dispersal and the many non-standard ways how propagules can arrive on an island may reduce the strength of this relationship (Higgins et al. 2003, Nathan 2006, Nogales et al. 2012. Also the significance of dispersal syndromes for island colonization is under increasing scrutiny (Heleno andVargas 2015, Carvajal-Endara et al. 2017). Finally, considerable interactions among these traits are to be expected, e.g. some relatively large-seeded species being capable of effective long-distance dispersal by birds or seawater. Such interactions might also explain the slightly negative effect of seed mass on representational disharmony (Fig. 4a), i.e. the tendency of large-seeded families to be slightly overrepresented on islands.
According to our results, the importance of biotic filtering equals, if not supersedes that of dispersal filtering in driving global patterns of representational disharmony. Specifically, we found a strong over-representation of families with abiotic pollination syndromes. Indeed, pollination is increasingly recognized as a critical factor for the colonization of islands (Olesen et al. 2010, Alsos et al. 2015, Grossenbacher et al. 2017, Razanajatovo et al. 2018. Given the general scarcity of animal pollinators on islands, abiotic pollination syndromes and the ability to self-pollinate should provide an advantage over biotic pollination or strict outcrossing (Baker 1955). It is noteworthy that pteridophytes, whose over-representation on islands is often attributed to their long-distance dispersal, are also generally independent of biotic pollinator agents and often capable of selfing (Mehltreter et al. 2010, Groot et al. 2012). An in-depth analysis of how biotic interactions may impact the distribution of taxa on islands has recently been given by Taylor et al. (2019), who argue that the global under-representation of Orchidaceae on islands is possibly due to pollinator limitations and the absence of appropriate strains of mycorrhizal fungi on many oceanic islands.
Traits related to resource acquisition (SLA) and life history (woodiness, plant height) showed overall weak effects. These traits are most likely affected by climatic filtering and interspecific competition, both of which show attenuated levels on islands compared to the mainland (Gillespie andClague 2009, Weigelt et al. 2013). Consequently, oceanic islands should impose little directional filtering upon these traits and accommodate a wide range of ecological strategies.
While filtering is a subtractive process that reduces the pool of potential colonizers, in situ diversification is an additive process that increases the representation of a taxon subsequent to colonization. Indeed, the exceptional over-representation of Orchidaceae on the Mascarenes, Campanulaceae on Hawaii and Asteraceae on the Canary Islands seems to reflect the signal of diversification. These families underwent striking radiations on the respective archipelagos (Micheneau et al. 2008, Givnish et al. 2009, Juan et al. 2000. We also found positive effects of island area and elevation on our measure of compositional disharmony, which provides further evidence that in situ diversification tends to produce more disharmonic floras on large, environmentally heterogeneous archipelagos, where speciation is expected to occur more often.

Turnover-based source pool estimation
Existing methods for reconstructing biogeographical source regions are typically based on taxonomic or phylogenetic relationships between the focal region and a set of potential source regions (Schaefer 2002, Papadopulos et al. 2011, Price and Wagner 2018. The broad geographical scope of the floristic literature underlying such comparative analyses only allows for the delineation of relatively coarse source regions such as continents, biogeographical regions or countries. However, a more fine-grained understanding of potential source regions is often needed. More advanced methods therefore derive the compositional structure of a given location by means of so-called assemblage-dispersion fields, which are stacked geographical distributions of all species occurring in a focal region (Graves andRahbek 2005, Carstensen et al. 2013). Such approaches follow the 'predict first, assemble later' strategy outlined by Ferrier and Guisan (2006), where the model quantity relates to the species (e.g. its distribution or environmental niche) while assemblage-level metrics (e.g. species turnover) are derived later from the set of species-level predictions. This modelling strategy is not suitable in our case. On the one hand, it requires large amounts of high-quality data on the distribution of all examined species. This is still beyond reach for many regions and taxa (Hortal et al. 2015, Cornwell et al. 2019) and certainly impractical for global analyses involving tens of thousands of species. On the other hand, there is no straightforward way of dealing with the high rates of endemism encountered on many oceanic islands (Kier et al. 2009) because, by definition, endemic species are absent from all potential source regions.
In contrast, the approach outlined here follows an 'assemble first, predict later' strategy (Ferrier and Guisan 2006), as we first calculate an assemblage-level metric (species turnover) and then model it directly as a function environmental predictors. While the quality of predictions still depends on how well the model and the input data capture general trends underlying species turnover, comparative studies have shown that assemblage-level approaches consistently outperform species-level approaches when predicting alpha and beta diversity (Zhang et al. 2019). Moreover, predicting source regions from a model of species turnover does not require floristic data for all of the investigated geographical regions, which makes it less data-intensive than species-level approaches (Graves and Rahbek 2005) while offering much finer spatial resolutions than purely checklist-based methods (Papadopulos et al. 2011). The high congruency between our source pool predictions and empirical reconstructions (Fig. 2) confirms the general utility of our approach. However, it must be noted that these predictions are a statistical abstraction of the source pool that does not necessarily match the biogeographical affinities of any actual flora. Our method should therefore not be understood as a replacement of detailed empirical reconstructions based on floristic or phylogenetic relationships (Price and Wagner 2018), but rather as a robust and scalable approximation for macroecological applications.

Methodological and conceptual limitations
The accuracy of source pool estimates is subject to various sources of uncertainty. The assembly of an island flora takes place over millions of years, during which climate, habitat distribution, position, size and shape of both islands (Whittaker et al. 2008, Weigelt et al. 2016) and source regions (Galley andLinder 2006, Pokorny et al. 2015) may change considerably. Another problem is that the effective isolation of an island is difficult to quantify and depends not only on the distance to the mainland, but also on the availability of stepping stones and the direction of predominant sea and wind currents (Cook andCrisp 2005, Weigelt andKreft 2013). Even different habitats or elevational zones within an island may recruit from distinct source regions on the mainland, and thus vary in their degree of isolation (Steinbauer et al. 2012). We expect such effects of environmental heterogeneity on the accuracy of source region estimates to be most pronounced on large, high-elevation islands, which may partly explain their generally more disharmonic floras (Fig. 4b). More highly-resolved floristic input data and additional predictor variables may help to better represent such variation and increase the resolution at which source pools can be meaningfully predicted.
We acknowledge that inaccurate source pool predictions potentially compromise our measure of disharmony. However, not only the validation of predicted source pools (Fig. 2), but also our results attest the reliability of the proposed framework. For example, the known over-representation of pteridophytes (Kreft et al. 2010) and underrepresentation of orchids (Taylor et al. 2019) on islands was very clearly reflected in our results. Also, the expected overrepresentation resulting from in situ diversification could be detected (Supporting information 2). The problem of asymmetric sensitivity for detecting over-and under-representation was largely solved by excluding families with a very small number of expected species on a given island.
If our measure of representational disharmony is a reliable approximation of the true over-or under-representation of taxa, why do functional traits explain only little of its variation? First, we quantified the disharmony of families at the level of individual islands (Fig. 3a), which introduced withinfamily variation in representational disharmony that could not be explained by global family-level traits. We think that this variation is an important and interesting aspect of disharmony, and future work could explore ways to mobilize additional trait data to reflect geographic variation in family-level traits. Second, we characterized numerous families based on a relatively small number of species-level records (Table 1 and Supporting information 3). Missing data is a common problem in trait-based ecology (Penone et al. 2014, Cornwell et al. 2019) and a major source of uncertainty in macroecological studies (Hortal et al. 2015). We reduced the problem of missing species-level data by incorporating traits from botanical descriptions of plant families, but that was not feasible in all cases. The functional characterization of some families consequently was incomplete and potentially inaccurate. Third, and most importantly, our analyses demonstrate that the explicit taxonomic focus of the disharmony concept sensu Carlquist (1965Carlquist ( , 1974 is in itself fundamentally limited. The degree to which taxa are consistently overor under-represented on islands depends on their uniformity in terms of colonization success and, thus, in terms of dispersal abilities, environmental tolerances and degree of biotic specialization. Our supplementary analysis of within-family trait variation shows that these parameters vary considerably in some families, while being highly conserved in others (Supporting text 1, Supporting information 3). Thus, taxonomy is an unreliable framework for understanding the traitmediated processes underlying island disharmony, and it is not surprising that examples of disharmonic elements in the scientific literature range from small genera (e.g. Metrosideros in Carlquist 1966) to major taxonomic groups (e.g. pteridophytes in Braithwaite 1975). While our principal aim was to test the classic, taxon-focussed disharmony concept sensu Carlquist (1965Carlquist ( , 1974 in a macroecological framework, we argue that future studies should move away from using taxonomic groups as proxies of colonization success.

Towards a more differentiated picture of island disharmony
Our approach can be easily adapted to other research questions and facets of biodiversity. For example, contrasting the distribution of functional traits rather than taxonomic groups against a statistical null expectation derived from the source pool would help evaluating the relative importance of ecological filters during island colonization more directly. Moreover, such assessments of functional disharmony could be used to better understand the global prevalence of prominent island syndromes such as insular woodiness or the loss of dispersibility (Burns 2019). Sampling approaches would also be easy to implement, e.g. to contrast the phylogenetic structure of island floras with random samples from the potential source pool (Weigelt et al. 2015). In all these cases, the specification of island source regions is key to unbiased comparisons, because most variables of interest change drastically along biogeographical gradients, e.g. pollinator-specificity (Ollerton and Cranmer 2002), seed mass and growth form (König et al. 2019), wood density (Swenson and Enquist 2007) or plant height (Moles et al. 2009). Comparative studies in island biogeography are therefore often very specific, by focusing on a single archipelago with known source regions , Carvajal-Endara et al. 2017, or very general, by calculating global averages across many island and mainland assemblages (Grossenbacher et al. 2017, Taylor et al. 2019. The framework we presented here is applicable in global-scale analyses while considering the unique biogeographical setting of individual islands, and could therefore facilitate important additional insights into the assembly of island biotas.
In conclusion, we demonstrated how representational and compositional disharmony of island floras can be studied within a macroecological framework. While our results provide important insights into the taxon-and island-specific drivers of disharmony, they also highlight the limitations of taxonomic groups to capture the complexity of ecological processes mediated by functional traits. However, the proposed framework can be adapted in various ways, e.g. for quantifying the over-or under-representation of functional rather than taxonomic groups. This may provide a crucial step towards a more quantitative understanding of assembly mechanisms on oceanic islands and other insular systems.