Biogeographical network analysis of plant species distribution in the Mediterranean region

Abstract The delimitation of bioregions helps to understand historical and ecological drivers of species distribution. In this work, we performed a network analysis of the spatial distribution patterns of plants in south of France (Languedoc‐Roussillon and Provence‐Alpes‐Côte d'Azur) to analyze the biogeographical structure of the French Mediterranean flora at different scales. We used a network approach to identify and characterize biogeographical regions, based on a large database containing 2.5 million of geolocalized plant records corresponding to more than 3,500 plant species. This methodology is performed following five steps, from the biogeographical bipartite network construction to the identification of biogeographical regions under the form of spatial network communities, the analysis of their interactions, and the identification of clusters of plant species based on the species contribution to the biogeographical regions. First, we identified two sub‐networks that distinguish Mediterranean and temperate biota. Then, we separated eight statistically significant bioregions that present a complex spatial structure. Some of them are spatially well delimited and match with particular geological entities. On the other hand, fuzzy transitions arise between adjacent bioregions that share a common geological setting, but are spread along a climatic gradient. The proposed network approach illustrates the biogeographical structure of the flora in southern France and provides precise insights into the relationships between bioregions. This approach sheds light on ecological drivers shaping the distribution of Mediterranean biota: The interplay between a climatic gradient and geological substrate shapes biodiversity patterns. Finally, this work exemplifies why fragmented distributions are common in the Mediterranean region, isolating groups of species that share a similar eco‐evolutionary history.


INTRODUCTION
The delimitation of biogeographical regions or bioregions based on the analysis of their biota has been a founding theme in biogeography, from the pioneer work of Wallace (1876), Murray (1866) or Wahlenberg (1812) to the most recent advances of Cheruvelil et al. (2017); Ficetola et al. (2017).Describing spatial patterns of biodiversity has appeared fundamental to understand the historical diversification of biota, and gain a better understanding of ecological factors that imprint spatial patterns of biodiversity (Graham & Hijmans, 2006;Ricklefs, 2004).Additionally, it has become a key element in the identification of spatial conservation strategies (Funk et al., 2002;Mikolajczak et al., 2015;Rushton et al., 2004).To divide a given territory into meaningful and coherent bioregions, the overall aim is to minimize the heterogeneity in taxonomic composition within regions, while maximizing differences between them (Kreft & Jetz, 2010;Stoddart, 1992).Although such delineation of bioregions has been based for a long time on expert knowledge of qualitative data collection the increasing availability of species-level distribution data and recent techno-logical advances have allowed for the development of more rigorous frameworks (Kreft & Jetz, 2010).Multivariate methods, such as hierarchical clustering algorithms, have thus been successfully applied in a wide range of studies focused on a variety of organisms, under very different spatial scale (from regional to worldwide perspective).Yet, the production of detailed cartographic outputs portraying the differentiation of vegetation into distinct homogeneous bioregions remains difficult, especially where spatial heterogeneity of assemblages is associated with complex environmental gradients (Mikolajczak et al., 2015).Besides, the identification of meaningful and coherent bioregions represents only one step of the biogeographical regionalizations (Morrone, 2018).It is also crucial to propose new metrics to quantify the relationship between bioregions and to analyze species and spatial relationships.Some regions of the world oppose inherent difficulties due to their highly diversified biota, reflecting complex eco-evolutionary processes.The Mediterranean basin is one of the largest and most important biodiversity hotspots in the world (Blondel et al., 2010;Myers et al., 2000).This region hosts about 25,000 plant species representing 10% of the world's total floristic richness concentrated on only 1% of the world's surface (Greuter, 1991).Additionally, a high level of narrow endemism is a major feature of this biome (Thompson, 2005).Endemism and richness re-arXiv:1803.05275v4[q-bio.PE] 8 Feb 2024 sult in a very heterogeneous region, whose comprehension of spatial patterns of plant distribution is clue to get better insights into past and actual processes shaping biodiversity (Quézel, 1999).The onset of the Mediterranean climate during the Pliocene and the diverse glacial periods of the Pleistocene (Quézel & Médail, 2004) have shaped the most important phases of plant evolution since the Tertiary (Thompson, 2005).Additionally, due to a long history of human presence, contemporary flora has been widely influenced by human-mediated dispersal, land-use and other pressures (Dahlin et al., 2014;Fenu et al., 2014).The French Mediterranean area stretches from the Pyrenees in the south-west to the slopes of the Maritime and Ligurian Alps in the east.It encompasses three zones highlighted as glacial refugia (Médail & Diadema, 2009), and the eastern sector represents one of the ten main biodiversity hotspots in the Mediterranean area (Médail & Quézel, 1997).This area represents the northern limit of the Mediterranean climate in the western basin, and thus constitutes a climatic transition from a Mediterranean zone that has a summer drought to a temperate zone less prone to summer drought (Walter & Breckle, 19911994).On a finer scale, the climate is more complex with several subtypes and intricated boundaries (Joly et al., 2010;Tassin, 2017).Several works have tried to map the distribution of biogeographical entities.To date, no statistical analysis had been ran to tackle those expert-based maps with up-to-date plant records, in order to test their reliability.
In order to depict spatial structure in such a complex regional flora, a large dataset is required.While the level of diversity and complexity of such dataset may appear overwhelming at first glance, the emergence of network-based approaches has opened new paths for identifying and delimiting bioregions where the presence-absence matrix is represented by a bipartite network.For example, Kougioumoutzis et al. (2014) applied the NetCarto algorithm (Guimerà & Nunes Amaral, 2005) in order to identify biogeographical modules within the phytogeographical area of the Cyclades.Similarly, Vilhena & Antonelli (2015) proposed a network approach for delimiting biogeographical region based on the InfoMap algorithm (Rosvall & Bergstrom, 2008).By embedding species distributional data into complex networks, these methods have the great advantage to be generic, flexible and to incorporate several scales in the analysis.Most importantly, these methods integrate species community and spatial units within a single framework, which allow to test the relative contribution of each taxa to bioregions depicted, and to represent the relationship between those bioregions based on those contributions.
In this study, we present a biogeographical network analysis of plant species distribution in the French Mediterranean area at different scales.The French Mediterranean territory represents an interesting study area to test new approaches, given the excellent knowledge of the spatial distribution of the plant species revealed by botanical inventories (Tison & Foucault, 2014;Tison et al., 2014) and the detailed databases compiled by the French National Botanic Conservatory of Porquerolles and the Alpine National Botanic Conservatory.The objective of this work is to delineate bioregions, identify groups of species and analyse the relationships between the two entities.

Dataset and study area
The study area, situated in southern France, encompasses the former Languedoc-Roussillon region (five departments of the current Occitanie region: Pyrénées-Orientales, Aude, Hérault, Gard and Lozère) and the whole Provence-Alpes-Côte d'Azur region.It extends around the entire Mediterranean coastline of mainland France and inland, comprising almost all the Mediterranean hinterland, totalling 558, 776 km 2 (Figure 1).The topography is structured by three major mountain ranges, the Pyrenees in the southwest, the Massif central in the northwest and the Maritime Alps in the north-east.Inbetween, the landscape is mostly hilly with some lowlands around rivers that flow into lagoons or marshy deltas such as the Camargue.The Rhône is the main structuring river and delimitates western and eastern subregions.Acidic substrates and silicate soils are mainly found in the aforementioned mountain ranges and in the smaller Maures-Estérel range in southern Provence.The remaining part of the territory is dominated by calcareous or marly substrates (principally Cretaceous and Jurassic), with some significant alluvial zones and small volcanic areas.
The SILENE database1 , has been created in 2006, and is the reference botanical database in the study area.It contains historical data gathered from the scientific literature and herbaria along with more recent data coming from public studies, partnerships, local amateur botanist networks and professional botanists of the Botanical Conservatory.Our analysis is based on a 5 × 5 km 2 grid cells.We decided to only retain data whose georeferencement precision is below 10 meters.While the SILENE database contained nearly five million observations at the date of the export (June 2016), we deleted several taxa whose distribution is still insufficiently known and could distort the results (e.g.apomictic taxa such as Rubus or Hieracium).For the same reason, we also aggregated all sub-taxa at the species level.The final dataset results in 4, 263, 734 vegetation plant samples corresponding to 3, 697 plant species.We divided the study area using a UTM grid composed of 2, 607 squares of lateral size l = 5 km.In order to assess the impact of the spatial resolution on the results (Divíšek et al., 2016;Lennon et al., 2001), we also applied the aforementioned biogeographical network analysis with a grid composed of squares of lateral size l = 10 km (see Figure S2 and Table S1 in Appendix for more details).
Biogeographical network analysis 1. Biogeographical bipartite network.Delineating bioregions requires a link between the species studied and their spatial environment.This link is usually identified with presence-absence matrices where each row represents a grid cell and each column a species.The region of interest is usually divided into grid cells, the resolution of which depends mostly on the size of the study area, the taxonomic group under study and the accuracy of the data.According to the type and quality of data, but also to the research question, the species relevé can be aggregated both spatially or by group of species.Another way of formalizing complex interactions between species and grid cells is to build a biogeographical bipartite network.This bipartite network enables us to model relations between two disjoint sets of nodes, grid cells and species (in our case), which are linked by the presence of a species (or a group of species) in a given grid cell during a certain time window (Step 1 in Figure 2).This way of understanding complex interactions makes it possible to visualize and analyze complex spatio-ecological systems as a whole from individual interactions to local and global biogeographical properties.

Delineating bioregions.
To identify bioregions we projected our biogeographical bipartite network on a spatial template (Step 2 in Figure 2), by defining a metric to measure the similarity of species composition between grid cells.Several measures based on beta diversity have been proposed to quantify the degree of (dis)similarity between grid cells, typically taking into account the number of shared species between grid cells (Koleff et al., 2003;Wilson & Shmida, 1984).These measures are mostly based on presenceabsence data and aim at quantifying species turnover and species nestedness among grid cells, together or separately (Baselga, 2012).Although this indicator may be influenced by gradients in species richness (Baselga, 2012;Dapporto et al., 2015;Lennon et al., 2001), results obtained with the Jaccard index were more spatially coherent in our case.
The resulting network is a weighted undirected spatial network whose intensity of links between grid cells range from 0, absence of a link (no species in common) to 1 (identical species composition).The detection

test-value Grid cells
Plant species Steps of the biogeographical network analysis.1. Biogeographical bipartite network where grid cells and species are linked by the presence of a species (or a group of species) in a given grid cell during a certain time window.Note that there is no link between nodes belonging to the same set.2. The bipartite network is then spatially projected by using a similarity measure of species composition between grid cells.Bioregions are then identified with a network community detection algorithm.3. The test-value matrix based on the contribution of species to bioregions is computed.4.Then, a network of similarity between species is built, based on the test-value matrix.Groups of species sharing similar spatial features are identified using a community detection algorithm.5. Finally, a coarse-grained biogeographical network unveiling the biogeographical structure of the studied area and the relationship between bioregions is obtained.
of community structure in biogeographical networks is an interesting alternative approach to delineating bioregions (Kougioumoutzis et al., 2014;Vilhena & Antonelli, 2015).Community structure is indeed an important feature, revealing both the network internal organization and similarity patterns among its individual elements.In this study we used the Order Statistics Local Optimization Method (OSLOM) (Lancichinetti et al., 2011).OSLOM uses an iterative process to detect statistically significant communities with respect to a global null model (i.e.random graph without community structure).The main characteristic of OSLOM is that it is based on a score used to quantify the statistical significance of a cluster in the network (Lancichinetti et al., 2010).The score is defined as the probability of finding the cluster in a random null model.The random null model used in OSLOM is the configuration model (Molloy & Reed, 1995) that generates random graphs while preserving an essential property of the network: the distribution of the number of neighbors of a node (i.e. the degree distribution).Therefore, the output of OSLOM consists in a collection of clusters that are unlikely to be found in an equivalent random network with the same degree sequence.This algorithm is nonparametric in the sense that it identifies the statistically significant partition, without defining the number of communities a priori.However, the tolerance value that determines whether a cluster is significant or not might play an important role for the determination of the clusters found by OSLOM.The influence of this value, fixed initially, is however relevant only when the community structure of the network is not pronounced.When communities are well defined, as it is usually the case in biogeography, the results of OSLOM do not depend on the particular choice of tolerance value (Lancichinetti et al., 2011).See Lancichinetti et al. (2011) for a comparison between OSLOM and other community detection algorithms.

Test-value matrix.
To analyse the bioregions and their species composition, we rely on test-values measuring the under-or over-representation of species in a bioregion.Let us consider a studied area divided into n grid cells, a species i present in n i grid cells and a biogeographical region j composed of n j grid cells.The test-value compares the actual number of grid cells n ij , located in biogeographical region j and supporting species i, with the average number n i n j /n that would be expected if the species were uniformly distributed over the whole studied area.Since this quantity depends on n i and n j it is normalized by the standard deviation associated with the average expected number of grid cells (Lebart et al., 2000).
The test value ρ ij is then defined as, (1) The test value ρ ij is negative if the species i is under-represented in region j, equal to 0 if the species i is present in region j in the same proportion as in the whole study area or positive if the species i is over-represented in region j.In the latter case we consider that the species i contribute positively to region j and the level of contribution depends of the ρ ij value.Additionally, we consider that a plant species contribute positively and significantly to a bioregion j if ρ ij is higher than a predetermined significance threshold δ.
Hence, The test-value matrix ρ can be used to highlight set of species which better characterize the bioregions.The test-values are easy to interpret by specialists and represent an user-friendly way of ranking species according to their relevance.
4. Groups of species.The next step is to identify how similarities between species are spatially distributed across the study area.Here also we build a network in which the similarity s ii ′ between two species i and i ′ is equal to, This similarity metric is based on the Euclidean distance between test-values for each pair of species.Again, the community detection algorithm OSLOM is used to detect significant groups of species sharing the same spatial features (Step 4 in Figure 2).This step produces a preliminary delimitation of the relationships between bioregions by identifying how the groups of species contributes to one or several bioregions.

Coarse-grained biogeographical network.
To quantitatively characterize relationships between bioregions, we retained only the positive and significant species contributions by considering only testvalues higher than δ = 1.96 (5% significance level of a Gaussian distribution).
Then, since we are interested in interactions between bioregions we focused on the way species contributions are distributed among regions by normalizing ρ + by row (Equation 4).
We then determined for each bioregions j how the set of species A j = {i | ρ ij > 1.96} that contributes to this biogeographical region are specific to it or also contribute to other regions (Equation 5).
λ jj ′ represents therefore the average fraction of contribution to cluster j ′ of species that contribute significantly to cluster j.The specificity of a biogeographical region is therefore measured with λ jj , while the relationships with other regions is given by λ jj ′ .It is important to note that for a given region j the vector λ j. sum to one and can be expressed in percentage.
At the end of the process, we obtain a coarsegrained biogeographical network summarizing the biogeographical structure of the study area.This network is composed of the bioregions and the species groups (Step 5 in Figure 2).All the metrics used to measure the similarity between the different bioregions are derived from the matrix of test-value ρ.

Biogeographical bipartite network
The bipartite network extracted from the database is composed of 2, 607 5 × 5 km 2 grid cells and 3, 697 plant species, where the links represent the occurrence of plant species in the grid cells.Two network degree distributions can be associated to this network: the number of species per grid cell and the number of cells covered by each species.The probability density functions of these two distributions are displayed in Figure 3.The spatial component of the network is very dense.Most of the grid cells host between 200 and 500 plant species, with an average of 360 species per cell (i.e.∼ 15 species/km 2 ).For species side, the situation is different; the majority of plant species cover less than 10% of the study area, which highlight the importance of range restricted taxa.Nevertheless, the distribution exhibits a long tail with a non-negligible number of widespread species.

Delineating bioregions
We identified eight statistically significant bioregions reflecting the biogeographical structure of the French Mediterranean area based on plant species distribution (Figure 4).Clusters size vary from 120 to 807 square cells.Clusters are spatially coherent, ex-hibiting a connectivity measure always higher than 0.5 (i.e.ratio between the number of grid cells in the largest patch and the total number of grid cells (Turner et al., 2001)).Results obtained are not scale sensitive, and the spatial coherence of each cluster according to the scale (l = 5 and 10 km) can be found in Table S1 in Appendix.It also important to note that this step can also be performed with standard hierarchical clustering methods.The results obtained with Ward's clustering are available in Appendix.

Groups of plant species
The test-value matrix can be used to identify plant species that contribute positively and significantly to one or more bioregions.It is worth noting that the number of contributions and their intensity vary among species.Indeed, some species contribute very little to only one region while other species contribute significantly to three or more regions.The number of species contributing to a given number of regions depends on the significance threshold δ.A very small and negative value of δ will imply that almost all plant species contribute significantly to the 8 bioregions.In contrast, a very high value of δ will result in all species contributing to no regions.In order to get a better understanding of species contribution mechanisms and to assess the influence of δ, we plot in Figure 5 the fraction of species contributing positively to a given number of bioregions as a function of a significance threshold value.If we consider the default threshold δ = 1.96, that corresponds to a 2.5% significance level of a Gaussian distribution, we observe that the vast majority of plant species contributes positively to one or two regions representing 35% and 45% of species, respectively.There is also 20% percent of plant species that contribute to three or more bioregions.If we increase the minimum level of contribution necessary to claim that a species contributes to a region, we see that the fraction of species contributing to two or more bioregions dramatically decreases while the fraction of species with no contribution increases.However, it is interesting to note that the fraction of species contributing to one region to increases until reaching a plateau.This demonstrates that 50% of plant species are strongly connected to a single region.The similarities between plant species' contribution to the 8 regions allowed us to identify 20 groups of species, and their contribution to each bioregion is displayed in Figure 7.We observed different patterns of contributions in terms of shape and intensity.This allows for the identification of groups of species sharing similar spatial features and highlights relationships between bioregions through the way plant species contribute to different group of regions.

Relationships between bioregions
This leads us to the study of relationships between bioregions.The network of interactions λ derived from the test-value matrix is plotted in Figure 6.We found that, globally, plant species contributing significantly to a region contribute mostly to this region, with an average specificity of 51% across the eight bioregions.It must be pointed out however that some regions are more specific than others with λ jj values ranging from 40% to 65%.
Analysis of how bioregions connect with each other showed that there is no isolated region in the sense that every region is connected with at least one other region with a λ jj ′ value varying from 1 to 28%.Moreover, for all regions, the maximal λ jj ′ value is always higher than 10%.Although it is generally the case, it is also worth mentioning that the relationships are not necessarily symmetric, which represents an interesting way of detecting hierarchical relationships.A table displaying all λ jj ′ values is available in Table S4 in Appendix.λ jj ′ , expressed here in percentage, represents the average fraction of contribution to cluster j ′ of species that contribute significantly to cluster j.Only links with a value λ jj ′ higher than 10% are shown.

DISCUSSION
In this study we delineate spatial bioregions in southern France, a transition area between a mediterranean and temperate climate.The present analysis represents to our knowledge one of the largest network-based studies published to date, relying on a database containing more than four million data points across a territory of about 558, 776 km 2 .While this territory has been divided into bioregions on expert knowledge, we confront those approaches to datadriven classification, and discuss the coherence of the  S3 in Appendix.
different perspectives.We delineated eight statistically significant bioregions, which we will first present in relation to previously published work, and emphasize their specificity regarding associated groups of species.We discuss the observed spatial patterns in terms of ecological and historical drivers, to provide insights into mechanisms driving the assemblage of vegetation communities.

Bioregions
The clustering approach identified eight statistically significant spatial clusters, that represents coherent territories detailed below.
Regions are presented from Mediterranean toward temperate and mountainous climates.
1. Gulf of Lion coast is a bioregion that extends west of the Rhône, penetrating more inland around the wetland of the Rhône Delta.The latter, along with the Languedoc lagoons, is frequently used as an example of azonal vegetation (Ozenda, 1994), and the originality of the flora and the vegetation of these areas has long been recognized (Molinier & Tallon, 1970).Some subdivisions have been suggested separating, even at a coarse scale, the sand-dune complex, the halophytic vegetation and the salt meadows (Bohn et al., 2000), but were not found here probably due to the size of the cells we used.From a geological point of view, this bioregion is essentially made of sand dunes, lagoon sediments and modern alluvium.It is entirely situated under a Mediterranean climate, in the mesomediterranean climatic belt, with a dry season of two or three months in the summer (Rivas-Martínez et al., 2004a).Taxa specific to this cluster exhibit a distribution following the Mediterranean coastal area, extending in some cases towards other coastal areas or to arid inland zones.They are mostly encountered in halophytic communities and surprisingly not that much into dunes, suggesting that the key factor defining this bioregion might be the saline soils rather than the coastal position alone.
Several narrow-endemics rely on those habitats, especially in the genus Limonium whose rapid radiation is typical of mediterranean neoendemics (Lledó et al., 2005).
2. Cork oak zone encompasses the Maures-Estérel range and neighbouring areas.West of the Rhône, it is fragmented with cells in the eastern tip of the Pyrenees (low Albères and the Roussillon lowlands), plus a few more sparsely dispersed zones in Languedoc.The Provence and Albères areas have been identified by phytogeographers (Ozenda, 1994;Ozenda & Lucas, 1987) as the "Cork oak zone", a silicicolous warm mesomediterranean area.Indeed, climatic data show a clear summer dry period of one to two months.Almost all of the cells contain acidic soils over a variety of substrates (granites, gneiss, schists, sandstones, alluvial deposits, etc.).Species most linked to the "Cork oak zone" have a Mediterranean distribution, with some extending towards the Atlantic area.Characteristic species have ecological preferences for acid soils, and belong to various vegetation stages (forest, scrub or grassland formations).

3.
Mediterranean lowlands bioregion covers the hinterland of the Gulf of Lion from the Roussillon to western Provence.Several authors have individualized an arc shaped mesomediterrean zone (Dupias & Rey, 1985;Ozenda, 1994) but their limits do not fit exactly ours.The closest match is the catalonian-provençal mesomediterranean holm oak forests unit of the European natural vegetation map (Bohn et al., 2000).The area is principally composed of sedimentary rocks (mostly limestones and marls) and alluvium.Its climate is Mediterranean (Rivas-Martínez et al., 2004a), with a summer dry period of one to three months.With few exceptions, species most linked to this bioregion have a distribution included in the Mediterranean region (Rivas-Martínez et al., 2004b).Most of them belongs to communities of the Quercetea ilicis or of the former Thero-Brachypodietea, i.e. the matorral / forest and grasslands communities making up the landscape locally called "garrigues".The other part of these taxa is usually found in disturbed communities, showing the strong incidence of human activities in this area.
4. Mediterranean border is a bioregion whose northern edge roughly follows the limit of the Mediterranean world as it is usually depicted (Dupias & Rey, 1985;Quézel & Médail, 2004).It broadly coincide to what has been called a supramediterranean belt (Ozenda, 1994) or a submediterranean zone (Bólos, 1961), and fits quite well with four mapping units of the Map of the natural vegetation of Europe (Bohn et al., 2000); namely the catalonian-provençal supramediterranean holm oak forests and three types of downy oak forests (ligurian-middle apennine, languedocian and those extending from the southern Pyrenees to the southwest pre-Alps).The substratum of this area is mainly calcareous and marly.This area has a short (one month) summer drought period with the exception of some Var and Alpes-Maritimes places where the summer drought is more pronounced (two months).Species most linked to this bioregion present a western eury-mediterranean distribution, and share a common ecology, occurring frequently in communities belonging to the Helianthemo italici-Aphyllanthion monspeliensis and to a lesser extent to the Ononidetalia striatae (Gaultier, 1989;Rivas-Martínez et al., 2002), i.e. dry dwarf scrubs and their associated grasslands on calcareous and marly eroded soils (Mucina et al., 2016).
5. Cévennes sensu lato is a bioregion to which most of the cells are situated in the Cévennes areas, while the remainder is scattered over the eastern Pyrenees piedmont and the Montagne Noire (southern limit of the Massif Central).This spatial cluster overlays four zones of the phyto-ecological regions (Dupias & Rey, 1985): the lower Cévennes, the "warm" Cévennes valleys, the Aspres and the chestnut zone of the southern edge of the Montagne Noire.The Cévennes proper part of this cluster has also been identified by other authors (Braun-Blanquet, 1923;Ozenda, 1994) and putative glacial refugia has been positioned there (Médail & Diadema, 2009).This area is not subject to a summer drought and covers siliceous substrata such as schists, granites or gneiss.Taxa exhibiting the strongest link to this biogeographical region are either Cévennes endemics, subendemics (Dupont, 2015;Lavergne et al., 2004) or plants with a more or less Atlantic distribution (Dupont, 2015), but no clear ecological pattern is emerging among these taxa.

Subatlantic mountains
The largest area covered by cells of this biogeographical region is the northern part of the Lozère department.The remaining cells are mostly distributed in the Massif Central and in the Pyrenees.These areas belong to the beech (Fagus sylvatica L.) montane belt (Bohn et al., 2000;Ozenda, 1994) with a few exceptions where Scots pines (Pinus sylvestris L.) dominate.It corresponds to the predominantly siliceous subatlantic type (Ozenda & Lucas, 1987), where the climate is rather wet, with precipitations frequently exceeding 1,000 mm per year and no dry period.Thus, wetlands and bogs are not rare, and the substratum is made of igneous rocks which explain the acidic nature of the soils.The majority of the taxa most linked to this spatial cluster are generally distributed all over the eurosiberian region or the western part of this region, corresponding to a subatlantic distribution (Dupont, 2015;Rivas-Martínez et al., 2004b).Interestingly, most of those plants grow in wetlands habitats, a trend already noticed in the Massif Central (Braun-Blanquet, 1923).
7. Pre-Alps and other medium mountains represent a bioregion whose cells are disseminated through the lower parts of the eastern Pyrenees including almost all the Pyrenean part of the Aude department, through the highest areas of the Causses, around the Mont Ventoux and through the most eastern part of the Pre-Alps.This area has rarely been individualised in such a way even if at a European scale it can be related to several more or less calcicolous beech or fir-beech forest belts (Bohn et al., 2000) (Abies alba L. and Fagus sylvatica L.), or more specifically, for the Var department, to a pre-alpine district (Lavagne, 2008).Most of the rock underlying this area is calcareous.Climatically, we are outside of the Mediterranean climate as there is no dry period.The distribution of taxa most linked to this biogeographical region is basically holarctic, avoiding the Mediterranean parts of Europe.Some of these taxa also avoid the most Atlantic part of the continent.Their ecology is varied, pertaining to different stages (grasslands, shrubs, forests) of mountain vegetation series, often (but not systematically) calcicolous.

High mountains
This bioregion regroups the highest part of the Alps and the Pyrenees.If most authors agree on individualizing the upper vegetation belts of these mountain ranges, its unity and the common points are less often identified (Ozenda, 2002).Both calcareous and acidic soils are to be found in this area.Cells of this region are the coldest of our study area, and there is no dry period: the climate is relatively harsh and the vegetation period is reduced (Ozenda, 2002) compared to the other clusters.Taxa most linked to this region are mainly European mountains endemics, venturing also in the Arctic.They belong to grasslands or snowbeds communities, which is consistent with their occurrence on the highest ranges.

Defining the Mediterranean region
At a global scale, the delimitation of the Mediterranean border has been a long running question (Latini et al., 2017), and the mismatch of the numerous attempts attest to the difficulties .In France, the first attempt goes back to third edition of the Flore Française by Lamarck & Candolle (1805), as shown in Ebach & Goujet (2006) followed by several other works such as Flahault & Durand (1887), who considered the distribution limit of the olive tree (Olea europaea L.) as a marker of the Mediterranean biome.This was later generalized to the evergreen oak belt (Quézel, 1999), but it appeared that the situation was more complex (Quézel & Médail, 2004).Thus, variability in results has not lead to a comprehensive framework yet.This has several implications regarding conservation programs, as the delimitation mentioned by European legislation has been used as a reference to delimit the distribution of several protected habitat2 .In this study, the network approach allowed to discriminate two "sub-networks" with little exchange regarding species composition and different relative contribution to each area, which globally relate to a temperate and a Mediterranean sub-groups.Several earlier bioregionalizations in the mediterranean basin have failed to separate mediterranean from eurosiberian ensembles, suggesting this boundary would be highly permeable (García-Barros et al., 2002;Saiz et al., 1998) and easily crossed by species.Here, the use of a precise dataset coupled with a network analyse has proven to be relevant to depict such spatial transition, which reinforce the need to gather coherent dataset to characterize complex and intricate spatial structures.This biogeographical boundary has been linked to a change in the annual distribution of precipitation, which induces a prolonged summer drought and a stronger climatic seasonality in the mediterranean (Antonelli, 2017).At a finer scale, the three mediterranean clusters present a high spatial coherence, and closely fit to the mesomediterranean thermoclimatic belt (Rivas-Martínez et al., 2004b) (see Figure S11 in Appendix).The high congruence between climatic model (Rivas-Martínez et al., 2004b) and biogeographic entities has never been pictured by previous bioregionalization works (see Appendix for maps), as most of them presented a wider definition of the mediterranean biome, extending northward.Then, the absence of orogenic barriers along this climate-based distinction is likely to produce shallow boundaries typical of transition areas (Antonelli, 2017;Ficetola et al., 2017) exemplified here by the cluster "Mediterranean border" that contains all historical attempt to delimitate the mediterranean biome.West of the Rhone, this region is relatively thin and fence around the mesomediterranean ensemble; east of the Rhone, it occupies a wide area on the Alpine piedmont.Thus, instead of drawing a single line (Cox, 2001), we propose to identify a transition area (Droissart et al., 2017;Latini et al., 2017) with an upper boundary as the limit of the Mediterranean biome (Antonelli, 2017).

Vicariance and fragmentation among bioregions
The relationship between bioregions can be seen through the understanding of species relative importance in each area.First, the regions "Gulf of Lion coast", "Cork oak zone" and "Mediterranean lowlands", all included within the same bioclimatic belt (Rivas-Martínez et al., 2004a), differ mostly on substratum, i.e. calcareous (bioregion 3), siliceous (bioregion 2) or quaternary deposits (bioregion 1).Thus, they are well defined and little uncertainty exists concerning their spatial configuration (Figure S15 in Appendix); those three entities can be seen as climatic vicariant bioregions which have conjointly developed 04/07/2018) on different geological substrates, or "islands".As a result, they share an important pool of species, and present the highest complementarity in the network, as they are the only three clusters all related to each other.In contrast, the relationship between the "Cork oak zone" and the "Cévennes" exemplify the opposite process: those two areas share a similar bedrock (mainly acidic substrate) but are located at each extreme of the Mediterranean climatic gradient.While the "Cork oak zone" is present under hot and dry mesomediterranean climate (some coastal cells even belonging to a thermomediterranean belt), the "Cévennes" present a higher impluvium and a very weak summer drought.Consequently, they share a common set of species which interestingly are typical of the "Cévennes" cluster, and extend into the "Cork oak zone".Noteworthy point, those population can constitute relictual rear edge populations, which often retain particular interest for conservation (Hampe & Petit, 2005;Lavergne et al., 2006).
Finally, the "Pre-Alps" and "High mountains" bioregions are both present within the three mountain chains, and occupy climatic conditions with no dry period at all, and especially harsh prolonged winter for the second.Several species groups are highly informative for both of those bioregions, which signify that they share an important group of species globally adapted to mountain environment."High mountains" present the highest percentage of typical species.Yet, within the numerous plant species groups characterizing those entities (5 groups in Figure 7), the relative contribution of each toward one or the other bioregion might differ slightly, sometimes in association with another bioregion such as the "Mediterranean border" (Figure 7).This illustrates that groups of taxa are unevenly important across these two regions, probably reflecting the complex geological substrate.Thus, while our analysis reflect an overall homogeneity of mountain flora mainly driven by climate, it is likely that finer divisions based on a more precise study could be expected.This has been pinpointed by Bohn et al. (2000) who pictured a high local heterogeneity due to steep altitudinal gradients and geological diversity, despite some vegetation groups shared between the Alps and the Pyrenees.Therefore, a comparative analysis including a broader spatial perspective on those massif could improve our understanding of the spatial structure of mountain flora in western Europe.

Eco-evolutionary factors driving the spatial organisation of plant diversity
The spatial distribution and species relative importance for each bioregion can help us to better understand processes that have shaped Mediterranean biota in the south of France.The regional species pool results from several waves of colonization following glacial cycles, constrained by ecological filters that allowed taxa to persist and ultimately shaped lo-cal communities (Ricklefs, 1987).Indeed, our study area is at the crossroad of recolonization routes out of two major refugia, i.e. the Iberic and Italian peninsulas (Hewitt, 2000), and represents an admixture zone for several mediterranean taxa (Lumaret et al., 2002).Joint action of colonization-retraction sequences and long term persistence within microrefugia has been suspected to generate fragmented distribution.Thus, one particular feature of such climatic transition area is the high proportion of population isolated at the periphery of their main range (Thompson, 2005), either at the rear or at the leading edge of their distribution (Hampe & Petit, 2005).However, spatial patterns alone do not inform on the evolutionary isolation of such populations, could it be of recent dispersal following Last Glacial Maximum (Lumaret et al., 2002), or long term persistence in a given refugia (Médail & Diadema, 2009;Papuga et al., 2015).Thus, integrating phylogenies within bioregionalization would prove informative to analyse historical events that have shaped current spatial patterns of biodiversity (Nieto Feliner, 2014), and capture the evolutionary relationship among bioregions (Holt et al., 2013).
Nevertheless, analysing the spatial organization of flora can help us to understand ecological factors that shape such bioregions.Orographic barriers and past tectonic movement are expected to have little impact on our study area, as no such events have occurred there since the onset of the Mediterranean climate in the Pliocene (Rosenbaum et al., 2002).In our analysis, spatial structuration relies principally on two elements.On the one hand, a climatic gradient from Mediterranean to temperate climate creates fuzzy spatial limits among adjacent groups, and increases uncertainty when delimitating groups (Figure S15 in Appendix).This is exemplified by the spatial imbrication of "Mediterranean lowlands" and "Mediterranean border".On the other hand, geological variations can form sharp transitions creating important species turnover between places close apart.This is exemplified by the "Cork oak zone" whose spatial delimitation is very clear, due to the presence of an acidic substrate surrounded by places dominated by calcareous-based rock.Interestingly, this area still shares an important part of its biota with other places in the Mediterranean basin probably inherited from times where such geological islands formed a single ensemble, before the separation and later migration of these islands (Médail & Quézel, 1997;Rosenbaum et al., 2002).The joint action of these two ecological factors has already been highlighted in previous bioregionalization of the Mediterranean basin (Buira et al., 2017).As a result, complex geo-climatic variation have played a key role in shaping island-like territories which have fragmented species distributions, a factor that has strong influence on populations characteristics both genetically and demographically (Pironon et al., 2017).
The flora of the Mediterranean basin shows recurrent patterns of narrow endemism, species turnover and highly disjunct distributions (Thompson, 2005).
While allopatric isolation has been suspected to be the main mechanism explaining the differentiation of taxa, the shared significance of different ecological variables (namely climate and geology) points out the combined importance of spatial isolation and heterogeneous selective pressures (Anacker & Strauss, 2014;Thompson, 2005).Additionally, recent studies have shown that this can be enhanced by small scale changes of the ecological niche (Lavergne et al., 2004;Papuga et al., 2018;Thompson et al., 2005).Contrary to other mediterranean biomes (e.g.South-Africa and Australia), the mediterranean basin is marked by an active speciation, which has led to the high observed proportion of neoendemic species (Rundel et al., 2016).If evidences have accumulated concerning cryptic microrefugia for temperate trees (Stewart & Lister, 2001), little is known regarding mediterranean taxa, especially those that exhibit little dispersal capacities, a shared trait among mediterranean endemics (Lavergne et al., 2004).Thus, this bioregionalization set the scene to investigate the shared phylogeographic legacy of the Mediterranean biota (Puşcaş & Choler, 2012), and measure the evolutionary isolation of such communities that separate peripheral isolates from newly differentiated species (Crawford, 2010).

CONCLUSION
The quality of a bioregionalization is dependent on the data and the method used.To our knowledge, the present analysis constitutes the densest speciescells network analysed in a bioregionalization study, at such a high spatial resolution.Therefore, results of this study demonstrate that new statistical methods based on network analysis can bring solutions to manage and analyse large databases, and provide efficient bioregionalization at different scales.New perspectives for bioregionalization will integrate community structure across different scales, in order to understand how deterministic (i.e.niche based) processes and stochastic events (dispersal, random extinction, ecological drift) interact to shape plant communities, from regional species pool to local assemblages (Chase & Myers, 2011).

Comparison of OSLOM to standard clustering methods
In this work, bioregions are delineating using the community detection algorithm OSLOM applied on a weighted undirected spatial network whose intensity of links between grid cells are measured with the Jaccard similarity coefficient.This algorithm is nonparametric in the sense that it identifies statistically significant communities with respect to a global null model, and therefore the number of communities does not need to be defined a priori.In order to assess the accuracy of the method, we compared the results obtained with OSLOM with the ones obtained with standard hierarchical clustering methods.Not that these standard methods cannot be directly applied on the spatial network described above, we first need to transform the network into a dissimilarity matrix.Three different agglomeration methods have been tested: average (UPGMA), mcquitty (WPGMA) and Ward1 .To choose the number of clusters, we used the average silhouette index S (Rousseeuw, 1987).For each cell g, we can compute a(g) the average dissimilarity of g (based on the Jaccard index in our case) with all the other cells in the cluster to which g belongs.In the same way, we can compute the average dissimilarities of g to the other clusters and define b(g) as the lowest average dissimilarity among them.Using these two quantities, we compute the silhouette index s(g) defined as, s(g) = b(g) − a(g) max{a(g), b(g)} (1) which measures how well clustered g is.This measure is comprised between −1 for a very poor clustering quality and 1 for an appropriately clustered g.We choose the number of clusters that maximize the average silhouette over all the grid cells S = ∑ n g=1 s(g)/n.UPGMA and WPGMA failed to detect any coherent partitions, most of the grid cells were gathered in a giant cluster component even increasing significantly the number of clusters.Better results were obtained with Ward's method.The average Silhouette index as a function of the number of clusters is shown in Figure S3.Two optimal partitions have been detected with the average Silhouette index.It is interesting to note that the number of clusters of the second partition is the same that the one automatically detected with OSLOM (Figure S3).
A map of the eight optimal bioregions obtained with Ward's method is display in Figure S4.In order to compare the two partitions a contingency between the partitions obtained with Ward's method and OSLOM is shown in Table S2.Table S2.Contingency tables between the partitions obtained with Ward (in row) and OSLOM (in column).(Ebach & Goujet, 2006;Lamarck & Candolle, 1805).

Bohn (2000)
Figure S9.Comparison of the results obtained with OSLOM (l = 5 km) with Bohn's mediterranean/supramediterranean limit (Bohn et al., 2000).  .Uncertainty map (l = 5 km).For a given cell, J1 represents the average Jaccard similarity index between this cell and all the cells that belong to its cluster, and J2 represents the average Jaccard similarity index between this cell and all the cells belonging to the second closest cluster (based on the Jaccard similarity).

Figure 1 .
Figure1.Distribution of the number of species per grid cell (l = 5 km).The inset shows a map of France including the studied area colored in red.An altitude map of the studied area is available in Appendix.
Figure 3. Degree distributions of the biogeographical bipartite network.Probability density functions of the number of plant species per grid cell (in blue) and the number of cells covered per plant species (in red).Similar figures showing histograms instead of densities are available in Figure S13 in Appendix.

Figure 4 .
Figure 4. Bioregions based on similarity in plant species (l = 5 km).Eight bioregions have been identified.1. Gulf of Lion coast in red.2. Cork oak zone in orange.3. Mediterranean lowlands in light green.4. Mediterranean border in dark green. 5. Cévennes sensu lato in purple.6. Subatlantic mountains in pink.7. Prealps and other medium mountains in yellow.8.High mountains in brown.The inset shows a map of France including the studied areas colored in red.An altitude map of the studied area is available in Appendix (FigureS14).

Figure 5 .
Figure 5. Fraction of species contributing positively and significantly to a given number of bioregions (from 0 to 5 or more) as a function of the significance threshold.The vertical line represents the significance threshold δ = 1.96.

Figure 6 .
Figure6.Network of interactions between bioregions.λ jj ′ , expressed here in percentage, represents the average fraction of contribution to cluster j ′ of species that contribute significantly to cluster j.Only links with a value λ jj ′ higher than 10% are shown.

Figure 7 .
Figure 7. Description of the groups of plant species.Boxplot of test-values according to the bioregions and the plant species groups.The horizontal line represents the significance threshold δ = 1.96.The number of plant species per group is available in TableS3in Appendix.

Figure S3 .
Figure S3.Average Silhouette as a function of the number of clusters obtained with Ward's clustering (in blue) and OSLOM (in red).

Figure S4 .
Figure S4.Biogeographical regions based on similarity in plant species obtained with Ward's clustering (l = 5 km).Eight biogeographical regions have been identified.
Figure S15.Uncertainty map (l = 5 km).For a given cell, J1 represents the average Jaccard similarity index between this cell and all the cells that belong to its cluster, and J2 represents the average Jaccard similarity index between this cell and all the cells belonging to the second closest cluster (based on the Jaccard similarity).