Climate conditions are primary predictors of the regional‐scale spatial diversity patterns of ectomycorrhizal fungi

Spatial models are valuable for revealing biodiversity patterns but are less commonly applied to soil microbes than to aboveground macroorganisms. Ectomycorrhizal (EM) fungi are symbiotic microbes with high taxonomic and functional diversity that are associated with forest trees. We aimed to predict regional‐scale spatial patterns of EM fungal richness and community composition.


| INTRODUC TI ON
Soil microbes play fundamental roles in regulating key ecological processes such as nutrient cycling and aboveground trophic interactions (Bardgett & van der Putten, 2014).Regarding their high taxonomic and functional diversity, the biogeography of soil microbes is crucial to achieving a full understanding of biological diversity patterns.However, the geographic distribution of soil microbes has been poorly documented in comparison to the well-studied diversity patterns of some aboveground macroorganisms (Cameron et al., 2019;Guerra et al., 2021).
The advancement of molecular tools has allowed ecologists to explore the diversity patterns of soil microbes at various spatial scales and locations (Lilleskov et al., 2004;Talbot et al., 2014;Tedersoo et al., 2014).Global-scale studies have suggested that contrasting biogeographic patterns may exist between microorganisms and macroorganisms, and among different taxonomic and functional groups of soil microbes (Bahram et al., 2018;Kivlin et al., 2017;Tedersoo et al., 2014).For example, the diversity of many macroorganisms, particularly plants, tends to decrease from the equator towards the poles, a pattern known as a latitudinal diversity gradient (LDG; Hillebrand, 2004).Similar trends have been reported in the taxonomic diversity of total soil fungi (Bahram et al., 2018) and arbuscular mycorrhizal fungi (Davison et al., 2015).In contrast to the LDG, diversity may increase with latitude and elevation or peak at mid-latitudes for ectomycorrhizal (EM) fungi (Kivlin et al., 2017;Tedersoo et al., 2012Tedersoo et al., , 2014) ) and soil bacteria (Bahram et al., 2018).Substantial variation is evident among studies examining microbial diversity patterns, and the fundamental patterns of microbial biogeography remain largely unknown (Thompson et al., 2017).Large among-site variability is common in microbial studies, implying that detailed studies at fine spatial scales are crucial to improving the accuracy of microbial biogeography.
EM fungi are among the most diverse groups of root symbionts that are widespread in forest ecosystems and have the capacity to improve the nutrient uptake and stress resistance of host plants (Smith & Read, 2008).EM fungal community structure is influenced by multiple environmental factors and dispersal limitations.Climate, soil, and vegetation type are considered important environmental factors that impact EM fungi, with varying degrees of importance in different regions (Talbot et al., 2014;van der Linde et al., 2018).For example, the variability in EM fungal community composition is predominantly explained by host identity and soil conditions in western Europe (van der Linde et al., 2018), geographic distance in North America (Talbot et al., 2014), and climate in eastern Asia (Miyamoto, Terashima, & Nara, 2018).Climate factors tend to be the primary drivers of EM fungal community structure at the global scale (Tedersoo et al., 2012(Tedersoo et al., , 2014;;Větrovský et al., 2019).Because previous studies have had differing focal regions and sampling strategies, generalization or direct comparison among studies is difficult.Identifying the potential drivers that shape EM fungal communities is crucial to understand their spatial patterns in a target region.Statistical models are often used to estimate the potential spatial patterns of biodiversity because information on species and communities is generally sporadic and incomplete.To clarify the spatial patterns of EM fungi, systematic sampling across a range of environmental gradients and interpolation across target areas using statistical models is required.However, few studies have yet produced spatial maps of ecologically important EM fungi (Steidinger et al., 2020;Tedersoo et al., 2020).
Here, we compiled data from multiple studies with the aim of mapping the spatial patterns of EM fungi in Japan.The Japanese Archipelago stretches over 3000 km in a northeast-southwest direction and has an elevation range of more than 3000 m, providing high habitat diversity along sharp climatic gradients within a narrow region.This geographic setting is ideal for examining biodiversityenvironment relationships in EM fungi with little influence from historical dispersal limitation (Miyamoto, Narimatsu, & Nara, 2018).In this study, we used geographic information including climate, soil, and ecosystem productivity data for spatial modelling, taking advantage of extensive datasets that are publicly available for ecological research.Although host associations are important drivers of EM fungal community structure (e.g.van der Linde et al., 2018), there is no reliable tree distribution dataset that is applicable to spatial modelling.Therefore, we did not include host information as a predictor in our spatial modelling, but instead used on-site records to evaluate the relative importance of host richness.
This study provides the first detailed characterization of the regional-scale spatial patterns of EM fungal communities in Japan.Specifically, we tested the following hypotheses.First, it has been reported that EM fungal taxa display narrow temperature niches in cold habitats (Kivlin et al., 2017;Miyamoto, Terashima, & Nara, 2018;Větrovský et al., 2019) and that diversity peaks at mean annual temperatures of around 0 and 10°C (Tedersoo et al., 2012).Thus, we expected that colder habitats, such as high mountains and northern regions, would harbour higher EM fungal diversity than warmer habitats (Hypothesis 1).Second, EM fungi are taxonomically and functionally diverse and often exhibit varying sensitivities to environmental factors related to differences in their life strategies and physiological optima, resulting in niche differentiation (Hobbie & Högberg, 2012;van der Linde et al., 2018).Therefore, we expected distinct spatial zoning among taxonomic groups (Hypothesis 2).Finally, given the expected niche differentiation among taxonomic groups and the steep geographic and climatic gradients in this region, we expected that community composition would shift across the landscape from north to south and from high to low elevations (Hypothesis 3).

| Sampling and processing
To provide a baseline of EM fungal diversity across Japan, we compiled EM fungal community data that were collected using comparable sampling techniques (Koizumi & Nara, 2020;Janowski & Nara, 2021;Murata et al., 2017;Miyamoto, Terashima, & Nara, 2018 and references therein; Figure 1; Table S1).The study sites cover a wide range of climate conditions, representing the typical vegetation zones of the Japanese Archipelago.Sampling sites were closed-canopy mature forests and subalpine ecosystems that were predominated by EM host plants.Site size averaged 1.6 ha (0.2-6.8 ha), which depended on the continuity of uniform vegetation and the availability of EM host plants.At each site, >20 soil cores were randomly collected from the surface layer (5 cm × 5 cm × 10 cm depth) at an interval of >5 m, as autocorrelation in EM fungal communities diminishes after 3-4 m (Lilleskov et al., 2004;Pickles et al., 2012).In total, 1631 soil cores were collected at 39 sites.
We followed previously published protocols for root sampling and molecular analysis (Murata et al., 2013).In brief, woody plant roots in each soil core were carefully collected and washed with tap water.EM fungal root tips were differentiated into morphotypes according to the morphological characteristics of Agerer (2001), including colour, branching pattern, and the presence/absence of emanating hyphae and rhizomorphs under a dissecting microscope.One to five visually healthy EM root tips of each morphotype per soil core were collected for molecular identification.Each root tip sample was stored in a microtube containing cetyl trimethyl ammonium bromide (CTAB) solution.Genomic DNA from individual EM roots was extracted using the CTAB method (Nara et al., 2003).We conducted Sanger DNA sequencing of the internal transcribed spacer (ITS) region of individual EM fungal root tips to achieve high resolution taxonomic identification.The full ITS region (ITS1-5.8S-ITS2) of nrDNA was amplified primarily using the forward primers ITS5 and ITS1F with the reverse primer ITS4.The basic PCR conditions were: one cycle of 98°C for 30 s followed by 35 cycles of 98°C for 30 s, 56°C for 15 s, and 70°C for 30 s, with modification according to the primers and DNA polymerases used in each reaction.Additional primers (e.g.ITS0F, LA-W, LB-W, ITS4B and LR22) were applied depending on amplification success.Amplicons were purified using ExoSap-IT and sequenced using BigDye3.1 on an Applied Biosystems 3730xl DNA Analyser (Applied Biosystems) with the primers ITS1, ITS4 or both.
In mixed forests, the host identification of each root sample was confirmed through molecular analysis at the genus level.Briefly, the trnL region of chloroplast DNA was amplified using the trnL_c-trnL_d primer pair and digested using the restriction enzyme HinfI.
Restriction fragment length polymorphism (RFLP) patterns were compared with DNA taken from the leaves of EM host plants at a study site.Other primer pairs (e.g.trnS-trnG, trnL_e-trnL_f) were used secondarily depending on amplification success.Samples with unclear RFLP patterns were subjected to direct sequencing.

| Bioinformatics
All sequencing chromatograms were visually checked for base-pair call quality, and sequence reads with multiple peaks or low base-call quality were discarded using ATGC software (GENETYX Corp).ITS sequences were obtained for individual root tips and sequences of sufficiently high quality (>300 bp long) from all sites were assembled into operational taxonomic units (OTUs) at a sequence similarity threshold of 98.5% using VSEARCH v2.22.1 (Rognes et al., 2016).
Fungal taxonomy and nomenclature were determined using the usearch_global tool in VSEARCH with the UNITE species hypothesis database v.9.0 (Abarenkov et al., 2022), with an 80% identity threshold.EM status was determined from the literature and the UNITE database (Kõljalg et al., 2013;Tedersoo et al., 2010;Tedersoo & Smith, 2013).Cenococcum geophilum, which has distinct morphological features, was identified based solely on morphotypes and its sequence was unavailable at some sites.Additionally, the ITS region provides insufficient taxonomic resolution for C. geophilum (Douhan & Rizzo, 2005).Thus, C. geophilum-morphotypes and sequences identified as Cenococcum were assigned to a Cenococcum-complex OTU.After the removal of non-EM and unknown guild sequences, we retained 8195 sequences from 1507 soil cores.

| Explanatory variables
All statistical analyses were conducted using R ver.4.2.0 (R Development Core Team, 2021).Environmental variables were prepared at a resolution of 1-km 2 grid cells ('3rd mesh' defined by Ministry of Land, Infrastructure, Transport, and Tourism, Japan).Climate F I G U R E 1 Locations of study sites.Inset plot shows elevation with latitude.Details of the study sites are provided in Table S1.
records  were obtained from The Japan Meteorological Agency (2014).Monthly climate records included temperature, precipitation, snowfall and solar radiation.Seasonal variability (across four seasons) was calculated for each variable, and warm and cold indices were calculated from temperature records (Table S2).To examine ecosystem productivity, annual net primary productivity (NPP, for the period 2000-2014) was determined from MODIS satellite (MOD17A2/A3) data, available at 1-km resolution (Running & Zhao, 2015).To examine soil factors, the soil organic matter concentration (SOM [g/kg]), total nitrogen concentration (TN [g/kg]), carbon to nitrogen ratio (CN), and pH in the top 20 cm of the soil horizon were obtained from the WISE30sec database, which estimates soil properties at 1-km resolution based on the combination of climate zone and soil type (Batjes, 2016).
We conducted two principal component analyses (PCAs), one for climate variables and the other for soil variables, after standardizing and centring all 40 variables.For subsequent analyses, we used the first four principal components for climate and the first three for soil, because these components accounted for more than 93% of the total variation in each analysis (Figure S1).The principal components (PCs) were interpreted based on the loadings of the corresponding environmental variables.For climate, PC1 was mainly explained by temperature (mean annual and seasonal temperatures, cold and warm indices); PC2 by precipitation (annual and seasonal total precipitation); PC3 by winter conditions (solar radiation from December to May, precipitation from December to February, snow); and PC4 by summer radiation (June to August) (Table S2).For soils, PC1 was mainly explained by SOM and TN; PC2 by CN; and PC3 by pH (Table S3).Eight variables (four climate PCs, three soil PCs, NPP) were used as predictors for spatial mapping.Multicollinearity was not found among the selected variables (Pearson's correlation r < 0.56, variance inflation factor < 3.2).The range of environmental niche space for each variable is shown in Figure S2.
Additionally, we examined the relative influences of on-site variables (i.e.sample year, host genus richness and number of soil cores) on fungal community structure, but these factors were not included in the spatial model because gridded datasets were unavailable.Onsite variables were standardized prior to analysis.

| Richness
We calculated the alpha diversity (i.e.OTU richness) of EM fungi at two spatial levels, namely the site-and core levels.At the core level, we recorded the presence of individual EM fungal OTUs in soil cores of consistent volume (250 cm 3 ).Richness data were log-transformed prior to analysis.At the site level, because different numbers of soil cores were collected among sites, we removed the effect of sampling effort by standardizing the OTU richness of each site to an equal number of soil cores.We computed species-area curves using the iNEXT package of R (Hsieh et al., 2016).Because this programme recommends extrapolation up to twice the minimum sample size, OTU richness from 32 soil cores was used as the site-level richness for spatial modelling.
Maps were constructed using geographic grid cells with a resolution of 1 km 2 in accordance with the resolution of the environmental data.Grid cells where soil data were unavailable and forest cover was <5% (e.g.urban, alpine and crop areas) were excluded from spatial modelling (Figure S3).We used random forest (Breiman, 2001) to model regional-scale patterns of EM fungal richness.Random forest is a machine learning algorithm that makes few statistical assumptions and can reveal nonlinear associations and interactions among explanatory variables while mitigating the issue of overfitting (Crisci et al., 2012).This analysis was conducted using the randomForest function in the randomForest package of R (Liaw & Wiener, 2002) with 500 regression trees.The number of randomly selected variables at each node (the 'mtry' parameter) was tuned between 2 and the total number of explanatory variables (i.e.8) using 10-time repeated fivefold cross-validation (CV).The root mean square error (RMSE) was used to select the optimal model based on the smallest value.The importance of explanatory variables was measured based on '%incMSE' that indicates the contribution of each variable to the model's predictive performance.We computed the median %incMSE using 100 permutations of each predictor variable.Variables that produced negative %incMSE values were excluded from spatial prediction.
To evaluate model performance, CV was conducted by randomly selecting sites or cores in each iteration.Training was performed on 80% of the dataset and testing on the remaining 20%.
This procedure was repeated 50 times and model performance was quantified using RMSE and the coefficient of determination (R 2 ; the squared Pearson's correlation between observed and predicted values from all repetitions of CV).Spatial autocorrelation in residuals of each model was assessed by calculating and testing (twotailed test) Moran's I statistic.To obtain real distances between sampling sites, planar rectangular coordinates were obtained using an online survey-site calculator (Geospatial Information Authority of Japan, 2013).
We identified areas where spatial modelling had low applicability because predictions were made outside the environmental niche space of the training dataset.'Area of applicability' (AOA) analysis was applied following the approach described by Meyer and Pebesma (2021).We calculated the dissimilarity index (DI) that measures how much a prediction grid cell differs from the training data in multidimensional space.Standardized explanatory variables were weighted by multiplying the %incMSE values that was derived from the random forest.CV (50 rounds of iteration with 80% of data for training and 20% of data for testing) was performed to determine the threshold DI value to define the AOA.The DI was computed for each target grid cell, and cells with DI above the threshold (i.e. the 75th percentile plus 1.5 times the interquartile range of DI for the CV training data) were considered to be outside the environmental space and thus not applicable for spatial modelling.

| EM fungal lineages
To compare the spatial patterns of EM fungal taxonomic groups, the relative richness of EM fungal lineages was determined at each site.Lineage is a monophyletic taxonomic unit defined based on molecular sequence data that represents the independent evolutionary gains of EM symbiosis by different fungal groups (Tedersoo et al., 2010).We evaluated the stability of models with respect to sampling effort by generating subsets of fungal communities containing an equal number of soil cores per site.We randomly selected 22 cores and the relative richness of lineages was computed for each site.We excluded two sites represented by <22 cores, resulting in 37 sites for this analysis.Random subsampling was conducted 20 times, and the spatial modelling procedures described above were applied to the entire dataset and each subset.

| Community composition
To examine the dissimilarity of community composition among sites, we generated a community composition matrix of site-by-OTU abundance, measured as the numbers of soil cores where a focal OTU was present.After excluding singleton OTUs (i.e.recorded in one soil core in the entire dataset), community composition data were Hellinger-transformed and pairwise Bray-Curtis distances were calculated.We performed nonmetric multidimensional scaling (NMDS) ordination with 999 permutations using the metaMDS function in the vegan package of R. Dissimilarity in the community composition and species scores of the OTU-rich EM fungal lineages was visualized in a two-dimensional (2D) NMDS space.An environmental fitting test (envfit function of vegan) was conducted to visualize the linear relationships between NMDS axes and explanatory variables on the 2D plot.The standard error among fungal lineages was depicted using the ordiellipse function in the vegan package.NMDS ordination was used to visualize the geographic pattern of EM fungal community composition dissimilarity following the approach described by Kreft and Jetz (2010) with modifications.Colours were assigned to grid cells according to their position in the 2D ordination space.The colours red, yellow, green and blue were assigned to the four corners of the NMDS plot and then interpolated across the grid cell in steps of 100, resulting in 10,000 grid cells with unique gradient colours.
Random forest was used to assess associations between NMDS axes and eight environmental variables and to estimate a colour in each geographic grid cell.This procedure was repeated for the 20 subsampled community datasets that were generated for evaluating the effect of sampling effort.The extent of variation was measured as the pairwise Euclidean distance between predicted positions on a 2D NMDS plot.The median distance obtained from the 20 subsets was computed for each geographic cell.Large pairwise distances indicate high variability on the map associated with sampling effort (Figure S4a).
These records represent the realized niches of individual OTUs in the surface soil (top 10 cm).

| Richness
Host genus richness had high importance in explaining fungal OTU richness variation (Figure 2a,b).Spatial models explained 32%-38% of the variation in OTU richness (Table 1).Spatial patterns were predicted largely by climate PC1, climate PC4, and soil PC3 at the site level, and by climate PC1, NPP and climate PC4 at the core level (Figure 2; Figure S5).The residuals of each model indicated no positive spatial autocorrelation at <50 km (Figure S6).The spatial prediction map revealed higher OTU richness in the northeastern region and mid elevations than in the southwestern region and low elevations (Figure 3a,b).The strongest discrepancies between the two models were found at high elevations, notably in mountain areas of central Honshu.Specifically, richness decreased at the highest elevations in the site-level analysis while remaining high in the core-level analysis, mainly caused by the different response patterns to climate PC1 at the coldest extreme (Figure S5).Site-level analysis also identified areas of low applicability for spatial prediction in the lowlands along the Japan Sea coast of northern Honshu as well as the lowlands and coasts of western regions (Figure 3a).

| EM fungal lineages
Spatial patterns were predicted for 5 of the 10 EM fungal lineages that showed relatively high model performance (Table 1; Figure 2c-l).The importance of predictor variables in determining relative richness varied among lineages (Figure 2; Figure S5).For example, the most important predictor variable was climate PC2 for /tomentella-thelephora and climate PC1 for /cortinarius.NPP had the largest importance, with climate and soil PCs being secondary for /russula-lactarius.Significant spatial autocorrelation in the model residuals was detected for /tomentella-thelephora and /suillus-rhizopogon, indicating possible biases in the model predictions (Figure S6).Spatial predictions revealed contrasting richness patterns among lineages (Figure 3c-f).
Richness hotspots were detected at high elevations for /cortinarius and in the southern islands for /boletus.In contrast, high richness was widespread at low to mid elevations for /tomentella-thelephora and in southwestern regions for /russula-lactarius.The relative richness of /suillus-rhizopogon was high at high elevations (Figure S7), but most of the OTUs in this lineage were represented by Pinus pumila in subalpine ecosystems.In the analysis testing the stability of predicted spatial patterns through subsampling, model performance showed large variation between the entire dataset and 20 subsets (Figure S8a,b).
However, Pearson's correlation coefficients between the spatial predictions using the entire dataset and subsets were high (r > 0.93; except for /boletus with r = 0.86; Figure S8c), and the standard deviations of predictions were very small (<0.047%), implying that the modelled spatial patterns were generally robust (Figure S9).

| Community composition
The final ordination stress in the NMDS was 0.230.The important predictor variables were climate PC1 and PC2 for NMDS1 and climate PC1 and PC3 for NMDS2 (Figure 2m,n; Figure S5).Host genus richness and the number of soil cores were relatively important for NMDS2.The predicted spatial map showed a steep shift in community composition dissimilarity with elevation and a gradual shift from north to south (Figure 4).The AOA result was similar to that of site-level richness, but also included the lowlands of Hokkaido.The variation in estimates associated with subsampling was larger in the mountains of northern Honshu and the lowlands of Hokkaido than in other areas.However, the extent of this variation was small, and the predicted spatial patterns showed high similarity between the entire dataset and subsets (Figures S4 and S8c).

| Richness
Our regional analysis revealed that EM fungal OTU richness was higher in the northeastern and montane regions than in the southwestern regions and low-elevation plains.The depicted pattern agrees with global patterns (Tedersoo et al., 2012) and regional patterns in Japan (Matsuoka et al., 2019;Shigyo & Hirao, 2021;Toju et al., 2018) for EM fungal richness that has been suggested to increase in colder habitats, supporting the first hypothesis.
However, richness appeared to be influenced by analysis level in the coldest habitats.Specifically, EM fungal richness decreased at the site level while remaining high at the core level at the lowest temperatures (Figure S5).This caused the greatest discrepancy in richness predictions in high mountain areas of central Honshu (Figure 2a,b).High core-level richness may be maintained by microhabitat heterogeneity in the coldest regions, where the complex soil structure (with thick litter and organic layers) provides high substrate availability for EM fungal colonization at a fine-scale.In contrast, overall EM fungal richness may be reduced by the middomain effect (Miyamoto et al., 2014), soil pH (Figure S5a), or other unmeasured factors such as vegetation and soil parameters that are reported to be critical in cold regions at high elevations and latitudes (Arraiano-Castilho et al., 2021;Matsuoka et al., 2020;Miyamoto et al., 2022).

| EM fungal lineages
The EM fungal lineages examined in this study are species-rich groups commonly found in temperate and boreal forests (van der Linde et al., 2018).Contrasting spatial patterns and habitat separation were found among the four lineages (Figures 3 and 4b), implying unique niche preferences and competitive abilities under certain environmental conditions.This finding implies that lineage-level zoning can be applied for an overview of spatial patterns, supporting the second hypothesis.Specifically, /tomentella-thelephora and /russula-lactarius showed differing spatial patterns, notably along the Japan Sea and Pacific sides in the western region, which may be strongly influenced by seasonal precipitation (Figure 3c,d).Similarly, /cortinarius and /boletus clearly showed opposite patterns along the temperature gradient (Figure S5e,f).These two lineages also exhibited diversity hotspots unlike /tomentella-thelephora and /russulalactarius.These results imply that taxon-specific spatial diversity patterns are helpful for identifying taxa with diversity hotspots.A statistical issue was the spatial autocorrelation in the model residuals for /tomentella-thelephora, which may induce biases in model predictions (Dormann, 2007).Additionally, sampling effort appeared to influence model performance in some cases (Figure 2; Figure S8).
Variability in the estimates tended to be larger in areas with higher richness for all lineages; however, the extent of this variability was very small (<0.05% of relative richness, Figure S9) and the predicted spatial patterns were highly robust (Figure S8c).We showed that the approach used in this study may be applicable to taxonomic groups with high OTU richness and strong environmental signals.

| Community composition
NMDS ordination is useful for illustrating continuous transitions in community dissimilarity (Kreft & Jetz, 2010).Supporting the third hypothesis, the community composition pattern was structured geographically, shifting from low to high elevations and from northeast to southwest.These shifts were mainly influenced by climate (Figure 2) and were likely related to niche differentiation among lineages along environmental gradients (Figures 3 and 4b).
High R 2 values in the random forest models imply that the potential effect of environmental predictors was well presented in the 2D NMDS plot (Table 1).However, the relatively low ordination stress value of 0.23, which is albeit typical in soil microbial studies (e.g.Geml et al., 2022;Větrovský et al., 2019;Wang et al., 2021), indicates the limitation of presenting in 2D space.Thus, the overall information contained in the distance matrix was only partially examined and presented.Nevertheless, taken together with the predicted spatial patterns of EM fungal richness and lineages, the map should represent an overview of community composition patterns.Soil conditions, particularly pH, are important factors influencing EM fungal richness in many ecosystems (Geml et al., 2022;Glassman et al., 2017;Tedersoo et al., 2020).In this study, soil pH had high importance in determining OTU richness (Figure 2; Figure S5), consistent with previous reports.While soil variables had varying degrees of importance, their roles were generally secondary to climate variables, particularly in community composition.The stronger influence of climate factors compared with soil factors on EM fungal communities agrees with the findings of global studies (Větrovský et al., 2019) and local-scale studies conducted in Japan (Shigyo & Hirao, 2021;Sugiyama et al., 2021).Moreover, in the mountains of central Japan, soil properties have been reported to be more important drivers of the richness patterns of saprotrophic fungi than of EM fungi (Ogwu et al., 2019;Shigyo & Hirao, 2021).Note that we used global-scale estimates of soil parameters with inherently large uncertainties (Batjes, 2016).Additionally, the mapping resolution of 1 km may be too large to fully represent soil chemical properties that can vary largely at fine spatial scales.Therefore, the potential roles of soil properties may be better represented using locally relevant data at fine spatial scales.

| Methodological considerations
Overall, the model performance was 32%-65% (Table 1), indicating that a large amount of variability remained unexplained and that other predictors may need to be considered.In particular, host genus richness had high importance in many models but was not included as a predictor for spatial modelling due to the lack of an applicable gridded dataset.Thus, organizing host distributions into a geographic information system would be valuable for spatial modelling of root-associated fungi.Host identity is particularly important for fungal taxa that require specific hosts, such as suilloid fungi on Pinaceae (Molina et al., 1999) and Alnicola and Alpova on Alnus (Kennedy & Hill, 2010;Moreau et al., 2006).Thus, the distribution of associated hosts is valuable for improving the prediction accuracy of host specialist EM fungi (Pickles et al., 2015;Pietras et al., 2018).
Conversely, the dominant host trees in Japan are mostly associated with diverse EM fungal taxa that show low levels of host specificity.
Additionally, tree distributions are generally associated with climate and geography in Japan (Kira, 1991;Noriyuki et al., 2021).Thus, the model should consider the interplay of environmental and biological interactions in future research.Other factors, such as soil nutrients (e.g.Mg, Ca, K) (Arraiano-Castilho et al., 2021;Wang et al., 2019) and geological history (Marin et al., 2023;Pither et al., 2018) at a scale relevant to regional analysis, may also help to explain the residuals.
Identifying areas of low applicability of spatial mapping is necessary because prediction grids may be located outside the environmental niche space of the training data (Meyer & Pebesma, 2021).
Generally, our sampling sites cover the range of environmental conditions in the focal region (Figure S2).Thus, the resulting predictions include only a small degree of extrapolation of the tested environmental variables.When considering multidimensionality, low model applicability was identified along the Japan Sea coast of northern Honshu as well as some lowland and coastal areas of the southwest, indicating that these areas require additional sampling to improve the spatial maps of site-level richness and community composition.

Low model applicability was also identified patchily in northern
Honshu for core-level richness and /russula-lactarius, likely driven by soil properties (i.e.soil PC2 and PC3).AOA analysis can be used to weight the relative contributions of predictor variables and consider multidimensional environmental spaces (Meyer & Pebesma, 2021).
Thus, the method is valuable for detecting model-specific uncertainties to help prioritize sampling areas for future investigations.
A technical bias that should be considered is that EM host plants were not uniformly distributed within a site, which led to variation in

F
Relative importance of environmental and on-site variables on EM fungal response variables.(a,b) Richness; (c-l) relative richness of lineages; (m,n) community composition NMDS axes.Points represent the median relative importance of a predictor across 100 simulations.The bars connect the 25th and the 75th percentiles of the relative importance distribution across runs.Numbers in the bottom corner of each panel represent the variance explained by the model (%, left) and R 2 from cross-validation statistics (%, right).

F
I G U R E 3 Spatial richness patterns predicted from eight environmental variables.(a,b) Richness (scaled); (c-f) relative richness of lineages.Lineages with high model performance are shown.Inset plot shows elevation with latitude.Black indicates areas with low applicability for spatial modelling based on AOA analysis.Note that the AOA results are not clearly shown in (c) due to their small area.Grey indicates non-forested areas.

F
I G U R E 4 (a) Dissimilarity in EM fungal community composition as revealed by NMDS ordination analysis.Black arrows indicate associations between predictor variables and NMDS axes based on the environmental fitting test.cPC and sPC represent climate PC and soil PC, respectively.(b) NMDS species scores for the top 10 OTU-rich EM fungal lineages.Ellipses indicate the standard error for compositional differences of fungal lineages./tom., /tomentella-thelephora; /rus., /russula-lactarius.Note that colours are assigned arbitrarily and do not correspond to (a) or (c), but NMDS positions correspond to (a).(c) Spatial prediction of EM fungal community composition dissimilarity derived from the 2D NMDS plot for the entire dataset.Black indicates areas with low applicability for spatial modelling based on AOA analysis.Grey indicates non-forested areas.
Evaluation of the random forest using NPP and climate and soil factors as predictor variables for spatial modelling.