Untargeted metabolic profiling reveals geography as the strongest predictor of metabolic phenotypes of a cosmopolitan weed

Abstract Plants produce a multitude of metabolites that contribute to their fitness and survival and play a role in local adaptation to environmental conditions. The effects of environmental variation are particularly well studied within the genus Plantago; however, previous studies have largely focused on targeting specific metabolites. Studies exploring metabolome‐wide changes are lacking, and the effects of natural environmental variation and herbivory on the metabolomes of plants growing in situ remain unknown. An untargeted metabolomic approach using ultra‐high‐performance liquid chromatography–mass spectrometry, coupled with variation partitioning, general linear mixed modeling, and network analysis was used to detect differences in metabolic phenotypes of Plantago major in fifteen natural populations across Denmark. Geographic region, distance, habitat type, phenological stage, soil parameters, light levels, and leaf area were investigated for their relative contributions to explaining differences in foliar metabolomes. Herbivory effects were further investigated by comparing metabolomes from damaged and undamaged leaves from each plant. Geographic region explained the greatest number of significant metabolic differences. Soil pH had the second largest effect, followed by habitat and leaf area, while phenological stage had no effect. No evidence of the induction of metabolic features was found between leaves damaged by herbivores compared to undamaged leaves on the same plant. Differences in metabolic phenotypes explained by geographic factors are attributed to genotypic variation and/or unmeasured environmental factors that differ at the regional level in Denmark. A small number of specialized features in the metabolome may be involved in facilitating the success of a widespread species such as Plantago major into such wide range of environmental conditions, although overall resilience in the metabolome was found in response to environmental parameters tested. Untargeted metabolomic approaches have great potential to improve our understanding of how specialized plant metabolites respond to environmental change and assist in adaptation to local conditions.


| INTRODUC TI ON
Plants produce a multitude of specialized metabolites that contribute to their fitness and survival, and play a role in their ability to adapt to local environmental conditions. Specialized metabolites play a role in defense against herbivores and pathogens (Bowers & Stamp, 1992), provide protection from harmful ultra-violet (UV) radiation (Murai, Takemura, Takeda, Kitajima, & Iwashina, 2009), and are essential for fostering mutualistic relationships with other organisms such as pollinators and mycorrhizal fungi (Schweiger, Baier, Persicke, & Muller, 2014). Specialized metabolites also are the source of many of our plant-based medicines and therefore have value to human health and well-being (Briskin, 2000). Much attention has therefore been placed on environmental and geographic factors influencing specialized metabolite production in plants, particularly in crop species and in model plant systems (Agrawal, Conner, Johnson, & Wallsgrove, 2002;Asai, Matsukawa, & Kajiyamal, 2016;Carrari et al., 2006;Dan et al., 2016;Hirai et al., 2004;Lasky et al., 2012;Tarczynski, Jensen, & Bohnert, 1993;Riedelsheimer et al., 2012).
However, the importance of natural variation in the environment in explaining metabolite variation within plant species remains little understood (Maldonado et al., 2017;Moore, Andrew, Külheim, & Foley, 2014).
Out of the circa 250 species that comprise the genus Plantago (Rahn, 1996;Rønsted, Chase, Albach, & Bello, 2002), the specialized chemistry of two species, Plantago lanceolata L. and P. major, has been the focus of a vast body of chemical ecology research. Due to these species' widespread and nearly global distributions in nature, their medicinal value (Samuelsen, 2000), and because both can be easily grown under controlled conditions, P. lanceolata and P. major L. serve as excellent model organisms to explore biochemical responses and improve understanding of local adaptation (Fuchs & Bowers, 2004;Pankoke, Buschmann, & Mueller, 2013;Sutter & Muller, 2011).
When investigating the complex and often interrelated effects that environmental factors have on plant metabolic phenotypes, it is becoming increasingly popular to use untargeted metabolomic approaches, as fewer a priori assumptions are made, allowing for the detection of metabolic responses that were overlooked using targeted approaches (Pankoke et al., 2013;Schweiger et al., 2014;Sedio, Rojas Echeverri, Boya, & Wright, 2017;Sutter & Muller, 2011).

Despite the extensive chemical ecology literature that exists for
Plantago, still little is understood about metabolome-wide responses (but see Sutter & Muller, 2011;Pankoke et al., 2013Pankoke et al., , 2015and Schweiger et al., 2014). As is the case for most model plant systems, the regulation and expression of plant metabolites have traditionally been assessed under controlled conditions and focus on the response of a handful of targeted (specific) metabolites to simplified environmental factors. In Plantago, it is the iridoid glycosides aucubin and catapol, and the caffeoyl phenylethanoid glycosides verbascoside and plantamajoside, that have been most widely studied as chemical response variables, particularly due to their antiherbivore and medicinal activity Mølgaard, 1986;Reudler et al., 2013;Rønsted, Franzyk, Mølgaard, Jaroszewski, & Jensen, 2003).
There is a growing need to investigate how varying environmental conditions affect plant metabolomes in situ. The majority of metabolite investigations, even for the well-studied Plantago species, have been conducted on plants grown under controlled conditions, and therefore, much of the environmental complexity is hidden given the simplicity of controlled conditions. Variation in environmental factors tested under controlled conditions may vary at levels not occurring under natural conditions and may not reflect responses that are shaped by multiple interacting factors found in situ (Maldonado et al., 2017). Given that metabolic phenotypes are a product of the interaction of a number of factors, including genetic and environmental factors, a plant's metabolome can be regarded as the ultimate response of biological systems (Fiehn, 2002;Lopez-Alvarez et al., 2017). Thus, screening the metabolomes of locally adapted plants that have been exposed to natural environmental conditions throughout their lifetimes is an essential step in improving our understanding of the role that a plant's metabolome plays in local K E Y W O R D S cosmopolitan weed, environmental conditions, geographic location, geographic patterns, herbivory, local adaptation, metabolic phenotype, phenotypic plasticity, soil pH adaptation to environmental conditions. Significant advances in the field of metabolomics, specifically with today's high-throughput analytical instruments and bioinformatic tools (Lankadurai, Nagato, & Simpson, 2013;Sedio et al., 2017), now allow us to more thoroughly explore the influence of environmental factors on the variation in metabolic phenotypes using complex multivariate data.
The aim of this study was to use an untargeted metabolomic approach to investigate intraspecific variation in metabolic features and disentangle the relative effects of environmental and geographic factors measured at different spatial scales on the metabolome using the widespread weed, Plantago major, as a model. Highly plastic species such as P. major, with large geographic ranges, have the ability to exhibit higher intraspecific variation in physiology and morphology and serve as good models to study local and regional adaptations (Soolanayakanahally, Guy, Silim, Drewes, & Schroeder, 2009). We therefore investigate the extent to which environmental factors such as differences in habitat type, light levels, phenological stage, leaf area, and soil conditions are important drivers of metabolic phenotypes of plants sampled in situ from 15 populations across Denmark. Given the complexity of assessing the effects of environmental factors measured under natural conditions, we first use variation partitioning to allocate variation of these explanatory variables on the foliar metabolomes, based on methanolic extracts.
We subsequently applied a novel and highly conservative mixed linear modeling approach to identify statistically significant metabolic features associated with environmental and geographic variation.
We further apply a networking analysis on co-occurring metabolites to identify patterns in metabolic phenotypes across Denmark.
Lastly, we test whether the induction of nonvolatile, polar metabolites can be detected in response to localized leaf damage caused by herbivory in situ, by comparing undamaged and damaged leaves collected from the same individuals.

| Plant population sampling
Plantago major is a short-lived perennial herb belonging to the genus Plantago (Plantaginaceae family). Plantago major was selected for this study because of its widespread distribution, its ability to grow in a wider range of natural and semi-natural conditions compared to its congener, P. lanceolata, and the presence of previous studies using Danish and Dutch populations which provides background information (Haeck, van der Aart, Dorenbosch, Van der Maarel, & Van Tongeren, 1982;Mølgaard, 1986;Kuiper & Bos, 1992). In addition to its ability to adapt to a wide range of habitat types and environmental conditions, P. major exhibits high phenotypic plasticity and can vary extensively in morphology based on growth conditions, even within populations (Mølgaard, 1986;Samuelsen, 2000;Warwick & Briggs, 1980). Two subspecies are recognized in Denmark, P. major subsp. major, and P. major subsp. intermedia (Gilib.) Lange (syn.  (Mølgaard, 1986). Plants are selffertile, wind pollinated, but nonrhizomatous, and although hundreds of individuals can persist in a small area, low genetic diversity is expected within populations (van Dijk & van Delden, 1981;Mølgaard, 1986;Rahn, 1996;).
Fifteen naturally occurring populations of Plantago major were selected across Denmark to capture a range of different habitat types (woodland, meadow, agricultural, and parkland), geographic regions and geological conditions known from the various islands and mainland in Denmark (Table 1). Given the low expected genetic diversity between individuals of P. major at a population level in Denmark, we selected three plants to serve as biological replicates in our population-level sampling. Two fully expanded leaves of similar age (based on their position in the rosette) and similar size were sampled from each plant; one leaf that showed no visible signs of herbivore damage ("undamaged"), and one leaf that did ("damaged").
This resulted in a paired dataset consisting of an undamaged and damaged leaf from each plant individual. Young leaves (i.e., those that had not fully unfurled) as well as older leaves (i.e., those showing changes in coloration or senescence) were avoided in our sampling. In total, 90 leaves (45 undamaged leaves and 45 damaged) were collected from 45 plants in the 15 different natural populations ( Figure 1). In order to minimize effects caused by temporal variation, field sampling was conducted over a span of 5 days, in July 2015. Photographs were taken of each plant, and the phenological stage was recorded (vegetative, immature flowers, or flowering).
Meta-data is presented as Supplementary Information, Table S1. In addition, one herbarium voucher was collected from each population and deposited at Herbarium C at the Natural History Museum of Denmark in Copenhagen. Environmental conditions (qualitative light level and soil samples), as well as associated species, were recorded for each population (Table 1 and Figure 1).

| Leaf area and localized damage by herbivory assessment
To assess the role that leaf area and localized damage caused by her-

| Analyses of soil samples
To explore variation in edaphic conditions within and between populations, three soil samples were taken at each sampling location, each collected adjacently to the three plants sampled, using stainless steel soil sampling rings (100 cm 3 in volume, Eijkelkamp, the Netherlands). Soil samples were loosely covered and air-dried until completely dry, before being stored in airtight containers. Soil samples (n = 45) were analyzed for pH, electrical conductivity (EC), and carbon to nitrogen ratio (C/N). Samples were gently ground using a mortar and pestle and passed through a 2-mm sieve. To measure pH, 10 g of sieved soil was transferred to a 50 mL tube and 25 ml of distilled water was added (1:2.5 w/v). Tubes were shaken for 1 min and then every 10 min for 50 min. pH was measured in the upper part of the colloid suspension with a Metrohm 914 pH/Conductometer (Metrohm, Switzerland), calibrated beforehand using three buffers (pH 4, 7, and 9). To measure EC, five grams of sieved soil was dispersed in 25 ml of distilled water (1:5 w/v). Tubes were shaken every 10 min for an hour, then let to rest for 10 min. EC was measured in the upper layer using a Metrohm 914 pH/Conductometer after calibration with one conductivity standard (Reagecon TM, NIST 20 μS/ cm, Fisher Scientific, USA). For C/N, a 15 g subsample of sieved soil was ground to a fine powder using a custom-designed ball mill.
One-hundred milligrams of milled soil was combined with the same weight of tungsten. Soil C/N was measured by Dumas combustion on GmbH, Hanau, Germany). Sulfanilamide was used as a standard in three different weights, and acetanilide was used to calibrate drift after every 15 samples.

| Spatial variables describing geographic variation
A Euclidean distance matrix was created between sampling locations using the vegdist function the R package "vegan" (Oksanen et al., 2017). Geographic variation between populations was then incorporated into downstream analyses by constructing principal components of neighborhood matrices (PCNM) using the pcnm function, also in "vegan," and PCNM one and two are used as explanatory variables to describe spatial structure and are referred to as geographic distance herein (Borcard & Legendre, 2002).
In addition, three broad geographic regions in Denmark were designated to investigate geographic effects at a higher scale. These

| Targeted analyses of known metabolites
Screening for metabolic features in a subset of the overall metabolome offers an advantage when seeking to identify changes in metabolic phenotypes in response to environmental factors.
However, to test the response of metabolites known for their antiherbivory effects in Plantago (Rønsted et al., 2000(Rønsted et al., , 2003, 10 standards were ran using the same UPLC analyses described above and a baseline for how variable these compounds are in ixoroside, 10-acetoxymajoroside (isolated in previous studies by Rønsted et al., 2000Rønsted et al., , 2003 and provided for this study by Søren Rosendal Jensen, Danish Technical University). A standard for plantamajoside was unfortunately not available for this study, which is a metabolite that has previously been found to differ across Danish populations of P. major (Mølgaard, 1986).

| Variation partitioning analysis of metabolic phenotypes
All 197 features were analyzed using variation partitioning (Borcard, Legendre, & Drapeau, 1992) to explore the explanatory power that

| Conditional log-normal model analysis
In order to conduct a statistically valid analysis with power sufficient enough to detect significant differences in metabolic features associated with environmental and geographic variation and address low sampling numbers, we use the two-step model proposed by  In all analyses, we corrected for leaf area (logarithmic scale) and soil pH (quadratic scale). Soil EC and C/N were found to be correlated with soil pH based on Spearman's Rank coefficient (EC, ρ = 0.49, p < 0.001) (C/N, ρ = 0.39, p = 0.001), and therefore, soil pH was used to represent edaphic factors in our modeling.
Conditions set out to run the models (detailed in Supplementary   Information, Appendix S1) resulted in a slightly different number of metabolic features suitable for the modeling of each of the seven explanatory variables. Table 2 lists the number of features included in the analyses for each factor. The number of features tested for each feature was recorded, and Type I errors corrected to q-values using the false discovery rate (FDR, Benjamini & Hochberg, 1995).
The q-values below 20%, 10%, and 5% were recorded, and Tukey post hoc tests were performed on all metabolic features found to have a q-value of less than 20% (FDR q-value < 0.20).
A Venn Diagram with the overlap in significant metabolic features identified (FDR q-value < 0.20) for five of the seven environmental and geographic factors tested was constructed in the package "VennDiagram" in R (Chen, 2016).

| Network analysis of co-occurrence
Co-occurrence networks have been implemented within community ecology using high-throughput sequencing data, providing insight into Here we use this approach on metabolic feature data, using features that were shown to differ significantly with geographic region (30 features in total). Spearman's rank correlations were calculated in a matrix using the "vegan" package of R (R Core Team, 2016; Oksanen et al., 2016), with a Spearman's correlation coefficient (ρ) < 0.372, which was equivalent to a p-value < 0.01 (Junker & Schreiber, 2008) was considered valid co-occurrence between features. A network of features ("nodes") and co-occurrences ("edges") between nodes was created with qgraph in R and visualized using the program Cytoscape (Shannon et al., 2003;Eskamp et al., 2012). Nodes were set as pie charts represent average feature abundance for each geographic region. The network analysis was then used to identify regionally distributed clusters of co-occurring features, which could serve as a metabolic fingerprint.

| Testing for differences in known antiherbivory compounds
Mann-Whitney U tests were performed in R on the output from TargetLynx to test for significant differences between the intensities recorded for the known antiherbivory compounds between undamaged and damaged leaf treatment groups.  being explained by geographic region and distance (Table 2).

| RE SULTS
Geographic region (as defined by three groupings: eastern Jutland, western Jutland, and the southern islands, Figure 1) was found to have the greatest magnitude of effect on differences in the metabolome; thirty of the 193 metabolic features tested in the model were found to differ statistically after FDR correction (q-values < 0.20).
Nine metabolic features of the 193 investigated where found to differ significantly based on geographic distance (as measured by principal coordinates of neighbor matrices), and all but one of these were shared with the features found to differ significantly in modeling of the variable geographic region.
Second to the geographic factors, soil parameters had the next largest effect on differences in metabolic features (Table 2). Eight features were found to differ significantly according to pH. Four features were found to differ significantly in response to habitat type, and three metabolic features were found to differ significantly for leaf area. Light levels (i.e., whether the plants were growing in populations that had either full sun, part shade or full shade), phenological stage (vegetative, immature flowers, or flowering), and damage caused by herbivory did not explain any significant differences between metabolite features (Table 2) Table 3). Seven significant features were found to be shared between the two geographic factors tested, only one feature (SF14) was found to overlap between leaf area and habitat type, and one feature (SF26) was found to overlap between habitat type and the geographic factors. All eight features found to differ significantly under soil pH were not shared with any of other explanatory factors tested (Figure 4).

| Network correlation analysis with features varying by geographic region
To investigate the degree of co-occurrence of the 30 significant features explained by geographic region (Table 3) Residuals: 90% 1 2 TA B L E 3 Significant metabolic features and their q-values listed after FDR correction (q-value < 0.2). A conditional log-normal model was run separately for each of the seven explanatory variables tested and no significant features were found for herbivory or phenology; therefore, these variables are not included in the

| Targeted analyses
Six of the ten known metabolites for which chemical standards were available (aucubin, gardoside, geniposidic acid, melittoside, verbascoside, and 10-acetoxymajoroside) were detected in the unfiltered metabolic feature data, based on the retention times and mass spectra of these metabolites; however, the intensities of these six metabolites did not differ significantly between undamaged and damaged leaf treatment groups based on Mann-Whitney U tests (data not shown). Two of the known metabolites, aucubin and melittoside, were also detected in the filtered dataset of 197 metabolic features, yet were not among the 44 significant features which were identified in our conditional log-normal model.

| Environmental and geographic factors regulating the metabolome
We investigated variation in the foliar metabolome of Plantago major growing under natural environmental conditions across Denmark and identified metabolic features, beyond the traditionally studied antiherbivore metabolites, that differed in response to environmental and geographic factors, at varying spatial scales. Due to its high phenotypic plasticity and ability to adapt to a wide range of environmental conditions (Lotz & Blom, 1986;Rahn, 1996;Wolff, 1991), we hypothesized that diversity in the metabolome could be a factor involved in the plant's success and therefore expected that metabolic features would vary in response to environmental conditions such as differing habitat types, light levels, and soil conditions (Gratani, 2014 This study most notably demonstrated the importance of geographic scaling in detecting differences in metabolic phenotypes, where broad geographic regions (eastern Jutland, western Jutland, and the islands) were found to explain the highest percentage of variation of the metabolome and the largest number of individual features. Geographic factors, including distance, isolation, latitudinal, and longitudinal gradients, have previously been shown to be important in explaining intraspecific variation in targeted metabolites for vascular plants, reflecting possible adaptations to local conditions (Davey, Burrell, Woodward, & Quick, 2008;Moles et al., 2011;Züst et al., 2012). In performing networking analysis on untargeted metabolomic data, the current work also demonstrates unique patterns of co-occurrence, such that many of the metabolite features co-occur in plants sampled within the same geographic region.
These metabolic phenotypes could be a useful approach in identifying features and metabolic pathways associated with factors such as herbivory, other environmental stressors, or genotypic diversity.
In line with our findings, variation in metabolic traits due to geographic factors has previously been associated with genotypic variation in Plantago lanceolata (i.e., Reudler & Elzinga, 2015), and in wild populations of Arabidosis thaliana (L.) Heynh. (Lasky et al., 2012), (Davey et al., 2008). These associations may reflect that plant genotypic variation increases with increasing geographic distance (Nybom, 2004) and highlight that there is an interdependence between genotype and metabolic phenotype (Fiehn, 2002). Significant differences found in metabolic features associated with geographic region in this study could therefore reflect upon underlying genetic variation between populations. However, in widespread weedy species such as P. major, with likely few barriers to its dispersal and a low expected genetic variation within populations (Verhoeven, Macel, Wolff, & Biere, 2010;Wolff & Morgan-Richards, 1998), greater spatial distances are likely required to detect genetic variation Incorporating genetics in future studies may test this assumption.
Climatic and geological conditions, as well as biological factors including herbivore communities and pathogens are also likely to differ across the three geographic regions, thus, it cannot be ruled out that these and/or other unmeasured factors are driving the significant differences in metabolic features we observed between regions. For example, variation in local and regional herbivore communities has been shown to drive differences in metabolic profiles (Agrawal, 2011;Mølgaard, 1986;Züst et al., 2012). Similarly, climatic factors have been shown to explain variation in metabolic profiles across broad geographic regions (Laskey et al., 2012;Reudler & Elzinga, 2015). To further elucidate the effects of spatial scale from other factors, future research should include a more comprehensive suite of biotic and abiotic variables, including climate and genetic data, with a greater emphasis on quantitative data.
Although to a lesser degree, we also found natural variation in soil pH to be an important factor in explaining differences in metabolic phenotypes in P. major. This demonstrates that differences in edaphic conditions within a given habitat may result in differences in the expression of specialized metabolites in the metabolome, even with low population sampling numbers. Nutrient levels and nutrient availability are influenced by and often correlated with soil pH (Sims & Patrick, 1977), and changes in both pH and nutrient availability have been shown to influence the concentrations and composition of secondary metabolites produced in Plantago species (e.g., Darrow & Bowers, 1999;Jarzomski et al., 2000;Miehe-Steier et al., 2015;Pankoke et al., 2015). In the current study, pH significantly accounts for changes in eight unique metabolic features, and no overlap was found with the significant metabolic features explained by geographic or other factors, which suggests these features are associated with local adaptation independent of geographic and genotypic variation (Latzel, Janecek, Dolezal, Klimesova, & Bossdorf, 2014;Metlen, Aschehoug, & Callaway, 2009).
Phenological stage, which has been well documented to influence specialized metabolites in model Plantago species (i.e., Barton, 2008;Hanley et al., 2013;Pankoke et al., 2013;Sutter & Muller, 2011), was not found to be an important factor in explaining metabolic variation in situ. Any differences due to phenological stage, were explained by differences in geographic region, which is a particularly important finding considering the difficultly in standardizing phenological stage when sampling plants growing in situ across great geographic distances.

| Metabolic responses to herbivory
To the best of the authors' knowledge, this is the first study on metabolic responses to localized herbivory, where a paired study design was adopted to investigate undamaged and damaged leaves from the same plant, growing under natural environmental conditions. Even although none of the metabolic features detected in this study were found to differ significantly between undamaged and damaged leaves, metabolites specific to herbivory could be expressed systematically (i.e., a whole plant response), or could simply have been undetectable, and we therefore cannot conclude that polar metabolites in Plantago major are not induced in response to herbivory. Past experiments on target metabolite induction in Plantago have found induction to be short-lived and therefore difficult to detect (i.e., Bowers & Stamp, 1993;Darrow & Bowers, 1999;Fuchs & Bowers, 2004). The timing of the metabolomic screening in our study, as well as in other studies where no or insignificant differences were found (i.e., Jarzomski et al., 2000;Stamp & Bowers, 1996;Sutter & Muller, 2011), could limit detection of induced metabolites. Additional work should aim to investigate important factors such as the time and type of herbivore damage under in situ conditions.

| Model validation
The statistical approach developed here allowed for the conservative testing between metabolic and environmental data that did not follow normal distributions and allowed us to address a metabolomic dataset with many zero values, which typically constrain the use of more traditional statistical analyses for ecological and biological data matrices, and further addressed issues in statistical power with low sampling numbers (Supplementary Information). Rigorous filtering steps were used in our metabolic feature data which substantially reduced the number of features available for statistical analyses.
Although this conservative filtering prior to statistical analyses could result in missing important metabolic features, we believe that our conservative methods, including using FDR correction for multiple testing, improved our confidence levels in identifying significant metabolic differences and removed a high level of possible Type I error, which was particularly important in the absence of controlled experiments. The statistical approach developed here could also be useful in future studies of similar complex ecological data in other systems.

| CON CLUS IONS
A surprisingly limited number of metabolic features were found to be regulated by environmental factors in this study. The greatest differences in metabolic phenotypes were detected across broad geographic regions, and this finding is largely explained by geographic separation, which in turn could reflect genotypic differentiation or variation in other unmeasured climatic and biological factors. This work also provides evidence that a small number of metabolic features are associated with local adaptation to soil pH and are independent of the effects of geography.
No effects of herbivore damage were found between undamaged and damaged leaves of the same plants; however, further studies are required to investigate the induction of specialized metabolites in response to herbivory in situ and to characterize herbivore communities over broad geographic regions. In addition, future metabolomic studies conducted in situ would benefit from greater number of individuals sampled from each population, as well as controlled greenhouse studies being run in tandem, allowing for more rigorous hypothesis testing based of the patterns observed in situ. ArticMass. Søren Rosendal Jensen is thanked for providing standards of melittoside, gardoside, salidroside, arborescoside, geniposidic acid, asperuloside, ixoroside, and 10-acetoxymajoroside.

ACK N OWLED G M ENTS
Birgitte Boje Rasmussen is thanked for running the C/N soil analyses. Per Mølgaard is thanked for initial input to the sampling area design.

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R S' CO NTR I B UTI O N S
CJB, NIA, and NR conceptualized the study. NIA, NHR, and CJB conducted field sampling. NIA and NHR prepared samples for analyses UPLC analyses, and NIA and HCBH prepared and conducted the soil analyses. FE, and MT produced the metabolic feature data and NIA analyzed the data. BM, CJB, and NIA designed and undertook statistical analyses. NIA wrote the manuscript with CJB, BM, FE, and NR.
All authors read and commented on the manuscript and approved the final version. Authors declare no conflict of interests.

DATA ACCE SS I B I LIT Y
Metabolic feature data and environmental and geographic factor data generated for this study is available from the Dryad Digital Repository : https://doi.org/10.5061/dryad.1q0s4gf.