Identifying microbial drivers in biological phenotypes with a Bayesian network regression model

Abstract In Bayesian Network Regression models, networks are considered the predictors of continuous responses. These models have been successfully used in brain research to identify regions in the brain that are associated with specific human traits, yet their potential to elucidate microbial drivers in biological phenotypes for microbiome research remains unknown. In particular, microbial networks are challenging due to their high dimension and high sparsity compared to brain networks. Furthermore, unlike in brain connectome research, in microbiome research, it is usually expected that the presence of microbes has an effect on the response (main effects), not just the interactions. Here, we develop the first thorough investigation of whether Bayesian Network Regression models are suitable for microbial datasets on a variety of synthetic and real data under diverse biological scenarios. We test whether the Bayesian Network Regression model that accounts only for interaction effects (edges in the network) is able to identify key drivers (microbes) in phenotypic variability. We show that this model is indeed able to identify influential nodes and edges in the microbial networks that drive changes in the phenotype for most biological settings, but we also identify scenarios where this method performs poorly which allows us to provide practical advice for domain scientists aiming to apply these tools to their datasets. BNR models provide a framework for microbiome researchers to identify connections between microbes and measured phenotypes. We allow the use of this statistical model by providing an easy‐to‐use implementation which is publicly available Julia package at https://github.com/solislemuslab/BayesianNetworkRegression.jl.


Introduction
Microbial communities are among the main driving forces of biogeochemical processes in the biosphere.For one, many critical soil processes such as mineral weathering and soil cycling of mineral-sorbed organic matter are governed by mineral-associated microbes [1,2,3,4,5].Additionally, plant and soil microbiome drive phenotypic variation related to plant health and crop production [6,7,8,9].Lastly, the human gut microbiome plays a key role in the regulation of human health and behaviour [10,11,12] and similar host-microbe associations have been studied for lung [13,14] or skin microbiome [15].Understanding the composition of microbial communities and how these compositions shape specific biological phenotypes is crucial to comprehend complex biological processes in soil, plants and humans alike.
Standard approaches to study the connection between microbial communities and biological phenotypes rely on abundance matrices to represent the microbial compositions [16,17,18,19,20,21,22,23]. Different experimental settings are defined and then microbial compositions are measured (as abundances) on each experimental setting.Next, the abundance matrices are used as input in a regression-type (or machine-learning) analysis to relate the microbial community to phenotypes of interest.This standard pipeline, however, has limitations to find real connections between microbes and phenotypes.Many times, these standard approaches focus on the relationship between a single microbial OTU and the phenotype, adjusting for possible confounders like soil mineral characteristics.This univariate procedure has multiple assumptions that are violated by the complexity of the microbiome and can lead to elevated type-I error rates as well as reduced power.For example, univariate analyses of individual OTUs ignore correlations and interactions among the microbial communities, which could lead to power loss if the tested OTU is weakly correlated with an unmeasured relevant OTU.In contrast, high-dimensional regression models allow the inclusion of multiple microbial OTUs simultaneously [24,25,26,27,28], yet these models can be complicated by multicollinearity [29] or instability due to overfitting.Furthermore, standard regression analyses rarely account for interactions among microbes or ignore potential epistasis [30,31,32,33] among genes across the microbiome that can reduce power if not properly modeled [34,35].Finally, microbial OTUs are usually represented as relative abundances (compositional data) which is restricted to sum to 1 and this affects how proportions behave in different experimental settings (e.g.changes in proportions in the microbial composition does not necessarily reflect actual biological changes in the interactions [36]).
Given that relative abundances only provide a snapshot of the composition of the community at the specific time of sampling and do not account for correlations between microbes, microbial interaction networks have been recently preferred to represent microbial communities [37,38,39,40].Yet models to connect a microbial network to a biological phenotype remain unknown.On one side, recent years have seen an explosion of methods to infer microbial networks from a variety of data types [41,42,43,44].However, these methods aim to reconstruct one microbial network and do not attempt to connect this network to any biological phenotypes.On the other side, novel statistical theory has been developed to study samples of networks [45,46,47], yet again, these methods do not aim to understand the connection between the networks and a phenotype of interest.There has only been a handful of new methods that aim to identify associations between a sample of networks (predictors) and a phenotype (response) via a regression framework [48,49,50].These methods, however, have only been studied for brain connectome networks which, unlike microbial networks, are intrinsically dense.In conclusion, methods to find associations between a sample of microbial networks and a biological phenotype remain unknown.
In this paper, we introduce a Bayesian Network Regression (BNR) model that uses the microbial network as the predictor of a biological phenotype (Fig. 1).This model intrinsically accounts for the interactions among microbes and is able to identify influential edges (interactions) and influential nodes (microbes) that drive the phenotypic variability.While the model itself is not new [48,49,50], it has only been studied for brain connectome networks, and thus, its applicability to microbial networks which are inherently more high-dimensional and sparser has not been studied.Here, we test the BNR model on a variety of simulated scenarios with varying degrees of sparsity and effect sizes, as well as different biological assumptions on the effect of the microbes on the phenotype such as additive effects, interactions effects or functional redundancy [51].We show that this model is able to identify influential nodes and edges in the microbial networks that drive changes in the phenotype for most biological settings, but we also identify scenarios where this method performs poorly which allows us to provide practical advice for domain scientists aiming to apply these tools to their datasets.In addition, we implement the method in an open-source publicly available and easy-to-use new Julia package (BayesianNetworkRegression.jl) with online documentation and step-by-step tutorial which will allow scientists to easily apply this model on their own data.The computational speed and efficiency of the package makes it suitable to meet the needs of large datasets.
Main contributions.The Bayesian Network Regression (BNR) model is not new [48,49,50], yet its applicability to microbial datasets has never been explored.Here, we develop the first thorough investigation of whether BNR models are suitable for microbial datasets on a variety of synthetic data that was generated under realistic biological scenarios.Microbial datasets are challenging due to the high-dimension and high sparsity.In addition, we introduce a novel Julia package (BayesianNetworkRegression.jl) with extensive documentation that implements the BNR model and has broad applicability for the microbiome research community.

Model and priors
We use the Bayesian Network Regression model initially defined in [50] to elucidate associations between microbial drivers and biological phenotypes.We repeat below the theoretical details of the model and priors for the sake of completeness.
Let y i denote the scalar continuous phenotype for sample i and let N i be the network that represents the microbial community in sample i.In this work, we assume that these networks have been estimated already and will be assumed to be known without error.That is, at this stage, we do not propagate statistical error in the inference of the microbial network (but see Discussion).For each microbial network, we can compute its adjacency matrix A i ∈ R V ×V where V represents the number of nodes in the microbial network.While [50] assume that all networks must have the same nodes, here we allow different networks to have different nodes so that V represents the total number of nodes that appear in at least one network.
The network regression model is then defined as is the symmetric network coefficient matrix, and • F represents the Frobenius inner product.
Given the symmetric structure of the predictor A i , we can rewrite this model with a design matrix X where the ith row of X is set to be the upper triangle of the ith adjacency matrix A i so that X ∈ R n×q with q = V (V −1)

2
: The responses y ∈ R n then follow a Normal distribution y ∼ N µ + Xγ, τ 2 I n with an overall mean µ ∈ R, regression coefficients γ ∈ R q and error variance τ 2 ∈ R. Here, I n represents the identity matrix of dimension n.We note that the regression coefficients of this model (γ kl ) represent the effect of the edge in the microbial network between node k and node l (interaction effects) in the response and they are connected to the original regression coefficient matrix B as b ij = γ ij /2.The model implicitly assumes that there are no main effects for the presence of the microbes in the sample.While this could be a reasonable assumption for brain research where the model originated [50], it is not appropriate for microbial research where the phenotype is expected to be affected by both the presence of the microbes and their interactions.At this stage, we implement the model without main effects to test its applicability to microbial datasets as is.We will, however, extend the model to the inclusion of main effects as future work (see the Discussion).
The prior for the regression coefficients γ ∈ R q is given by where u 1 , . . ., u V ∈ R R are latent variables corresponding to each of the V nodes, Λ ∈ R R×R is equal to diag(λ 1 , ..., λ R ) for λ i ∈ {−1, 0, 1} and s kl ∈ R is a scale parameter.We note that the effect on the response of the interaction between node k and node l is positive if ).The dimension of the latent variable R is chosen by the user and we find in our simulations that it has a strong effect in the floating-point stability of the implementation (see Simulations).
The matrix Λ governs which entries in the latent variables u k ∈ R R are informative and we set the following prior: with probability π1r 1 with probability π2r −1 with probability π3r for r = 1, . . ., R and with hyper prior (π 1r , π2r , π3r ) ∼ Dirichlet(r η , 1, 1) for η > 1.Note that the probability of 0 in the Dirichlet is governed by the index r (and η) which is meant to bias inference towards lower dimensions.These π parameters control the sparsity of the regression coefficient matrix B. It is traditionally assumed that only a subset of microbes in the sample are key drivers of the phenotype.
To determine which nodes are influential (non-zero effect on the response), we set a spike-and-slab prior [52]: where δ 0 is the Dirac-delta function at 0, M ∈ R R×R is a covariance matrix, 0 is an R-dimensional vector of zeros, and ξ ∈ {0, 1} V is a column vector of dimension V where each value denotes whether or not that node is influential on the response.We assume that ξ k ∼ Bernoulli(∆) with hyper priors Lastly, the prior for the scale parameters (s ∈ R q ) is given by s kl ∼ Exp(θ/2) with hyper prior θ ∼ Gamma(ζ, ι) for shape ζ ∈ R and rate ι ∈ R, and the prior for the overall mean (µ) and error variance (τ 2 ) is assumed to be non-informative π(µ, τ 2 ) ∝ 1 τ 2 .Table 4 in the Appendix contains all parameters in the model and their descriptions.

Posteriors
The posterior distribution of the overall mean and the error variance are given by where 1 n is an n-dimensional vector of ones, W ∈ R q is a vector given by and D ∈ R q×q is a diagonal matrix with the vector of scale parameters s in the diagonal.
The posterior distributions for the scale parameters (s kl ) and their hyper parameter (θ) are given by The posterior distribution for the regression coefficients (γ) is given by Next, for the auxiliary variables, the posterior distribution of the latent variables (u k ) is given by where for γ k = (γ 1k , ..., γ k−1,k , γ k,k+1 , ..., γ kV ) ∈ R q , and N (x|m, v) corresponding to the Gaussian PDF evaluated on x for mean m and covariance v, ∆ given by ×R , 0 is the (V − 1)-dimensional vector of zeros, and matrix M sampled from the posterior distribution: where 1(•) is the indicator function and I R is the identity matrix of dimension R.
In addition, the posterior mean (m u k ) and posterior covariance matrix (Σ u k ) are defined as The posterior probability of the vector ξ is given by with the same definition of w u k as in Eq. 8.
The posterior distribution of the λ r values is given by with probability p 2r −1 with probability p 3r (12) with where N (x|m, v) corresponds to the Gaussian PDF evaluated at x for mean m and covariance v and Lastly, the posterior distribution of the hyper parameters (π 1r , π2r , π3r ) is given by where again 1(•) represents the indicator function.
The main parameters of interest are the regression coefficients γ which represent the effect of the interactions among microbes on the response and the parameters ξ that represent whether each of the V microbes are influential on the response or not.We obtain the posterior probability that a node is influential by taking the mean of ξ over the samples.Samples of the posterior distributions are obtained using Gibbs sampling as described in the next section.

Open-source software
We have released a package in the Julia programming language to perform the sampling scheme which provides posterior estimates and convergence statistics for the Bayesian Network Regression model available as BayesianNetworkRegression.jl at the GitHub repository https://github.com/solislemuslab/BayesianNetworkRegression.jl.In addition, we provide all reproducible scripts for the simulation study (described in the next section) in the GitHub repository https://github.com/samozm/bayesian_network_regression_imp.

Simulations
One of the main objectives of this manuscript is to test the applicability of the Bayesian Network Regression model when facing sparse data that is ubiquitous in microbiome research.There are two main sources of sparsity: 1) the matrix of regression coefficients B is assumed to be sparse (sparsity controlled by π) which means that there are few microbial drivers that affect the phenotype and 2) the data matrix is sparse (represented by the adjacency matrix A i with sparsity controlled by the number of sampled microbes k) because we do not have complete sampling of all microbes.We test different levels of sparsity both on the regression coefficient matrix B (π = 0.3, 0.8) and in the adjacency matrices A i (k = 8, 15, 22 sampled microbes out of 30 total).In addition, we test two levels of effect sizes (µ = 0.8, 1.6) so that the entries in the regression coefficient matrix B are distributed N (µ, σ = 1.0).
We split the simulations into two scenarios: 1) theoretical simulations (graphical description in Fig. 2) and 2) realistic simulations (graphical description in Fig. 3).We describe both scenarios next.We simulate a 30-taxon phylogenetic tree as the representation of the true microbial community, and then select k microbes per sample with which to build an adjacency matrix (A i ) per sample.The phenotype y i is then computed as the Frobenius product of the sample adjacency matrix (A i ) and the true matrix of regression coefficients B plus Gaussian noise.
Theoretical simulations.We simulate a 30-taxon phylogenetic tree using the rtree function from the R package ape [54] that randomly splits edges until the desired number of leaves is attained.This tree represents the true community of microbes.We generate the true matrix of regression coefficients (B ∈ R 30×30 ) by flipping a biased coin for every entry b ij to determine if the edge connecting nodes i and j is an influential edge.We vary the probability of influential node as π = 0.3, 0.8 as already mentioned above.If the edge is indeed set as influential, the entry b ij is sampled from a Normal distribution with mean µ = 0.3 or 0.8 and variance equal to 1.0.
For each sample, we randomly select k (set as 8, 15, or 22) microbes out of the 30 total microbes.We build the adjacency matrix for that sample using the phylogenetic distance between microbes.That is, the entry a ij is equal to the inverse of the phylogenetic distance between microbe i and microbe j if both microbes are present in the sample (and zero otherwise).Note that we are not estimating the phylogenetic tree for a given sample given that we are not simulating sequences, and thus, we are ignoring estimation error in the phylogenetic pipeline at this point.Future work will incorporate this type of error to assess its implications downstream (see the Discussion).
For each sample, we calculate the phenotype y i as the Frobenius product between B and the adjacency matrix for that sample A i plus a Gaussian random error with mean zero and variance of 1.0.Because the generation of the phenotype follows the same model as the Bayesian Network Regression, we denote this scenario as "theoretical".
Lastly, we vary the sample size as n = 100, 500.The whole simulation process is illustrated in Fig. 2 and the mathematical details are described in the Appendix.
Figure 3: Description of realistic simulations.We simulate a 30-taxon phylogenetic tree as the true microbial community, and then select k microbes per sample with which to build an adjacency matrix (A i ) per sample.The phenotype y i is computed under three models: additive, interaction or functional redundancy.Within each model, there are two options for the generation of the true microbial effects: randomly sampled independently of other microbes or phylogenetically-informed in which related microbes have similar effects on the phenotype.
Realistic simulations.Again, we simulate a 30-taxon phylogenetic tree which represents the true community of microbes using the rtree function from the R package ape [54] that randomly splits edges until the desired number of leaves is attained.For each microbe, we simulate its true effect on the phenotype under two settings: 1) random effect in which each microbe has an effect b i that is distributed Normal with mean µ b = 0.8 or 1.6 and variance of 1.0 independently of other microbes, and 2) phylogenetic effect in which we simulate the whole vector of microbial effects b as a Brownian motion on the phylogenetic tree using the Julia package PhyloNetworks [55].That is, b ∼ N (µ b 1 30 , Σ) where µ b = 0.8 or 1.6, 1 30 is a 30-dimensional vector of ones, and Σ is the covariance matrix imposed by the phylogenetic tree.In this last setting, microbes that are closely related have similar effects on the phenotype.
The generation of the sample adjacency matrices (A i ) is the same as in the theoretical scenario.Namely, for each sample, we randomly select k (set as 8 or 22) microbes out of the 30 total microbes.We build the adjacency matrix for that sample using the phylogenetic distance between microbes.That is, the entry a ij is equal to the inverse of the phylogenetic distance between microbe i and microbe j if both microbes are present in the sample (and zero otherwise).
For the computation of the phenotype, we test three settings: Additive model.The phenotype y i is computed as the sum of effects b i for the influential microbes that are present in the sample plus Gaussian noise.As before, microbe is considered influential with probability π = 0.3, or 0.8.
Interaction model.In addition to the main effects already described in the additive model, the phenotype also contains interaction terms a jl × b jl , with b jl ∼ N (0.4, 1.0) if microbes j and l are both influential and a jl as the inverse of phylogenetic distance between microbes j and l if both are in the sample.Since the interaction term is positive, this model is sometimes referred to as a super-additive model.
Functional redundancy model.Here, we assume that the effect of the microbes on the phenotype is not unbounded.That is, different microbes can have the same function, and thus, the phenotype is not affected by both microbes at the same time.We model this mathematically by imposing a threshold L on the phenotype after it was computed following the interaction model.In this setting, we cannot impose the same threshold on all combinations of µ b and π because the phenotype values will have different ranges.Therefore, we utilize different thresholds for each setting to try to guarantee that not all response values will be capped: L = 3 for the π = 0.3, µ = 0.8 case, L = 7 for the π = 0.3, µ = 1.6 case, L = 22 for the π = 0.8, µ = 0.8 and L = 30 for π = 0.8, µ = 1.6.
Because the phenotype is not computed using the Frobenius product directly (as in the theoretical simulations), but instead it is generated based on biologically reasonable settings (additive, interaction and functional redundancy models), we denote these scenarios as "realistic".
Lastly, we vary the sample size as n = 500, 1000.The whole simulation process is illustrated in Fig. 3 and the mathematical details are described in the Appendix.

MCMC convergence
For each simulation setting, we run three MCMC chains and assess convergence using the R convergence criterion proposed in [53].We consider convergence to have been achieved if R ≤ 1.2 for all of the γ and ξ variables.For most simulation settings, 30,000 burn-in followed by 20,000 samples (50,000 total generations) is enough to guarantee convergence.In eleven cases (Tables 1 and 2), longer chains are needed to achieve convergence.See Results for information on computing times.The number of MCMC samples post burn-in is 20,000 for all cases.All other simulation cases required 30,000 burn-in (50,000 total) generations to achieve convergence.The number of MCMC samples post burn-in is 20,000 for all cases.All other simulation cases required 30,000 burn-in (50,000 total) generations to achieve convergence.

Theoretical Simulations
Fig. 4 shows the posterior probability of being an influential node (key microbe) for different sample sizes (n = 100, 500), number of samples microbes (k = 8, 22), effect sizes (µ = 0.8, 1.6) and sparsity (π = 0.3, 0.8) in the regression coefficient matrix B. Each bar corresponds to one node (microbe) and the bars are colored depending on whether the node is truly influential (dark) or not influential (light).For smaller sample size (n = 100), the effect sizes need to be larger (µ = 1.6) for the nodes to be accurately detected as influential (tall dark bars).For a larger sample size (n = 500), the model has a high PP for truly influential nodes (tall dark bars) and a low PP for non-influential nodes (short light bars) regardless of the values of k, µ, π.There seem to be no major differences in the performance of the method in terms of regression coefficient sparsity (π) or adjacency matrix sparsity (k) with the exception that for smaller effect sizes (µ = 0.8) in small sample size setting (n = 100), less sparsity in A (k = 22) improves the detection of influential nodes compared to k = 8.
We further compare the performance when changing the latent dimension (R).In Fig. 4, we have a latent dimension of R = 7 which produced more accurate results than R = 5 (Fig. 20 in the Appendix).Latent dimension of R = 9 (Fig. 21 in the Appendix) produces slightly better accuracy compared to R = 7, but it also creates floating-point instability in the code.As [50] suggest, we aim to find the smallest R value that produces good performance to guarantee floating-point stability.In our case, we choose a latent dimension of R = 7 for all the remaining of the simulations.Lastly, results for k = 15 sampled microbes can be found in the Fig. 22 in the Appendix with no considerable differences with respect to R.
We conclude that the method is able to detect influential nodes (microbes) for a sufficiently large sample size (n = 500 in here) regardless of the effect size (µ) and sparsity (π) of the regression coefficient B and regardless of the sparsity of the adjacency matrix (k).For smaller sample size (n = 100), either larger effect sizes are needed (µ = 1.6) or less sparsity in the adjacency matrix A (k = 22).The color of the intervals depends on whether it intersects zero (light) and hence estimated to be non-influential or does not intersect zero (dark) and hence estimated to be influential by the model.These panels allow us to visualize false positives (dark intervals on the top panel) or false negatives (light intervals on the bottom panel) for all simulation settings (n, k, π, µ).Smaller samples size (top n = 100) has considerably more false negatives compared to larger sample size (bottom n = 500) as evidenced by the many light intervals in the "True influential edges" panels.This is true especially for the cases of high sparsity in B (π = 0.3 for both µ = 0.8 and 1.6) and low effect size with less sparsity in B (π = 0.8, µ = 0.8).Overall, all simulation settings show controlled false positive rate as evidenced by few dark intervals on the "True non-influential edges" panels, regardless of sample size (n), effect size (µ) and sparsity in B (π). Fig. 6 shows the same plot for k = 22 sampled microbes instead of k = 8.The conclusions are the same which provides evidence that the identification of influential edges does not depend on the number of microbes in the samples.We plot the credible intervals for the regression coefficients per edge ordered depending on whether they are truly non-influential edges (top of each panel) or truly influential edges (bottom of each panel).The color of the intervals depends on whether it intersects zero (light) and hence estimated to be non-influential or does not intersect zero (dark) and hence estimated to be influential by the model.These panels allow us to visualize false positives (dark intervals on the top panel) or false negatives (light intervals on the bottom panel).We find similar conclusions as Fig. 5: controlled false positive rate for all settings and improved false negative rate with increased sample size (bottom n = 500).
for smaller sample size (n = 100).The actual bij coefficients appear to be slightly overestimated based on the range (0-6 or 0-8 for the MAP and 0-4 for the true B).
Fig. 8 shows the false positive and false negative rates for influential edges and nodes for different simulation settings in terms of n, k, π, µ.With increased sample size (n = 500), the false positive rates for edges and for nodes are zero as well as the false negative rate for nodes.Only the false negative rate for edges remains high for n = 500 for the case of small effect size (µ = 0.8).The number of sampled microbes (k) does not appear to have any effect in the rates, at least for the case of n = 500.The estimated B matrix has a similar heatmap to the true B in all settings of k, π, µ for large sample size (n = 500) and for large effect size (µ = 1.6) for smaller sample size (n = 100).The actual bij coefficients appear to be slightly overestimated based on the range (0-6 for the MAP and 0-4 for the true B).
Lastly, Fig. 23 (Appendix) shows the mean square error for the estimated coefficients and the responses.The MSE of the coefficients seems to be better when the edge effects are weaker (µ = 0.8) or when there is more sparsity (π = 0.3) whereas the MSE of the responses is similar for all values of π and µ.With increased sample size (n = 500), there are no more false positives for edges and for nodes and there are no false negatives for nodes.Only the false negative rate for edges remains high for n = 500 for the case of small effect size (µ = 0.8).

Realistic simulations
Additive model Fig. 9 and Fig. 24 (Appendix) show the posterior probability of influential nodes for random coefficients and phylogenetic coefficients respectively.Both types of coefficients produce similar results.The model performs poorly for k = 8 sampled nodes as it is unable to identify influential nodes regardless of sample size (n) and characteristics of B (π, µ).For k = 22 sampled microbes, the model behaves better for increased sample size (n = 1000) except for high sparsity and low effect size (π = 0.3, µ = 0.8) for the case of random coefficients and high sparsity (π = 0.3) for the case of phylogenetic coefficients.That is, if the biological phenotype is generated under an additive model with random or phylogenetic coefficients, the BNR model is able to successfully detect influential nodes only under large sample sizes (n = 1000), large number of sampled microbes (k = 22) and low sparsity in the regression coefficient matrix (π = 0.8).
Fig. 10 shows the credible intervals for edge effects under the additive model with random coefficients for k = 8 sampled microbes (see Fig. 25 in the Appendix for the case of phylogenetic coefficients which has very similar results).
Given that the additive model does not have interaction (edge) effects, these panels allow us to visualize false positives (dark intervals).The false positive rate does not appear to be affected by sample size (top vs bottom) with higher rate for the cases of larger effect sizes (µ = 1.6) or less sparsity (π = 0.8).Note that the parameter µ in the additive simulations corresponds to the effect of the nodes, not the effect of the edges (as in the theoretical simulations).So, here, we can conclude that larger main (node) effects also drive a higher false positive rate in the edge effects.Fig. 11 shows the credible intervals for edge effects under the additive model with random coefficients for k = 22 sampled microbes (see Fig. 26 in the Appendix for the case of phylogenetic coefficients which has very similar results).We observe a reduced false positive rate in all settings compared to k = 8 sampled microbes (Fig. 10).That is, if the biological phenotype is generated under an additive model with random or phylogenetic coefficients, the BNR model is able to identify that there are indeed no interaction (edge) effects only when sufficiently many microbes have been sampled (k = 22).

Interaction model
Fig. 12 (for random coefficients) and Fig. 13 (for phylogenetic coefficients) show the posterior probabilities of influential nodes under the interaction model.Both types of coefficients show similar performance when there is low sparsity in the B matrix (π = 0.8).Namely, the method estimates a high PP for truly influential nodes (tall dark bars) and a low PP for non-influential nodes (short light bars) for all cases of low sparsity in B (π = 0.8) regardless of sample size (n), number of sampled microbes (k) or effect size (µ).For scenarios of high sparsity in B (π = 0.3), both types of coefficients behave differently with phylogenetic coefficients showing more signal to detect influential nodes Each bar corresponds to one node (microbe) and the bar are colored depending on whether the node is truly influential (dark) or not influential (light).The model performs poorly for k = 8 sampled nodes as it is unable to identify influential nodes regardless of sample size (n) and characteristics of B (π, µ).For k = 22 sampled microbes, the model behaves better for increased sample size (n = 1000) except for high sparsity and low effect size (π = 0.3, µ = 0.8).Phylogenetic coefficients (Fig. 24) have similar performance.
For random coefficients, sample size needs to be n = 1000 for influential nodes to be detected under k = 22 sampled microbes (Fig. 12).Sampling fewer nodes (k = 8) produces worse results for both types of coefficients with high false positive rate for random coefficients for all sample sizes (Fig. 12) and high false positive rate for phylogenetic coefficients only under the µ = 1.6, n = 1000 setting (Fig. 13).It is interesting to note that under the phylogenetic coefficients with k = 8 sampled microbes, the false positive rate is worse for n = 1000 than for n = 500 when µ = 1.6.This implies that if few microbes are sampled, larger sample sizes produce a larger false positive rate when the effect of the nodes is large (µ = 1.6).
To conclude, when the biological phenotype is generated under an interaction model, the BNR method is able to accurately identify influential microbes under low sparsity settings (π = 0.8) regardless of the type of coefficients (random vs phylogenetic), regardless of number of sampled microbes (k = 8, 22) and regardless of sample size (n = 500, 1000).For high sparsity settings (π = 0.3), phylogenetic coefficients have more signal to identify influential nodes regardless of sample size (n), number of microbes sampled (k), or effect sizes (µ), yet there is an inflated false positive rate with larger sample sizes (n = 1000) for the case of large effect size (µ = 1.6) with few sampled microbes (k = 8).Random coefficients provide no strong signal to detect the influential nodes (high false positive rate) under high sparsity (π = 0.3) unless there are k = 22 microbes sampled with large sample size (n = 1000).We plot the credible intervals for the regression coefficients per edge.In the additive model, all edges are non-influential.The color of the intervals depends on whether it intersects zero (light) and hence estimated to be non-influential or does not intersect zero (dark) and hence estimated to be influential by the model.Given that the additive model does not have interaction (edge) effects, these panels allow us to visualize false positives (dark intervals).The false positive rate does not appear to be affected by sample size (top vs bottom) with higher rate for the cases of larger effect sizes (µ = 1.6) or less sparsity (π = 0.8).
false positive rate as evidenced by few dark intervals on the "True non-influential edges" panels for all sample sizes (n), sparsity levels (π) and effect sizes (µ).We highlight that it is expected that there will be few differences when comparing the two effect sizes (µ = 0.8, 1.6) as these quantities refer to the main (node) effects, not the interaction (edge) effects which was set as 0.4 for all simulations.The purpose of these simulations is to test if changes in the main (node) effects biased the performance of the BNR model to detect influential edges.It appears from these figures that there is no such bias.That is, when the biological phenotype is generated under the interaction model, the BNR has good performance to identify the influential edges regardless of the sample size (n), the number of microbes sampled  We plot the credible intervals for the regression coefficients per edge.In the additive model, all edges are non-influential.The color of the intervals depends on whether it intersects zero (light) and hence estimated to be non-influential or does not intersect zero (dark) and hence estimated to be influential by the model.Given that the additive model does not have interaction (edge) effects, these panels allow us to visualize false positives (dark intervals).We observe a decreased false positive rate when compared to k = 8 sampled microbes (Fig. 10).
(k), the sparsity level (π), node effect sizes (µ), and type of coefficient (random vs phylogenetic) with controlled false positive and false negative rates in all settings.

Functional redundancy model
Fig. 15 shows the posterior probability of influential nodes under the functional redundancy model with random coefficients.We observe that there is a high false positive rate in all settings (tall light bars) except for low sparsity (π = 0.8) with few sampled nodes (k = 8) or small effect size with high sparsity (µ = 0.8, π = 0.3) and more sampled nodes (k = 22).This behavior is similar with phylogenetic coefficients (see Fig. 30 in the Appendix).This implies that ) and both sample sizes (n = 500, 1000).In this setting (π = 0.3), there is a high false negative rate (short dark bars) for k = 22, n = 500 which is improved with larger sample size (k = 22, n = 1000).Interestingly, all cases of high sparsity (π = 0.3) for k = 22, n = 500 have good performance with phylogenetic coefficients (Fig. 13).
the BNR model is unable to identify influential microbes under a model of functional redundancy unless there are very few sampled microbes (k = 8), but many expected to be influential (π = 0.8).This result could be explained by the fact that multiple sampled microbes could cause the phenotype to reach the threshold more easily and thus, there is less information on the variability of the response to estimate the effects and influential probabilities.
Fig. 16 shows the credible intervals for edge effects under the functional redundancy model with random coefficients for k = 8 sampled nodes.The results are very similar for phylogenetic coefficients (Fig. 31 for k = 8 in the Appendix).Namely, there is controlled false positive rate (dark intervals in the "True non-influential edges" top panels) for the case of small sample size (top: n = 500) which strangely worsens slightly for larger sample size (bottom: n = 1000) for all cases of π, µ.False negative rate (light intervals in the "True influential edges" top panels) appears unaffected by n, π, µ.When k = 22 microbes are sampled instead, the model performs worse in all settings.Fig. 17 shows the credible intervals for edge effects under the functional redundancy model with random coefficients for k = 22 sampled nodes.The results are very similar for phylogenetic coefficients (Fig. 32 for k = 22 in the Appendix).Namely, inflated false positive and false negative rates for all settings of n, π, µ.Similarly to the identification of influential nodes, it seems that the BNR model is unable to accurately identify influential edges under a functional redundancy model when many nodes are sampled (k = 22).That is, when the biological phenotype is generated under the functional redundancy Each bar corresponds to one node (microbe) and the bar are colored depending on whether the node is truly influential (dark) or not influential (light).Just as in the case of random coefficients (Fig. 13), the model has a high PP for truly influential nodes (tall dark bars) a low PP for non-influential nodes (short light bars) for all cases of low sparsity in B (π = 0.8) regardless of sample size (n), number of sampled microbes (k), or effect size (µ).For high sparsity in B (π = 0.3), there is a high false positive rate (tall light bars) when k = 8 microbes are sampled (top) only for large effect size (µ = 1.6) and large sample size (n = 1000).Unlike with random coefficients (Fig. 12), the method with phylogenetic coefficients shows good performance in high sparsity settings (π = 0.3) when k = 22 microbes are sampled.
model, the BNR has good performance to identify influential nodes and influential edges only when there are few microbes sampled (k = 8) and low sparsity in B (π = 0.8).
False positive and negative rates Fig. 18 shows the false positive and false negative rates for edges and nodes for different simulation settings in terms of n, k, π, µ for additive (top), interaction (middle) and functional redundancy (bottom) models with random coefficients.An interaction model with low sparsity in B (π = 0.8) shows the best performance in terms of controlled false positive and false negative rates for all settings of n and µ.Under the interaction model, there is inflated false negative rate of edges which could be due to the fact that the model does not include main (node) effects.Overall, sampling more nodes  We plot the credible intervals for the regression coefficients per edge ordered depending on whether they are truly non-influential edges (top of each panel) or truly influential edges (bottom of each panel).The color of the intervals depends on whether it intersects zero (light) and hence estimated to be non-influential or does not intersect zero (dark) and hence estimated to be influential by the model.These panels allow us to visualize false positives (dark intervals on the top panel) or false negatives (light intervals on the bottom panel).The model has a low false positive rate of influential edges for all settings of n, k, π, µ, except for n = 500, π = 0.3, µ = 1.6.This behavior is similar for k = 22 sampled microbes (Fig. 27) and for phylogenetic coefficients (Fig. 28 for k = 8 and Fig. 29 for k = 22).Accuracy to detect influential edges (dark intervals on the lower "True influential edges" panels) is also similar across all simulation settings.Each bar corresponds to one node (microbe) and the bar are colored depending on whether the node is truly influential (dark) or not influential (light).The model shows high false positive rate in all settings except for low sparsity (π = 0.8) in small number of sampled microbes (k = 8), or high sparsity with small effect (π = 0.3, µ = 0.8) with large number of sampled nodes (k = 22).This behavior is similar with phylogenetic coefficients (Fig. 30).
In general, the BNR model with random coefficients is able to accurately detect influential nodes and edges when there are truly interactions effects (Fig. 18 middle: interaction model), especially when there is low sparsity of the coefficient matrix B (π = 0.8).Under the additive model, the BNR suffers from high false positive rate of nodes Fig. 19 shows the false positive and false negative rates for edges and nodes for different simulation settings in terms of n, k, π, µ for additive (top), interaction (middle) and functional redundancy (bottom) models with phylogenetic coefficients.Again, an interaction model shows the best performance in terms of controlled false positive and false negative rates for all settings of π, n and µ.Under the interaction model (middle), there is inflated false negative rate of edges which could be due to the fact that the model does not include main (node) effects.Furthermore, there is better controlled false positive rate of nodes (dark purple bars) in the π = 0.3 case than in the random coefficients case (Fig. 18) which means that when the effects of microbes are expected to be phylogenetically-informed, the BNR model is able to accurately identify the influential nodes compared to effects of microbes that are randomly assigned.Under the additive model (top), there is controlled false positive rate of nodes in all settings of k = 22 which is not true when coefficients are random (Fig. 18) The results for the functional redundancy model (bottom) are very similar to those with the random coefficients (Fig. 18).Namely, inflated false positive and false negative rates on all settings of k = 22, and only controlled rates under the k = 8 sampled microbes with low sparsity π = 0.8.Each panel corresponds to a scenario of π = 0.3, 0.8 (which controls the sparsity of the regression coefficient matrix B) and µ = 0.8, 1.6.We plot the credible intervals for the regression coefficients per edge ordered depending on whether they are truly non-influential edges (top of each panel) or truly influential edges (bottom of each panel).The color of the intervals depends on whether it intersects zero (light) and hence estimated to be non-influential or does not intersect zero (dark) and hence estimated to be influential by the model.These panels allow us to visualize false positives (dark intervals on the top panel) or false negatives (light intervals on the bottom panel).

Convergence checks
As mentioned, for each simulation setting, we run three MCMC chains and assess convergence using the R convergence criterion proposed in [53].We consider convergence to have been achieved if R ≤ 1.2 for all of the γ and ξ parameters.
For the theoretical simulations, the maximum R is 1.18 for ξ and 1.19 for γ.For the realistic simulations, the maximum R value is 1.20 for γ and 1.13 for ξ.Maximum R values for each realistic simulation type are given in Table 3.

Discussion
In this work, we present the first deep investigation of the applicability of the Bayesian Network Regression (BNR) model on microbiome data.In addition, we introduce the first user-friendly implementation of the BNR model in an open source well-documented Julia package BayesianNetworkRegression.jl available on GitHub https: //github.com/solislemuslab/BayesianNetworkRegression.jl.The model performs well under a variety of settings of data sparsity (sparsity in network adjacency matrix A) and sparsity of influential drivers (sparsity in the coefficient regression matrix B) when the model generating the simulated data matches the BNR model (denoted theoretical simulations).Indeed, the model is able to identify influential nodes (microbes) and influential edges (interactions of microbes) regardless of the sparsity of A and B for sample sizes of n = 500, and more surprisingly, in spite of not including main (node) effects.For smaller sample sizes (n = 100), influential nodes and edges can still be detected as long as the effect sizes are large (µ = 1.6).It is important to note that the model has a low false positive rate in all settings which prevents the identification of spurious microbial drivers in the phenotypes of interest.
When the model generated the simulated data did not match the BNR model (realistic simulations), the conclusions varied depending on the specific generating model.For example, if the biological phenotype is generated under an additive model with random or phylogenetic coefficients, the BNR model is able to successfully detect influential nodes only under large sample sizes (n = 1000), large number of sampled microbes (k = 22) and low sparsity in the regression coefficient matrix (π = 0.8).Under this same additive model, there are no influential edges (interactions of microbes) and the BNR model is able to control the false positive rate of edges only when sufficiently microbes have been sampled (k = 22).This result is important as it highlights that the model is able to identify influential nodes even in the absence of main (node) effects.
When the biological phenotype is generated under an interaction model, the BNR method is able to accurately identify influential microbes under low sparsity settings (π = 0.8) regardless of the type of coefficients (random vs phylogenetic), regardless of number of sampled microbes (k = 8, 22) and regardless of sample size (n = 500, 1000).For high sparsity settings (π = 0.3), phylogenetic coefficients appear to have more signal than random coefficients to identify influential nodes regardless of sample size (n), number of microbes sampled (k), or effect sizes (µ), yet there is an inflated false positive rate with larger sample sizes (n = 1000) for the case of large effect size (µ = 1.6) with few sampled microbes (k = 8).Random coefficients provide no strong signal to detect the influential nodes (high false positive rate) under high sparsity (π = 0.3) unless there are k = 22 microbes sampled with large sample size (n = 1000).In addition, the BNR has good performance to identify the influential edges regardless of the sample size (n), the number of microbes sampled (k), the sparsity level (π), node effect sizes (µ), and type of coefficient (random vs phylogenetic) with controlled false positive and false negative rates in all settings.
Lastly, under the functional redundancy model, the BNR model is unable to identify influential microbes unless there are very few sampled microbes (k = 8), but many expected to be influential (π = 0.8).This result could be explained by the fact that multiple sampled microbes could cause the phenotype to reach the threshold more easily and thus, there is less information on the variability of the response to estimate the effects and influential probabilities.Similarly, the BNR model has good performance to identify influential nodes and influential edges only when there are few microbes sampled (k = 8) and low sparsity in B (π = 0.8).
The main take-home message is that the current version of the BNR model (Equation 1), even without including main (node) effects, can accurately identify influential nodes (microbes) and influential edges (interactions among microbes) under most realistic biological settings, but it requires large sample sizes (n = 1000 here).Future work will involve the extension of the model to incorporate main (node) effects as well as to incorporate downstream estimation error of the adjacency matrix which is currently taken as perfectly reconstructed from the phylogenetic tree.
Where is the real data analysis?In this work, we are only able to test the BNR model on simulations, but not on real data, which is unusual.We discovered that the real data necessary to fit this model was not readily available in most data repositories.For example, in [56], they have 50 samples of microbial relative abundances data divided among 6 treatments, and 10 measured phenotypes.A single network was reconstructed from all 50 samples, but then subnetworks were extracted for each of the 6 treatments.The microbial networks could be used as predictors of the 10 different phenotypes, however, under this setup, there are only 6 samples (pairs of network predictor and phenotype value), not enough to fit a BNR model based on our simulations.While the lack of real data to apply the model could suggest that the model is not satisfying any real need in the microbiome community, we believe that this is a case of "build it and they will come".Until now, real data analyses of microbiome data involved one matrix of relative abundances which is used to estimate one co-occurrence microbial network.The assumption behind this one co-occurrence matrix is that correlations represent interactions, and these interactions are global.That is, the interactions will appear in all contexts and all samples.
In recent years, microbiome researchers believe that the interactions are context-dependent and that there will be different interactions on different environmental conditions.These different interactions would produce different microbial networks, each in turn associated with a specific biological phenotype of interest.BNR is the ideal model to represent this setting as it requires a sample of networks with edge variability (which violates the global interactions assumption of most public microbiome datasets) and it requires each of the microbial networks to be associated with a phenotype value.The downside is that the model requires hundreds of these network-phenotype samples.While these large sample conditions appear to be outside the norm of current observational microbiome research, we believe that the BNR model could be quite useful to identify key microbial drivers of biological phenotypes in experimental settings.Indeed, scientists can design experiments with k microbes and then, measure the phenotype of interest.Different samples would correspond to different replicates under the same set (or different) of microbes.This setup actually aligns with our simulation study where we have k microbes per sample and the true phenotype value is computed only using those k microbes.Future work will incorporate other covariates into the model corresponding to replicate, experimental or environmental conditions.Furthermore, in conjunction with data augmentation techniques, the BNR model could be applied on a variety of real datasets, as this will be another line of future work.Despite its limitations, we view the BNR model as one novel tool that microbiome researchers could utilize to identify key microbiome drivers in biological phenotypes of interest given its robustness and accuracy under a variety of real-life conditions.
• Σ = I 30 (case of random coefficients) • Σ governed by the phylogenetic tree using the Julia package PhyloNetworks [55] (case of phylogenetic coefficients) (c) b ij ∼ Normal(0.4,1) 3.For every sample i = 1, . . ., n calculate the adjacency matrix (A i ) as follows (a) Select k microbes randomly K * = {j 1 , j 2 , . . ., j k } (b) For p, q ∈ K * , let d p,q be the phylogenetic distance between microbes p and q so that the (p, q) entry in the adjacency matrix is given by . For every sample i = 1, . . ., n, calculate the response value as follows: where 1(•) is the indicator function.
At the end of the simulating algorithm, we have a collection of pairs (y 1 , A 1 ), . . ., (y n , A n ) to be used as data in the BNR implementation.

A.3.3 Functional Redundancy Simulations
1. Simulate a 30-taxon phylogenetic tree using the rtree function from the R package ape [54] 2. Simulate whether a specific node is influential, its true main effect for i = 1, . . ., 30 and all interaction effects j = 1, . . ., 30: • Σ governed by the phylogenetic tree using the Julia package PhyloNetworks [55] (case of phylogenetic coefficients) (c) b ij ∼ Normal(0.4,1) 3.For every sample i = 1, . . ., n calculate the adjacency matrix (A i ) as follows (a) Select k microbes randomly K * = {j 1 , j 2 , . . ., j k } (b) For p, q ∈ K * , let d p,q be the phylogenetic distance between microbes p and q so that the (p, q) entry in the adjacency matrix is given by . For every sample i = 1, . . ., n, calculate the response value as follows: where 1(•) is the indicator function.
At the end of the simulating algorithm, we have a collection of pairs (y 1 , A 1 ), . . ., (y n , A n ) to be used as data in the BNR implementation.with best performance for n = 500, or for high coefficients µ = 1.6 for n = 100.We also notice here that a higher R (R = 9) helps identify influential microbes on smaller effect sizes (µ = 0.8, k = 22, n = 100) compared to the same setting in Fig. 20.The downside of a larger latent dimension is less floating-point stability.Each bar corresponds to one node (microbe) and the bar are colored depending on whether the node is truly influential (dark) or not influential (light).As with random coefficients (Fig. 9), the model performs poorly for k = 8 sampled nodes as it is unable to identify influential nodes regardless of sample size (n) and characteristics of B (π, µ).For k = 22 sampled microbes, the model behaves better for increased sample size (n = 1000) except for high sparsity and small effects (π = 0.3, µ = 0.8).with small number of sampled nodes (k = 8).This behavior is similar with random coefficients (Fig. 15).

Figure 1 :
Figure 1: Graphical abstract.Samples contain a measured phenotype (e.g.height in plants) y i and a microbial network as predictor which is converted into its adjacency matrix A i .The Bayesian Network Regression model infers the regression coefficient matrix B with the maximum a posteriori (MAP) and the posterior probability of being an influential node for every node.

Figure 2 :
Figure2: Description of theoretical simulations.We simulate a 30-taxon phylogenetic tree as the representation of the true microbial community, and then select k microbes per sample with which to build an adjacency matrix (A i ) per sample.The phenotype y i is then computed as the Frobenius product of the sample adjacency matrix (A i ) and the true matrix of regression coefficients B plus Gaussian noise.

Figure 4 :
Figure 4: Posterior probability of influential nodes (theoretical simulations).Different panels represent different number of sampled microbes (k = 8, 22) which controls the sparsity of the adjacency matrix and different sample sizes (n = 100, 500).Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential node (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each bar corresponds to one node (microbe) and the bars are colored depending on whether the node is truly influential (dark) or not influential (light).As expected, the model has a high PP for truly influential nodes (tall dark bars) and a low PP for non-influential nodes (short light bars) for the case of n = 500 regardless of the values of k, µ, π.

Fig. 5
Fig.5shows the credible intervals for the regression coefficients per edge ordered depending on whether they are truly non-influential edges (top of each panel) or truly influential edges (bottom of each panel) for k = 8 sampled microbes.The color of the intervals depends on whether it intersects zero (light) and hence estimated to be non-influential or does not intersect zero (dark) and hence estimated to be influential by the model.These panels allow us to visualize false positives (dark intervals on the top panel) or false negatives (light intervals on the bottom panel) for all simulation settings (n, k, π, µ).Smaller samples size (top n = 100) has considerably more false negatives compared to larger sample size (bottom n = 500) as evidenced by the many light intervals in the "True influential edges" panels.This is true especially for the cases of high sparsity in B (π = 0.3 for both µ = 0.8 and 1.6) and low effect size with less sparsity in B (π = 0.8, µ = 0.8).Overall, all simulation settings show controlled false positive rate as evidenced by few

Figure 5 :
Figure5: Credible intervals for edge effects for k = 8 sampled nodes (theoretical simulations).Top: Sample size of n = 100.Bottom: Sample size of n = 500.Each panel corresponds to a scenario of π = 0.3, 0.8 (which controls the sparsity of the regression coefficient matrix B) and µ = 0.8, 1.6.We plot the credible intervals for the regression coefficients per edge ordered depending on whether they are truly non-influential edges (top of each panel) or truly influential edges (bottom of each panel).The color of the intervals depends on whether it intersects zero (light) and hence estimated to be non-influential or does not intersect zero (dark) and hence estimated to be influential by the model.These panels allow us to visualize false positives (dark intervals on the top panel) or false negatives (light intervals on the bottom panel).Smaller samples size (top n = 100) has considerably more false negatives compared to larger sample size (bottom n = 500) especially for the cases of high sparsity in B (π = 0.3 for both µ = 0.8 and 1.6) and low effect size with less sparsity in B (π = 0.8, µ = 0.8).

Fig. 7
Fig.7shows the heatmaps of MAP estimates of B compared to the true heatmap of B. The estimated B matrix has a similar heatmap to the true B in all settings of k, π, µ for large sample size (n = 500) and for large effect size (µ = 1.6)

Figure 6 :
Figure6: Credible intervals for edge effects for k = 22 sampled nodes (theoretical simulations).Top: Sample size of n = 100.Bottom: Sample size of n = 500.Each panel corresponds to a scenario of π = 0.3, 0.8 (which controls the sparsity of the regression coefficient matrix B) and µ = 0.8, 1.6.We plot the credible intervals for the regression coefficients per edge ordered depending on whether they are truly non-influential edges (top of each panel) or truly influential edges (bottom of each panel).The color of the intervals depends on whether it intersects zero (light) and hence estimated to be non-influential or does not intersect zero (dark) and hence estimated to be influential by the model.These panels allow us to visualize false positives (dark intervals on the top panel) or false negatives (light intervals on the bottom panel).We find similar conclusions as Fig.5: controlled false positive rate for all settings and improved false negative rate with increased sample size (bottom n = 500).

Figure 7 :
Figure 7: Heatmap of MAP estimates for B (theoretical simulations).Different panels represent different number of sampled microbes (k = 8, 22) which controls the sparsity of the adjacency matrix and different sample sizes (n = 100, 500).Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential node (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each panel has a heatmap of the MAP estimates for B and on the bottom, we have the true values of B.The estimated B matrix has a similar heatmap to the true B in all settings of k, π, µ for large sample size (n = 500) and for large effect size (µ = 1.6) for smaller sample size (n = 100).The actual bij coefficients appear to be slightly overestimated based on the range (0-6 for the MAP and 0-4 for the true B).

Figure 8 :
Figure8: False positive and false negative rates for influential edges and nodes (theoretical simulations).X axis corresponds to the number of sampled nodes (microbes) which relates to the sparsity of the adjacency matrices A i .Y axis corresponds to false positive or false negative rates for edges or nodes (depending on the color of the bar).Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential node (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each bar corresponds to one rate: false positive rate and false negative rate for edges (dark and light blue) and false positive rate and false negative rate for nodes (dark and light purple).With increased sample size (n = 500), there are no more false positives for edges and for nodes and there are no false negatives for nodes.Only the false negative rate for edges remains high for n = 500 for the case of small effect size (µ = 0.8).

Figure 9 :
Figure 9: Posterior probability of influential nodes (additive model with random coefficients).Different panels represent different number of sampled microbes (k = 8, 22) which controls the sparsity of the adjacency matrix and different sample sizes (n = 100, 500).Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential node (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each bar corresponds to one node (microbe) and the bar are colored depending on whether the node is truly influential (dark) or not influential (light).The model performs poorly for k = 8 sampled nodes as it is unable to identify influential nodes regardless of sample size (n) and characteristics of B (π, µ).For k = 22 sampled microbes, the model behaves better for increased sample size (n = 1000) except for high sparsity and low effect size (π = 0.3, µ = 0.8).Phylogenetic coefficients (Fig.24) have similar performance.

Fig. 14
Fig.14shows the credible intervals for edge effects under the interaction model with random coefficients for k = 8 sampled nodes.The results are very similar for k = 22 sampled nodes (Fig.27in the Appendix) and for phylogenetic coefficients (Fig.28for k = 8 and Fig.29for k = 22, both in the Appendix).Namely, the model displays a low

Figure 10 :
Figure 10: Credible intervals for edge effects (additive model with random coefficients) with k = 8 sampled nodes.Top: Sample size of n = 500.Bottom: Sample size of n = 1000.Each panel corresponds to a scenario of π = 0.3, 0.8 (which controls the sparsity of the regression coefficient matrix B) and µ = 0.8, 1.6.We plot the credible intervals for the regression coefficients per edge.In the additive model, all edges are non-influential.The color of the intervals depends on whether it intersects zero (light) and hence estimated to be non-influential or does not intersect zero (dark) and hence estimated to be influential by the model.Given that the additive model does not have interaction (edge) effects, these panels allow us to visualize false positives (dark intervals).The false positive rate does not appear to be affected by sample size (top vs bottom) with higher rate for the cases of larger effect sizes (µ = 1.6) or less sparsity (π = 0.8).

Figure 11 :
Figure 11: Credible intervals for edge effects (additive model with random coefficients) with k = 22 sampled nodes.Top: Sample size of n = 500.Bottom: Sample size of n = 1000.Each panel corresponds to a scenario of π = 0.3, 0.8 (which controls the sparsity of the regression coefficient matrix B) and µ = 0.8, 1.6.We plot the credible intervals for the regression coefficients per edge.In the additive model, all edges are non-influential.The color of the intervals depends on whether it intersects zero (light) and hence estimated to be non-influential or does not intersect zero (dark) and hence estimated to be influential by the model.Given that the additive model does not have interaction (edge) effects, these panels allow us to visualize false positives (dark intervals).We observe a decreased false positive rate when compared to k = 8 sampled microbes (Fig.10).

Figure 12 :
Figure12: Posterior probability of influential nodes (interaction model with random coefficients).Different panels represent different number of sampled microbes (k = 8, 22) which controls the sparsity of the adjacency matrix and different sample sizes (n = 100, 500).Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential edge (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each bar corresponds to one node (microbe) and the bar are colored depending on whether the node is truly influential (dark) or not influential (light).The model has a high PP for truly influential nodes (tall dark bars) a low PP for non-influential nodes (short light bars) for all cases of low sparsity in B (π = 0.8) regardless of sample size (n), number of sampled microbes (k) or effect size (µ).For high sparsity in B (π = 0.3), there is a high false positive rate (tall light bars) when k = 8 microbes are sampled (top) for both values of effect size (µ = 0.8, 1.6) and both sample sizes (n = 500, 1000).In this setting (π = 0.3), there is a high false negative rate (short dark bars) for k = 22, n = 500 which is improved with larger sample size (k = 22, n = 1000).Interestingly, all cases of high sparsity (π = 0.3) for k = 22, n = 500 have good performance with phylogenetic coefficients (Fig.13).

Figure 13 :
Figure13: Posterior probability of influential nodes (interaction model with phylogenetic coefficients).Different panels represent different number of sampled microbes (k = 8, 22) which controls the sparsity of the adjacency matrix and different sample sizes (n = 100, 500).Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential edge (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each bar corresponds to one node (microbe) and the bar are colored depending on whether the node is truly influential (dark) or not influential (light).Just as in the case of random coefficients (Fig.13), the model has a high PP for truly influential nodes (tall dark bars) a low PP for non-influential nodes (short light bars) for all cases of low sparsity in B (π = 0.8) regardless of sample size (n), number of sampled microbes (k), or effect size (µ).For high sparsity in B (π = 0.3), there is a high false positive rate (tall light bars) when k = 8 microbes are sampled (top) only for large effect size (µ = 1.6) and large sample size (n = 1000).Unlike with random coefficients (Fig.12), the method with phylogenetic coefficients shows good performance in high sparsity settings (π = 0.3) when k = 22 microbes are sampled.

(k = 22 )
Fig.18shows the false positive and false negative rates for edges and nodes for different simulation settings in terms of n, k, π, µ for additive (top), interaction (middle) and functional redundancy (bottom) models with random coefficients.An interaction model with low sparsity in B (π = 0.8) shows the best performance in terms of controlled false positive and false negative rates for all settings of n and µ.Under the interaction model, there is inflated false negative rate of edges which could be due to the fact that the model does not include main (node) effects.Overall, sampling more nodes (k = 22) shows lower false positive rates than fewer sampled nodes (k = 8), except for the functional redundancy model.Sample size (n = 500, 1000) does not appear to have an influence given that both columns show similar rate patterns.False positive rates for edges (dark blue bars) seems to be controlled in all simulation settings showing that the BNR model is accurate in identifying truly non-influential edges.Under the additive model, false positive rate of nodes (dark purple bars) appear to be the concern when there are few nodes sampled (k = 8).High false positive rates of nodes are evident in most setting of the functional redundancy model except for π = 0.8, k = 8 and π = 0.3, µ = 0.8, k = 22.

Figure 14 :
Figure14: Credible intervals for edge effects (interaction model with random coefficients) with k = 8 sampled nodes.Top: Sample size of n = 500.Bottom: Sample size of n = 1000.Each panel corresponds to a scenario of π = 0.3, 0.8 (which controls the sparsity of the regression coefficient matrix B) and µ = 0.8, 1.6.We plot the credible intervals for the regression coefficients per edge ordered depending on whether they are truly non-influential edges (top of each panel) or truly influential edges (bottom of each panel).The color of the intervals depends on whether it intersects zero (light) and hence estimated to be non-influential or does not intersect zero (dark) and hence estimated to be influential by the model.These panels allow us to visualize false positives (dark intervals on the top panel) or false negatives (light intervals on the bottom panel).The model has a low false positive rate of influential edges for all settings of n, k, π, µ, except for n = 500, π = 0.3, µ = 1.6.This behavior is similar for k = 22 sampled microbes (Fig.27) and for phylogenetic coefficients (Fig.28for k = 8 and Fig.29for k = 22).Accuracy to detect influential edges (dark intervals on the lower "True influential edges" panels) is also similar across all simulation settings.

Figure 15 :
Figure 15: Posterior probability of influential nodes (functional redundancy model with random coefficients).Different panels represent different number of sampled microbes (k = 8, 22) which controls the sparsity of the adjacency matrix and different sample sizes (n = 100, 500).Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential edge (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each bar corresponds to one node (microbe) and the bar are colored depending on whether the node is truly influential (dark) or not influential (light).The model shows high false positive rate in all settings except for low sparsity (π = 0.8) in small number of sampled microbes (k = 8), or high sparsity with small effect (π = 0.3, µ = 0.8) with large number of sampled nodes (k = 22).This behavior is similar with phylogenetic coefficients (Fig.30).

Figure 16 :
Figure16: Credible intervals for edge effects (functional redundancy model with random coefficients) with k = 8 sampled nodes.Top: Sample size of n = 500.Bottom: Sample size of n = 1000.Each panel corresponds to a scenario of π = 0.3, 0.8 (which controls the sparsity of the regression coefficient matrix B) and µ = 0.8, 1.6.We plot the credible intervals for the regression coefficients per edge ordered depending on whether they are truly non-influential edges (top of each panel) or truly influential edges (bottom of each panel).The color of the intervals depends on whether it intersects zero (light) and hence estimated to be non-influential or does not intersect zero (dark) and hence estimated to be influential by the model.These panels allow us to visualize false positives (dark intervals on the top panel) or false negatives (light intervals on the bottom panel).

Figure 18 :
Figure18: False positive and false negative rates for influential edges and nodes for additive (top), interaction (middle) and functional redundancy (bottom) models with random coefficients.X axis corresponds to the number of sampled nodes (microbes) which relates to the sparsity of the adjacency matrix X.Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential edge (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each bar corresponds to one rate: false positive rate and false negative rate for edges (dark and light blue) and false positive rate and false negative rate for nodes (dark and light purple).An interaction model with low sparsity in B (π = 0.8) shows the best performance in terms of controlled false positive and false negative rates for all setting of n and µ.

Figure 19 :
Figure19: False positive and false negative rates for influential edges and nodes for additive (top), interaction (middle) and functional redundancy (bottom) models with phylogenetic coefficients.X axis corresponds to the number of sampled nodes (microbes) which relates to the sparsity of the adjacency matrix X.Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential edge (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each bar corresponds to one rate: false positive rate and false negative rate for edges (dark and light blue) and false positive rate and false negative rate for nodes (dark and light purple).Again, an interaction model shows the best performance in terms of controlled false positive and false negative rates for all settings of π, n and µ.

A. 4
Simulation plots: theoretical case

Figure 20 :
Figure 20: Posterior probability of influential nodes (theoretical simulations R = 5).Different panels represent different number of sampled microbes (k = 8, 22) which controls the sparsity of the adjacency matrix and different sample sizes (n = 100, 500).Latent dimension of R = 5.Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential node (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each bar corresponds to one node (microbe) and the bar are colored depending on whether the node is truly influential (dark) or not influential (light).As expected, the model has a high PP for truly influential nodes (tall dark bars) and a low PP for non-influential nodes (short light bars) with best performance for n = 500, or for high coefficients µ = 1.6 for n = 100.

Figure 21 :
Figure 21: Posterior probability of influential nodes (theoretical simulations R = 9).Different panels represent different number of sampled microbes (k = 8, 22) which controls the sparsity of the adjacency matrix and different sample sizes (n = 100, 500).Latent dimension of R = 9.Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential node (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each bar corresponds to one node (microbe) and the bar are colored depending on whether the node is truly influential (dark) or not influential (light).As expected, the model has a high PP for truly influential nodes (tall dark bars) and a low PP for non-influential nodes (short light bars) with best performance for n = 500, or for high coefficients µ = 1.6 for n = 100.We also notice here that a higher R (R = 9) helps identify influential microbes on smaller effect sizes (µ = 0.8, k = 22, n = 100) compared to the same setting in Fig.20.The downside of a larger latent dimension is less floating-point stability.

Figure 22 :
Figure 22: Posterior probability of influential nodes (theoretical simulations k = 15, R = 5, 9).Top: Latent dimension of R = 5.Middle: Latent dimension of R = 7. Bottom: Latent dimension of R = 9.Different panels represent different number of sampled microbes (k = 15) which controls the sparsity of the adjacency matrix and different sample sizes (n = 100, 500).Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential node (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each bar corresponds to one node (microbe) and the bar are colored depending on whether the node is truly influential (dark) or not influential (light).As expected, the model has a high PP for truly influential nodes (tall dark bars) and a low PP for non-influential nodes (short light bars) with best performance for n = 500, or for high coefficients µ = 1.6 or high sparsity π = 0.3 for n = 100.There appear to be no differences by the latent dimension R.

Figure 23 :
Figure23: Mean Square Error for regression coefficients and response.X axis corresponds to the number of sampled nodes (microbes) which relates to the sparsity of the adjacency matrix X. Dashed lines correspond to different values of the true mean for edge effects (µ = 0.8, 1.6) and different colors correspond to different sparsity levels on the regression coefficient matrix B (π = 0.3, 0.8).The MSE does not seem to be affected by the sampling proportion, and it is more impacted in the case of the regression coefficients to the value of π.

Figure 24 :
Figure 24: Posterior probability of influential nodes (additive model with phylogenetic coefficients).Different panels represent different number of sampled microbes (k = 8, 22) which controls the sparsity of the adjacency matrix and different sample sizes (n = 500, 1000).Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential node (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each bar corresponds to one node (microbe) and the bar are colored depending on whether the node is truly influential (dark) or not influential (light).As with random coefficients (Fig.9), the model performs poorly for k = 8 sampled nodes as it is unable to identify influential nodes regardless of sample size (n) and characteristics of B (π, µ).For k = 22 sampled microbes, the model behaves better for increased sample size (n = 1000) except for high sparsity and small effects (π = 0.3, µ = 0.8).

Figure 30 :
Figure 30: Posterior probability of influential nodes (functional redundancy model with phylogenetic coefficients).Different panels represent different number of sampled microbes (k = 8, 22) which controls the sparsity of the adjacency matrix and different sample sizes (n = 100, 500).Within each panel, we have four plots corresponding to the two values of edge effect size (µ = 0.8, 1.6) and two values of probability of influential edge (π = 0.3, 0.8) which controls the sparsity of the regression coefficient matrix (B).Each bar corresponds to one node (microbe) and the bar are colored depending on whether the node is truly influential (dark) or not influential (light).The model has a high PP for truly influential nodes (tall dark bars) a low PP for non-influential nodes (short light bars) only when n = 1000 and π = 0.3 (high sparsity in B).The model shows high false positive rate in all settings except for low sparsity (π = 0.8)with small number of sampled nodes (k = 8).This behavior is similar with random coefficients (Fig.15).
to assess convergence.Because γ and ξ are multivariate, we consider each to have converged if its maximum R value is less than 1.2.

Table 1 :
Theoretical simulation cases with more than 30,000 burn-in to achieve convergence.

Table 2 :
Realistic simulation cases with more than 120,000 burn-in to achieve convergence.

Table 3 :
Convergence for realistic simulations.