From a statistical point of view, the data on daily taxa abundances can be regarded as a multivariate time series. The potential interactions between taxa are expected to be reflected in the correlations between taxa, but also some temporal correlation is expected to be present in the data. In other words, if Yit represents the matrix with the number of sequences of taxon i = 1,…, n found in the sample collected on day t = 1,…, T for a given individual, both rows and columns present correlation structures of different nature. Abundances in a given row are likely to be affected by temporal correlation, whereas values in a specific column may be subject to the correlations generated by the underlying interactions between taxa. To model both correlation structures simultaneously, we applied a Bayesian hierarchical model to the follow-up data for each individual.
Our model specification is as follows. Let Yt = (Y1t,…, Ynt)′ be the taxonomic distribution of sequences on day t. Our model first assumes that Yt follows a multinomial distribution
where Nt is the total number of sequences on day t and πt = (π1t,…, πnt)′, πit being the unknown proportion in which taxon i is present in the community on day t. The proportions πit are in turn decomposed, on the log-odds scale, into
where αi is a taxon-specific intercept that picks up the average relative abundance of taxon i over the T = 15 days, and νit and εit are random effects intended to pickup time structured and unstructured variation, respectively. To this end, we chose a normal prior distribution for εit and a multivariate random walk of order one for νt, t = 1,…, T
where Σ is the n × n variance-covariance matrix between taxa abundances. For convenience, we take ν0 = 0n×1. This conditional specification is a particular case of the intrinsic multivariate conditional autoregressive (MCAR) models (Kim et al., 2001; Gelfand & Vounatsou, 2003), for which the full conditional distribution is
that is, νt follows a multivariate normal distribution centred in the average of its temporal neighbours and variance-covariance matrix inversely proportional to the number of neighbours. The joint distribution of ν = (ν11,…,νn1,ν12,…,νn2,…,ν1T,…,νnT)′ is a zero-mean multivariate normal distribution with precision matrix Ω = (D − W) ⊗ Σ −1, where W is a T × T matrix with Wtt′ = 1 if time points t and t′ are adjacent and Wtt′ = 0 otherwise, D is a T × T diagonal matrix with Dtt equal to the number of neighbours of time point t (i.e. D11 = DTT = 1 and Dtt = 2 ∀ t = 2,…,T − 1) and ⊗ represents the Kronecker product for matrices. The matrix D − W is singular, which makes this distribution improper. However, with our choice of W and D, Ω satisfies the so-called symmetry condition that ensures propriety of the posterior. In practice, this impropriety is overcome using the proper full conditionals for νt and imposing n sum-to-zero constraints. See for example Banerjee et al. (2004, pp. 247–251) for further details.
We fitted our model using Markov chain Monte Carlo (MCMC) simulation techniques as implemented in the WinBUGS software (Lunn et al., 2000) and the R2WinBUGS package (Sturtz et al., 2005) for the R statistical software (R Development Core Team, 2010). We ran two chains with 50 000 iterations, discarded the first 10 000 as burn-in and kept every 40th to reduce autocorrelation in the chains. Therefore, inference for each parameter is based on a thinned sample of size 2000 from its posterior distribution.