Population dynamics of species-rich ecosystems: the mixture of matrix population models approach


Correspondence author. E-mail: frederic.mortier@cirad.fr


  1. Matrix population models are widely used to predict population dynamics, but when applied to species-rich ecosystems with many rare species, the small population sample sizes hinder a good fit of species-specific models. This issue can be overcome by assigning species to groups to increase the size of the calibration data sets. However, the species classification is often disconnected from the matrix modelling and from the estimation of matrix parameters, thus bringing species groups that may not be optimal with respect to the predicted community dynamics.
  2. We proposed here a method that jointly classified species into groups and fit the matrix models in an integrated way. The model was a special case of mixture with unknown number of components and was cast in a Bayesian framework. An MCMC algorithm was developed to infer the unknown parameters: the number of groups, the group of each species and the dynamics parameters.
  3. We applied the method to simulated data and showed that the algorithm efficiently recovered the model parameters.
  4. We applied the method to a data set from a tropical rain forest in French Guiana. The mixture matrix model classified tree species into well-differentiated groups with clear ecological interpretations. It also accurately predicted the forest dynamics over the 16-year observation period.
  5. Our model and algorithm can straightforwardly be adapted to any type of matrix model, using the life cycle diagram. It can be used as an unsupervised classification technique to group species with similar population dynamics.


The conservation of animal and plant species and their biological control require models to understand and predict population dynamics (Fieberg & Ellner 2001; Buongiorno & Gilless 2003; Demyanov, Wood & Kedwards 2006). Among population dynamics models, projection matrix models have been widely used to investigate the dynamics of age-, stage- or size-structured populations (Caswell 2001; Stott et al. 2010). They provide a simple way of integrating vital rate information such as recruitment, birth, growth or ageing, and mortality (Crone et al. 2011). Matrix models have been used to model population demography in the context of species invasion (Hooten et al. 2007; Sebert-Cuvillier et al. 2007), species extinction or conservation of endangered species (Cropper & Loudermilk 2006), and the sustainable management of exploited species (Hauser, Cooch & Lebreton 2006). Recent improvements in matrix models targeted the estimation of demographic parameters, in particular for animal populations using capture–recapture methods (Besbeas et al. 2002).

In species-rich ecosystems like tropical rain forests, tropical marine fish or coral reefs, high diversity implies that the sample size for most species is limited. The small sample size hinders a good fit of species-specific dynamics models, including matrix population models. To address this problem, modellers usually cluster species into groups. A variety of methods has been used to group species, favouring either ecological interpretation or the accuracy of predictions. Groups of species can be derived from functional characteristics (Steneck & Dethier 1994), ecomorphology (Bellwood & Wainwright 2001) or ecological subjective strategy (Swaine & Whitmore 1988; Favrichon 1994; Gitay & Noble 1997). These methods do not rely on a strong statistical methodology, thus they do not ensure that the within-group similarity is maximum, or that the number of groups is optimal. Gourlet-Fleury et al. (2005) described two other strategies applied in tropical rain forests: the ecological data-driven strategy (Phillips et al. 2002) and the dynamic process strategy, in which ‘process’ refers to the components of forest dynamics (recruitment, growth or mortality) (Gourlet-Fleury & Houllier 2000; Picard et al. 2010). These strategies rely on statistical unsupervised classification methods, such as hierarchical cluster analysis, to group species with similar traits. Moreover, species classification is most often disconnected from the matrix modelling and from the estimation of the matrix parameters, thus bringing species groups that may not be optimal with respect to the predicted community dynamics. To cluster the species while ensuring optimality for predicting community dynamics, we need to rely on the mixture model framework.

Mixture models are based on the assumption that observation data arise from several unobserved groups (McLachlan & Peel 2000). A model is associated to each group. Each observation contributes to the fitting of the model for a given group with a weight that represents its probability to belong to this group. These weights can eventually be used to classify observations among groups. Thus, mixture modelling simultaneously fits models and classifies observations, and the clustering step is closely linked to the calibration step. This favours the similarity of species response within groups rather than the similarity of species traits (Dunstan, Foster & Darnell 2011). Mixture modelling has mainly been developed for observations with a normal distribution (e.g. mixture regressions). The use of mixture models has recently been proposed to model the presence/absence of species (Dunstan, Foster & Darnell 2011), the species richness in a species assemblage (Mao, Colwell & Chang 2005) or the heterogeneity of capture and survival probabilities in free-ranging populations (Pledger, Pollock & Norris 2010).

This study aims at extending mixture modelling to matrix population models. The mixture of matrix population models will simultaneously solve two issues: fit matrix models for species-rich ecosystem with many rare species, and classify species into groups. As proposed in population genetics (Pritchard, Stephens & Donnelly 2000; Corander, Waldmann & Sillanpaa 2003; Guillot et al. 2005), the strategy consists in a probabilistic model-based clustering method expressed in terms of matrix population mixture models with an unknown number of components (Richardson & Green 1997; Dunson 2000; Marin, Mengersen & Robert 2005). The number of groups and the parameters of the matrix population models associated with each group are the unknown quantities. We propose to use a Bayesian framework to infer these unknown quantities. The Bayesian framework has several advantages over frequentist methods. First, it enables us to obtain the credibility interval for finite population sizes, whereas frequentist methods provide asymptotic confident intervals. Secondly, with the use of prior distributions, strong biological or ecological knowledge can be integrated in the model.

The mixture of matrix models is defined in the next section. An inference method is then outlined, and tested using simulated data. The mixture matrix model is finally applied to a data set from the Paracou tropical rain forest in French Guiana. The tree species groups obtained had consistent ecological behaviours with contrasted functional traits, and compared favourably to other groups obtained by a standard classification technique.

Materials and methods

Mixture of matrix population models

When fitting a base model to some observations, it is assumed that the set of observations is homogeneous, in the sense that all observations share a common distribution (e.g. the normal distribution for the residuals of a linear model). When dealing with an heterogeneous set of observations composed of K assumedly homogeneous subsets, mixture modelling is a relevant framework to extend this base model (McLachlan & Peel 2000). Mixture model assumes that the distribution of observations is a mixture of K base distributions, with mixing weights that represent the probability for an observation to belong to each of the homogeneous subsets. Conditionally on an observation belonging to a subset, the model identifies with the base model, while the distribution of the mixture includes the uncertainty on which subset an observation belongs to.

Mixture of matrix population models results from the application of the mixture framework to matrix population models. In matrix population models, individuals are classified into stage, size or age classes, and the population dynamics is described by transition rates among classes (Caswell 2001). At the individual level, these transitions can be interpreted as the transitions of a Markov chain, which defines some distribution of the population-level numbers of individuals that switched between two classes. Assuming that individuals have any of K, such dynamics distribution defines a mixture of K matrix population models. A specificity of the mixture of matrix models is that one observation corresponds to one population (more specifically, it is the vector of all numbers of individual transitions between classes), and the set of observations is the community-level set of populations. Hence, mixtures of matrix models are relevant to model the dynamics of a community when assuming that its constituent species can be assigned to K homogeneous groups of species.

Hereafter, we detail the mathematical expression of the mixture of matrix models for a specific type of matrix population models, namely the Usher model. This framework readily extends to any type of matrix models on the basis of individual transitions among classes.

Mixture of Usher matrix models

The Usher matrix model applies to size-structured populations (Usher 1966, 1969). It is based on the description of the change of the population by a vector, math formula containing the numbers Nl,t of individuals in L ordered size classes (l = 1,…, L) at discrete time t. Let math formula be the total number of individuals at time t. Like any other matrix population model, the Usher model can be interpreted as the expectation of Nt independent Markov chains (Fig. 1). The relationship between math formula and math formula is described by a L × L transition matrix U, called the Usher matrix:

display math(eqn 1)


display math(eqn 2)

and pl is the probability for an individual to stay in class l, ql the probability to move up from class l to l + 1 and f the average fecundity. ql and pl take values in [0, 1], whereas f takes positive real values. The probability to die for an individual in class l is given by ml = 1 − pl − ql. Let math formula be the class distribution of the population, such that dl denotes the probability for a randomly chosen individual to belong to class l (math formula). Let Nl,l,t denote the number of individuals staying in class l between t − 1 and t, Nl,l+1,t the number of individuals moving up from class l to l + 1 between t − 1 and t, and Nl,†,t the number of individuals dying in class l between t − 1 and t. Let Rt be the number of recruits between t − 1 and t, assumed to be a Poisson random variable with parameter f Nt−1. The vector of observations for the population is math formula. The likelihood of the joined individual Markov transitions, and thus of the Usher matrix model, is:

display math(eqn 3)

where math formula denotes the multinomial distribution, math formula the Poisson distribution and math formula, math formula, math formula, f, math formula) is the vector of parameters with math formula,…, pL), math formula,…, qL-1) and math formula,…, mL). Equation (eqn 1) is the deterministic version of the Usher projection model while eqn (eqn 3) accounts for the demographic stochasticity and is useful when the population size gets small Caswell 2001.

Figure 1.

Life cycle representation of the Usher projection matrix model, where pl is the probability for an individual to stay in class l, ql is the probability to move up from class l to l + 1, ml is the probability of dying and f is the average fecundity.

Suppose now that the modelled population arises from K unobserved groups of species such that each group is modelled by a Usher matrix model. Thus, there are K Usher matrices U1,…, UK. Because the group the population belongs to is not known a priori, one can define a random latent variable C that identifies the group of the species. For example, if the species belongs to the third group: conditionally on C = 3, the prediction of the dynamics is given by eqn (eqn 1), with U being replaced by U3. Accounting for the uncertainty on C brings:

display math(eqn 4)

where πk is the posterior probability that C equals k. Equation (eqn 4) defines the mixture of Usher matrix models, whose likelihood is:

display math(eqn 5)

where math formula,…, θK) is the vector of all parameters associated with the K matrix models, math formula,…, πK) is the vector of all posterior probabilities, and math formula is given by eqn (eqn 3). The species can be a posteriori classified by assigning it to the group g with the maximum posterior probability: math formula. Hence, the mixture of matrix models jointly defines K matrix models (and implicitly provides us with a way to estimate math formula) and classifies the species into K groups (i.e. provides an estimate of math formula).

Mixture model inference

The parameters math formula and math formula of the mixture matrix model can be estimated in a frequentist context by maximizing the likelihood (5) of the mixture model. Inference can be achieved using an EM algorithm (McLachlan & Krishnan 2008). However, we used the Bayesian inference framework to have the opportunity to integrate biological knowledge into the model through the prior distribution of the parameters. Based on the direct acyclic graph of the mixture matrix model (Fig. 2), a Markov chain Monte Carlo (MCMC) inference algorithm was implemented: a long sequence of parameter values was randomly drawn from the posterior distribution, and the parameter estimates were extracted from this sample by computing its mode or its means (Gilks, Richardson & Spiegelhalter 1996). Details on the Bayesian inference, including the choice of the priors, are given in Appendix A. Annotated R codes (R Core Team 2012) for the algorithm and a first tentative version of MPMM package are available in the Supporting Information.

Figure 2.

Direct acyclic graph of the mixture of Usher projection matrix model. Double dot arrows indicate deterministic links, dot lines indicate direct links, circles indicate random nodes and frames indicate deterministic nodes.

Fitting a mixture model also requires estimating the number K of groups. Classically, different mixture models with different numbers of groups are independently fitted, and an information criterion is finally used in the end to perform selection among these competing models (Biernacki, Celeux & Govaert 2000 ; see also Cubaynes et al. 2012 in a capture-recapture context). A MCMC algorithm for a fixed K was developed with this aim in view. Alternatively, we also developed an inference algorithm that considered K as unknown and jointly estimated it with the other parameters. This involved using a reversible jump MCMC approach when the number of groups changed (Richardson & Green 1997). With this latter approach, posterior probabilities for each value of K were obtained, thus enabling one to choose the most likely K while assessing the reliability of this choice.

Because the posterior distribution for the number K of groups may be sensitive to changes in the prior distribution for of the parameters when using a reversible jump MCMC algorithm (Richardson & Green 1997), a sensitivity analysis to the priors was performed. Details on the different priors that were tested are given in Appendix A.


Data were simulated to assess the efficiency of the algorithm to correctly classify species into groups, according to different levels of differentiation between groups and different numbers of groups. Simulated data were composed of 100 species distributed across eight diameter classes. Numerical experiments tested the combinations of three factors: (i) the number of groups, that was equal to 1, 5 or 10 (three modalities), and will be referred to as the true number of groups; (ii) the number of individuals per species, that was equal to 100 or 1000 (two modalities); and (iii) hyper-priors for parameters (math formula, math formula, math formula, math formula, f), that took the values given in Table 1 (five modalities).

Table 1. Hyper-prior distributions of the parameters used for simulations. math formula is the Dirichlet distribution, math formula is the gamma distribution. ‘Var’ is the variance of di, of pl, ql, ml, and of f respectively
Differentiation levelDiameter math formulaTransition (plqlml)Fecundity f
Ldiff1math formula(1,1,1,1,1,1,1,1)0·0121math formula (1,1,1)0·055math formula (10, 1000)10−5
Ldiff2math formula (3,3,3,3,3,3,3,3)0·0044math formula (3,3,3)0·022math formula (10, 2000)2·5 × 10−6
Ldiff3math formula (5,5,5,5,5,5,5,5)0·0027math formula (5,5,5)0·014math formula(10, 3000)1·1 × 10−6
Ldiff4math formula (7,7,7,7,7,7,7,7)0·0019math formula (7,7,7)0·010math formula (10, 4000)6·25 × 10−7
Ldiff5math formula (9,9,9,9,9,9,9,9)0·0015math formula (9,9,9)0·008math formula (10, 5000)4 × 10−7

The five different hyper-priors for the parameters corresponded to five levels of differentiation between groups. Indeed, the expectation of the diameter class or transition parameters was constant (E(dl) = 1/8 and E(pl) = E(ql) = E(ml) = 1/3 for all the hyper-priors in Table 1), but their variances decreased from 0·012 to 0·0015 for dl and from 0·055 to 0·0079 for the transition parameters. As this variance corresponded to the between-group variance, the lower it was, the more similar the groups were. Let us note Ldiff1,…, Ldiff5, the five decreasing differentiation levels of the hyper-parameters. When the number of groups was one, only the level Ldiff1 was used for hyper-priors. In total, there were thus: 2 × 1 + 2 × 2 × 5 = 22 combinations of factors in the numerical experiments. For each combination, 50 replications were simulated. For each replication, the 100 species were randomly assigned to groups. This simulated classification was the reference to compare with the estimated classification and was referred as the ‘true classification’. Then, for each group, the diameter class parameters, the transition parameters and the fecundity parameter were randomly drawn according to their hyper-prior distributions (Table 1). Finally, for each species, the prescribed number of individuals was drawn according to the law defined by eqn (eqn 3) using the parameters of the group to which the species belonged.

To assess the performance of the method, we compared the estimated number math formula of groups with the true number K used to simulate data sets, and we compared the estimated classification with the true classification using two set matching indices I1 and I2. These indices are based on the K × math formula contingency table T = (Tij) with i = 1, …, K and j = 1, math formula that cross-tabulates the species according to the true and the estimated classifications:

display math

These indices vary between 1/S and 1, and the higher they are, the better is the adequacy between the two classifications (Meilă 2007). They jointly reflect how groups collapsed and merged: I1 = 1 and I2 = 1 means that both classifications were identical; I1 = 1 and I2 < 1 means that the number of groups was underestimated and one or more groups were merged; I1 < 1 and I2 = 1 means that the number of groups was overestimated and one or more groups were split; I1 < 1 and I2 < 1 means that several set operations are needed to move from one classification to the other.

Tropical forest data

Data on the tropical rain forest were collected at the Paracou experimental site (5°18′N, 52°53′W), French Guiana. The site is located in a undisturbed terra firme forest under equatorial climate. Three 250 × 250 m permanent sample plots (18·75 ha in total) have been established in 1984 and left as control of the undisturbed forest dynamics. All trees greater than 10 cm d.b.h have been identified and georeferenced. Girth at breast height, standing deaths, treefalls and newly recruited trees greater than 10 cm d.b.h have been monitored either annually or every 2 years since 1984 (Gourlet-Fleury, Guehl & Laroussinie 2004). Because the Paracou forest is a mature undisturbed forest, the diameter distribution in those control plots could be considered at quasi-equilibrium. Two data sets were extracted from the Paracou data base: one training data set to infer the mixture of Usher models, and one validation data set. A data set gave the species, the diameter class at year t and the diameter class at year t + 2 for n trees. Trees that died between years t and t + 2, and trees whose diameter overcame the inventory threshold of 10 cm between years t and t + 2 (recruited individuals) were included in the data set.

The training data set consisted of the data collected in 1993 and 1995 on the three control plots. One hundred and eighty-one species were identified in these three control plots (Fig. 3), illustrating both the high species richness, and the relative scarcity of most species of the Guianan forest. The mean number of individuals per species was 64·54 (total on the three control plots of the training data set), with a minimum of 1 and a maximum of 980. The median number of individuals per species was 22, with a first quartile of 8 and a third quartile of 61·25. Although it could be possible to include species with few individuals into the analysis, we decided to leave out species with less than 20 individuals in the control plots in 1993. A preliminary analysis (data not shown) evidenced that there was little difference between the classification based on all species and the classification restricted to species having at least 20 individuals: the algorithm took longer to converge in the former case, rare species were not well classified, and actually behaved like noise with respect to the estimation of groups. Moreover, from an ecological point of view, it does not make sense to assign species to groups when they are represented by few individuals. It is ecologically much more meaningful to a posteriori assign rare species to existing groups, using expert's knowledge on the species autecology. Hence, we reckon that rare species should rather be a posteriori assigned to existing groups. We were left with 93 species that included at least 20 trees monitored in the three control plots. This training data set contained 10 756 trees. The validation data set consisted of the data collected in 2009 on the same three control plots.

Figure 3.

Rank-abundance diagram in the control plots at Paracou in 1993.

A classification of tree species into five groups was defined at Paracou by Favrichon (1994) using multivariate analysis and k-means clustering of species attributes (including size summary statistics, growth and recruitment). On the basis of these groups, Favrichon (1998) then fitted a Usher matrix model to predict forest dynamics. Hence, Favrichon's approach is illustrative of a two-step approach with a species classification that is disconnected from the matrix population model. We compared Favrichon's species classification with the one obtained by the mixture matrix model using the likelihood (5) of the training data set. Because there were missing observations between 1995 and 2009, the same computation was intractable for the validation data set. Nevertheless, considering that the undisturbed forest was close to equilibrium, we also compared the likelihoods of the validation data set given the asymptotic diameter distributions according to the two classifications. For a given population with Usher transition matrix U (eqn 2), the asymptotic diameter distribution is the normalized eigenvector of U associated to its dominant eigenvalue (Caswell 2001).


Recovery of simulated classifications

Simulation results were similar whether we used a uniform or a truncated Poisson distribution as a prior for K. Hence, only the results with the later prior (that was the default one) are reported here. For 1000 individuals per species, the estimated classification perfectly matched with the true simulated classification for all differentiation levels: I1 and I2 were always equal to one.

For 100 individuals per species, the results depended on the differentiation levels and on the number of groups (Table 2). When the true number of groups was one, the algorithm always found one group. For five groups, we correctly estimated the number of groups in 100%, 100%, 96%, 76% and 52% of the cases for the five decreasing levels of differentiation respectively. When the number of groups was wrongly estimated, it was systematically underestimated: I1 was very close to 1 and I2 always remained lower than I1. The classification method tended to merge different species groups into one group, and to dispatch very few species of a given group into another group. The same results were found with stronger evidence in the case of 10 groups. At the fourth level of differentiation, the number of group was correctly estimated in about 80% of the cases, and more than 95% of the species were classified into the correct groups.

Table 2. Comparison between simulated and estimated classifications: mean of (I1, I2) on the 50 simulations for 100 individuals per species, depending of the differentiation levels for the hyper-priors. Definition of Ldiffi is given in Table 1
Differentiation levelOne groupFive groupsTen groups
  1. n.d., not defined.

Ldiff1(1, 1)(1, 1)(1, 1)
Ldiff2n.d.(0·996, 0·996)(0·998, 0·988)
Ldiff3n.d.(0·996, 0·989)(0·978, 0·889)
Ldiff4n.d.(0·983, 0·933)(0·929, 0·686)
Ldiff5n.d.(0·964, 0·865)(0·899, 0·574)

Tropical rain forest tree species classification

The 93 tree species at Paracou were classified using the mixture of matrix models, based on eight diameter classes (≤15 cm, 15–20, 20–25, 25–30, 30–40, 40–50, 50–60, ≥60 cm). Based on 50 different chains, and 20 000 iterations after a burn-in of 10 000 iterations, five groups were obtained 48 times and six groups twice. Groups remained globally the same for all chains. We kept the chain with the highest log-likelihood. For this chain, the posterior probabilities for K = 5, 6, 7 or 35 groups were equal to 0·99, 5·3 × 10−3, 9·3 × 10−4 and 6·7 × 10−5 respectively.

The sensitivity analysis to the prior distributions showed that the estimate of K was fairly insensitive to the specification of the prior distributions for the parameters. For all priors except one, the algorithm found again five groups of species. The exception corresponded to α = β = 10 for the priors of the transition and diameter class parameters, to be compared to α = β = 1 for the default prior (Appendix A). In that case, K was estimated to three groups (with former groups 2 and 3 merged into a single one, and former groups 4 and 5 merged into a single one). Because α and β can be interpreted as pseudo-counts of individuals in diameter classes, large values of α and β tend to decrease the impact of observations on the classification, in particular for the largest diameter class that have few observations. Hence, the sensitivity of K to α and β expresses the sensitivity of the species classification to differences between species in the largest diameter classes.

To help interpreting the five species groups, five demographic and biological attributes were computed for each group: growth rate, mortality rate, fecundity rate, upper bound for diameter and turnover. Direct estimates of these attributes were computed from the training data set, and compared to the indirect estimates obtained from the estimated transition and diameter class parameters of the mixture matrix model (see the Supporting Information for the estimates of all mixture matrix model parameters). The direct estimate of growth was the mean diameter increment between 1993 and 1995 of all trees that belonged to the group, while its indirect estimate was math formula, where δi is the width of the ith diameter class. The direct estimate of the mortality was the ratio of the number of dead trees in the group between 1993 and 1995 over the number of trees in the group in 1993, while its indirect estimate was math formula. The direct estimate of the fecundity was the ratio of the number of recruited trees in the group between 1993 and 1995 over the number of trees in the group in 1993, while its indirect estimate was f. The direct estimate of the upper bound for diameter was the 95% quantile of diameters in 1995, while its indirect estimate was interpolated from math formula assuming that the diameter distribution was uniform within each class. Finally, the turnover was computed as half the sum of the mortality rate and of the fecundity rate. The direct and indirect estimates of these attributes were not expected to be strictly equal since they did not derive from the same estimators; yet, their values were quite close and evidenced the same differences between groups (Table 3).

Table 3. Observed vital rates of groups (Obs.) and average vital rates computed from the estimated transition rates (Est.): 2-year d.b.h increment (ΔDBH), 2-year mortality rate, 2-year fecundity rate, upper bound of diameters (DBH95) and 2-year turnover of the five groups obtained using matrix population mixture model classification. The observed ΔDBH for group i was math formula, where math formula was the d.b.h of individual j at year t, and ki the number of individuals in group i
GroupΔDBH (cm)Mortality (%)Fecundity (%)DBH95 (cm)Turnover (%)

Groups were labelled by decreasing order of growth (Table 3). The gradients of maximum size and turnover perfectly paralleled this gradient of growth, with the fastest growing group 1 having the greatest maximum size and the lowest turnover rate. Group 1 was composed of emergent mid-tolerant species, i.e. species that need to settle in the upper strata and sometimes above the forest canopy to complete their whole life cycle. Group 2 was composed of a mix of shade-tolerant (mostly) and light-demanding (to a lesser extent) canopy species. Group 3 was composed of shade-tolerant species, with a mix of canopy (mostly) and understorey (to a lesser extent) species. As a consequence, its growth rate and maximum size were lower than for group 2, but higher than for group 4. The two small-sized groups 4 and 5 were composed of understorey shade-tolerant species, although group 4 also included a few pioneer species. As a consequence, the growth rate of group 4 was higher than that of group 5.

Because mixture of matrix models jointly classifies species and fits matrix models, we also compared the predicted and the observed number of individuals in each diameter class and each group in 2009, to check the validity of the matrix model. The mixture matrix population model correctly predicted both the number of trees 16 years later and their size distribution (Fig. 4).

Figure 4.

Predicted (boxplot) and observed (black dot) number of individuals in each diameter class and each species group in the control plots at Paracou in 2009.

The log-likelihood of the training data set was −2722·7 for the Bayesian classification and −3351·7 for Favrichon's classification. The log-likelihood of the validation data set given the asymptotic diameter distribution was −2007·7 for the Bayesian classification and −2874·3 for Favrichon's classification. Hence, both criteria largely favoured the Bayesian classification to the detriment of Favrichon's classification.


Mixture modelling can deal with matrix population models, and can jointly classify species and fitting matrix models. Mixture of matrix population models can be addressed in the frequentist or in the Bayesian context. The algorithm that we developed in the Bayesian context performed well on simulated data with known groups, even when the differentiation between groups was low. Classification was correctly predicted when between-group variances were higher than 0·0019 for diameter parameters (math formula) and 0·010 for transition parameters (math formula and fk), corresponding to the fourth level of differentiation (see Table 1). A specificity of the Bayesian method presented here is that it estimated the number K of groups together with the other parameters. This is original as mixture modelling generally operates conditionally on K, and then uses an information criterion to select K (Biernacki, Celeux & Govaert 2000). Moreover, the Bayesian approach allowed us to construct prior distributions taking into account ecological expert knowledge. For example, we assumed that the prior diameter distribution was a Dirichlet distribution where all parameters were equal to one meaning that the diameter distribution was uniform across diameter classes. Nevertheless, using the Bayesian paradigm, it is straightforward to change the prior distribution to model expert knowledge, assuming for example that the diameter distribution is decreasing from the first to the last diameter class.

The method that we developed for the mixture of Usher matrix models could straightforwardly be adapted to other types of matrix projection models, such as Leslie or Lefkovitch matrix models for age- and stage-structured populations respectively. Starting from the life cycle representation of the matrix model (Fig. 1), one simply has to translate the probabilities associated to each transition into a distribution law for an observation (eqn (eqn 3)).

When applied to a tropical rain forest at Paracou, the mixture of Usher matrix models was able to jointly classify species and make reliable predictions. Predictions were better with the mixture model than with Favrichon's two-step approach, thus exemplifying that a classification disconnected from the matrix model may not be optimal to predict the community dynamics. The characteristics of the tree species groups formed at Paracou were consistent with known ecological behaviour (Lieberman et al. 1985; Nascimento et al. 2005; Delcamp et al. 2008; Poorter et al. 2008): small-sized species (with the exception of pioneers) tend to grow slowly, to have high recruitment and mortality rates (i.e. high turnover rates), whereas large sized species that reach the forest canopy tend to grow rapidly and have low turnover rates. The mixture of Usher matrix models classified species according to both their growth rate and their maximum size (Picard et al. 2012). When plotting species along these two axes, species groups were clearly separated (Fig. 5). Because these two axes can be used to order species along a continuum of ecological strategies (Turner 2001; Alder et al. 2002), this means that the mixture of Usher matrix models was also able to classify species in a way that is consistent with their autecology.

Figure 5.

Upper bound of diameters (95% quantile of d.b.h. in 1995, in cm) vs. mean diameter increment between 1993 and 1995 (cm) for 93 species at Paracou, French Guiana. The five different symbols correspond to the five groups defined by the mixture matrix model.

The heterogeneity, in terms of light-requirement, found in groups 2 and 4 can be easily understood given the environmental conditions prevailing in the control plots. These plots are largely undisturbed, with only small gaps occurring at a rate of more or less 3 per year (Gourlet-Fleury, Guehl & Laroussinie 2004). Such conditions do not favour the growth of light-demanding species, nor the growth and survival of pioneer species. Because these species do not express their growth potential, they tended to be gathered with slower growing species in groups 2 and 4. This, in addition to the fact that few pioneer species can survive in these plots, explains why no pioneer group was identified by our procedure while such a group usually is the first one to be isolated in a classification, due to its particular behaviour (Swaine & Whitmore 1988). Applying the mixture of matrix models to disturbed plots would have raised a different classification better accounting for the variety of potential specific behaviours.

In the Paracou example, the distribution of individuals across diameter classes in 1993 was taken into account in the mixture of matrix models: the likelihood (eqn 3) depended on the vector of parameters math formula. This means that the shape of the initial diameter distribution influenced the outcome of the species classification. This made sense for the Paracou control plots because these plots were settled in undisturbed forest, whose state in 1993 could be considered as close to equilibrium. The vector math formula was thus representative of the equilibrium state of the forest. We checked indeed (results not shown here) that the asymptotic growth rate of the matrix models was close to one, and the associated eigenvectors close to math formula. In other situations where the forest is far from equilibrium, it might not be advisable to account for the initial diameter distribution math formula in the species classification. Computing the conditional likelihood knowing math formula would enable to drop math formula from the expression of the likelihood (eqn (eqn 3)). Apart from this, the mixture of matrix models would be unchanged.


This study is part of the GUYASIM project (31032, operational program FEDER 2007–2013), with financial support from European structural funds. This work also has benefited from an ‘Investissement d'Avenir’ grant managed by the Agence Nationale de la Recherche (CEBA, ref. ANR-10-LABX-0025). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Appendix A

Bayesian inference

Let S be the number of species in the calibration data set. Using the same notation as in the section ‘Mixture of Usher matrix models’ with the additional superscript s, let math formula,…, math formula, math formula, math formula) be the vector of observations for species s = 1, …, S and let math formula be the vector of observations for all species. Let math formula be the latent vector that gives the group of each species. Considering K as unknown, the posterior probability πk follows from the posterior density distribution of the mixture model:

display math(6)

where math formula is given by eqn (eqn 3), and math formula, math formula and math formula are the prior densities associated with the class latent random variables, the parameters of each matrix model and the number of groups respectively. For full Bayesian inference of the model, we set the followings priors on the unknown quantities math formula, math formula and K.

We assumed that the prior distribution for the number K was a Poisson distribution with mean one, truncated to strictly positive values: math formula. This prior distribution was suggested by Nobile (2005) to be more parsimonious than under uniform distribution. For the sensitivity analysis, a uniform distribution between one and S was also used as a prior for K.

The parameters associated with the matrix population model for group k are math formula and math formula. The prior for the parameters math formula of the K matrix population models assumed that the parameters of the different classes and groups were independent:

display math

Because the Dirichlet distribution (denoted math formula) is the conjugate prior of the multinomial distribution, we used the Dirichlet distribution as a prior for all transition parameters and all diameter class parameters: math formula, math formula and math formula, where α and β are hyper-parameters that can be interpreted as pseudo-counts of individuals. The default priors used α = β = 1. For the sensibility analysis, we also tested α = β = 0·5 that corresponds to the non-informative Jeffreys prior (Jeffreys 1946; Atwood 1996), and α = β = 10. Because the gamma distribution (denoted math formula) is the conjugate prior of the Poisson distribution, we used the gamma distribution as a prior for the fecundity parameter: math formula, where δ and γ are hyper-parameters. The default prior used γ = 0·01 and δ = 1, which expresses the expert's knowledge that the recruitment rate in undisturbed natural rain forest is around 1%. For the sensitivity analysis, we also tested γ = 0·5 and δ = 1, 10−1 or 10−10 (but the Jeffreys prior that corresponds to γ = 0·5 and δ = 0 could not be used because it is improper).

The prior for the class vector math formula assumed that, given the number of groups, each species could equally and independently of the other species be in any group: math formula where math formula is a uniform distribution on the number of groups: math formula.

The inference of parameters was made through the investigation of the posterior distribution math formula defined by eqn (6). As the number of groups was unknown, the posterior distribution was not available in an analytic form. Hence, a specific Metropolis within Gibbs MCMC algorithm was developed. The algorithm consisted of three moves: increasing the number of groups (birth case); decreasing the number of groups (death case); keeping the same number of groups, but potentially changing one species assignment (no jump case). In the first two cases, the number of parameters was not constant, so a reversible jump MCMC approach was used (Richardson & Green 1997), whereas in the third case, a Gibbs step could be used. All moves were equally distributed with probability 1/3.

In the following, we detail the proposal step for the three moves and the selection step for the birth and death cases.

1. Proposal step. Let |k| denote the number of species in group k, for k = 1, …, K. Let K* denote the number of groups of the proposal and math formula denote the latent class vector of the proposal.

● No jump case: K* = K. The proposal math formula for the latent class vector is drawn in two steps:

(a) randomly choose one species s among the groups that include two or more species;

(b) new assignment math formula for species s is sampled from a multinomial distribution math formula, whereas math formula for t ≠ s. The coefficients wk are equal to math formula where math formula is given by (3).

● Birth case: K* = K + 1. The proposal for the latent class vector is obtained by splitting one group into two subgroups:

(a) randomly choose one group k among the groups that include two or more species; this group will form two subgroups labelled k1 and k2;

(b) choose the number |k1| of species that will compose group k1 following a uniform distribution: math formula

(c) sample |k1| species among the |k| species in group k and allocate them to the first subgroup k1. The others are allocated to the second subgroup k2. Let D denote the resulting allocation vector of the |k| species between k1 and k2.

Let math formula denote the new classification that results from math formula through steps (a)–(c). Then, the conditional probability distribution of the new classification into K + 1 groups given the old one into K groups, math formula, is defined by:

display math

● Death case: K* = K − 1. The proposal for the latent class vector is obtained by merging two groups into a single one: randomly choose two groups among K and merge them into one group. Let k1 and k2 be the two selected groups and let math formula be the new classification that results from math formula by merging k1 and k2. Then, the conditional probability distribution of the new classification into K − 1 groups given the old one into K groups, math formula, is defined by:

display math

2. Selection step. Given math formula and K, the vector of new parameters math formula, math formula, math formula, f, math formula is sampled from its marginal posterior distribution math formula. This marginal posterior distribution (not given here to save space) is known in an analytical form since multinomial/Dirichlet and Poisson/gamma distributions are conjugate distributions (Robert & Casella 2005).

The following equations give the expression of the Metropolis-Hasting ratio in the death case, for example. Let the current number of groups be K, and the new state K* be − 1. Let us assume that two groups k1 and k2 have been chosen and merged into a unique group k. Then,

display math

Moreover, math formula is the ratio of marginal posterior distributions of math formula and is equal to

display math

where Nk is the set of observations belonging to all species classified in group k. math formula is broken down as follows:

display math


display math

where nllk, nl(l + 1)k and nl†k are the number of individuals in group k that respectively stay in class l, move from class l to l + 1 or die;

display math

where nlk is the number of individuals of group k in class l at initial time t; and finally,

display math

where nk is the total number of individuals in group k at initial time t and n01k is the number of recruits in group k. Given this, the calculation of prior distribution as well as likelihood ratios is straightforward. As the matrix population model parameters are sampled from their posterior distributions, the canonical reversible transition function is the identity function. Hence, its Jacobian is equal to one and does not appear in the Metropolis-Hasting ratios.