Bayesian multivariate longitudinal model for immune responses to Leishmania: A tick-borne co-infection study

While many Bayesian state-space models for infectious disease processes focus on population infection dynamics (eg, compartmental models), in this work we examine the evolution of infection processes and the complexities of the immune responses within the host using these techniques. We present a joint Bayesian state-space model to better understand how the immune system contributes to the control of Leishmania infantum infections over the disease course. We use longitudinal molecular diagnostic and clinical data of a cohort of dogs to describe population progression rates and present evidence for important drivers of clinical disease. Among these results, we find evidence for the importance of co-infection in disease progression. We also show that as dogs progress through the infection, parasite load is influenced by their age, ectoparasiticide treatment status, and serology. Furthermore, we present evidence that pathogen load information from an earlier point in time influences its future value and that the size of this effect varies depending on the clinical stage of the dog. In addition to characterizing the processes driving disease progression, we predict individual and aggregate patterns of Canine Leishmaniasis progression. Both our findings and the application to individual-level predictions are of direct clinical relevance, presenting possible opportunities for application in veterinary practice and motivating lines of additional investigation to better understand and predict disease progression. Finally, as an important zoonotic human pathogen, these results may support future efforts to prevent and treat human Leishmaniosis.


Bayesian Model
For N = 50 dogs and T = 7 time points, the proposed model is structured as follows i,t , π D i,t+1 ∼ Multinomial 1; f D (D i,t , P i,t , A i,t , X i ) A i,t+1 ∼ N f A (D i,t , P i,t , A i,t , X i ), σ 2 A P i,t+1 ∼ N f P (D i,t , P i,t , A i,t , X i ), σ 2 for k = 1, 2, 3 in the probability terms, and i = 1, . . ., N and t = 1, . . ., T .

Complete Data Likelihood
Using the model definition as described by Equations 2 and 3, the joint likelihood can be defined by the product of each probability density (or probability mass) function corresponding to each model component.In this case, we have two continuous outcomes (pathogen load and antibody levels) and one categorical (disease status).To facilitate the definition of the likelihood presented in Equation 4, let us define θ P = (β P , α P , σ 2 P ) as the set of vectors associated with pathogen load.Similarly, we can define the set of parameters associated with the antibody levels and disease status as θ A = (β A , α A , σ 2  A ) and θ D = (β D α D ) , respectively.
Here we have that the mean expressions for each normal density distribution are defined as η P i,t−1 = M P i,t β P +X i α P , and η A i,t−1 = M A i,t β A +X i α A , respectively.The expressions for the π (k) i,t−1 are presented in Equation 2, and where i,t−1 = 0 otherwise, which are defined based in Equation 1.Note that a proportionality notation is used in the second line of the likelihood, which helps in factoring out all of the constants or fixed quantities.In addition, notice that we defined our baseline category in the multinomial distribution as

Full Conditionals (Model Components)
Since pathogen load, antibodies level, and disease status were not observed for some of the time points, which are known as latent variables, then we estimated the corresponding components of the model for those time points.Therefore, the full conditional for the three main components of the model are given below.

Disease Status
1.5 Full Conditionals (Parameters) If ω 1 is equal to one of the parameters in {β P , α P , σ 2 P }, then the full conditional is given by If ω 2 is equals to one of the parameters in {β A , α A , σ 2 A }, then the full conditional is given by Finally, if ω 3 is equals to one of the parameters in {β In Equations 8-10, π(ω 1 ), π(ω 2 ), and π(ω 3 ) denote the prior distribution of the parameters associated with the specific model component.A complete list of these prior specifications is provided in the main text (Section 3.5 Computation and Model Diagnostics).

Simulation Study
In this section, we present a simulation study for the development and implementation of the withinhost Bayesian modeling of Leishmania infection, but the model presented here is a more straightforward case than the one presented in the main article.In order to develop an effective protective immune response against pathogens, it is necessary to have some specific and complex coordinated immunological events happen over a specific period of time.These events involve a variety of immune cells, within different tissues or organs, and the interplay between the pathogen load in the host's system, the clinical signs of the subject, and these immune responses.

Infection States
The first step in this development is to propose a simple organizing principle to describe infection state as consisting of three high-level categories: regulatory and inflammatory responses from the immune system, and pathogen load.The regulatory and inflammatory processes have multiple components subdivided into innate and adaptive immune processes and potential phenomena related to particular cell types.While immunological assays are uniformly more granular than this proposed aggregate framework, we propose that these categories represent a meaningful infection-immune state and provide a useful way to describe the progression of these states.The pathogen state within a host is defined by the amount and composition of the infectious organisms.

Model Specification
The model for this exploratory work considers the fitting of longitudinal data on individual hosts and their clinical status.For one, we define t i,j to be the time index j of subject i, with the baseline value for an individual host denoted as t i,1 , where the time is modeled discretely.Let P i,j be a scalar in the single pathogen model context (pathogen burden of Canine Leishmaniasis), denoting the pathogen load for the corresponding indexes.We measure the inflammatory state of the adaptive immune response by characterizing CD4 cells and associated cytokines, including the concentration of T H 1 cells, as well as the level of IFN-γ being expressed.We also include measures of antibody expression by B-cells.We denote the latent inflammatory state of an immune response as column vector I i,j , with elements varying by the pathogen.For example, we might define I i,j to be a two column-vector where the first component, I i,j,1 , captures the T-cell-related inflammatory response, and the second, I i,j,2 , captures the B-cell related antibody response.The regulatory response R i,j is closely coupled with the inflammatory parts of the adaptive immune system.For example, we can focus on the CD4 cells: serum concentration of T H 1-reg cells, IL-10 expression, and cell-surface PD-1 expression.The latent regulatory state of an immune response might be a two column-vector as well, R i,j , for the appropriate indices, and similar to the inflammatory specification, it could comprise T-cell regulatory response and B-cell related antibody regulatory response.
For all of the described state components, we assume that the future state is expected to depend on the current state, as well as possibly on the case history.In order to capture this temporal dependence, we introduce the arrays R i,t<j+1 , I i,t<j+1 , and P i,t<j+1 to denote the latent regulatory, inflammatory, and pathogen state vectors for subject i from time t i,1 to t i,j .In Equation 11, we observe three unknown update functions for the regulatory (R) and inflammatory (I) responses of the immune system as well as for pathogen load (P ), which describe the expected state at the next time point for each of the three main model components.In addition, these update equations describe individualspecific error components, capturing measurement error and individual heterogeneity.In general, the R and I components are allowed to depend on the entire response history of the host but change in pathogen burden P depends only on the current immunopathogenic state.The chosen form of these functions depends on pathogen-specific concerns and can range in complexity from straightforward linear functions of state components to involving interaction terms, and even to complex nonlinear expansions of the state vector.
For the simulation study, we define the following notations of model components, • I i,j,1 to denote the proportion of (interferon) IFNγ+ CD4+ T cells, • I i,j,2 the proportion of proliferative CD4+ T cells, • R i,j,1 the proportion of (interleukin) IL10+ CD19+ B cells, • R i,j,2 the proportion of (interleukin) IL10+ CD4+ T cells, and • P ij the number of parasites per 10 6 peripheral blood mononuclear cell (PBMC) In real datasets, these measurements could be obtained from different diagnostic assays from the laboratory, such as the enzyme-linked immunoassay (ELISA), real-time quantitative polymerase chain reaction (RT-qPCR), SNAP 4Dx Plus test, and among others.

Assumptions
In most cases when sample extraction is performed, it is not usually known the exact amount of pathogen load that is initially eliminated by the innate immune system.Therefore, we have addressed this by assuming that the pathogen load measured at the first time point is the pathogen load after the innate immune system has eliminated a certain amount of the pathogen genetic material, and it is not a measure of the pathogen burden that was introduced initially by the infectious agent in the host.

State Transitions and Rules
To define clinical status and progression, we defined three categories of disease state, based primarily on pathogenic information.We say that a subject could be healthy (H) or in a silent state; be asymptomatic or in an acute state where inflammatory responses (I) are mainly activated; or in a symptomatic or chronic state, where regulatory (R) activation occurs.Therefore, to determine the transitions between each of the disease states, we used the work from Esch et al. [2] as a basis to define the following transition rules for disease state: Based on this set of rules, we can now define the transition functions and probabilities of state transitions, which are the basis for the construction of the simulated data set.In general, it is common to see an increase in pathogen load earlier in the course of an infection.Once the inflammatory responses of the immune system are activated by some specific cells, we can possibly observe a slight decline in the pathogen load, which is subject-specific.If treatment is not considered during the course of infection and the inflammatory response is not effectively fighting the disease, the pathogen load is expected to increase again, causing progression to the worst state, previously denoted as the chronic state.Figure 1 of Petersen [3], shows a temporal trend for parasite load as well as for CD4+ T cells, and productive versus non-productive antibodies.Using this information on the temporal behavior of a common trend in pathogen load, we were able to define transition functions for the state components of the CanL infection that can provide this behavior for pathogen load.

Example: Random Walk
As an example, we run a random walk simulation by using all of the information explained in previous sections to show how subjects can transition among disease state space based on immunopathogenic information.For instance, a random walk for the interplay of VL pathogen load and clinical state for a single individual is shown in Figure 1, where the threshold values used to define the transition rules are being plotted as horizontal dashed lines.The plot shows how disease classification changes as a function of pathogen load.Notice that for this particular artificial subject, we were able to identify the disease states over time.The subject started in the healthy state but it transition to the second state once it crossed the first cutoff value as presented in the second rule as presented in Subsection 2.2.2.Usually, when the inflammatory response is activated, we can see a slight decline in the pathogen load, which in Figure 1 starts from time point 2.Then, if treatment is not considered, as we assumed in this simulation setting, and if the inflammatory response is not effectively eliminating the disease material, then pathogen load is expected to increase causing a progression to the worst state, the regulatory state, as we observed for this particular subject towards the end of the temporal trend.
For the case of the inflammatory and regulatory responses, we used previous works from the literature to simulate corresponding values.For instance, Schaut et al. [4] provided that approximate percentages of IFNγ+ CD4+ T cells was fluctuating from 10% to 35% during the progression from a healthy to a regulatory state.The same authors also provided guided values for the percentage of proliferative CD4+ T cells ranging from 15% to 50%, the percentage of IL10+ CD19+ B cells ranging from 20% to 65%, and the percentage of IL10+ CD4+ T cells ranging from 3% to 25%.On the other hand, guided values for pathogen load (Canine Leishmaniasis) were provided by Esch et al. [2], where the number of parasites/10 6 PBMCs greater than 100 corresponds to a subject in a chronic state.If we replicate the random walk process for several subjects, we obtain the temporal trends in Figure 2, where the red line represents the mean pathogen load over time.In this setting, we assume that all data was observed, but in real studies, this is usually not the case.

Bayesian Model
Now that we have discussed the initial stages of the exploratory work, which was the simulation of the data set, we can now define the Bayesian model that is being proposed.We now assume that the generating process of the simulated data is not known, in particular, we are assuming that the nonlinear functions that were used in simulating the data are not known as well.The Bayesian model is defined as in Equation 12.In this specification, g A (A = {P, I, R}) represents a basis-spline expansion over the range of the latent variables, and C ij (optionally) denotes a relevant individual specific covariate vector.We also include interactions between the operands (Int).The terms collected into X 1 ij and X 2 ij form the overall design vectors for each component for subject i at time j.This highly flexible specification is designed to allow us to learn previously unknown nonlinear effects of the current state on future developments, including effects like plateaus, polynomial-like relationships, and possible interactions.   , σ 2 p ) where :

Posterior Results and Diagnostics
We fit the model using the rjags package [5] in R [6,7].For the purpose of this simulation, we considered 25 hosts and 10 points in time.Three chains of 11,000 iterations with 1,000 as a burn-in period were used, for a total of 10,000 on each chain.Independent standard normal distributions were considered for the beta coefficients as well as for the between and within host variability.The gamma distribution is used for the precision parameter since the normal distribution in rjags is defined in terms of precision, which is the inverse of the variance.Posterior results of selected parameters are shown in Table 1, which are related to the pathogen component of the model.Notice here that all posterior means are positive, which indicates that the corresponding variable produces an increasing effect on the pathogen load in a future state.The column of the potential scale reduction factor, R, tells us that selected parameters reached a level of convergence.The posterior density and trace plots are shown in Figure 3.
Parameter  In Figure 4, we present the (simulated) observed pathogen load trajectories with the posterior predicted pathogen load and corresponding 95% credible bands.From this plot, we observe how close the predicted data is to the observed curve.We estimated that the model explains 87% variability of the pathogen component, indicating good performance of the model.

Discussion
In the main article, we expand this model specification, and we repeat the analysis by using real longitudinal data, including clinical and molecular diagnostic data of a cohort of dogs that are naturally exposed to L. infantum.In addition, we incorporate individual-specific covariates such as age, clinical conditions such as indicators for co-infection, and treatment assignment.In addition to characterizing evidence for the processes driving disease progression, we predict individual and aggregate patterns of CanL progression, by using posterior predictive trajectories based on posterior distributions of the model parameters.

Figure 1 :
Figure 1: Random Walk.Pathogen load trajectory for a single individual where disease progression is classified based on transition rules as defined in Subsection 2.2.2.

Figure 2 :
Figure 2: Random Walk Replication.Pathogen load trajectories for 50 individuals where disease progression is classified based on transition rules defined in Subsection 2.2.2.The red line indicates the group means over time.

Figure 3 :
Figure 3: Posterior Plots.Posterior density and trace plots for the selected parameters associated with pathogen load.

Figure 4 :
Figure 4: Estimation.Estimated pathogen load for a particular subject.
is in healthy state (H) if pathogen load (P ) is less than 10 parasites/10 6 PBMCs.
Remain in H if P < 10 2. A subject transitions to inflammatory state (I) if pathogen load (P ) is greater or equals than 10 parasites/10 6 PBMCs, but less than 100 parasites/10 6 PBMCs.Transition from H to I if 10 ≤ P < 100 3. A subject transition to regulatory state (R) if pathogen load (P ) is greater or equals than 100 parasites/10 6 PBMCs.Transition from I to R if P ≥ 100

Table 1 :
Results.Posterior summary of the 3 MCMC chains for some selected model parameters.In this summary, posterior means, standard deviation, 95% credible interval are provided, and measures for convergence diagnostic are provided.Selected parameters are associated with pathogen load.