Qualifying drug dosing regimens in pediatrics using Gaussian processes

Drug development commonly studies an adult population first and then the pediatric population. The knowledge from the adult population is taken advantage of for the design of the pediatric trials. Adjusted drug doses for these are often derived from adult pharmacokinetic (PK) models which are extrapolated to patients with smaller body size. This extrapolation is based on scaling physiologic model parameters with a body size measure accounting for organ size differences. The inherent assumption is that children are merely small adults. However, children can be subject to additional effects such as organ maturation. These effects are not present in the adult population and are possibly overlooked at the design stage of the pediatric trials. It is thus crucial to qualify the extrapolation assumptions once the pediatric trial data are available. In this work, we propose a model based on a non‐parametric regression method called Gaussian process (GP) to detect deviations from the made extrapolation assumptions. We introduce the theoretical background of this model and compare its performance to a parametric expansion of the adult model. The comparison includes simulations and a clinical study data example. The results show that the GP approach can reliably detect maturation trends from sparse pediatric data.

• We compare the proposed non-parametric method to a parametric maturation function approach. The comparison is done in a simulation study and a real data example. The simulation study consists of four simulation studies with realistic and sparse data designs. Different simulated maturation trends are explored (no maturation, relevant maturation, and bi-phasic maturation). At last, a case study from a real pediatric study is evaluated.
This article is structured as follows. Section 2 describes pharmacokinetic modeling as used for sparse pediatric data and connects it to drug responses and dosing. Section 3 describes the proposed approach for modeling maturation effects. Section 4 shows the simulated case studies that show the advantages and limitations of the proposed approach. In Section 5, we compare the proposed approach to existing parametric approaches on real pediatric data. The article concludes with a discussion in Section 6.

PHARMACOKINETIC MODELING IN SPARSE DATA CONTEXTS
Pharmacokinetic (PK) models are nonlinear mixed-effect models that aim to describe the ADME processes of a drug. 14 The aim is to predict the concentration of a given drug in the blood as a function of time for an individual patient. The prediction is based on the dosing history and details of the patient, like body weight. As each patient differs from one another, most of the parameters in a PK model are specific to the patient and vary around typical parameter values. Thus, most of the parameters are modeled in a hierarchical approach, where each patient have their own set of parameter values, which are modeled as normally distributed random effects centered at the typical population estimates. The typical population estimates are modeled as a function of the patient's individual traits such as body weight. PK models are trained with data that consist of patients, their dosing history and drug concentration measurements in blood. Kallen 14 provides a good introduction on the theory and derivations of the most used PK models and their efficient computation. This work concentrates on sparse data designs as they are common in pediatric trials. Pediatric trials usually have a very limited sampling design of the blood concentrations. This leads to restrictions as to what can be inferred from the data. Due practical issues, blood concentrations are usually only measured just before the next dose is given. In this case, it is only possible to learn how patients reach and stay at the steady-state conditions when subject to continuous and regular dosing administration schemes. As no detailed information on the absorption process is measured by such a design, it is possible to only consider a very simple one-compartment model. In this work, we concentrate on a 1st order absorption process corresponding to an oral route of administration. Physiologically this model assumes that the drug is administrated through mouth, from which it transfers through stomach to the central blood circulation and is then eliminated. Assuming that patient i ∈ [1, … , N] has a dosing history {D j , t j } d (i) j=1 for doses D j at times t j , this model can be described with the following first-order process: , a (i) 1 (0) = 0, and a (i) where a (i) 1 (t) is the mass (mg) of the drug in absorption compartment (stomach), a (i) 2 (t) is the mass (mg) of the drug in central compartment (blood circulation), k (i) a is the absorption rate constant (1/h), Cl (i) is the clearance (l/h), and V (i) is the apparent volume of blood (l). This system of differential equations has an analytic solution for an arbitrary dosing regimen. 14 Some drugs violate the assumption of a first-order absorption process. One such example is Everolimus, which is used in the real clinical data example of Section 5. To avoid the use of more complex models, which lack an analytic solution, an alternative approach is to use a relative bioavailability. Given a dose D j , relative bioavailability is modeled as From modeling perspective, relative bioavailability scales the administered doses and non-zero f is used to add non-linearity to the absorption process.
The dynamic system in Equation (1) contains patient dependent clearance, apparent volume and absorption parameters. The clearance for an individual i is modeled as a nonlinear mixed-effect model log(Cl (i) ) = log ( Cl ref ) + log ( A Cl (weight (i) ) ) + log ( M Cl (age (i) ) ) where the reference clearance Cl ref is the population average value, the allometric scaling function A Cl (weight (i) ) depends on body weight, the maturation function M Cl (age (i) ) varies with age, and random effect Cl accounts for the heterogeneity between individuals not explained by other factors. The age used in this manuscript always refers to the post-menstrual age, which on average is 42 weeks at birth. The apparent volume of an individual is modeled with a similar function, although in this work we do not use the maturation function, since we are mainly interested in the clearance which is relevant for characterizing steady-state conditions. The absorption rate also varies between the individuals, but the between individual heterogeneity is not identifiable from a common pediatric study design. Because of this, this work models the absorption rate only using the reference value, or population average. The described 1-compartment oral PK model is illustrated in Figure 1.
In pediatric drug trials, the drugs are usually administrated at a fixed frequency and with a constant dose. This means that the doses are given at equidistant times, at interval , such that a steady-state is attained after a number of doses. The steady-state is characterized formally as a 2 (t + ) = a 2 (t) for any t after the steady-state has been achieved. The average concentration at steady-state is fully determined by the clearance (and relative bioavailability) of the drug. Because of this, we wish to learn the values of the clearance as accurately as possible. As the emphasis of this article is in modeling the maturation effect, it is important to understand how the clearance differs between the pediatric and the adult populations. The potential deviation observed in the clearance parameters will be considered as a maturation effect. Should a significant maturation effect be detected from the pediatric data, the pediatric dosing regimen may need to be adjusted by using the detected maturation effect as a scaling factor for the dose.
The parameters of a PK model are learned from data that are obtained from clinical trials. These data consist of details of patients, their dosing histories and drug concentration measurements. Since the measurements y (i) j are drug concentrations c (i) (t j ) rather than drug amounts, the solution to Equation (1) must be rewritten in terms of the drug concentration c (i) (t j ). This can be done by using apparent volume V (i) as a divider. The error of a concentrations measurement y (i) j is modeled on the log scale as, where log y (i) j is the noisy log concentration measurement of patient i at time-point t j , c (i) (t j ) is the corresponding concentration, 2 c is the constant part of the noise, and 2 p is the noise term proportional to the predicted concentration. In this article, we model the problem using the Bayesian framework. Weakly informative priors are applied for each parameter when fitting the adult dataset and the resulting (adult) posterior distributions are used as informative priors for the pediatric dataset. The details of these are presented with the respective examples below.

MODELING MATURATION WITH GAUSSIAN PROCESSES
In this section, we first present how a GP can be used to model a maturation function before showing how prior assumptions about the maturation function can be incorporated into a GP. Finally, we describe how the performances of different models will be compared.

F I G U R E 1
One-compartment model with oral administration and the first order process that describes the dynamics of the process

Model specification
GPs are a powerful non-parametric regression method dating back to the 1940s 15 and were first used in geostatistics, 16 where they are better known as Kriging. The modern Bayesian interpretation of the GP framework was proposed in the 1970s. 17,18 Mathematically, GPs are a class of stochastic processes which define prior distributions over unobservable latent functions. GPs provide a flexible framework for solving regression and classification problems and they have become a popular go-to-method in many disciplines. A comprehensive description of GPs is available from Rasmussen and Williams, 13 which provides a good introduction for the interested reader. Key advantages of Bayesian GPs relevant to the analysis at pediatric trials include: (a) they avoid bias due to their non-parametric formulation, (b) they allow for uncertainty in their learned functional shape, and most importantly, (c) they enable one to include domain knowledge about the target function. To our knowledge, GPs have not yet been used in the context of pediatric PK models.
We use GPs to model the maturation function and the inter-patient variation log ( M Cl (age) ) + Cl of Equation (3). To do so, let us assume N 1 estimates of the maturation function {age (i) , log ( M Cl (age (i) ) ) . In order to simplify the notation, let g (i) ∶= log ( M Cl (age (i) ) ) ) denote the logarithm of the maturation function (which cannot be observed directly) and letm (i) CL = g(age (i) ) + (i) Cl denote the corresponding estimates of these individual logarithmic maturation function values. We use a GP to combine our prior knowledge about g with estimatesm CL which are measured with error. A GP is used to define a probability distribution for the logarithmic maturation function values which depend on age. Mathematically, a function follows a GP distribution if the values of that function follow a multivariate Gaussian distribution across any finite set of points. The covariance matrix of this multivariate Gaussian distribution is chosen to represent what is known about the form of the function being approximated. In our case, a GP prior is specified for the logarithmic maturation function with prior assumptions encoded in the covariance function k(age (1) , age (2) ), which specifies the covariance of two logarithmic maturation function values g(age (1) ) and g(age (2) ). A zero mean Gaussian process prior is chosen, where K is the covariance matrix between N logarithmic maturation values g = [ g (1) , … , g (N) ] T at A = [ age (1) , … , age (N) ] T such that K ij = cov ( g (i) , g (j) ) = k(age (i) , age (j) ). To connect the estimatesm CL = We use the Bayes' theorem to derive the predictive distribution for N * logarithmic maturation function values ] T for previously unseen patients with ages A * = [ age (1) * , … , age (N * ) * ] T which under the assumptions described above, is given by where Here the covariance between the latent values at the training and test inputs is cov ( g, g * ) = K * and the covariance matrix of the latent values at the test inputs is cov ( g * , g * ) = K * * . As all distributions in Equation (6) are Gaussian, the predictive distribution has an analytic form and thus the predictive distribution of the logarithmic maturation function can be written explicitly for any age. Each element of the covariance matrices K * and K ** is computed with a positive definite covariance function, which is also known as kernel function. As the chosen kernel function encodes the assumed covariance between the latent maturation function values, it is important to choose this so that it is aligned with prior assumptions. Different kernels can be used to capture different prior assumptions about the function smoothness or shape. See Duvenaud 19 for the most commonly used kernels. The kernel we will use is the product of a linear and a squared exponential kernel: The squared exponential kernel determines the overall smoothness, that is, how much the logarithmic maturation function is allowed to change with the age. The linear kernel acts as a multiplier: it allows the maturation function to change more quickly with age for younger patients, which may be appropriate because we assume younger patients to mature at a faster rate than older patients. Within the linear kernel b determines the covariance of at input c, while v defines the overall variability of the linear part. For the squared exponential kernel, defines the average distance of the function from the mean and length-scale defines how smooth the function is. To make the product of these two kernels identifiable, either v ∕ b or has to be fixed.

Including domain knowledge in the Gaussian process model to handle sparsity
As pediatric trials generate typically very sparse data, relevant leveraging prior information can improve model fitting. Even though the exact parametric form of the logarithmic maturation function is unknown, prior physiological knowledge can be incorporated into the GP model. The logarithmic maturation effect, log , should be 0 for mature children. This is because once pediatric patients reach a certain age, they will have fully working organs just like adults. Moreover, it is reasonable to assume that the maturation function is smooth when transitioning to the adult model. This can be guaranteed by assuming that the derivative of the maturation function is 0 for mature children. The term "mature child" is with depend on the drug under investigation, since the maturation curves of different drugs can be very different. It is also reasonable to assume that the younger the pediatric patient is, the greater the maturation effect is such that the logarithmic maturation function will take smaller values. Consequently, the logarithmic maturation function can be assumed to increase monotonically. If a function is monotonically increasing within interval, its derivative values are greater than or equal to zero at all points within the interval.
These assumptions can be incorporated into the GP model by encoding them as "hypothetical data." This approach is also referred to as "virtual observations" in the GP literature. 20 For instance, to guarantee, guaranteeing continuity between an adult regime where no maturation occurs and a regime where maturation can occur, we can add an unobserved logarithmic maturation function value of 0 for an age that corresponds to a mature patient (eg, 18 years of age). Ensuring constraints on the functional shape of the logarithmic function, such as monotonicity, is also possible by adding unobserved derivative observations of the logarithmic maturation function to the GP. Since differentiation is a linear operator, the partial derivative of a (mean-square differentiable) GP remains a Gaussian process. 21 As covariance functions are bilinear mappings with respect to their derivative, cov ( g (i) age (i) , the covariance matrices in Equations (4) and (7) can be extended to include partial derivatives either as observations or as values to be predicted. In addition to the covariance function, to completely define the unobserved data we needs an observation model to connect the assumption to the latent maturation function. For unobserved data on the function itself, it is natural to stripulate a normal likelihood similar to the one assumed in Equation (5) for the actual observations. However, for the monotonicity information, the connection is not as straightforward. Following Riihimäki and Vehtari, 20 we denote bẏ g s ∈ {−1, 1} the partial derivative sign value of g at age "age." Then, the probability of observing a positive or a negative derivative value is assumed to follow a probit likelihood with a control parameter , In practice, the probit likelihood, Φ(⋅), is often used in classification tasks and results in high likelihoods for latent derivatives of same sign. The control parameter can be used to adjust the steepness of the likelihood function which helps to ensure robustness to erroneous monotonicity information. To encode monotonicity, virtual derivative observations have to be placed in a dense enough grid to force the function to be monotonic across some interval. 20 Just how many are needed will depend on the assumed length scale of the latent logarithmic maturation function.
The unobserved data representing prior information are added as additional terms to the GP model likelihood as follows. Let us assume that we have N log maturation observationsm Cl at A = [ age (1) , … , age (N) ] T , N v virtual log maturation observationsm v at A v , N d virtual log derivative observationsġ at A d and N s virtual log derivative sign observationsġ s at A s . Furthermore, assume normal likelihoods with variances 2 Cl , 2 v , 2 d for estimates, hypothetical estimates, and observations on derivative values, respectively, and a probit likelihood for the derivative sign observations. Then the posterior predictive distribution p(g * | A * , A + ,m Cl,+ ) of the latent values g * at ages A * is given by with normalizing constant Z, is similar to Equation (7), but the cross covariances between the latent function and the latent function derivatives (eg, cov can be computed using Equation (9) and the cross covariance between two latent function derivatives (eg, cov ) can be computed with Equation (10). The partial derivatives of the covariance function needed for this can be computed using the product rule. The needed partial derivatives for the squared exponential can be found from Riihimäki and Vehtari, 20 but the derivatives of the linear kernel are omitted for simplicity. Since the likelihood of the derivative sign observations (Equation (11)) is not Gaussian, the posterior (Equation (12)) is not analytically tractable, but can be approximated using random draws. As the posterior distribution of the PK model is intractable, the fact that the logarithmic maturation function is intractable does not increase the computational burden of fitting the model.
In addition to these virtual observations, we also know that younger patients mature faster than older patients. This makes it natural to use the logarithm of age as an input for GPs. As logarithmic function grows faster for smaller values, we can use the logarithm to stretch the relative differences between young patients. This way the GP model can have relatively larger variability for younger patients. In practice, we replace A in Equation (4) by This allows for larger changes in maturation for smaller kids without making the function too non-smooth for old adolescents. Since the logarithm of age is used, ages close to birth (or premature births) would cause numerical issues. Instead we use the logarithm of the postmenstrual age (pma), which is the time from the first day of the last menstrual period the mother had before giving birth. This can be approximated by adding 0.7 years to the patients age.

Model comparison
To quantitatively compare the proposed non-parametric model with other popular parametric alternatives, we score models based on their ability to predict data from previously unseen patients. We use the fully Bayesian measure of leave-one-patient-out expected log predictive density (loo-elpd). This measure captures the predictive power of the model as it reflects how well the model is able to predict the drug concentration of unseen pediatric patients. This choice deviates from the oftentimes used Bayesian information criteria (BIC) 22 for model selection when using frequentist inference.
The reasons is that BIC relies on point estimates and as such does not fully account for the uncertainty represented by the posterior distribution. Nonetheless, BIC targets the approximation of the Bayes factor which is also the case for loo-elpd. Hence both metrics target the same quantity of interest, but in a fully Bayesian framework the loo-elpd is expected to more accurately approximate the Bayes factor. For details on leave-one-patient-out cross validation, please refer to Vehtari et al. 23 The loo-elpd measure is defined as where p(Θ | y −i ) is the posterior distribution of the model parameters Θ given measurements of all other patients but i and p(y i | Θ) is the posterior predictive distribution of an unseen patient i. Another way to compare models is to see how strongly individual clearance estimates (i) Cl are shrunk toward the population mean. This can be done by computing a pooling factor 24 which is the Bayesian equivalent of the shrinkage estimate for Frequentist models. 25,26 The key difference between a pooling factor and the shrinkage estimate is that the former is based on the entire posterior distribution while the latter uses empirical point estimates of the random effects. The pooling factor varies between 0 and 1 and is defined as Values close to 1 indicate that the individual estimates are shrunk toward the population mean, or that the population mean explains a lot of the variability of the individual values. In contrast, values close to 0 indicate that the individual estimates are further away from the population mean. In the context of maturation models, if the maturation model is wrong and the pooling factor is high, the estimated individual estimates will be biased. On the other hand, if the pooling factor is high, this can also be an indication of overlearning for the non-parametric models like GPs.

SIMULATION STUDIES
In this section, we present the results of four simulation studies conducted to compare the performance of the proposed approach with and without additional prior information about monotonicity with that of the most commonly used parametric modeling approaches. First we give implementation details, before going through three different simulated data scenarios which were each represented by 50 random dataset. Finally, we show how the GP model behaves under different levels of data sparsity.

Implementation details
The purpose of the simulation studies is to mimic a realistic pediatric trial design when the adult PK model is accurately known. The scenario is that the pediatric trial is planned under the assumption that the adult PK model is correct also for the pediatric patients. This means that maturation effects are assumed to be absent in the pediatric population. Hence, the dosing regimen of the pediatric trial is planned with dosing adjusted only for the size of the patients (with allometric scaling), but without taking the age of the patients into account (with maturation effects). In contrast, the pediatric data are simulated from three different models which deviate from this assumption. By analyzing the simulated pediatric data, we aim to understand how well the proposed method works in realistic scenarios where the ground truth, the model used to generate the pediatric data, is known. In all three cases, the data are generated using a one-compartment model with oral administration. Allometric scaling which is a popular choice for the maturation model (henceforth referred to as the Hill model). The second model is a GP, which is implemented as described in Section 3 without including any virtual derivative sign observations to ensure monotonicity ( subsequently referred to as the naive GP model). The third model is a GP with 8 virtual derivative sign observations placed log linearly between 0 and 3 years, and one virtual observation for maturation and its derivative for a 5-year-old child (labeled as the GP model from this on). The parameter values of all three models (Hill model, naive GP model and GP model) are described in Table 2. Priors for the rest of the model parameters can be found from on Table 1. In all simulation studies, the data are generated with 20 patients, with linearly spaced ages between 0 and 5 years, and weights taken from a population curve without additional noise. This choice for ages mimics an idealized situation, where the pediatric data are evenly spread across the pediatric age range of the drug of interest. Each patient is treated with a dose regimen tailored to their body weight. The dosing regimen for a 75 kg reference patient includes first a loading dose of 1500 mg followed by 10 more doses administered daily every 24 hours after the previous dose. Measurements are always taken before a new dose is given, excluding the first dose. Each patients dose is determined using allometric scaling based on body weight with the same scaling factor as is used for the clearance parameter (dose i = dose ref 3 4 ). The simulation models and data scenario are chosen so as to be comparable to real clinical data situations. As the simulation model includes random parameters, and as the drug concentration measurements are assumed to be stochastic, we simulate 50 trials for each scenario, to better take into account the aleatoric uncertainty. Simulations are performed using Stan 2.18.2 28 and ggplot2 29 is used to visualize the results. The source code is build from native R and Stan and is not dependent on any other population PK software. The source code is openly available at https://github.com/esiivola/admegp.

Hill maturation
In the first example, data are generated from a Hill type maturation model (Equation (16)). Parameters are equal to the values reported for immunosuppressant Sirolimus (h = 2.94, MA50 = 53 52 ), 27 a drug used after transplantation in order to avoid organ rejection. In this scenario, using the Hill model for the maturation function means there is no model misspecification. Figure 2 illustrates the exponent of the maturation function (M Cl (⋅)) for clearance as estimated by the Hill and GP models. Note that unlike in Equation (3), results are not plotted on the logarithmic scale so that the figure shows the factor (between 0 and 1) by which the adult clearance (and doses) must be multiplied in order to obtain the pediatric clearance (and doses). The figure shows that all models manage to capture the true maturation function well. The Hill model results in a narrower uncertainty estimate for the function than either of the GP models and manages to model the individual maturation parameters without large biases. The widths of the posterior predictive intervals for the GP models better reflect the sparsity of the data and also increase for younger patients. The tendency of the GP model estimates to bend toward the model prior (exp(0)) can be seen for very young patients where the data are also sparse. This trend is more obvious for the naive GP model. Table 3 summarizes the means and standard deviations of the loo-elpd differences between the three different models and the baseline (adult model) computed from the results of 50 different dataset initializations. Since the data are sparse, the random initialization affects the results substantially and the standard deviation across all realizations is large. To be able to compare different models, the table also summarizes their ranks for the same data realization. The table shows that the Hill model (labeled as parametric model) outperforms all other models when the data are simulated according to the Hill model. Table 4 summarizes the pooling factors estimated by the different models for the Hill data averaged over 50 random dataset realizations. The results for the Hill model and both the GP models are similar in this simulation study.

No maturation
In the second example, the data are generated without a maturation effect, that is, log(M(age)) = 0. In other words, the data are simulated from the adult model. This maturation function means the Hill model is misspecified. Figure 3 shows the maturation function detected by the Hill and GP models. All models manage to model the maturation function reasonably well. The Hill model results in small values for MA50 and large values for h so that the nonzero maturation can not only be detected for very young patients. The GP model learns very small values for the lengthscale so that the derivative is slightly negative where the virtual derivative observations are, but the detected maturation still stays reasonably close to zero. The naive GP model results in a smoother approximation to the maturation function due to larger lengthscale values, but it still overlearns. Table 3 summarizes the means and standard deviations of the loo-elpd differences between different models and the baseline (adult model) computed from the results of 50 different dataset realizations. The table shows that the model without maturation (labeled as the adult model) outperforms the other models, but both of the GP models perform slightly better than the Hill model. This result is to be expected, as the adult model is the model used to generate the TA B L E 3 Model comparison for three simulated datasets (with known maturation function) and for Everolimus data (with unknown maturation function) For the simulated datasets, the table also shows the mean ranks (and standard deviations) of the loo-elpd values (second row). Ranks are computed so that the highest loo-elpd gets rank 1 and lowest gets rank 4. In addition to this, the probability of each model being the best is reported on the third row by computing the proportion of times the model has the highest loo-elpd value for each dataset. The best model for each experiment is on bold. data. Table 4 summarizes the pooling factors of the different models for 50 different datasets. Again, the results for the Hill model and GP models are similar.

Double Hill maturation
In the third example, the data are generated from a double Hill type maturation model: with p 1 = 0.5, h 1 = 0.7, MA50 1 = 6.0, h 2 = 3.5, and MA50 2 = 7.0. Figure 4 shows the maturation function detected by the Hill and GP models. In this case, the Hill model is unable to capture the true function and the resulting estimates are overconfident, especially for kids of ages 1 to 3 years. Both GP models are able to capture the true maturation function more accurately and unlike for the Hill model, the widths of the posterior credible intervals are not unduly narrow for ages 1 to 2.5 years. Table 3 summarizes the means and standard deviations of the loo-elpd differences between different models and the baseline (adult model) computed from the results of 50 different dataset realizations. The table shows that the GP model outperforms naive GP and Hill models as it is more flexible than the Hill model but more constrained (monotonic) than the naive GP. Table 4 summarizes the means and standard deviations of the pooling factors estimated by the different models for the double Hill maturation model averaged over 50 random dataset realizations. The pooling factors for the GP models are higher than for the Hill model, which indicates that the Hill model is not flexible enough to fit the data. Figure 4 shows that the Hill model does not fit the data well, which can also been seen in the small value of the pooling factor.

Effects of sparsity on the model performance
On average, the more patients and measurements there are, the better the estimates for the maturation are. But how does data sparsity affect the results? We repeat the simulations described above, still using the Hill type maturation model (see Section 4.2) but this time varying the number of patients and the number of observations per patient. We then compute the average estimation error as whereM(age) is median of the estimated maturation and {age (i) * } N i=1 are 100 ages selected linearly between 0 and 5 years. Figure 5 visualizes the average error for 0-to-5-year-old patients for the Hill and the GP models as a function of the number of patients and concentration measurements per patient. The results are averaged over 50 dataset realizations for each plotted data point. Other parameters and prior distributions of the model stay unchanged. The Hill model has smaller average errors than the GP models, as this is the true data generating model.
The figure shows that the average error drops as the number of patients increases. However, there is no clear trend when the number of measurements per patient increases. The figure also shows that in the case of a fixed budget of concentration measurements, within the scope of the grid, it is better to select more patients with fewer measurements per patient rather than fewer patients with more measurements. Certainly these findings are specific to the chosen scenario for which the within-patient error is relatively small as compared to the between patient heterogeneity. While this is a common real data situation, the results may differ in other circumstances. A consequence of this is that only few measurements per patient are needed to accurately estimate the individual maturation function values.

ANALYSIS OF PEDIATRIC CLINICAL DATA OF EVEROLIMUS
Everolimus is an immunosuppressant that is used to prevent the rejection of organ transplants. As an example to demonstrate our approach, we re-analyze a pediatric trial investigating Everolimus used in liver transplantation. The pediatric trial and the corresponding adult study have been modeled with a one-compartment model with oral administration and dose-dependent bioavailability (see Section 4.1). Kovarik et al 30 presented the PK model for adults and Dumortier and Looby 31 presented the study design for the pediatric trial. No allometric scaling has been used in the adult PK model, but for the pediatric trial data, the clearance has been modeled using the weight dependent function This modeling approach for the pediatric trial data empirically accounts for size differences that were not accounted for in the adult model. Therefore, we consider this term as a deviation from the adult model and treat it as a maturation function. The difference of this maturation function to the ones considered earlier is that it is a function of body weight instead of age. The pediatric dataset consists of 21 patients with 6.2 concentration observations on average per patient. The model parameters and their informative prior distributions learned from the adult data can be found in Table 5. Prior distributions of the model parameters are listed in Table 6.
A standard approach to assess if the pediatric data are consistent with the adult model is to perform a visual predictive check (VPC). This entails using the adult model to predict conditional on the pediatric trial data (baseline covariates and dosing history) the data as collected in the pediatric trial.
If the pediatric data are aligned with the predictions from the adult model, then this is interpreted as confirmation that the adult model is applicable and correctly specified for the pediatric population as well. Figure 6 illustrates the predictions generated with the adult PK model for the patients of the real pediatric data and the actual measurements of those patients. This visual predictive check (VPC) compares at the the predicted lower 10%, median, and 90% quantile with the quantiles of the actual data. The figure shows that the median of the data appears to be adequately predicted by the model, while the outer quantiles are difficult to conclude on. The figure also illustrates the limitations of the VPC in the context of model checking for pediatric data. First, in realistic data, some patients have a longer record of measurements and the data become more sparse the more time goes on. This can be seen in the figure as the visualized bins become wider on the right even though they have an equal amount of data. This also distracts the reader from the illustration as points on the right get more visual weight. Second, realistic pediatric data are often very sparse in the number of patients and number of measurements per patient. Sparse data limit the number of bins as a small number of data points in each bin makes estimating the percentiles rather inaccurate, especially if they are far from the median (eg, see the 10% and 90% percentiles in the figure). Furthermore, different age ranges can differ vastly in their baseline covariates, such as  body size, leading to generally wider prediction bands (colored bands in the figure) when comparing to VPC plots that are generated for adult data. Splitting the pediatric dataset into smaller age ranges amplifies the sparsity problem even more. These factors make reading the VPC plot harder and concluding whether or not the model is correct becomes more prone to subjectivity. A VPC is still a valuable tool since it allows to evaluate the model on its original scale, which is often helpful. However, very sparse data can limit its usefulness. Hence it should only be used together with other analysis methods to conclude if the model is correct. Figure 7 illustrates the maturation function for clearance detected by the GP models and the parametric maturation model (Equation (18)). To make the parametric and GP methods comparable, the weights in the parametric model are transferred to ages with a standard population curve. The figure shows that the GP model is very close to value 1 for the whole age range. This results suggests only very moderate or no maturation effect. Table 3 summarizes the loo-elpd differences between the different models and the baseline (adult model). The table shows that there is not much proof in the data that any maturation function would be needed as the loo-elpd values are very close to each other. The adult model slightly outperforms other models, but the differences are small. Table 4 summarizes the pooling factors of the individual clearance estimates for the different models. The table shows little difference between the models, but the pooling factor for the naive GP model is higher than for the rest, which may indicate overfitting of the naive GP to the sparse data. This suggests that the naive GP model also learns variation between individuals and not only the maturation function.
To summarize, the proposed GP approach does not detect any noticeable deviation from the adult model. No maturation function or any other allometric effect is apparent from the Everolimus pediatric trial data. The clinical data also show a clear shortcoming of the frequently used parametric approach chosen here, which is the implied discontinuity.
Recall that the adult model holds for subjects with age beyond 18 years. At the boundary age of 18 years, the two different models would imply a discontinuous change in the apparent clearance. This discontinuity is avoided with the GP modeling approach which is anchored at the age of 18 years to have no deviation from the adult model.

SUMMARY AND DISCUSSIONS
The choice of the dosing regimen for pediatric trials is in most cases based on extrapolation of an adult PK model in the absence of actual pediatric trial data. Once the pediatric trial data becomes available it is essential to confirm the appropriateness of the dosing regimen used in the trial. In this article, we proposed the use of a constrained non-parametric GP to detect deviations from the original extrapolation model. The GP model approach is a data-driven approach that is used to estimate possible model deviations. We evaluated our approach in four simulation studies using realistically sparse datasets. The evaluation showed that our approach is capable of correctly (not) detecting an unexpected maturation function if (not) present. The results are summarized in Table 3. In addition to this, we were able to reproduce similar results and performance on real pediatric clinical data when comparing to the results of the parametric model published earlier.
Extreme data sparsity is a problem to non-parametric approaches like GPs. By using virtual data to add prior assumptions like monotonicity to the model, sparsity becomes less of a problem. GPs were shown to behave unsteadily if the data and prior assumptions disagree. In this case, the virtual observations introduce extra bias to the model, but not as much as a non-realistic parametric model. If the prior assumptions agree with the data, virtual observations were shown to solve the problems caused by the sparsity.
A key advantage of a non-parametric GP approach is the lower risk of model misspecification when comparing to the parametric approaches. The GP learns the functional relationship of maturation and age only locally in ranges supported by the data. In contrast, a parametric approach will be informed from data in a local age range, but result in a comparatively confident global relationship across large age ranges. Thus, the use of a GP protects against over-confident conclusions that are far outside the domain of the data. For regions close to the domain of collected data or sparsely sampled data regions, the GP will infer a smooth functional relationship and preserve the continuous differentiability of the maturation function. Thus, discontinuous changes in the maturation function are avoided.
In summary, we have shown that a non-parametric GP approach can reliably detect whether dosing regimen adjustments are needed or if the chosen dosing regimen is adequate. In comparison to model checking approaches, which rely on possibly subjective evaluation of model diagnostics, the proposed GP approach is data-driven and thus less subjective. In addition to this, GPs avoid bias and overconfident conclusions when compared to parametric approaches. Overall, the presented GP approach can lead to better decisions in pediatric drug development.