Truncated two‐parameter Poisson–Dirichlet approximation for Pitman–Yor process hierarchical models

In this paper, we construct an approximation to the Pitman–Yor process by truncating its two‐parameter Poisson–Dirichlet representation. The truncation is based on a decreasing sequence of random weights, thus having a lower approximation error compared to the popular truncated stick‐breaking process. We develop an exact simulation algorithm to sample from the approximation process and provide an alternative MCMC algorithm for the parameter regime where the exact simulation algorithm becomes slow. The effectiveness of the simulation algorithms is demonstrated by the estimation of the functionals of a Pitman–Yor process. Then we adapt the approximation process into a Pitman–Yor process mixture model and devise a blocked Gibbs sampler for posterior inference.


INTRODUCTION
The Pitman-Yor process is a rich and flexible class of random probability measures that has been widely used in Bayesian nonparametric statistics.Research on the Pitman-Yor process was initiated by Pitman and Yor (1992), where it was used to study the ranked lengths of excursions of a Markov process.Later on, Perman et al. (1992) demonstrated the connection between the random weights of a Pitman-Yor process and the normalized jumps of a stable process.Pitman (1995) introduced the partially exchangeable probability function and derived a sampling formula for the Pitman-Yor process.Pitman (2006) designed a generative approach for sampling from the Pitman-Yor process, which is known as the two-parameter Chinese restaurant process.The application of the Pitman-Yor process has been found in many areas.For example, Favaro et al. (2009) derived a species sampling formula from the Pitman-Yor process and constructed a Bayesian nonparametric methodology to deal with the prediction issue within the species sampling problems; Jara et al. (2010) introduced a linear-dependent model based on the Pitman-Yor process and used it to analyze a dataset generated by a dental longitudinal study; Carmona et al. (2019) developed a Bayesian nonparametric mixture model based on the Pitman-Yor process prior and studied its posterior inference method.Other applications can be found in bioinformatics and computational biology (Lijoi et al., 2007(Lijoi et al., , 2008)), image segmentation (Sudderth & Jordan, 2008), curve estimation (Jara et al., 2010), gene networks inference (Ni et al., 2018), and econometrics (Bassetti et al., 2014).The Pitman-Yor process is a random probability measure of the form where  K i (.) is a point mass at K i , {K i } i≥1 are independent and identically distributed random variables with the distribution H on a Polish space , and {p i } i≥1 are random weights such that pi ≥ 0 and ∑ i≥1 pi = 1, independent of {K i } i≥1 .For simplicity, we will omit the hyperparameters  and  where possible.The random weights admit the following stick-breaking representation, with the discount parameter 0 ≤  < 1 and the concentration parameter  > −.By setting  = 0, we revert to the stick-breaking representation of a Dirichlet process.Throughout this paper, we use the tilde notation to emphasize that the sequence {p i } i≥1 is presented in its original order.In other words, it follows Equation (2) without any reordering.Based on the stick-breaking representation, several approximation methods have been proposed for the Pitman-Yor process.Ishwaran and James (2001) introduced the truncated stick-breaking process with a fixed truncation level N, where p1 ,…, pN are defined as (2), ẽN ∶= 1 − p1 − • • • − pN such that N is a valid random probability measure, and K 0 has the distribution H independently.As N → ∞, N converges to ∞ in total variation distance almost surely (see, e,g" Sec.4.3.3 of Ghosal & van der Vaart, 2017), hence a finite approximation for the distribution of the Pitman-Yor process is obtained.From the definition, it is clear that the approximation error of this method is characterized by the tail probability ẽN .A study about the expectation of the tail probability can be found in Ishwaran and James (2001).On the other hand, Arbel et al. (2019) introduced a random stopping rule which can achieve an almost sure control on the approximation error in total variation distance.Specifically, they used the random truncation level N() ∶= min{n ≥ 1|ẽ n < }.
The -Pitman-Yor process generalizes the -Dirichlet process proposed by Muliere and Tardella (1998).It follows from the definition that the random stopping rule controls the approximation error according to the total variation bound almost surely (see Prop. 4.20 of Ghosal & van der Vaart, 2017).The construction also guarantees the almost sure convergence of the measurable functionals of  to that of ∞ as  → 0. Numerical implementation of the functional estimation can be found in Arbel et al. (2019).Finally, Al Labadi and Zarepour (2014) proposed a different approximation method for the Pitman-Yor process based on the decreasing sequences of random weights of a gamma process and a stable process.They also provided a simulation scheme for the approximation.
In this paper, we propose a new random probability measure called the truncated two-parameter Poisson-Dirichlet process and use it to approximate the distribution of the Pitman-Yor process.Our construction is based on the ranked random weights of the Pitman-Yor process.Since {K i } i≥1 are i.i.d., it is clear that the Pitman-Yor process is invariant in distribution to a reordering of {p i } i≥1 .In particular, let p 1 > p 2 > … be the ranked values of {p i } i≥1 , then the random probability measure is identical in distribution to the Pitman-Yor process (1).We call (4) a two-parameter Poisson-Dirichlet process as the ranked random weights {p i } i≥1 follow the two-parameter Poisson-Dirichlet distribution (see Pitman & Yor, 1997).In fact, the original sequence {p i } i≥1 is a size-biased permutation of {p i } i≥1 , see McCloskey (1965) for more details.To derive a finite approximation to the Pitman-Yor process, we truncate the two-parameter Poisson-Dirichlet process at the fixed truncation level N. Then we get the truncated two-parameter Poisson-Dirichlet process where e N ∶= ∑ ∞ i=N+1 p i .It is easy to see that  N has a lower approximation error compared to N because the tail probability e N is almost surely smaller than ẽN for the same truncation level N. It follows that sup almost surely.Thus, we expect the truncated two-parameter Poisson-Dirichlet process to provide a better approximation to the Pitman-Yor process and its functionals.This will be examined by a comparison study based on the truncation error and estimation accuracy.
Our approximation method is similar to the Ferguson-Klass representation of a random probability measure.Recall that a completely random measure (CRM) induced by the rate measure v(dw) can be written as , and E j ∼ Exp(1) are i.i.d.exponential random variables.We can then derive a random probability measure from the CRM by normalization, that is, by considering Θ∕Θ().Examples of random probability measures derived from the normalized CRM include the Dirichlet process, the normalized inverse-Gaussian process and the generalized gamma process.Since the sequence {J i } i≥1 is decreasing, the Ferguson-Klass representation is based on a decreasing sequence of random weights.This means the truncated random probability measure Θ N ∕Θ(), where has the lowest truncation error compared to other approximations.See Campbell et al. (2019) for further discussion.Although the Pitman-Yor process is not a normalized CRM, its connection to the stable process CRM has been revealed by a change of measure in Pitman and Yor (1997).Thus, we will construct the truncated Ferguson-Klass representation of a stable process and derive the truncated two-parameter Poisson-Dirichlet process from it via the change of measure.
A typical application of the finite approximation of the Pitman-Yor process is the posterior inference of the Pitman-Yor process mixture models (PYMM).Consider the mixture model where X 1 ,…, X n represent the observations, they are assumed to be independent conditional on the latent variables Y i , (X i |Y i ) denotes the conditional distribution of X i given Y i , the latent variables Y i are i.i.d. with the distribution P, and P is a sample of the Pitman-Yor process prior.
The joint distribution of Y = (Y 1 ,…, Y n ) is characterized by the predictive distribution of the Pitman-Yor process where m i is the number of distinct values K * j observed in the first i draws, and n j is the number of latent variables taking the value of K * j such that ∑ m i j=1 n j = i.The predictive distribution implies a Pólya urn scheme for sampling from the posterior of the Pitman-Yor process, which leads to the marginalization method for the posterior inference of PYMM.See Ishwaran and James (2001) for the details.The marginalization method is easy to use, but it also suffers from the side effects of slow mixing of the Markov chain and allowing the inference to be based only on the latent variables.To avoid these limitations, Ishwaran and James (2001) replaced the Pitman-Yor process prior ∞ in model ( 5) by a truncated stick-breaking process N and developed a blocked Gibbs sampler for posterior inference.They showed that the truncation method provides a good approximation to the PYMM.In this paper, we will provide a new approximation to the PYMM using the truncated two-parameter Poisson-Dirichlet process.Based on a lower-error approximation, our approach is expected to have better inferential quality.We will verify this by a comparison study.
The rest of the paper is organized as follows.Section 2 constructs the truncated two-parameter Poisson-Dirichlet process and studies its truncation error.Section 3 designs the simulation algorithms for the truncated two-parameter Poisson-Dirichlet process and uses the algorithms to estimate the functionals of the Pitman-Yor process.Section 4 adapts the truncated two-parameter Poisson-Dirichlet process into the Pitman-Yor process mixture model and develops a posterior inference scheme.Section 5 presents some numerical results.Section 6 summarizes the paper with a discussion of open questions.The supplementary materials contain the derivations of the main results.

CONSTRUCTION AND DISTRIBUTIONAL PROPERTIES
In this section, we construct the truncated two-parameter Poisson-Dirichlet process and study its distributional properties.For a positive integer N, a truncated two-parameter Poisson-Dirichlet process is a random probability measure defined as where 0 <  < 1,  > −, {p i } i≥1 denotes the components of the two-parameter Poisson-Dirichlet distribution PD(, ), e N = ∑ ∞ i=N+1 p i ,  K i (.) is a point mass at K i , and {K i } i≥0 are i.i.d.random variables with the distribution H. From the definition, it is clear that the construction of the truncated two-parameter Poisson-Dirichlet process hinges on the components of the two-parameter Poisson-Dirichlet distribution.To obtain these components, recall that Pitman and Yor (1997) gave a connection between the PD(, 0) distribution and the stable process.Consider a stable process  s with the Lévy measure v(dw) = w −−1 1 {0<w<∞} dw.The Lévy-Khintchine representation of  s is ) .
Let {J i } i≥1 be the ranked jumps of  s on the time interval [0, t], such that Thus, we get a special case of the two-parameter Poisson-Dirichlet process in terms of ), and a truncation can be made based on this representation.To generalize this method to any  > −, recall that Pitman and Yor (1997) provided a change of measure between the components of the PD(, 0) and PD(, ) distributions.Next, we use this method to construct the truncated two-parameter Poisson-Dirichlet process.
To facilitate the construction, we first prepare some preliminary results about the jumps of a stable process.Note that to construct a random probability measure, we do not need the whole time history of the stable process, so any compact interval will suffice, and we pick [0, 1].Let {J i } i≥1 be the ranked jumps of a stable process on the time interval [0, 1].Denote by (N)  1 the N-trimmed subordinator (Ipsen & Maller, 2017) derived from  1 , that is, the process obtained by removing the N largest jumps from  1 , such that The following lemma derives the conditional density of (N)  1 .
Lemma 1 (Conditional density of N-trimmed stable process).Conditioning on the N-th largest jump J N of the stable process  1 , the density of the N-trimmed stable process (N)  1 is given by where g Z (z; , t) is defined as and To illustrate the result of Lemma 1, we plot the density of (N)  1 with different parameters in Figure 1.We also apply the concentrated matrix-exponential functions method (CME, see Horváth et al., 2020) to invert the Laplace transform of (N)  1 numerically at some fixed points and plot the results as stars.From the figures we can see that both methods produce similar results.Next, we use this lemma to derive the joint density of the N largest jumps and the sum of the smaller jumps of a stable process.
Lemma 2. Let J 1 > J 2 > …be the ranked jumps of a stable process on the time interval [0, 1].Denote by R k ∶= J k+1 ∕J k the ratio between the (k + 1)th and kth largest jumps, F I G U R E 1 Density plot for the N-trimmed stable process (N)  1 using Lemma 1 (in solid line) and the numerical inverse Laplace transform results using the CME method (in stars).The parameter values are  ∈ {0.2, 0.4, 0.6, 0.8} and then the joint density of Using Lemma 2, we can construct the truncated Ferguson-Klass representation of the CRM induced by the stable process, and hence the truncated two-parameter Poisson-Dirichlet process with  = 0.The following theorem generalizes this method to an arbitrary  > −.It also provides a representation in law for the random weights of the truncated two-parameter Poisson-Dirichlet process, which can be used in the numerical implementation.

Theorem 1. The N largest components of a two-parameter Poisson-Dirichlet distribution have the representation
for i = 1,…, N, and the sum of the smaller components can be represented by The random variables Next, we investigate the truncation error e N of the truncated two-parameter Poisson-Dirichlet process.Since {p i } i≥1 are the ranked values of the stick-breaking random weights we have e N ≤ ẽN a.s.Thus, the truncated two-parameter Poisson-Dirichlet process has a lower truncation error compared to the truncated stick-breaking process almost surely.To further investigate the truncation error, we use the representation (9) to express e N in terms of where y ∈ (0, 1).Using the joint density of J 1 , R 1 ,…, R N−1 , it is possible to integrate out the condition to derive the explicit distribution of e N .However, the integral is too complicated to calculate explicitly.Handa (2009) made a significant contribution to this problem by deriving the joint density of the random weights (p 1 ,…, p N ), and the probability P(e N > y) could be obtained by integrating the joint density over while the explicit expression of the probability remains an open question.For this reason, we focus on the expectation of the truncation error.From Prop. 17 of Pitman and Yor (1997), we know where Then the expectation of the tail probability can be expressed as On the other hand, since the expectation of the tail probability of the stick-breaking process is given by We plot the difference between the expectations of the tail probabilities, that is, E(ẽ N ) − E(e N ), in Figure 2. The figures illustrate the improvement of the prior approximation quality by using the ranked random weights.We can see that the improvement is more significant for small N and large .On the other hand, we plot the mean truncation error with different parameters in Figure 3.The figures show that the truncated stick-breaking process requires more terms to achieve the same prior approximation accuracy as the truncated two-parameter Poisson-Dirichlet process.

PRIOR SIMULATION ALGORITHMS
In this section, we design two simulation algorithms for the truncated two-parameter Poisson-Dirichlet process and use them to estimate the functionals of the Pitman-Yor process numerically.It follows from Theorem 1 that to sample from the random weights of the truncated two-parameter Poisson-Dirichlet process, we only need to simulate the random vector . Next, we develop an algorithm for this purpose.The algorithm involves the simulation of the N-trimmed stable process (N)  1 , which can be done via Alg.4.3 of Dassios et al. (2020).We denote this algorithm by AlgorithmTS(., .) and attach the full steps in the Appendix S1.

Sample from a N-trimmed stable process
Algorithm 1 uses the acceptance-rejection method, so we need to investigate its acceptance rate.From the derivation of Algorithm 1 (see Appendix S1) we know the expected number of iterations for an acceptance is M = M(, ) ∶= Γ( + 1)Γ(1 − ) ∕ .It follows that the efficiency of the algorithm suffers less from the truncation level N, although a large N requires more beta random numbers, but more from  and .Since M(, ) is increasing in both variables, Algorithm 1 works better with small  and .When these parameters are large, the algorithm becomes computationally expensive.To avoid this pitfall, we provide an alternative algorithm using the MCMC method.This algorithm is simply described as running Algorithm 4, which will be explained later, iteratively with an empty set of observations (n = 0).The idea is to draw from the joint density of (J 1 , R 1 ,…, R N−1 , (N)  1 ) using the MCMC method.This approach suffers less from the values of the parameters.Next, we use Algorithm 1 and Algorithm 4 to sample from the five largest components of the two-parameter Poisson-Dirichlet distribution.To achieve a fair comparison, we run both algorithms for a fixed time duration of 30 s.When  = 10, the exact simulation algorithm is too slow, and we only use the MCMC algorithm.The sample averages are recorded in Table 1.The table shows that both algorithms can achieve a reasonable level of Monte Carlo error.We remark that the MCMC algorithm completes the study in Dassios and Zhang (2021), where the large values of  were considered only when ∕ is an integer.
The truncated two-parameter Poisson-Dirichlet process and its simulation algorithms can be used to estimate the functionals of the Pitman-Yor process.We demonstrate this method with the cumulative distribution function  process and the truncated two-parameter Poisson-Dirichlet process.On the other hand, the exact distribution of F has been derived in James et al. (2010).In particular, if  = 0.5 and H is a uniform distribution on [0, 1], then F(1∕2) ∼ Beta( + 1∕2,  + 1∕2), and the density of F(1∕3) is given by In Figure 4, we use Algorithm 1 and Algorithm 4 to draw from F N (1∕2) so to illustrate that the estimation becomes more accurate as the truncation level grows higher.As for F(1∕3), we record the Kolmogorov distance between the exact and estimated distributions, together with their quantiles, in Table 2.The results suggest a lower Kolmogorov distance arising from the truncated two-parameter Poisson-Dirichlet process compared to the truncated stick-breaking process.
TA B L E 2 Summary statistics for F(1∕3) using Algorithm 1 (PD,  = 1, 2), Algorithm 4 (PD,  = 10, 20) and the truncated stick-breaking process (SB, using Equation 3) to approximate the Pitman-Yor process.Recall that the same experiments were carried out by Arbel et al. (2019) for the -Pitman-Yor process, and we can compare our numerical results to Fig. 2 and Table 3 of Arbel et al. (2019).For example, when  = 0.5,  = 1 and N = 50, the Kolmogorov distance between the exact distribution of F(1∕3) and the estimation F N (1∕3) is 0.00512, while the distance between the exact distribution and the estimated curve derived from the -Pitman-Yor process with  = 0.01 is 0.00570.Thus, the truncation level of 50 and running time of 30 s give us an approximation for F(1∕3) whose accuracy is slightly better than the -Pitman-Yor process with  = 0.01.

POSTERIOR INFERENCE SCHEME
In this section, we develop a posterior inference scheme for the truncated two-parameter Poisson-Dirichlet process and illustrate its usage in the Pitman-Yor process mixture models.Call Y = {Y 1 ,…, Y n } a sample of the Pitman-Yor process if then the joint distribution of Y 1 ,…, Y n can be characterized by the Pólya urn scheme (6).To approximate the posterior distribution of model ( 13), we replace the Pitman-Yor process prior with the truncated two-parameter Poisson-Dirichlet process.Then we consider a sequence of observations Y = {Y 1 ,…, Y n } from P N , that is, Denote by n j the number of observations taking the value of K j , such that ∑ N j=0 n j = n, then the observations can be written as Y = (n 1 ,…, n N , n 0 ).To make inference about the random weights P ∶= (p 1 ,…, p N , e N ), we need to sample from the posterior P(P|Y).But to determine P, it is sufficient to find out the values of (J 1 , R 1 ,…, R N−1 , (N)  1 ) in the representations ( 8) and (9).To facilitate the posterior inference scheme, we re-express the representations as where On the other hand, the likelihood of the truncated two-parameter Poisson-Dirichlet process is given by P(Y|P) ∼ Multi(n 1 ,…, n N , n 0 ; p 1 ,…, p N , e N ).Thus, we can express the posterior of the truncated two-parameter Poisson-Dirichlet process as To sample from ( 16), we develop a blocked Gibbs sampler to draw iteratively from P(r 1 ,…, r N−1 |z, y, Y) and P(z, y|r 1 ,…, r N−1 , Y).We use the Hamiltonian Monte Carlo method (HMC; see Neal, 2011, see also Sect.7.2 of Caron & Fox, 2017 for a similar application in posterior sampling) to sample from The HMC method is based on the computation of the gradient of the log-posterior, which, after the change of variables  i ∶= tan((r i − 0.5)), is denoted by The derivation of the gradient is straightforward, we attach the details in the Appendix S1.
Algorithm 2 (Hamiltonian Monte Carlo).Let L ≥ 1 be the number of leapfrog steps and  > 0 be the step size.The HMC algorithm for P(r 1 ,…, r N−1 |z, y, Y) is given as follows.
Next, we use the Metropolis-Hastings algorithm to sample from n+) .

Sample from a truncated stable process
accept the candidates and output (z, ỹ).Otherwise, output (z, y).
We can now formulate the blocked Gibbs sampler as follows.
Algorithm 4 (Blocked Gibbs sampler).Posterior inference for the truncated two-parameter Poisson-Dirichlet process.We obtain the posterior values of (p 1 ,…, p N , e N ) by running Algorithm 4 iteratively.If we input an empty set of observations, that is, n = 0, the posterior (16) reverts to the prior distribution, and the block Gibbs sampler draws directly from the prior.This gives us the MCMC algorithm for the truncated two-parameter Poisson-Dirichlet process prior.Its performance has been demonstrated in Table 1.As explained in the previous section, this algorithm is particularly useful when  is large.
Note that we have used fixed  and  in the blocked Gibbs sampler.It is also possible to put priors on these parameters and estimate them in the posterior inference scheme.The existing literature has considered various priors.For example, Lijoi et al. (2008) suggested the uniform priors with finite and fixed support; Jara et al. (2010) used a pair of beta and truncated normal priors; Carmona et al. (2019) applied a pair of beta and truncated gamma priors.Other methods include the usage of fixed values (see Ishwaran & James, 2001), empirical Bayes rule specification (see Lijoi et al., 2007), comparison between different combinations of fixed values (see Navarrete et al., 2008) and learning these parameters (see Cereda et al., 2023;Favaro et al., 2009).
Next, we adapt the truncated two-parameter Poisson-Dirichlet process into a Pitman-Yor process mixture model.We replace the Pitman-Yor process prior ∞ in model ( 5) by a truncated two-parameter Poisson-Dirichlet process  N .To achieve an efficient Markov chain Monte Carlo sampling scheme, we also recast the model completely in terms of random variables.Then model ( 5) is approximated by where Using the blocked Gibbs sampler for the truncated two-parameter Poisson-Dirichlet process, we can devise a posterior inference scheme for model ( 17).The sampler draws iteratively from the conditional distributions P(P|K), P(Z|K, X) and P(K|P, Z, X), thus producing the posterior values from P(P, Z, K|X).
We illustrate the usage of the posterior inference scheme within a normal mean mixture model.The model is in the format of (5) with (X i |Y i ) ∼  (X i | i ,  i ).We choose the priors  −1 ∼ Ga(a 0 , b 0 ) and | ∼  (  , 5).Using (17), we approximate the normal mean mixture model by p j  j (.) + e N  0 (.), We develop a blocked Gibbs sampler for the posterior of model ( 18) in the following algorithm.Note that the derivation of Step 2, 3, and 4 of the algorithm can be found in Ishwaran and James (2002).
and n j is the number of times K * j occurs in K. Also, for each j though, when it comes to a Pitman-Yor process prior, the slice-efficient sampler could still be extremely slow or even unfeasible due to the huge number of random variables generated, in particular when the discount parameter  is large.See Canale et al. (2022) for further discussion.To facilitate the posterior inference of a Pitman-Yor process mixture model, Canale et al. (2022) proposed the importance conditional sampler, which combines the appealing features of both conditional and marginal methods while avoiding their weaknesses.We will use these methods to analyze the synthetic data generated by a bimodal mixture and a leptokurtic mixture.The bimodal mixture assumes that f (x i ) = 0.5 (−1, 0.5 2 ) + 0.5 (1, 0.5 2 ), and the leptokurtic mixture assumes that f (x i ) = (2∕3) (0, 1) + (1∕3) (0.3, 0.25 2 ).We will also consider the inference problem of the galaxy velocity data, which is widely used in Bayesian non-parametric statistics.The analysis is carried out on MATLAB 2023a on a 64-bit Windows desktop with an Intel i9-10900 processor and 64GB RAM.
The first part of our numerical study is to check the posterior inference results.We draw n = 100 independent samples from the bimodal mixture and use Algorithm 5 to estimate the model posterior.We choose the concentration parameter  = 1, discount parameter  = 0.3, truncation level N = 100 and hyperparameters a 0 = 2, b 0 = 1,   = 0.In the HMC step, we use the leapfrog steps L = 10 and adjust the step size to obtain an acceptance rate of around 0.6.The numerical results are presented in Figure 5. From the figures we can see that the predictive density induces two modes for the observations at −1 and 1.We also count the number of occupied clusters in each iteration and record their proportions.The results suggest that the posterior distribution has at least two clusters, and it is unlikely to have more than 12 clusters.Clearly, the number of occupied clusters is overestimated.To achieve a more concentrated posterior estimation, we try a smaller concentration parameter  = 0.3.The numerical results are given in Figure 6.We can see that the posterior induces a similar predictive density as before but with fewer occupied clusters.
In the second part of the numerical study, we compare the performance of the different posterior inference schemes.We draw n = 100 independent samples from the leptokurtic mixture and analyze the data using different methods.The numerical results are recorded in Table 3 and  Table 4.In these tables, "Slice (Dep.)" and "Slice (Ind.)"represent the dependent and independent slice-efficient sampler, and "Truncation" stands for the truncated stick-breaking process.We would like to remind the reader of the remarkable BNPmix package developed by Corradin et al. (2021), which contains the ICS and slice-efficient samplers.On the other hand, the "Truncation" method is obtained by replacing Step 1 of Algorithm 5 with the posterior of the truncated stick-breaking process, which can be found in Suppl.A of Jara et al. (2010).We also include a concise derivation of the posterior in the supplementary materials.The running time is set to 300 s for all the posterior inference schemes to achieve a fair comparison.The inferential quality is monitored by four quantities: the number of occupied clusters, the deviance of the estimated density, the sum of squared errors (SSE) and the sum of standardized absolute errors (SSAE).For each iteration r of the blocked Gibbs sampler, denote by K (r) the number of occupied clusters and n (r)  j the size of each occupied cluster, such that The deviance is a function of all estimated parameters defined as The SSE denotes the sum of the squared differences between the observation x i and the predictive mean E(X i |data).The SSAE stands for the sum of the standardized error |x i − E(X i |data)|∕ √ Var(X i |data).These quantities have been used in the previous comparison studies of Neal (2000), Papaspiliopoulos and Roberts (2008), Fall and Barat (2012), Kalli et al. (2011), Canale et al. (2022), and Argiento et al. (2016).The algorithm efficiency can be evaluated by calculating the integrated autocorrelation time (IAT) and effective sample size (ESS) of K (r) and D (r) .The IAT of a variable is defined by Sokal (1997) as  ∶= 0.5 + ∑ ∞ l=1  l , where  l is the autocorrelation at lag l.It illustrates the statistical error of the target function in Monte Carlo estimation.The difficulty of calculating  arises from the covariance between the states, which have been used to evaluate  l .Sokal (1997) suggested the estimator τ = 0.5 + ∑ C−1 l=1 ρl for , where ρl is the estimated autocorrelation at lag l, and C is the cut-off point to be selected by the user.We will use the same cut-off point as Kalli et al. (2011) M}, where M is the number of iterations.This makes the cut-off point C the smallest lag for which we would not reject the null hypothesis H 0 ∶  l = 0. See Kalli et al. (2011) for more details.On the other hand, the ESS measures how many posterior samples are effective.Due to the autocorrelation, the number of effective samples would be smaller than the length of the Markov chain, and a higher ESS implies a better sequence of posterior samples.In practice, the ESS can be computed by the CODA package.We refer the reader to Plummer et al. (2006) and Canale et al. (2022) for more details.
From the IAT results, we can see that the slice samplers are less efficient than the other methods.Algorithm 5 and the ICS method have similar efficiency, and they are more efficient than the truncated stick-breaking method with a reduction in IAT of 10 to 20 per cent.On the other hand, we notice that Algorithm 5 provides the smallest SSE.It is known that the SSE is an index favoring complex models and leading to better values when the data set is over-fitted (see, e.g., Argiento et al., 2016), and Algorithm 5 is preferable according to the SSE criterion.Finally, we notice that Algorithm 5 generates fewer samples in the given time compared to the ICS and truncated stick-breaking process.This is caused by the simulation of the truncated stable process, which costs a lot of time.See Dassios et al. (2020) for a further discussion about the simulation efficiency.We conduct another numerical experiment based on the bimodal mixture and provide the results in the Appendix S1.The findings from the bimodal mixture are similar to those before.Specifically, Algorithm 5 is more efficient than the truncated stick-breaking process method and leads to the lowest SSE.In addition, we use Algorithm 5 to analyze the galaxy velocity data to demonstrate its usage in real-world datasets.The posterior mean density estimation results are provided in the Appendix S1.

DISCUSSION
In this paper, we have studied the finite approximation of the Pitman-Yor process by truncating its two-parameter Poisson-Dirichlet representation.We call the approximation process a truncated two-parameter Poisson-Dirichlet process.We have developed two simulation algorithms for the truncated two-parameter Poisson-Dirichlet process and used them to estimate the functionals of the Pitman-Yor process.The simulation results in Section 3 show that, for the same running time, our method provides a better approximation to the functionals compared to the truncated stick-breaking process.We have also adapted the truncated two-parameter Poisson-Dirichlet process into a Pitman-Yor process mixture model and designed the posterior inference scheme for the model.Numerical implementations suggest that our posterior inference scheme is more efficient than the truncated stick-breaking process method and leads to a lower SSE than the other methods.
The construction of the truncated two-parameter Poisson-Dirichlet process is based on the ranked sequence of the stick-breaking random weights.Thus, the existing research about the truncation error of the stick-breaking process provides an upper bound for the truncation error of our process.In this paper, we have compared the truncation error of different approximation methods.However, we did not give a rule for selecting the truncation level, except for simply looking at the expectation of the truncation error.In fact, our construction method needs to determine all the random weights simultaneously.Thus, a predefined truncation level is required.But it is still possible to use a random truncation level by looking at the ratio between two consecutive random weights.For example, the existing literature has considered a random stopping rule M in terms of J M ∕ ∑ M i=1 J i <  for a completely random measure (see, e.g., Arbel & Prünster, 2017;Gelfand & Kottas, 2002).We can extend this method to the current work by adding a condition (R 1 …R M−1 )∕(1 + R 1 + • • • + R 1 …R M−1 ) <  into Step 1 of Algorithm 1, such a random stopping rule truncates the sequence when the newly sampled random weight is small enough compared to the previous ones.
We find the application of the truncated two-parameter Poisson-Dirichlet process in the approximation of the Pitman-Yor process mixture models.Numerical implementations suggest a reasonable estimation quality by using the approximation.We also find that the simulation of the truncated stable process slows down the posterior inference scheme.Specifically, Step 3 of Algorithm 3 simulates a truncated stable process at time t ∶= Γ(1 − )z.When t is large, the exact simulation algorithm becomes less efficient.One potential improvement could be to split the time into t = ∑ ⌊t⌋ i=1 1 + (t − ⌊t⌋), where ⌊t⌋ denotes the largest integer smaller than t.Then we can simulate ⌊t⌋ number of truncated stable processes, each at time 1, parallelly, and a truncated stable process at time t − ⌊t⌋.Their summation gives us a sample of the truncated stable process at time t.Alternatively, we could replace the exact simulation algorithm with a nonexact but faster sampler.This problem will be further investigated in future work.

F
Posterior inference of the bimodal mixture with n = 100 observations using Algorithm 5.The parameter values are  = 0.3,  = 1 and N = 100.The plots are based on a sample size of 10000 iterations following an initial burn-in of 10,000 iterations.(a) Posterior mean values.The observations X i and true values for  X are denoted by green cross and red plus, respectively.(b) Twenty-five randomly selected predictive densities evaluated over the same partition.The observations are presented by the histogram, and the true density curve of the bimodal mixture is plotted in green.(c) Proportions of the number of occupied clusters.

F
Posterior inference of the bimodal mixture with n = 100 observations using Algorithm 5.The parameter values are  = 0.3,  = 0.3 and N = 100.(a) Posterior mean values; (b) Predictive densities; (c) Proportions of occupied clusters.TA B L E 3 Posterior of the leptokurtic mixture, sample size n = 100,  = 0.3,  = 1.0, and the running time is 300 s for all the posterior inference schemes.
: and D, number of occupied deviance; E, effective sample size is rounded to the nearest integer; K, the number of occupied clusters; SSAE, the sum of standardized absolute errors; SSE, sum of squared errors; τ, the estimated IAT. Posterior of the leptokurtic mixture, sample size n = 100,  = 0.3,  = 5.0, and the running time is 300 s for all the posterior inference schemes.