Improving risk classification and ratemaking using mixture ‐ of ‐ experts models with random effects

In the underwriting and pricing of nonlife insurance products, it is essential for the insurer to utilize both policyholder information and claim history to ensure profitability and proper risk management. In this paper, we apply a flexible regression model with random effects, called the Mixed Logit ‐ weighted Reduced Mixture ‐ of ‐ Experts , which leverages both policyholder information and their claim history, to categorize policyholders into groups with similar risk profiles, and to determine a premium that accurately captures the unobserved risks. Estimates of model parameters and the posterior distribution of random effects can be obtained by a stochastic variational algorithm, which is numerically efficient and scalable to large insurance portfolios. Our proposed framework is shown to outperform the classical benchmark models (Logistic and Lognormal GL(M)M) in terms of goodness ‐ of ‐ fit to data, while offering intuitive and interpretable characterization of policyholders' risk profiles to adequately reflect their claim history

In the underwriting and pricing of nonlife insurance products, policyholders' information, or covariates, is typically a useful indicator of their risk level.Using automobile insurance as an example, driver's age has been empirically observed as an important factor on both accident rates and frailties (see, e.g., Kelly & Nielson, 2006;Zhang et al., 1998).Similarly, there usually exists certain dependence structure between claim frequency/severity and a driver's annual mileage (the average distance driven per year), see, for example, Bailey and Simon (1960), Vickrey (1968), Edlin (1999), and Lemaire et al. (2016).Such information is often leveraged by the insurer to make better decisions in risk classification and ratemaking: the former categorizes policyholders into relatively homogeneous groups with similar risk profiles, while the latter determines a premium to be charged for insurance protection.
For new policyholders, risk classification and ratemaking are usually done on an a priori basis, whereby only policyholder covariates are utilized.A widely used approach is to incorporate the covariates as regressors in the Generalized Linear Models (GLMs) for modeling claim frequency and/ or severity, see, for example, McCullagh and Nelder (1989), De Jong and Heller (2008), and Ohlsson and Johansson (2010).However, a priori risk classification and ratemaking may fail to capture certain unmeasurable or unobserved risk factors, for example, the aggressiveness when driving, which cannot be reflected by covariates alone (see, e.g., Antonio & Beirlant, 2007;Antonio & Valdez, 2012;Denuit et al., 2007).These unmeasurable or unobserved risk factors are an additional source of heterogeneity among seemingly homogeneous policyholders who have very similar (or even exactly the same) covariate information.Still, one may reasonably assume that such latent risk factors can be reflected by and observed from policyholders' claim history, that is, riskier policyholders tend to have a higher number of claims and/or larger severities.As time goes by, the insurer gains additional, up-to-date insights into the policyholder's risk profile by observing their claim history, including both frequency and severity.At policy renewal, the insurer may decide to update their decision of risk classification and ratemaking on an a posteriori basis, whereby both policyholder covariates and their individual claim history are utilized.A widely known and used approach for a posteriori risk classification and ratemaking is the Bonus-Malus System (BMS), which categorizes policyholders into risk groups with appropriate premia based on their claim counts in the previous policy year (see, e.g., Denuit et al., 2007;Lemaire, 1995, for an introduction to BMS).
For the sake of profitability and risk management, it is essential for the insurer to design a good framework for a posteriori risk classification and ratemaking.As mentioned above, a priori information alone may be insufficient to accurately capture latent, heterogeneous risks of individual policyholders, which could lead to mispricing (i.e., overpricing safer policyholders while underpricing riskier ones) and potentially large losses for the insurer.By combining information from both covariates and claim history for a posteriori risk classification and ratemaking, the insurer may be able to offer competitive pricing for low-risk policyholders, while also appropriately charging high-risk policyholders so that their claims are expected to be covered to ensure the insurer's profitability in the long run.Besides, on a portfolio level, the insurer gains additional insights into the risk segmentation and the categorization of policyholders with similar risk profiles, which may be particularly helpful for risk management purposes such as identifying and ceding losses from high-risk policyholders to avoid tail risks (see, e.g., Chapados et al., 2008).
Consequently, the problem of a posteriori risk classification and ratemaking has led to an abundance of literature from actuarial researchers (see Section 2 for a detailed review).Due to the inherent differences between a priori information and latent risk factors (e.g., observable vs. unobservable, typically fixed vs. potentially time-varying, etc.), they are usually analyzed and modeled discussion on the economic and business implications of applying the Mixed LRMoE model in practice.Besides addressing the problem of a posteriori risk classification and ratemaking from a practical perspective, we also provide technical details of a stochastic variational Expectation-Conditional-Maximization (ECM) algorithm for the simultaneous estimation of model parameters and inference of the posterior distribution of random effects (see Section 4), which complements the theoretical development in Fung and Tseung (2022).The algorithm is also numerically efficient and scalable, which greatly boosts the applicability of our proposed Mixed LRMoE framework to large, multiyear insurance portfolio, as well as to more general modeling problems with one or multiple random effects.
The remainder of this paper is organized as follows.Section 2 contains a short literature review for previous works on a posteriori risk classification and ratemaking, as well as estimation methods for random effects models.Section 3 provides an overview of the LRMoE model and introduces the Mixed LRMoE in the general formulation and its adaptation for the problem of a posteriori risk classification and ratemaking.Then, Section 4 develops a stochastic variational ECM algorithm for estimating model parameters and inferring the posterior distribution of random effects.Next, Section 5 contains an application of our proposed framework on a real insurance data set.Finally, Section 6 concludes with a brief discussion and outlook for future research directions.In the Supporting Information, Appendix A contains technical details of the stochastic variational ECM algorithm, and Appendix B presents two simulation studies which aim to numerically illustrate and examine the proposed estimation algorithm.

| LITERATURE REVIEW
In this section, we review the existing literature on two fronts: the methodological development on a posteriori risk classification and ratemaking, and various algorithms for parameter estimation in the presence of random effects.We also briefly address how the present paper relates to and differs from previous works.

| A posteriori risk classification and ratemaking
The use of claim history for a posteriori risk classification and ratemaking is a classical problem which has been studied in depth in the actuarial literature.Early works in credibility theory, such as Bühlmann (1967), Norberg (1979), andBühlmann andGisler (2005), assume some common parameters underlying the distribution of insurance losses.One uses the observed claim history to infer the posterior distribution of the parameters, which then yields the posterior distribution of future losses given the history.As for the widely used BMS mentioned in Section 1, a comprehensive introduction can be found in, for example, Lemaire (1995) and Denuit et al. (2007).On the basis of the claim history (typically the number of claims in the year before policy renewal), policyholders are (re)classified into one of a number of prespecified risk classes according to certain transition rules, whereby each risk class corresponds to a premium relativity which reflects the level of risk.However, classical formulations of credibility theory (e.g., greatest accuracy credibility) and BMS (e.g., rate tables based solely on the claim counts in the previous policy year) do not consider covariate information, which is usually deemed as important indicators of policyholders' risk characteristics.To this end, there has been an abundance of literature that aims to apply more sophisticated statistical models, which typically involve a regression component, to the problem of a posteriori risk classification and ratemaking.Most notably, random effects have been a popular choice for modeling the temporal dependence between past and future claim behavior.For example, many authors have considered adding random effects in GLM which results in GLMM, see, for example, Dionne and Vanasse (1989), Dionne and Vanasse (1992), Pinquet (1998), Frangos and Vrontos (2001), Boucher and Denuit (2006), and Antonio and Beirlant (2007), whereby the posterior distribution of random effects given claim history is used for prediction.Another important consideration is the dependence structure between multiple coverages which is common in automobile insurance, see, for example, Pinquet (1998), Gómez-Déniz et al. (2008), Boucher et al. (2009), Gómez-Déniz (2016), and Tzougas and di Cerchiara (2021) for using shared random effects to model such dependence.Other researchers have also investigated the potential dependence random effects on policyholder covariates (e.g., Boucher & Denuit, 2006, or dynamic random effects, e.g., Bolancé et al., 2007).Besides, while some works mainly focus on claim frequency alone, many researchers have also attempted to incorporate claim severity and its dependence structure with frequency, for example, Ni et al. (2014), Park et al. (2018), Oh et al. (2020), andOh et al. (2022).Furthermore, to overcome certain restrictive assumptions in GLM, finite mixture models have recently become popular in a posteriori risk classification and ratemaking for more flexible and accurate modeling of claim frequency and severity, as used in Bermúdez and Karlis (2012), Tzougas et al. (2014), Tzougas et al. (2018), andTzougas anddi Cerchiara (2021).
Similar to many papers cited above, the Mixed LRMoE model uses policyholder covariates as fixed effects in a regression framework.The addition of random effects introduces dependence between observations across multiple policy years of the same policyholder, from which the posterior distribution of random effects is inferred and then utilized for a posteriori risk classification and ratemaking.Our work also intersects with mixture model-based approaches, such as Tzougas and di Cerchiara (2021), in that the Mixed LRMoE model allows for more flexible and accurate modeling of the loss distribution compared with classical regression models, such as GLM.In the broader class of general mixture-of-experts (MoE) models, our work is closely related to Yau et al. (2003), Ng and McLachlan (2007), and Ng and McLachlan (2014), where random effects are also incorporated to account for heterogeneity observed in real data.However, the Mixed LRMoE presented in this paper has an arguably simpler model structure.

| Estimation algorithms
Under certain assumptions such as the classical conjugate pairs of prior-posterior distributions, there exist closed-form solutions for model parameters and the posterior distribution of random effects, which also yields nice, closed-form results for a posteriori premium, see, for example, Chap. 13 of Klugman et al. (2012) for the Gamma-Poisson case, and Denuit and Lu (2021) for the Wishart-Gamma case.However, in many moderately complex regression modeling frameworks with random effects, parameter estimation and inference may be challenging due to typically intractable likelihood functions.As a classical approach, one may consider applying the Best Linear Unbiased Predictor (BLUP) procedure for obtaining the realization of random effects, combined with Restricted/Residual Maximum Likelihood for estimating the model parameters, see, for example, Henderson (1973), Henderson (1975), McLean et al. (1991) for Linear Mixed Models, McGilchrist (1994) and McGilchrist and Yau (1995) for GLMMs, and Yau et al. (2003) and Ng and McLachlan (2007) for MoE models.Alternatively, one may choose to estimate the parameters from the marginal likelihood by numerically integrating out the random effects using, for example, the Gauss-Hermite Quadrature (e.g., Pechon et al., 2019;Pinheiro & Bates, 1995) or the Laplace approximation (e.g., Breslow & Clayton, 1993;Raudenbush et al., 2000).One may also apply Markov Chain Monte Carlo (MCMC) methods (e.g., Booth & Hobert, 1999;Brooks et al., 2011;Zeger & Karim, 1991) for generating samples of random effects from their posterior distribution given the observed data, based on which the posterior of model parameters can also be obtained.A comparison of these methods for models with random effects can be found in Browne and Draper (2006).However, the aforementioned methods may not be suitable for the problem of a posteriori risk classification and ratemaking.For example, when working with large insurance portfolios, it is desirable to develop an algorithm which scales with the number of random effects and the size of data sets, which may be difficult for numerical integration or MCMC methods.Also, it is desirable to obtain posterior distributions, rather than point estimates, of certain quantities of interest (e.g., a posteriori premium based on different premium principles), which are not produced by either BLUP or numerical integration methods.
Therefore, in place of these classical methods, we opt to use variational inference (VI) primarily for its superior speed and scalability for large insurance portfolios.Besides estimating model parameters with computational efficiency, our stochastic variational ECM algorithm also directly produces the approximated posterior distribution of random effects for each individual policyholder, which is key for a posteriori risk classification and ratemaking for future policy years.Further, while VI methods have been widely used in the machine learning community as an alternative to computationally more expensive methods, such as MCMC (Blei et al., 2017), there have been few use cases of VI in the actuarial literature (see, e.g., Gomes et al., 2021;Kim et al., 2022;Kuo, 2020).We hope our paper serves as another example to showcase the potentials of VI methods for analyzing the ever-growing amount of data available for insurance applications.

| MODELING FRAMEWORK
In this section, we first give an overview of the LRMoE modeling framework, including model formulation, theoretical properties, implementation, and application in actuarial contexts.Then, we extend the LRMoE model with random effects to account for the temporal dependence across different policy years.Finally, we provide some discussion on the Mixed LRMoE specifically for the application of a posteriori risk classification and ratemaking.

| Overview of LRMoE
The LRMoE model first introduced in Fung et al. (2019b) is formulated as follows.Let x i denote a P-dimensional vector of covariates of policyholder i, such as demographic information and vehicle specification.Given x i , the policyholder is classified into one of g latent risk classes by the logit gating function where α j is a vector of regression coefficients for latent class j.Within each latent class j, a D-dimensional vector of response variable(s) y i such as claim frequency and severity is modeled by an expert function y ψ f ( ; ) j i j , where ψ j denotes the parameters of the expert function.Consequently, the likelihood function for a portfolio of n policyholders is given by (2) where α α α α = ( , , …, ) are the model parameters to estimate given the observed data X Y x y i n ( , ) = {( , ) : = 1, 2, …, } i i .We assume conditional independence among all dimensions in y i given the latent class j such that  y ψ ψ f f y ( ; ) = ( ; ) where y id is the dth dimension in y i and f jd is the expert function for y id with parameters ψ jd .
The LRMoE model can be viewed as a simplification of the general MoE model (see, e.g., Jordan & Jacobs, 1994), whereby the gating function is restricted to multiple logistic functions and the regression on covariates in the expert functions is eliminated.It is shown in Fung et al. (2019b) that such simplification will not reduce modeling flexibility, provided the expert functions satisfy some mild conditions.In other words, the LRMoE model is capable of achieving the same level of goodness-of-fit as the general MoE with a much simpler model structure.In the meantime, the simplified model structure of LRMoE provides the following intuitive model interpretation in insurance contexts.On the basis of covariates x i which are indicative of individual risk profiles, policyholders are classified into latent risk groups by a commonly used function for classification problems.Within the same latent group j, the individual risk profiles are naturally assumed to be homogeneous by sharing the same expert function y ψ f ( ; ) j i j whose parameters are independent of policyholder information.
Thanks to its flexibility and interpretability, the LRMoE model has been applied to many actuarial modeling problems.Fung et al. (2019a) used it for modeling correlated claim frequencies of two types of automobile insurance coverage, where the LRMoE mixture of Erlang Count experts is shown to outperform the negative binomial GLM (with and without zero inflation).Fung et al. (2022) discussed fitting LRMoE to censored and truncated data which are commonly encountered when modeling claim severity or reporting delays.The extended model is applied to insurance pricing with policy deductibles and prediction of incurred but not reported claims.In Fung et al. (2022), the LRMoE is further extended to include composite or slicing expert functions which account for multimodal and heavy-tailed distributions.For the implementation of LRMoE, software packages written in R (Tseung et al., 2020) and in Julia (Tseung et al., 2021) are readily available for use, which offer a wide selection of expert functions commonly used for actuarial modeling and utility functions for predictive analysis and model visualization.
As with many mixture models, parameter estimation for LRMoE is done using the ECM algorithm (see, e.g., Dempster et al., 1977;McLachlan & Peel, 2004).Details of the ECM algorithm for LRMoE can be found in the papers cited above.For Mixed LRMoE, we combine the same ECM algorithm with VI methods to deal with intractable marginal likelihood due to the presence of random effects, which will be presented in Section 4.

| Formulation of Mixed LRMoE
In the context of a posteriori risk classification and ratemaking, it is important to utilize information about policyholders' claim history to make predictions for the upcoming policy years.In effect, one takes advantage of the dependence structure in the claim history across different policy years generated by the same policyholder.Note that such dependence structure has not been accounted for by the LRMoE model, due to the assumption of independence between observations x y ( , ) i i as indicated by the likelihood function in Equation (2).To incorporate dependence between observations across different policy years, we propose to add policyholder-level random effects to the LRMoE model, which results in the Mixed LRMoE model.In this subsection, we first formulate the Mixed LRMoE in a general setting following Fung and Tseung (2022), and then discuss the special case with only policyholder-level random effects.
Assume each observation x y ( , ) i i is equipped with a vector of random effects w w w w = ( , , …, ) , where L is the total number of levels of different random effects.For the lth level of random effect, l L = 1, 2, …, , we assume there are in total S l factors w { } =1,2, …, l , and each observation i is mapped into one of these factors by a known function ⋅ c ( ) for s S = 1, 2, …, l .Equivalently, the mapping function c i ( ) l can be represented by an S l -vector t il where exactly the c i ( ) th element is one and the others are zero (see also Figure 2 for an example). Let =1,2, …, ; =1,2, …, l denote the collection of random effects across all levels and all factors, which are assumed to be independent across l and s.We also assume their distribution and density functions are prespecified by ⋅ Φ ( ) and ⋅ ϕ ( ) with no extra parameters such that Similar to the covariates x i , we assume the random effects w i influences only the gating function.In addition, we assume there are coefficients β j , j g = 1, 2, …, , multiplied to the random effects, which serve as scaling factors that also affect the gating functions and add to the modeling flexibility by compensating the lack of parameters in ⋅ Φ ( ).Consequently, for the Mixed LRMoE model, the gating function, given covariates x i , realization of random effects w i , and parameters α β ( , ) is specified by Unlike the gating functions, the expert functions are assumed to be independent of both the covariates x i and the random effects w i , as illustrated in Figure 1.Note this is the same assumption used in the LRMoE model without random effects.Consequently, given the realization of random effects w, the likelihood function of Mixed LRMoE is (5) while the likelihood with random effects integrated out is given by where and the subscript of the expectation operator indicates the expectation is calculated by integrating out w with respect to ⋅ ϕ ( ).We conclude the introduction of Mixed LRMoE with a remark on its formulation and a comparison with previous works which attempt to incorporate random effects in the general MoE framework.In the statistical literature, Yau et al. (2003) propose a two-component MoE with random effects in both the logit gating function and normal experts.Ng and McLachlan (2007) consider a similar framework but uses Bernoulli experts for a classification problem, while Ng and McLachlan (2014) add random effects only to the expert functions.For the application in insurance contexts, we focus on a special subclass of Mixed MoE model where random effects only influence the latent class probabilities through the gating function, while the expert functions are kept independent of covariates and random effects.Besides possessing the same level of modeling flexibility (see Section 3.3), this simplified model structure leads to an easier implementation of parameter estimation.As will be evident in Section 4, since the estimation procedures of gating and expert functions can be separated to some extent, the Mixed LRMoE model actually allows for more flexible choices and combinations of expert functions which are customized to different modeling problems (see also Section 6).By restricting the random effects to only the gating functions, we are able to develop a unified estimation algorithm which caters for different choices and combinations of expert functions.

| Denseness property of the Mixed LRMoE
The most important property of the Mixed LRMoE is the denseness property, which justifies the flexibility of the proposed model in capturing a broad range of complex multilevel data characteristics.While the theoretical result has been rigorously developed by Fung and Tseung (2022), we hereby briefly describe and interpret the result without extensive mathematical treatments.
) be the joint distribution function of Y given X under the proposed Mixed LRMoE model, which is given by where y ψ F ( ; ) ) as the joint distribution of Y given X under an arbitrary mixed effects model.Under some mild regularity conditions, Fung and Tseung (2022) prove that for any target mixed effects model  Y X H ( ), there exists a sequence of model parameters ) may carry very complicated model characteristics, including but not limited to the joint loss distribution (e.g., distributional multimodality and dependence across business lines), the regression link (e.g., nonlinear or interactive influence of policyholder attributes to the losses), the random intercept (e.g., latent impacts to each policyholder), and the random slope (e.g., random effects interact with policyholder attributes).As a result, the denseness theorem justifies the versatility of the proposed Mixed LRMoE in simultaneously capturing all these features to an arbitrary degree of accuracy.Moreover, the denseness theorem only requires that ⋅ Φ ( ) is continuous.Hence, one has the freedom to choose any continuous distributions for the random effects without impeding the flexibility of the Mixed LRMoE.Motivated by the computational convenience (see Section 4), we select ⋅ Φ ( ) l (Equation 3) to be a standard normal distribution, such that ⋅ Φ ( ) follows a multivariate standard normal distribution.

| A posteriori risk classification and ratemaking
In Section 3.2, the Mixed LRMoE modeling framework has been introduced in its most general form.The application of a posteriori risk classification and ratemaking is special case when L = 1, that is, there are policyholder-level random effects w { } Consequently, the sample size n is the total number of policy year observations out of N 0 unique policyholders, such that each factor =1,2, …, 0 represents the individual risk of one unique policyholder.An illustration for one such policyholder with 3 years of claim history is shown in Figure 2. Suppose we would like to conduct a posteriori risk classification and ratemaking for year 3 based on the previous 2 years.The claim history is represented by two rows in the data set, that is, x w y ( , , ) 1 1 1 and x w y ( , , ) 2 2 2 , while the future claim to be predicted is represented by yet another row of data x w y ( , , ) ( 1) , assuming this individual is encoded as the first unique policyholder in the portfolio (thus the superscript for w ( ) 1 (1) ).
Similar to many previous works such as those cited in Section 1, our paper also utilizes random effects for modeling temporal dependence among different policy years of the same policyholder, but we have done so in a slightly different fashion.Many previous papers have proposed mixed models whereby the certain model parameters are shared across different observations.For example, one may assume the claim frequency N it of policyholder i in the tth year follows θ Poisson( ) it , and then uses the observed data N t { : = 1, 2,… } it to infer the posterior of the intensity parameter.In contrast, our formulation of the Mixed LRMoE treats the random effects w in a similar way as the fixed effects x i , which essentially serve as a regressor in the gating function.Rather than imposing certain changing dynamics on model parameters, the formulation of Mixed LRMoE actually resembles, to a large extent, classical approaches of longitudinal data modeling with random effects, see, for example, Diggle et al. (2002) and Fitzmaurice et al. (2012).
While the denseness property guarantees the modeling flexibility of Mixed LRMoE, some previous works in a posteriori risk classification and ratemaking have investigated other formulations and assumptions of random effects, for example, Boucher and Denuit (2006) considered the potential dependence of random effects on the covariates, and Bolancé et al. (2007) imposed temporal dynamics on the random effects.In contrast, our proposed framework makes more simplified assumptions, that is, independence between random effects and covariates, as well as the same realization of random effects over different policy years.Relaxing these assumptions will lead to different interpretations of the model at various degrees of complexity, which consequently may (or may not) result in significantly different risk classification and ratemaking decisions.In this paper, we will focus on the formulation of Mixed LRMoE presented in Section 3.4 and leave these model extensions for future research.
In practice, a framework for a posteriori risk classification and ratemaking should account for time-varying covariates, such as the policyholder's age.Consider the example in Figure 2 for a policyholder with 2 years of history whereby their age is increasing annually (say, 30 and 31 years old).In the Mixed LRMoE, this policyholder's experience is represented by two separate rows with covariates x x , 1 2 and responses y y , 1 2 .The values for age in x x , 1 2 are correspondingly filled with 30 and 31.However, these two rows of data are not independent, because they are describing the same policyholder (thus the same factor in the policyholder-level random effect).In particular, the corresponding random effects are Example of a Mixed LRMoE model with L = 1 level of random effects on N 0 unique policyholders.A policyholder with 3 years of claim history is represented by three separate yet dependent observations in the data set, where the dependence is modeled by sharing the same factor in the random effect w. Assuming this individual is encoded as the first factor w 1 (1) , the mapping vectors are are the same 1-length vector of random effect.This can be equivalently described by the mapping function ( 1) , where w 1 (1) indicates the unobserved risks of this policyholder.When making predictions for the upcoming policy year, we would represent the same policyholder by yet another row of data, say with covariates x 3 and random effects w 3 .The age in x 3 will be valued at 32, while w w = ( ) 3 1 (1) remains the same for the same policyholder, which creates dependency between the past and the future.In short, our framework treats time-varying covariates as regular ones, whose fixed effects are reflected by the corresponding entries in the regression coefficients α, while the temporal dependence among claim experiences is accounted for by the random effects w.
Another practical issue in a posteriori risk classification and ratemaking is the varying length of available claim history per policyholder, in that an insurer's portfolio rarely remains unchanged as policyholders come in and out of the insured population.Similar to standard mixed effects models, our framework is able to handle imbalanced data, that is, policyholders with varying lengths of claim history.As detailed in the preceding example, policyholders with a longer claim history will have more rows of x y ( , ) i i to represent their claim history, all of which share the same policyholder-level random effects.Naturally, a longer claim history is desirable for obtaining more accurate posterior inference on the random effects, which may yield better results for a posteriori risk classification and ratemaking.

| PARAMETER ESTIMATION
In this section, we develop a stochastic variational ECM algorithm for estimating model parameters and for inferring the posterior distribution of random effects for Mixed LRMoE.We first present an overview of VI methods in general, and then provide details of the implementation for Mixed LRMoE with one single type of random effect.Discussion on model identifiability, model selection, and generalization of this algorithm is given at the end of this section.

| Overview of variational inference
In this subsection, we first provide an overview and motivation of VI methods.We start with the exact posterior distribution of random effects w which may be complicated due to the dependence on both the model parameters α β Ψ ( , , ) and the observed data X Y ( , ).To circumvent this numerical challenge, we assume the exact posterior can be reasonably approximated by a variational distribution q w Θ ( ; ) where Θ is the variational parameters, which are assumed to be independent of the model parameters and observed data.This produces a numerically more tractable lower bound of the marginal likelihood in Equation ( 6), also known as the Evidence Lower Bound (ELBO) in the VI literature.More specifically, by taking the logarithm of Equation ( 6), utilizing the variational distribution, and applying Jensen's inequality, we obtain the following ELBO of the marginal loglikelihood.
where  q w Θ ϕ w KL[ ( ; ) ( )] is the Kullback-Leibler (KL) divergence between the variational posterior q w Θ ( ; ) and the prior ϕ w ( ) of random effects.Instead of directly maximizing the marginal likelihood in Equation ( 6), we aim to maximize the ELBO α β Ψ Θ X Y ℓ( , , , ; , ) in Equation ( 9), hoping that the optimal parameters which maximize this lower bound are close to the true optimal parameters which maximize the actual loglikelihood.The main advantage is the tractability of the approximate posterior of random effects w, which is essentially specified by parameters Θ independent of all the other model parameters and observed data.As will be evident in Section 4.2, sampling from the approximated posterior is easier and faster than MCMC methods, since the latter works with a more complex exact posterior and typically requires a burn-in period.This may offer significant numerical efficiency, especially in high-dimensional cases where there are many types of random effects and each type of random effect has many levels.Meanwhile, the obvious trade-off is obtaining only the approximated solutions to the estimated model parameters and the approximated posterior distributions of the random effects.While the goodness of approximation and convergence properties for VI remain an open problem (see, e.g., Blei et al., 2017), our numerical simulations in Supporting Information Appendix B and real data analysis in Section 5 show promising results.This may serve as an empirical evidence for applying VI methods to insurance problems where an approximated solution may be acceptable in the presence of large data sets.
For VI, one needs to specify a family of parametric distributions for the approximated posterior q w Θ ( ; ).In this paper, we follow standard practices and use the mean-field variational family, whereby the posterior of latent variables, that is, random effects w, is a factorized multivariate normal distribution.More specifically, we assume the posterior of w l s ( ) is a normal distribution with mean μ l s ( ) and standard deviation σ l s ( ) for s S = 1, 2, …, l and l L = 1, 2, …, , which are independent across all levels l and all factors s.Mathematically, For notational convenience, we write is the posterior mean vector and Σ σ σ σ = diag (( ) , ( ) , …, ( ) ) (2) 2 ( ) 2 l the diagonal covariance matrix for the lth level of random effect.
When L = 1, given the factorization of likelihood across s S = 1, 2, …, 1 , different factors of the same level of random effect are in fact independent, both in the prior and the posterior distribution.Hence, in our application of the Mixed LRMoE with only policyholder-level random effects, the only source of error of VI is the approximation of the exact posterior by a normal distribution.However, when there are multiple types of random effects, especially in the case of certain dependence structures (e.g., multiple crossed random effects), the independence assumption in the mean-field variational family may create an additional source of error of approximation.

| A stochastic variational ECM algorithm
With the approach of VI and the choice of the mean-field variational family q w Θ ( ; ), we now develop a stochastic variational ECM algorithm for estimating the model parameters α β Ψ ( , , ), as well as inferring the posterior of random effects w represented by the variational parameters =1,2, …, .On a high level, our estimation algorithm proceeds in an iterative manner which seeks to conditionally maximize the ELBO in Equation ( 9) with respect to one set of parameters while keeping others fixed.Consequently, the algorithm will ultimately arrive at a local optimum for the ELBO of the marginal loglikelihood.First, we initialize the model parameters α β Ψ ( , , ) using the clusterized method of moments, similar to, for example, Gui et al. (2018).Meanwhile, the variational parameters Θ can be initialized such that μ 0 = l and Σ I = l for l L = 1, 2, …, (i.e., assuming a multivariate standard normal distribution), which is consistent with standard practices in the VI literature.Then, our algorithm iterates through the following steps until convergence.A detailed description of these steps can be found in Supporting Information Appendix A.1.
In addition to the estimated model parameters α β Ψ ( ˆ, ˆ, ˆ), our algorithm also yields the variational parameters Θ μ Σ ˆ= {( ˆ, ˆ)} l l l L =1,2, …, which completely specify the approximated posterior distribution of random effects w.For illustration purposes, Supporting Information Appendix B contains two simulation studies which show our proposed algorithm can recover both model parameters and the realizations of random effects to a reasonable degree.For applications such as a posteriori risk classification and ratemaking, despite no closed-form formulas for various quantities of interests such as the posterior mean of response y i (see also Section 5), their approximated values can be efficiently calculated by sampling from the variational posterior distribution which is assumed to be multivariate normal.Note the approximated posterior distribution of y i still retains a similar mixture structure as Equation ( 7), whereby the integration is now with respect to the approximated posterior of q w ( ) rather than the prior ϕ w ( ).

| Model identifiability and selection
As with many mixture models, certain restrictions are imposed for the Mixed LRMoE to be identifiable when conducting parameter estimation.To avoid label-switching between latent components (see, e.g., Fung et al., 2019a;Jiang & Tanner, 1999), we fix α 0 = g and β 0 = g as vectors of zeros, so the last latent class serves as a reference class.In addition, we fix β 1 = 1 as a vector of ones to avoid arbitrary scaling of magnitude and switching of positive and negative signs of the random effects w.Consequently, we need to estimate the coefficients β multiplied to the random effects only when there are at least three latent classes (see the examples in Supporting Information Appendix B).
Model selection when parameters are estimated using VI remains an open problem in general.One may accept the ELBO as a good approximation of the marginal likelihood and use it as the basis of model selection, but this has not been justified in theory (Blei et al., 2017).Other approaches include sequential selection (Sato, 2001), cross validation (Nott et al., 2012), and Generalized Evidence Bounds (Chen et al., 2018).For the purpose of this paper, we take a more practical approach by using the standard train-test split and examining the approximated loglikelihood and ELBO on the test set to obtain a conservative gauge of goodness-of-fit.Examples are given in the real data analysis in Section 5.

| REAL DATA ANALYSIS
In this section, we apply the Mixed LRMoE model to a real automobile insurance data set for a posteriori risk classification and ratemaking, and then compare its performance with a number of benchmark models.More specifically, we will investigate whether the Mixed LRMoE model can outperform benchmark models like Logistic and Lognormal GL(M)M and LRMoE without random effects in terms of goodness-of-fit.We will also investigate whether the Mixed LRMoE produces reasonable results for a posteriori risk classification and ratemaking, that is, policyholders who made claims in the past should generally be considered riskier and should be assigned a higher a posteriori premium.

| Description of data
The data set contains the Bodily Injury (BI) claim history of 76,049 unique policyholders from policy years 2014 to 2019 (330,781 records in total) of a major North American automobile insurer.
Since we are only working with a one-dimensional response, it will be represented by y i in this section.The description of available covariates x i and the summary statistics of the response y i are given in Table 1.We observe the loss distribution has significant zero inflation and a heavy tail in certain policy years.There also seems to be an increasing trend of claim severity over the years, in addition to varying degrees of skewness and kurtosis.The empirical distribution of positive losses is also plotted in Figure 3, which shows slightly different shapes across policy years.
Within the period of 2014-2019, the lengths of policyholders' available claim history vary from 1 year (new contracts) to 6 years (multiple renewals).For the purpose of this section, we will limit ourselves to a subset of policyholders with at least 3 years of claim history, whereby the last available year will be used as a holdout testing set and all preceding years will be used as a training set for model fitting.This filtering step will ensure all policyholders have at least some history (2 years at a minimum) for inferring the distribution of random effects.In a preliminary analysis whereby we use a train-test split based on calendar years (i.e., 2014-2017/ 2018 for training and 2018/2019 for testing), the differences in loss distributions (both frequency and severity) across the years make it very difficult to gauge and compare model performances.Our train-test split based on policyholder-level history will also smooth out some of the distributional differences across calendar years.Furthermore, 20% of the policyholders from the training period are randomly selected as the validation set for selecting the number of components in the (Mixed) LRMoE model.In the end, there are 203,579/50,831/75,742 T A B L E 1 Overview of real data set.

Covariate Range Description
x i0 1 Intercept.Baseline for Female drivers and Rural region.
x i4 [9.35, 12.82] Natural logarithm of vehicle price.Mean is 10.28 and median is 10.28.
Indicator for policies issued in the Capital.Mean is 0.09.For illustration purposes, we will model the total amount of loss per year.As a benchmark, we will consider various combinations of GLM and GLMM against which we compare the proposed Mixed LRMoE model.For these benchmark models, we assume independence between claim frequency and severity.We use a probability mass x δ ( ) i i at zero for no occurrence of claims and a continuous distribution x g y ( ) i i i for the total loss amount given there is at least one claim.Consequently, using I y { =0} i and I y { >0} i as indicators for the occurrence of claims, the distribution of total loss of policyholder i is given by where both x δ ( ) i i and x g y ( ) i i i may be modeled by either GLM or GLMM.In the case of GLMM, we will add policyholder-level random effects with 60,594 levels which corresponds to the number of unique policyholders in the training data set.For the claim probability, we use the standard Logistic GL(M)M with the logit link function.For the claim severity, we choose the Lognormal GL(M)M with the log link function which shows the best fit to data after some initial experimentation.
For the models to investigate, we will consider (mixed) LRMoE with zero-inflated (ZI) Lognormal expert functions.With the expert functions fixed, we only need to select the number of latent components for both LRMoE and Mixed LRMoE.We have selected a five-component LRMoE and a five-component Mixed LRMoE based on the Akaike Information Criterion (AIC) calculated on the validation data set.

| Goodness-of-fit
The fitted loglikelihood values of all benchmark models are summarized in Table 2.As expected, on the training set, the GLMM-GLMM model produces the highest loglikelihood since the policyholder-level random effects are used twice.The combinations of GLMM-GLM and GLM-GLMM offer worse fit to data, followed by the GLM-GLM model without any random effects.Meanwhile, all benchmark models perform very similarly on the testing set.Table 3 summarizes the fitting results of the (Mixed) LRMoE models.We see that the Mixed Besides loglikelihood values, we also look at how each model candidate fits the probability of claim and the distribution of positive losses.For the probability of claim, all model candidates offer very similar fitting performance.On the training period, all models are able to fit the observed claim probability 0.0255 to the fourth decimal place.However, on the testing period where the observed claim probability is 0.0196, all models candidates have produced a slightly higher prediction, ranging from 0.0248 to 0.0253 (or +26% to +29% of relative error), which can be attributed particularly to the lower claim frequency in year 2019 as observed in Table 1.Meanwhile, the (Mixed) LRMoE models have provided a better fit to the distribution of positive losses, as indicated by Figure 3 which compares the fitted densities against the empirical distribution.Most notably, the (Mixed) LRMoE models have successfully captured the multimodality in the distribution of positive losses, while GLM and GLMM only fit a unimodal density to the entire distribution of positive losses, and the LRMoE model without random effects fits slightly worse on smaller claims.Even though our data processing procedures have mixed up different calendar years for the testing period, note there still seems to be some distributional shift from the training to the testing period (most notably due to year 2019), so the estimated density curves from all model candidates appear to be slightly off to the left.

| Risk classification and ratemaking
For insurance pricing purposes, it is crucial that policyholders' claim history is adequately incorporated in the calculation of premium at policy renewal.In short, higher risks, as reflected by the occurrence of claim and/or higher claim amounts, should lead to a higher a posteriori premium.In this subsection, we compare the model performance in terms of a posteriori risk classification and ratemaking.
For risk classification, the latent classes in (Mixed) LRMoE models can be naturally interpreted as different clusters of policyholders based on their risk profile.To compare how risk classification is affected by claim history, we categorize all policyholders into two groups: those with at least one claim and those without any claim during the training period, and summarize their latent class probabilities in Table 4.Most notably, with the addition of random effects, the Mixed LRMoE models are able to strongly distinguish risky policyholders who have at least one claim in the past, by assigning almost a much higher probability to the riskiest latent class.Meanwhile, the LRMoE model without random effects only suggests a slight increase in the risky class probability based solely on covariate information, given the independence assumption for observations across different policy years.
Different decisions in a posteriori risk classification will also lead to differences in ratemaking.For a posteriori ratemaking, we calculate the premium for policy renewals in the testing period based on the posterior distribution given the claim history in the training period.For illustration purposes, we only consider the pure premium which is equal to the of claim multiplied by the expected positive mean loss amount.On a higher level, we investigate all policyholders based on the same grouping (with and without claims in the training period).The distributions of the predicted posterior premium are shown in Figure 4 for all model candidates.For models without random effects, that is, GLM-GLM and LRMoE, the predicted distributions of posterior premium for the two groups appear to be highly overlapping, which indicates that fixed effects alone cannot distinguish policyholders based on claim history.For benchmark models with random effects, namely, GLM-GLMM, GLMM-GLM, and GLMM-GLMM, there appear to be some differences between the two groups, whereby some policyholders with claim history will have a higher predicted premium.Most notably, the Mixed LRMoE model shows much larger differences between the distributions of predicted premium, which better captures the riskiness of policyholders reflected by their claim history.
On a more detailed level, Table 5 summarizes the predicted posterior premium, based on the two groups above in addition to the relative size of incurred total losses.We observe that the Mixed LRMoE model, as well as benchmark GLMM-GLM and GLMM-GLMM, heavily penalizes policyholders who have at least one claim, as shown by the additional premium loadings.
For both a posteriori risk classification and ratemaking discussed above, we have primarily focused on differentiating policyholders based on the occurrence of claims and the claim sizes when applicable, whereby the Mixed LRMoE model is shown to have effectively incorporated such information.However, we can still observe the effects of a priori information, that is, policyholder covariates, when determining the a posteriori premium.Most notably, in Figure 4, there is a good level of overlap between the histograms of the predicted premium for people with and without claim history, even for all model candidates with random effects.For example, certain policyholders with claim history (lower end of the orange histogram) would still be charged a lower premium than some policyholders without claim history (upper end of

| Gini Index
Next, we examine the model performance using the Gini Index as a measurement of adequacy for insurance risk scoring (see, e.g., Frees et al., 2011).We first plot the Ordered Lorenz Curve in Figure 5 for both the training and testing where the x-axis represents the cumulative percentage of premium and y-axis represents the cumulative percentage of the incurred losses  during the training or testing period, In the training set, the Mixed LRMoE model produces an Ordered Lorenz Curve farthest away from the 45°Line of Equality which represents a null model where all policyholders are assigned the same premium.This indicates the Mixed LRMoE yields the highest degree of differentiation of policyholders based on their relative riskiness.Meanwhile, in the testing set, it may be difficult to visually compare the model candidates.Hence, we rely the Gini Index, calculated as twice the area between the Ordered Lorenz Curve and the Line of Equality, as a measurement of model performance for comparison.
We implement a bootstrapping procedure to obtain the exact distributions and comparisons of Gini Index values (see also Corollary 3 of Frees et al., 2011, for an asymptotic version).More specifically, we obtain 10,000 bootstrapped samples of the training and testing sets, from which the Ordered Lorenz Curves are produced and the corresponding Gini Index values are calculated.We note the potential correlation of Gini Index values produced by all model candidates, since they are all regression models based on the same set of covariates.Consequently, broadly similar policyholders (e.g., low-risk vs. high-risk) will be assigned similar premium values.The difference in performance will stem from a finer differentiation within broadly similar policyholders (e.g., high-risk vs. higher-risk), by better capturing the nonlinear regression relationship and/or the latent, unobserved risks.Table 6 summarizes the bootstrapped Gini Index values of all model candidates, as well as the pairwise comparisons of their differences based on both two-sided and one-sided tests against zero.
In the training set, all model candidates are significantly better than the null model, which is not surprising.The GLMM-GLMM model is the best among the benchmark models, and it also outperforms the LRMoE model by capturing unobserved policyholder-level risks with random effects.The proposed Mixed LRMoE model performs the best, showing a significant margin of outperformance against all other model candidates.
In the testing set, all model candidates are also better than the null.However, when comparing against each other, they perform quite similarly whereby most of the two-sided tests yield insignificant results at p = 0.20.If one is interested in determining the outperformance of one model against another, a one-sided test of the positivity of the difference in Gini Index values may also be appropriate.Under this test, the GLMM-GLM model is the best among all benchmark models, while the Mixed LRMoE model may still be considered better than all others by producing a higher Gini Index at least 86% of the time.
A data drift in year 2019 has been previously noted in Table 1 and Figure 3, which comprises 64% of the testing set.To this end, we further compare the model performance by bootstrapping the testing set by policy years 2016-2018 and year 2019.In the former experiment, the proposed Mixed LRMoE offers superior outperformance, assuming the testing set is generated from a distribution similar to years 2016-2018.In the latter, the margin of outperformance by Mixed LRMoE becomes less significant due to the sudden change of loss distribution.However, such unprecedented data drift is outside the scope of what statistical and predictive models can address based on historical data only.

| Comparison of individual policyholders
Apart from the analysis on a portfolio level, we also investigate the model performance on a policyholder level.In particular, we examine pairs of policyholders with similar covariates but different claim experiences, to investigate how the model candidates determine the a posteriori T A B L E 6 Summary of Gini Index of all model candidates and their differences.premium for individual policyholders.For brevity, we only retain GLMM-GLMM as the benchmark and compare it with the (Mixed) LRMoE models.We consider three pairs of policyholders , whereby each pair of policyholders share the exact same covariates but different claim experiences and all of them have 6 years of full history from 2014 to 2019.A 1 and A 2 are both 65-year-old male, drive a 7-year-old vehicle worth of $40,100 with a collision rating of 33, and purchased their policies in the Urban region.B 1 and B 2 are both 35-year-old female, drive a 6-year-old vehicle worth of $29,400 with a collision rating of 32, and purchased their policies in the Urban region.C 1 and C 2 are both 40-year-old male, drive an 8-year-old vehicle worth of $24,800 with a collision rating of 29, and purchased their policies in the Urban region.As for the claim history during 2014-2018, A 1 , B 1 , and C 1 have no claims, while A 2 , B 2 , and C 2 have a total claim amount of $850, $1950, and $5704, respectively.Given that 97.9% of policyholders have no claims at all (see Table 1), these positive claims lie at the very tail of the overall loss distribution, with C 2 being close to the 99% percentile.From an a posteriori perspective, A 2 , B 2 , and C 2 should be considered increasingly riskier than their counterparts.
For these selected policyholders, Table 7 summarizes their a posteriori pure premium values.Since the LRMoE model does not incorporate claim history, each pair of A A ( , ) 2 is given the same premium value which is not reasonable.In contrast, all other models with random effects have produced higher a posteriori premium for A 2 , B 2 , and C 2 , since their claim experiences during the training period are indicative of latent heterogeneous risks unobservable from covariates alone.Most notably, the Mixed LRMoE model has posed very large penalties for policyholders B 2 and C 2 , whose a posteriori premium is more than double that of a comparable policyholder without any claim history.
In addition, Figure 6 illustrates their a posteriori predictive distribution for the positive losses of these selected policyholders.For A 2 , B 2 , and C 2 who have made claims in the past and should be considered riskier, the Mixed LRMoE model has assigned more probability masses on the positive losses, as indicated by the elevated density functions compared with their safer counterparts.Most notably, the Mixed LRMoE model has produced much heavier tails for B 2 and C 2 than those produced by other model candidates, which contributes to the drastic increase in the corresponding a posteriori pure premium compared with B 1 and C 1 .

| Economic and business implications
In the preceding subsections, we have illustrated how the proposed Mixed LRMoE outperforms the benchmark models by providing a superior fit to data, producing reasonable results for a T A B L E 7 A posteriori pure premium values for sample policyholders.posteriori risk classification and ratemaking, and adequately differentiating riskier policyholders from safer ones based on their claim history.Now we briefly discuss the economic and business implications of the potential application of the Mixed LRMoE model in practice.

Policyholder
As mentioned in Section 1, a well-designed framework for a posteriori risk classification and ratemaking is crucial for the insurer's profitability and risk management.With a better fit to empirically observed data shown in Section 5.2, the Mixed LRMoE can provide a more accurate description of the overall loss distribution, which lays the foundation for risk classification and ratemaking.Compared with benchmark models such as GLM and GLMM, our proposed model is flexible enough to capture complex data structures such as multimodality and heavy tails, which is particularly helpful for modeling extreme losses generated by risky policyholders.This is also illustrated by the a posteriori risk classification and ratemaking results in Sections 5.3 and 5.5, whereby riskier policyholders with large claims in the past are subject to a much higher posterior premium at policy renewal, while some safer policyholders are rewarded by a lower premium.Consequently, by capturing latent risks manifested in the claim history, policyholders are more appropriately priced (rather than mispriced) according to the Mixed LRMoE model, which results in better risk segmentation as indicated by the improved Gini Index values in Section 5.4.All these advantages of the Mixed LRMoE model will help increase the insurer's profitability and ensure better risk management.
However, in Section 5.3, certain risky policyholders with large claims in the past are very aggressively penalized by the Mixed LRMoE model, as illustrated by the drastic increase in the a posteriori pure premium.From a practical perspective, the insurer can undoubtedly expect nonrenewal of insurance policies from some of these riskier policyholders.While such nonrenewals will lead to a decrease in premium income (all else held constant), it also comes with the advantage of reduced risk exposures especially in the tail.In the meantime, safer policyholders are rewarded with potential decreases renewal premium, which increases the likelihood of customer retention and could contribute further to the insurer's profitability, since these policyholders are less likely to incur losses after all.Consequently, this may lead to longterm changes in the composition of the insurer's portfolio, as the proportions of safe and risky policyholders are likely to change after a few years, assuming risky policyholders with claim history gradually drop out.While the insurer should constantly monitor their portfolio structure, especially after implementing a new model (whether the Mixed LRMoE or any model in general), we leave the detailed investigation and discussion on such long-term impacts for future research.We also recognize that ours is only an illustrative application of the Mixed LRMoE model for research purposes.In practice, another potential challenge is to properly communicate the a posteriori premium values to policyholders and other stakeholders, especially when policyholders have similar covariates but different claim experiences, as shown by the examples in Section 5.5.This may also have legal and regulatory implications, as well as interesting academic discussions on the fairness of insurance pricing, but we will leave these issues for future research.

| CONCLUSION
In this paper, we have proposed to incorporate policyholder-level random effects in a flexible regression framework, called the Mixed LRMoE, which is then applied to the problem of a posteriori risk classification and ratemaking.Although the addition of random effects has resulted in an intractable marginal likelihood function of the model, we have developed a stochastic variational ECM algorithm for efficient estimation of model parameters and inference of the posterior of random effects, which are crucial for updating policyholders' risk profile based on their claim history.Our numerical simulation and real data analysis have demonstrated the potentials of Mixed LRMoE as a powerful tool for more accurate insurance loss modeling and better a posteriori insurance risk classification and ratemaking.While our current work has already shown promising results in an illustrative example, some practical issues remain to be addressed in future research (see Section 5.6).From a technical and modeling perspective, one may consider the following extensions and directions for future work.
• In the current formulation of Mixed LRMoE, all past policy years are equally weighted by sharing the same realization of random effects.A more realistic and general approach is to apply a weighting scheme whereby recent claims are more influential in determining the posterior premium.• We have taken the approach of modeling the total incurred loss as a mixture of ZI distributions, whereby the dependence between claim frequency and severity is not explicitly specified.An interesting extension is to incorporate such dependence in the (Mixed) LRMoE modeling framework.• As observed in our numerical study, the shift of loss distributions over different policy years presents another challenge to a posteriori risk classification and ratemaking.This opens up potential research opportunities for modeling frameworks which account for, for example, 2, …, .In general, one may specify a priori any distribution for ⋅ Φ ( ), but a common choice for random effects is the normal distribution.In this paper, we will set each ⋅ Φ ( ) l to be a standard normal distribution for l L = 1, 2, …, , which results in a multivariate standard normal distribution for ⋅ Φ ( ), since all levels of the random effects w …, l are marginally standard normal and are mutually independent.More discussions on the choice of ⋅ Φ ( ) are given in Section 3.3.

F
I G U R E 1 Model structure of a three-class Mixed LRMoE model.The shaded boxes indicate the addition of random effects to the original LRMoE model to account for policyholder-level individual risks and temporal dependence among different policy years for the same policyholder.LRMoE, Logit-weighted Reduced Mixtureof-Experts.

F
I G U R E 4 Histogram of predicted posterior premium based on different models.Top row, from left to right: GLM-GLM, GLM-GLMM, GLMM-GLM, and GLMM-GLMM.Bottom row, from left to right: LRMoE and Mixed LRMoE.AIC, Akaike Information Criterion; BI, Bodily Injury; GLM, Generalized Linear Model; GLMM, Generalized Linear Mixed Model.[Color figure can be viewed at wileyonlinelibrary.com] 20 | TSEUNG ET AL. 15396975, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/jori.12436 by University Of Toronto Libraries, Wiley Online Library on [05/07/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License the blue histogram), which should be attributed to covariates such as the inherent risk level of certain age groups or the collision rating of a particular group of vehicles.

F
I G U R E 5 Comparison of the Ordered Lorenz Curves generated from various model candidates.(Left/ right) Training/testing set.GLM, Generalized Linear Model; GLMM, Generalized Linear Mixed Model; LRMoE, Logit-weighted Reduced Mixture-of-Experts.[Color figure can be viewed at wileyonlinelibrary.com]T A B L E 5 Average of predicted posterior premium based on policyholders' claim history.

F
I G U R E 6 A posteriori predictive distributions for positive losses for sample policyholders.Top row, from left to right: A 1 , B 1 , and C 1 .Bottom row, from left to right: A 2 , B 2 , and C 2 .Since the LRMoE model does not consider claim history, the corresponding density functions are the same for each pair of A A may be viewed as a reference to compare the top and bottom rows.GLM, Generalized Linear Model; GLMM, Generalized Linear Mixed Model; LRMoE, Logit-weighted Reduced Mixture-of-Experts.[Color figure can be viewed at wileyonlinelibrary.com] Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/jori.12436 by University Of Toronto Libraries, Wiley Online Library on [05/07/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License i7 {0, 1} Indicator for policies issued in Urban region.Mean is 0.75.Histogram and fitted density of positive claim distribution.(Left)Empirical density of positive losses by policy year.(Middle/right)Training/testing set.Only GLMM-GLMM is shown because all the benchmark models yield very similar results.BI, Bodily Injury; GLMM, Generalized Linear Mixed Model; LRMoE, Logitweighted Reduced Mixture-of-Experts.[Colorfigure can be viewed at wileyonlinelibrary.com]two-year contracts, or 60,594/15,148/75,742 unique policyholders, in the training/validation/ testing set, respectively.Overall, 34%/22%/18%/26% of the policyholders have 3/4/5/6 years of claim history before the train-test split, while 11%/12%/13%/64% of the testing set are contracts from year 2016/2017/2018/2019, respectively.
Benchmark models for real data analysis.Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/jori.12436 by University Of Toronto Libraries, Wiley Online Library on [05/07/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)onWiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons LicenseLRMoE model offer much better fit to data in terms of loglikelihood on training and testing data sets, and outperforms the LRMoE model without random effects.This demonstrates the flexibility of Mixed LRMoE as well as the advantage of incorporating policyholder-level random effects for more accurate modeling of the loss distribution.As for penalization on model complexity, the AIC values are included for all model candidates in the tables, which also demonstrates the outperformance of the Mixed LRMoE model.Even though the Mixed LRMoE model has a more complex structure in terms of the number of parameters, as will be evident in Section 5.3, this added model complexity greatly improves a posteriori risk classification and ratemaking, which is the ultimate goal in this context.
T A B L E 3 (Mixed) LRMoE models for real data analysis.Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/jori.12436 by University Of Toronto Libraries, Wiley Online Library on [05/07/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Abbreviations: AIC, Akaike Information Criterion; LRMoE, Logit-weighted Reduced Mixture-of-Experts.
Comparison of latent classes and the predicted probabilities by claim history.Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/jori.12436 by University Of Toronto Libraries, Wiley Online Library on [05/07/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License T A B L E 4Note: The first table summarizes the mean and standard deviation of the response by latent class, and we have manually categorized them into three risk levels.The second table compares the predicted latent class probabilities for different groups of policyholders by their claim history (No: no claim during the training period; Yes: at least one claim during the training period), calculated from different models.Abbreviation: LRMoE, Logit-weighted Reduced Mixture-of-Experts.
The cutoff points for positive claim sizes are the 33% and 67% percentiles of its distribution.Percentages in brackets indicate the additional premium loadings compared with policyholders without any claim history, that is, Claim Indicator = No.Abbreviations: GLM, Generalized Linear Model; GLMM, Generalized Linear Mixed Model; LRMoE, Logit-weighted Reduced Mixture-of-Experts. Note: Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/jori.12436 by University Of Toronto Libraries, Wiley Online Library on [05/07/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Numbers in brackets indicate the 95%-level credible intervals estimated from 10,000 bootstrapped samples.Superscripts ***/**/* indicate the difference is significant at 0.05/0.10/0.20 levels, respectively, under a two-sided test against zero.Numbers below the brackets indicate the proportion of bootstrapped samples where the model in the row outperforms the model in the column under each pair of comparison, which is equivalent to a one-sided test of the difference against zero.Abbreviations: GLM, Generalized Linear Model; GLMM, Generalized Linear Mixed Model; LRMoE, Logit-weighted Reduced Mixture-of-Experts.