Modelling the composition of household portfolios: A latent class approach

We introduce the latent class modelling approach to the analysis of ﬁnancial portfolio diversiﬁcation at the household level. We explore portfolio allocation in Great Britain using household panel data based on a nationally representative sample of the population, namely the Wealth and Assets Survey. The latent class aspect of the model splits households into four groups, which serves to unveil a more detailed picture of the determinants of portfolio diversiﬁcation than existing econometric approaches. Our ﬁndings reveal a pattern of class heterogeneity that conventional econometric models are unable to identify as the statistical signiﬁcance as well as the direction of the eﬀect of some explanatory variables varies across the four classes. When comparing our preferred latent class estimator to the commonly used approaches, we ﬁnd that treating the population as a single homogeneous group may lead to biased parameter estimates and suggests that policy based on such models could be inappropriate or erroneous.


Introduction and background
Over the last four decades, there has been considerable attention paid in the household finance literature to the composition of financial portfolios at the household level, exploring both the range of assets held as well as the amount of financial wealth allocated to distinct asset types.Such interest is not surprising given the significant increase in the number of financial products available over this time period, including shares and mutual funds, with varying degrees of risk and return associated with different asset types.Given that the composition of financial portfolios has implications for the exposure to financial risk faced by households, this remains an important area of research for both academics and policymakers.
In this context, many empirical studies, such as Bertaut (1998) and Shum and Faig (2006), have focused on the determinants of holding particular types of assets, with considerable interest in stockholding amongst U.S. households.This focus on risky asset holding has been explored in the context of the well-established 'stockholding puzzle', whereby very few households hold stocks despite the relatively high expected returns.In these studies, household characteristics such as age, gender, education, ethnicity and wealth, are found to be important determinants of portfolio composition, as are health status, the level of risk aversion and the planning horizon of the household.Similar studies have been undertaken for other countries including the Netherlands, Hochguertel et al. (1997), Australia, Cardak and Wilkins (2009) and Italy, Guiso et al. (1996).Although such studies have revealed some interesting insights relating to the determinants of stockholding at the household level, it is important to acknowledge that the focus on a particular type of asset reveals limited information on the diversification of household portfolios. 1 In the early seminal contribution by Blume and Friend (1975), many private investors 1 It is also important to acknowledge that in a number of studies, asset types are classified into different types such as safe, risky, and medium risk, i.e., the focus lies beyond risky asset holding.For example, Rosen and Wu (2004) split the household's financial assets into four categories, namely; safe, risky, retirement and bonds.This asset classification has also been adopted by Berkowitz and Qiu (2006), Fan and Zhao (2009) and Borgan and Fertig (2013).
are found to hold undiversified portfolios of risky financial assets in contrast to the predictions of portfolio theory, proposed by Markowitz (1952), which indicates that, regardless of the degree of risk aversion, households should hold diversified portfolios.Similar evidence of a lack of diversification, even in the context of a sample of high income households, is reported by Kelly (1995), who explores the number of stocks held as a measure of diversification.In general, such theoretical predictions of portfolio theory are in stark contrast to the empirical observation that many households only hold a small number of asset types (e.g., Campbell, 2006;King and Leape, 1998).
One fundamental insight of portfolio theory asserts that, by holding a well-diversified portfolio, investors can reduce the idiosyncratic risk (i.e. the risk that is not compensated by the expected return) of their portfolio without sacrificing the return.However, underdiversification not only affects the asset-allocation and intertemporal consumption decisions of households, upon aggregation, it also distorts aggregate growth, which in turn amplifies social welfare losses, see Bhamra and Uppal (2019).Gaudecker (2015) argues that portfolio underdiversification ranks among those mistakes that are potentially most costly.Florentsen et al. (2019) analysed data related to stock market investors in Denmark and found that only 2% of their sample hold more than 20 stocks in their portfolio. 2  They estimated that underdiversification is costing the Danish population of stock holders 400 million US dollars annually as investors could eliminate 60% of their portfolio risk by moving from a portfolio with one randomly selected stock to a well-diversified portfolio. 3  Given the implications of portfolio underdiversification, a number of studies have examined its determinants (e.g., Gaudecker, 2015;Goetzmann and Kumar, 2008;Calvet et al., 2007;Roche et al., 2013).Most of these studies report that the level of diversification is greater among older, wealthy, high-income, financially literate and educated investors.
Other factors have also been examined in the literature.For example, Goetzmann and Kumar (2008), using data related to retail investors at a major U.S. discount brokerage house, found that the level of underdiversification is correlated with three psychological biases: propensity to hold local stocks; sense of over-confidence; and trend-following behavior.Gaudecker (2015) shows that the largest losses from underdiversification are incurred by those who score low on financial literacy and those who do not seek advice from professionals or private contacts with their investments.On the other hand, Calvet et al. (2007) examine the efficiency of Swedish households' investment decisions and find that many Swedish households portfolios are well-diversified, with the performance of their portfolios outperforming the Sharpe ratio of their domestic stock index, which reflects the substantial share of international securities held through most Swedish mutual funds.Financial constraints are also found to be a significant determinant of the level of diversification (e.g., Roche et al., 2013;Liu, 2014).
As highlighted by Barasinska et al. (2009), although portfolio diversification has attracted the attention of academics for many decades, there is no commonly accepted approach to measuring the extent of diversification in household portfolios.Early contributions have explored portfolio diversification from the perspective of the number of different types of assets held.In this vein, Blume and Friend (1975) use the total number of securities held as a measure of diversification.Barasinska et al. (2009) refer to the number of asset types held in a portfolio as 'naive' diversification, with greater diversification associated with a larger number of asset types held.They relate this to the approach whereby individuals split their wealth evenly among available assets types, i.e., the 1/n strategy, see Benartzi and Thaler (2001).The second measure explored by Barasinska et al. (2009) is based on grouping asset types according to the associated risk, specifically, low risk, moderate risk and high risk.According to this approach, a sophisticated investor categorises assets according to their risk and return and assigns them to one of these three classes.They find that the number of asset types held is negatively associated with the degree of risk aversion and that the propensity to hold complete portfolios decreases as risk aversion increases.
In terms of the econometric methods used in the existing literature on household portfolios, studies focusing on the holding of assets types have generally used standard models for limited dependent variables such as probit and logit frameworks, whereas those exploring asset shares have tended to use linear regression models or models that account for the truncated nature of the dependent variable such as tobit models.In contrast, our methodological contribution builds on Abreu and Mendes (2010), who recognise that an appropriate approach to modelling a portfolio diversification measure based on the number of asset types held is a count model given that it can only take non-negative integer values.4Using a Poisson model, they analyse a cross-sectional survey of 1,268 Portuguese investors and find that specific financial knowledge is positively associated with the number of assets in a financial portfolio.
In contrast to the existing literature, we explore portfolio allocation in Great Britain using panel data based on a nationally representative sample of the population.Interestingly, there is limited research on portfolio allocation in the UK, with much of the early literature based on US data (e.g., Goetzmann and Kumar, 2008;Ivković et al., 2008;Dimmock et al., 2016).In addition, many existing studies are based on cross-section data, such as the U.S. Survey of Consumer Finances (e.g., Kelly, 1995;Polkovnichenko, 2005) or based on samples of subgroups of the population (i.e. investors only), such as online brokers and administrative data, as in (Goetzmann and Kumar, 2008;Calvet et al., 2007;Grinblatt et al., 2012;Florentsen et al., 2019).Given that wealth is found to be an important determinant of asset holding and that wealth accumulates over the life cycle, the use of panel data appears to be particularly appropriate in this context.A related point is raised by Polkovnichenko (2005), who argues that one of the main limitations of the current empirical literature on portfolio diversification is that the samples used for empirical analysis are frequently not representative of the entire population.Our nationally representative sample of households in Great Britain does not suffer from such limitations.
In addition to our focus on a nationally representative sample of households and on panel data, we make an important methodological contribution to the literature on the diversification of household financial portfolios by introducing the latent class modelling approach to this area of research.Latent class modelling has been used extensively in other areas of economics including consumer behaviour (e.g., Chung et al., 2011) and health economics (e.g., Deb and Trivdei, 1997), but is yet to be widely applied to the household finance literature.One recent exception is Gerhard et al. (2018), who use a Finite Mixture Model (F M M ) to explore whether psychological traits affect the level of household savings.The advantages of using the F M M approach in this application lie in its superiority in introducing unobserved heterogeneity by partitioning the sample endogenously into a number of homogeneous classes rather than relying on user-defined sub-samples, as in the existing literature.5They find evidence of two distinct classes and that accounting for latent heterogeneity when studying the drivers of savings behaviour is important as drivers differ between the two groups.More broadly in the finance literature, Durand et al. (2022) use a F M M to examine the capital structure decisions of firms in relation to adjustment towards target levels of leverage, which maximize a firm's value.
The latent class approach probabilistically divides the population into a set of homogeneous groups.Within each class, an appropriate statistical model applies, which in our case, following Abreu and Mendes (2010), is based upon a count specification as the number of asset types held can only take non-negative integer values.The latent class approach is arguably well-suited to the analysis of portfolio diversification given the potential for very diverse financial behaviour within a population.For example, in our data set, the number of different asset types held by households in Great Britain ranges from 0 to 21.Such a latent class approach is advantageous, as it simultaneously introduces heterogeneity into the empirical framework and, ex post, allows the splitting of the popu-lation into various sub-groups of households according to their portfolio diversification behaviour.Ex post, we can then examine sample statistics for each class by detailed asset type to evaluate the extent of portfolio diversification.Building on the heterogeneity afforded by the latent class approach, we take advantage of the panel data available to us to account for unobserved heterogeneity that is likely to be an important driver of household financial decisions.

Methods
As stated above, the latent class approach involves probabilistically splitting the population into a finite number of homogeneous classes or types.Within each of these classes, the same statistical model applies, but the same explanatory variables are allowed to have different effects across the classes.This modelling approach starts from the premise that, although the classes are latent, ex post, researchers frequently label them according to the expected value within each class.Thus, finding evidence of the features of each class is an important outcome of the modelling approach.Our basic hypothesis is that there are distinct, but observed, types (or classes) of households with respect to their asset-holdings.Therefore, an appropriate approach here is based on the generic Latent Class Model (LCM ) approach, which attempts to model this (e.g., McLachlan and Peel, 2000).Importantly, we have priors as to the drivers of these unobserved classes, so our generic approach will be based on the latent class modelling literature, but explicitly with predictors in the class equation(s).
Initially, for ease of exposition, assume cross-sectional data so that the overall density for household i (i = 1, . . ., N ), f (y i |x i , β), is assumed to be an additive mixture density of Q distinct sub-densities weighted by their appropriate mixing probabilities, π iq .The outcome variable of interest, y i (i.e., the number of financial assets held), is driven by the (k x × 1) vector of covariates in the model, x i .Importantly, these will be allowed to have differing effects across the different q classes.β denotes all of the parameters of the model.Hence, the corresponding mixed density will be (1) A restricted version of the approach, would have π iq = π q , whereas this would appear to be a mis-specified approach especially as we have priors as to the drivers of this process.
The former is generally referred to as a F M M and the latter, a latent class model.So, here we allow π iq to be a function of predictors (z i ), such that where g is an unspecified function and γ unknown parameters relating (z i ) to the class probabilities.
However, we first require an appropriate form for f q (y i |x i , β).Given the nature of the outcome variable, observable counts of the number of assets held, an appropriate form for f q in equation (1), is one that respects the nature of this.Obvious examples here would be Poisson or Negative Binomial models/densities (Cameron and Trivedi, 1998).
We note that the latter is normally employed to relax the restriction of the former that the conditional mean and variance are equal.In our approach, using the Poisson distribution for f q (y i |x i , β), once mixed as given by equation (1), the mixture Poisson density no longer embodies this restrictive assumption.
As is usual (Greene, 2018), we employ a multinomial logit (MNL) functional form for g(z i , γ) to model class membership: An important part of the latent class approach concerns determining the appropriate number of classes, Q * .A common approach is to use information criteria (IC) metrics; such as BIC/SC, Schwarz (1978), AIC, Akaike (1987), corrected AIC, CAIC, Bozdogan (1987), and Hannon-Quinn, HQIC, Hannan and Quinn (1979).Such approaches can simultaneously be used to choose specifications including the choice of Q * and crosssectional versus panel variants (see below).As such, we use these metrics in determining our preferred approach.
The V uong test for non-nested models can also be used here (Vuong, 1989).As model size will vary considerably over different class models, the 'BIC' correction factor can be used here, as proposed in Vuong (1989).The standard V uong test (for example, Greene, 2018) for comparing two competing models j = 1, 2 is based on m i , the individual differences in the two log-likelihoods, such that where f j are the respective likelihoods from the two j = 1, 2 competing models, and di corresponds to all of the covariates in the model, i.e., di ∈ (x i , z i ), for household i.The V oung test is given by where n is the sample size and m and s m are the simple sample average and standard deviation of m i , respectively.The test has a limiting standard normal distribution, with values of |V | < 1.96 being indeterminate, whereas large positive (negative) values favour model 1 (2).The BIC corrected version of this (Vuong, 1989) is given by where k j refers to the number of estimated parameters in model j.The V uong test is strictly a pairwise one, so with many potential competing models, it is possible to use the approach suggested in Durand et al. (2022) in that an appropriate model selection metric amongst all models is that model with the most favoured number of pairwise selections.
All of the expected values (EV s), predicted count probabilities, and partial effects, within class are given by the usual Poisson expressions.For example, the class specific expected value EV q is given by: Comparable overall quantities will be given by the probability-weighted average of the class ones, where it is usual to use the prior probabilities as the weights.
In predicting class membership, one could use the (prior) probabilities described above, or more favourably, the posterior ones, which additionally take into account the information on the observed outcome (Greene, 2018).These are defined as where L i is the likelihood for the household, used in estimation.Note that for all secondary quantities of interest, standard errors of these can be obtained using the delta method, (Greene, 2018).
As noted above, our empirical analysis is based on panel data.Having repeated observations on each household allows us to better identify class membership.To this extent, we treat the model parameters θ ∈ (β, γ) non-parametrically as a random vector with discrete support, where the discrete outcomes define the classes.Thus, the class probabilities are constant for each household over time, and the joint density for the T i observations for household i is given by and the corresponding log-likelihood is given by: It is also possible to allow for random effects in nonlinear panel models to account for unobserved household heterogeneity (Matyas and Sevestre, 2008).These can be classspecific, but will be independent as households can only be in one class.Note that this complicates estimation, as the Q household effects need to be integrated out of the likelihood function.We use simulated maximum likelihood techniques, using 100 Halton draws.

Data
Our empirical analysis is based on data from the Wealth and Assets Survey (WAS), which is a biennial longitudinal household survey for Great Britain measuring the personal and economic well-being of individuals and households by assessing levels of assets, debt, savings and planning for retirement. 6The WAS also provides information on a host of socio-demographic factors that we control for in our analysis, as detailed below.The survey started in 2006 and covers Great Britain: England; Wales; and Scotland.Our empirical analysis is based upon waves 2-5 of the survey, yielding 28,756 heads of household (N) and total observations (NT) equal to 46,424.time preference; 10 whether their mother had post school education; whether their father had post school education; whether their mother was employed or self employed; whether their father was employed or self employed; whether they grew up in a single parent household; and the number of siblings when growing up. 11The only control variable common to both the outcome and class membership equations, i.e., appearing in both x i and z i , is gender.
Sample summary statistics are presented in Table I, where it can be seen that 58% of heads of household are male and their average age is 48.In terms of the controls for determining class membership, only 6% (9%) of the respondents' mothers (fathers) had post school education.Clearly, on average, the respondent's father was more likely to be employed than their mother, at 70% and 46%, respectively.Approximately 10% of household heads grew up in a single parent household.Turning to those covariates in the outcome y i equation, i.e., number of financial assets held, 49% of respondents are single, around 30% have at least degree level education, with approximately 15% having no qualifications.Labour income not surprisingly is higher than non-labour income, with means of £3,480 and £3,095, respectively.An equal proportion of household heads are financially optimistic or financially pessimistic about their finances for the coming year.
9 This is a binary control constructed from the following question: If you had a choice between a guaranteed payment of one thousand pounds and a one in five chance of winning ten thousand pounds which would you choose?0=Guaranteed payment of £1,000, 1=One in five chance of £10,000.
10 Defined as a binary control constructed from the following question: If you had a choice of receiving a thousand pounds today or one thousand one hundred pounds in 12 months which would you choose?0=£1,000 today, 1=£1,100 next year.Both risk attitudes and time preference are observed to be time invariant in the data.More generally, Schildberg-Horisch (2018) has recently argued that individual risk preferences appear to be persistent and moderately stable over time 11 Childhood related questions are specific to when the respondent was around the age of 14.

Results
In terms of model comparison, we compare a range of latent class estimators using standard IC metrics to identify the preferred model.The models compared include a standard linear estimator, poisson and negative binomial count models, and the latent class approach (from two to five classes).Table II presents  This is supported by the results in Tables III and A1, which reveal a pattern of class heterogeneity that conventional econometric models are unable to identify.More specifically, Table III shows that the magnitude, statistical significance, as well as the direction of the effect of some explanatory variables varies across the 4 classes.For example, the direction of the effect changes between the 4 classes for gender, marital status and labour income.Households with a male head in class 1, a class with a relatively low level of asset diversification, are associated with a lower number of financial assets than households with a female head by a factor of 0.886, whereas, for class 2, households with a male head have more financial assets by a factor of 1.061, ceteris paribus.However, the effect becomes negative again for class 3 and statistically insignificant for class 4.
Households with a single head in the first three classes have more financial assets than households with a married head and the magnitude of the effect is similar across these classes, which is in line with the findings of Abreu and Mendes (2010), who argue that married investors are financially less well-informed.
The impact of labour income on the number of financial assets held also exhibits considerable heterogeneity in the effects across the 4 classes.Specifically, a 1% increase in labour income is associated with 1.058 more financial assets for households in class 2, whereas it is associated with 0.905 less financial assets for those in class 3, ceteris paribus.Such findings indicate that labour income influences diversification for the two middle classes, whereas it has no statistically significant for households in the lowest or the highest class of diversification.Findings of other empirical studies usually report a positive relationship between income and the level of diversification, (e.g., Calvet et al., 2009Calvet et al., , 2007;;Abreu and Mendes, 2010).
The other financial related variables have the expected impact across the 4 classes.
However, the magnitude of the effect varies between classes.For example, a 1% increase in pension wealth is associated with 1.558 more financial assets for class 1, whereas the same increase is associated with only 1.156 more financial assets for class 4. On the other hand, net wealth has a more pronounced impact on households in the top 2 classes compared to those in the bottom 2 classes (i.e.those classes characterised by less diversification).
Specifically, a 1% increase in net wealth is associated with only 1.019 more financial assets for class 1, whereas the same increase is associated with 1.243 more financial assets for class 4. Having a head of household with a defined benefit pension is found to be associated with holding less financial assets across all classes, a result that might reflect the possibility that those who have a defined contribution pension are more exposed to the concept and implications of diversification than those with defined benefit pension schemes.In general, such findings not only highlight the importance of allowing parameter estimates to vary by class, but also the importance of distinguishing between different income sources.
The impact of the age of the head of household is only statistically significant for class 1.The findings for class 1 suggest that the older is the head of the household the lower is the number of financial assets held and the magnitude of the impact increases at a decreasing rate, as shown by the quadratic term.A number of empirical papers report that age is a significant determinant of underdiversification (e.g., Goetzmann and Kumar, 2008;Calvet et al., 2007;Roche et al., 2013).Roche et al. (2013) argue that young investors are more likely to be financially constrained as they generally have a low value of wealth to income ratio.Therefore, they hold underdiversified portfolios given that financial constraints are a significant determinant of portfolio diversification.
Education has the expected impact across the 4 classes with the impact being strongest for those in class 2. Specifically, in class 2, having a head of household with a degree or above is associated with holding more financial assets than those who have no education by a factor of 1.415.A similar pattern is found for the employment dummy, with the impact being stronger for those in class 2, which is as expected and accords with the existing literature (e.g., Dimmock et al., 2016;Calvet et al., 2007;Abreu and Mendes, 2010).Heads of household who are employed in lower supervisory and technical jobs have statistically significant coefficients only if they are in class 1 and the associated IRRs factors are the smallest compared to the other occupations.In contrast, those who are in managerial and professional occupations have the strongest IRRs factors and this is the only statistically significant occupation for households in class 4 (i.e. the most diversified class).This is as expected as those in these types of occupations arguably have more financial knowledge and experience.
The number of children and the number of adults in the household also have the expected effects.Specifically, the number of children is negatively associated with the number of financial assets and the opposite is observed regarding the number of adults.
These findings may reflect the presence of children in the household being associated with higher costs, whereas more adults may bring more financial and economic knowledge and/or an extra source of income into the household.Being financially optimistic has a positive impact on the number of financial assets held, but this impact is only statistically significant for heads of household in classes 1 and 2. On the other hand, being financially pessimistic is negatively associated with the number of financial assets.However, the impact is statistically insignificant across all classes.
To summarise, the most important factors that lead to more diversification, as observed in the coefficients associated with households in class 4, are heads of household with higher levels of education, being in managerial and professional occupations, and having high levels of net wealth and pension wealth.Moreover, in general, the economic magnitudes stemming from the effects of the covariates are non trivial given the size of the IRRs relative to the class specific EVs.
The ρ parameters in Table III show the degree of association of the panel structure of the data, i.e. the extent of the unobservable intra-household correlation in the data over time.This may be an indication of some persistence in portfolio allocation.Specifically, the overall average of this correlation is 0.22, which is statistically significant at the 1% level.This shows the importance of the longitudinal nature of the data in modelling the number of financial assets, particularly for those households with the least diversified portfolios.
To assess the factors that are correlated with the probability of belonging to a specific class, Table IV presents the partial effects of the class probabilities, evaluated at the sample means of the covariates.In general, the findings indicate that gender, the birth cohort controls, the measure of risk attitudes, time preference and childhood conditions are mostly statistically significant, supporting a well-specified class membership equation.
Furthermore, the table reports the average posterior probabilities across classes, which show that class 1 is the largest class of the four, containing 39 percent of the sample and the smallest class is class 4, which contains only 11 percent of the sample.
Table IV shows that there is a clear impact of risk attitudes on the probability of belonging to each class.Specifically, households with heads, who are more willing to take risks, are more likely to be in class 4, the class with a higher number of financial assets held, and less likely to belong to classes 1 and 2. Whilst the measure of time value is only statistically significant for class 1, where it has the expected sign.These findings are in line with the existing literature that examines the determinants of stock holding at the household level (e.g., Cardak and Wilkins, 2009;Shum and Faig, 2006), as those who are more likely to take risk are more likely to hold a higher number of risky financial assets.
Similarly, households with a male head are around 20 percentage points more likely to be in class 4 than households with a female head, which also ties in with the existing literature, which reports differences in risk preference by gender, e.g., Guiso and Sodini (2013), with females generally found to be more risk averse.
Households with a head who grew up in a single parent family are around 9 percentage points less likely to belong to class 4 and 2 percentage points more likely to belong to class 1, compared to those household heads who did not grow up in a single parent household.
However, the probability of being in class 4 is positively associated with the number of siblings the household head grew up with.The birth cohort controls are statistically significant across most of the 4 classes, with those who were born after 1965 being more likely to belong to class 1 and less likely to be in class 4 relative to those who were born before 1945 (the omitted category).However, it should be acknowledged that the pattern of the impact of the other birth cohorts controls is not clear.
Parental employment status of the household head is also a strong predictor of class membership.To be specific, having a mother (father) who was an employee or self employed when the household head was around 14 years old increases the probability of being in the highest class of diversification by approximately 19 (11) percentage points.
In contrast, the probabilities of belonging to the other 3 classes are lower for these heads of household.Although parental education is statistically significant for classes 2 and 3, the direction of the impact is not as clear as that for parental employment status.
The discussion so far illustrates how the latent class approach unveils differential partial effects across classes, with the approach essentially being used as a means to allow for more unobserved heterogeneity in the modelling approach.If this is the case, then focus may actually lie on the overall partial effects and whether or not there are any differences in overall effects across model variants.To explore this, in Table V, we compare the overall partial effects between a linear random effects (RE), negative binomial (NegBin) model and our preferred 4-class MNL estimator.The table also reports the AIC and BIC statistics to compare the overall statistical performance of these models.As mentioned above, both statistics reveal that, statistically, the latent class MNL estimator is the preferred approach for modelling household portfolio diversification.
Although the general pattern of results is broadly consistent across the three models, there are some substantive differences in terms of size and statistical significance for a number of explanatory variables.12Specifically, in contrast to the results from the 4-class MNL model, the linear RE and the NegBin models reveal positive and statistically significant gender and marital status effects on the number of financial assets held.Similarly, non labour income, being financially pessimistic and age are found to have negative and statistically significant effects according to the linear RE and the NegBin models.Furthermore, in terms of the size of the effect, in comparison to the latent class modelling approach, the linear model seems to overestimate the partial effects whereas these effects are underestimated according to the negative binomial model.
In general, given that the average partial effects of some controls in the linear and the NegBin models are attenuated by the most populated class, class 1, results based on these models may not adequately reflect the determinants of diversification for the other groups, which suggests that policy based on such models could be inappropriate or erroneous.13

Analysis of ex-post statistics
Ex post, we are able to split the population into various sub-groups of households, i.e., classes based upon ex-post EVs, and analyse their portfolio diversification behaviour.To be specific, we can examine the composition of portfolios within each class and also across classes.This allows us to explore questions such as whether it is the case that class 4 is characterised by a more diversified portfolio.Using the naive measure, i.e., the number of financial assets held, see Barasinska et al. (2009), the EVs suggest that, on average, class 4 is characterised by more diversified portfolios as the number of asset types held is higher.
However, in this section, we explore how this relates to asset shares and combinations of different types of assets.For example, a household in class 4 might hold five types of assets but 95% of the total value of the assets, on average, might be held in a single asset.We also consider a Herfindahl-Hirschman Index (HHI), which can be used to measure portfolio diversification, e.g.see Ivković et al. (2008).It is defined as follows: where HHI q is the metric for class q and there are N q assets in the class.The share of asset, s q , in the household financial portfolio is class specific (q) and is given by ω sq .
The index ranges from 1/N q to unity; hence classes which are more diversified have a lower HHI q .The final row of the summary information reported in Table VI shows that this is indeed the case.The HHI increases monotonically across the classes, consistent with class 1 (4) being the least (most) diversified. 15For example, for class 4, the value, HHI = 0.08, is equivalent to a household portfolio comprising 12 financial assets.The HHI statistics support the conjecture that the LCM performs well in terms of ranking the extent of portfolio diversification across the different sub-groups (i.e.classes).
Table VI also reports the proportions held in each type of financial asset within and across classes.In terms of savings and deposit accounts, for class 1, only 41.31% of households have such an account, which is below the sample mean.The proportion of households holding cash ISAs increases monotonically across classes, with around 78% of households in class 4 having cash ISAs.Similarly, the percentage of households who have investment ISAs in their portfolio increases monotonically across classes and it is noticeable that over half of the population have such assets.Share ownership also increases monotonically across classes, where class 4 dominates in terms of the proportion of households having such assets in their portfolio: approximately 23% hold employee shares and just under 80% have shares in UK and foreign companies.The same pattern emerges for other asset types and suggests that diversification increases across the classes, i.e., as the ex post EV increases.
It is noticeable from the statistics shown in Table VI that households do not appear to split their financial wealth evenly among available asset types, which is at odds with the 1/n strategy (Benartzi and Thaler, 2001).Furthermore, for class 1, on average, the majority of financial wealth is held in more liquid assets, i.e., savings and deposit accounts, compared to class 4, where there are similar orders of magnitude of the proportions of wealth held in liquid and illiquid assets (e.g.ISAs and shares).
In Table VII, again based upon ex-post analysis, we analyse the amount held in each type of asset and its proportion of financial wealth.The average amount financial assets held by households over the period was £58,174.However, financial wealth is not evenly distributed across classes with class 1 -arguably the least diversified based upon the analysis of Table VI -having, on average, portfolios with a total value of £11,548 compared to class 4, where the average amount of financial assets held is £215,857.Focusing on savings accounts -one of the most liquid assets -although class 4 is found, on average, to have the highest monetary amount, in terms of the proportion of financial assets, it is the lowest at around 22%, which is below the sample mean.Indeed, for class 4, a higher proportion of financial assets is held in cash and investment ISAs than savings at just under 30%.It is particularly noticeable for the group of households with the most diversified portfolios, i.e., class 4, that such households not only hold a higher monetary amount of each asset in comparison to other groups (i.e., classes) but that, with the exception of savings accounts, each asset also constitutes a much higher percentage of the total amount held in financial assets.This is particularly apparent for more illiquid assets, such as fixed term investment bonds, unit trusts and shares.
Having explored a wide range of financial assets from very liquid to highly illiquid, the LCM approach would appear to be convincing in terms of splitting households into sub-groups based on their underlying level of financial diversification.These sub-groups were ordered monotonically into different classes by the ex post EVs, based upon a naive measure of diversification derived from the underlying number of financial assets.The analysis in this section has revealed that, across the different classes, the asset shares and combinations of distinct asset types become considerably more diverse across the classes as the EV increases.This is consistent with the LCM providing information on heterogeneity in household financial behaviour for various sub-groups, some of whom are less likely to hold well-diversified portfolios (e.g.class 1).
There are potential implications for policymakers interested in promoting savings behaviour in less diversified sub-groups, e.g.class 1, where 59% of households in this group do not have a saving or deposit account.Policy interventions may target this identified sub-group or attempt to manipulate certain characteristics through interventions.However, policy aimed at targeting specific groups based on observed behaviour is potentially limited in that it can only use discerned behavioral differences and may overlook the latent heterogeneity in the data.Hence, acknowledging that latent heterogeneity across groups generally exists in terms of the impact socioeconomic characteristics have on financial behavior is potentially important.In the least diversified sub-group, incorporating an appreciation of the complexity of the relationships between behavioural traits such as risk attitudes, time preference (both of which are found to be statistically significant determinants of class 1 membership) and diversification in policy design could be of importance.
Specifically, targeting this sub-group with interventions designed to improve financial literacy might be beneficial and more precise than basing an intervention on observable characteristics alone.Ultimately, it would also be potentially less resource intensive than implementing policy aimed at the population as a whole.

Conclusion
Recent theoretical work suggests that the composition of household portfolios should be of interest to policymakers.In particular, Bhamra and Uppal (2019) show that underdiversified household portfolios can lead to lower macroeconomic growth.Encouraging greater diversification of household portfolios may consequently result in benefits that are not just restricted to improving household welfare.In this paper, we make an important methodological contribution to the literature on the diversification of household financial portfolios by applying the latent class modelling approach, based upon a count model specification, to panel data based on a nationally representative sample of households in Great Britain.
Given the extent to which wealth accumulates over the life cycle and the potential for very diverse financial behaviour within a population, the use of panel data and allowing the determinants of household portfolio composition to vary across different subgroups of the population seems to be a potentially important approach in order to fully understand the drivers of diversification of household portfolios.Our results confirm this and show that the statistical significance as well as the direction of the effect of some explanatory variables varies across the four classes supported by our data, advocating the use of a modelling approach that can reveal such a pattern of class heterogeneity.
Our key findings include revealing the considerable heterogeneity in the effect of labour income on the number of financial assets held, which indicates that labour income influences diversification for the two middle classes, whereas it has no statistically significant effect for households in the classes with the lowest or the highest level of diversification.
Furthermore, our empirical analysis suggests that there are noticeable differences in the magnitude of the effects of some explanatory variables across the four classes.In particular, in relation to pension wealth, net wealth and being in managerial and profession occupations, the results show that these are the most important factors that are associated with more diversification, yielding interesting insights into the drivers of portfolio diversification.
The ex post analysis reveals that our modelling approach, which moves beyond the naive measure of diversification based upon the number of financial assets, is consistent with household portfolio diversification.To be specific, examining class specific heterogeneity through ex post summary statistics for detailed sub-categories of different types of assets held in terms of rates of holding, monetary amounts and the ratio of asset value to total household financial assets, reveals a pattern of results, which is consistent with portfolio diversification increasing across the classes.
Moreover, the statistical performance of estimators typically used in the literature compared to our latent class approach, shows that the approach we adopt strongly dominates with regard to the information criteria metrics.This suggests that treating the population as a single homogeneous group when analysing household financial behaviour may lead to biased parameter estimates and that policy based on such models could be inappropriate.
Finally, splitting the population into different groups based upon observed behaviour and characteristics to implement policy targeted at specific groups of interest, may introduce investigator bias due to preconceived notions about how to categorise different sub-groups.The LCM approach does not suffer from this as it is based upon latent heterogeneity.In future research, applying this type of framework more generally in the household finance literature could aid theoretical developments, as complex patterns may emerge that require novel explanations, as well as appropriate policy response.
the summary IC, where Panel A shows the IC for the pooled models and Panel B the IC for the random effects models, where the longitudinal nature of the data is taken into account.All of the IC metrics favour the panel models within each type of estimator, e.g.panel linear versus pooled linear (OLS), and also across the alternative estimators.The panel 4-class MNL latent class model dominates all alternative specifications (see Panel B).Moreover, in terms of the latent class approach, the optimal structure is found to be four classes.The Vuong test reported in Panel C also confirms that the 4-class MNL model is the optimal latent class structure amongst the competing alternatives.TablesIII and A1present the coefficients and partial effects, respectively, associated with the determinants of the number of financial assets held, evaluated at the sample means of the covariates.Although the MNL latent class approach does not impose any ordering on the expected values, we impose ordering on the classes ex-post according to the class expected values (EV).Hence, by definition, class 1 is characterised by a relatively low number of asset types held, at 1.61, and a relatively low level of diversification.In contrast, class 4 is characterised by a higher level of diversification, with an expected value of 4.06.The maximum number of financial assets held increases monotonically across classes 1 to 4, where the classes are ordered according to the ex-post EVs.One of the key features of the latent class approach is that the covariates are allowed to have different effects across the 4 classes, thereby unveiling a more detailed picture of the determinants of household portfolio composition than the modelling approaches employed in the existing literature.

Figure 1 :
Figure 1: Number of financial assets The minimum (maximum) number of financial assets held is 0 (21) and the distribution is shown in Figure1, where 80% of households hold fewer than five financial assets.Clearly, the number of assets held is not continuous being characterised by kurtosis of 4.2 and the Shapiro-Wilk test for normality rejects the null at the 1% level.Many empirical studies have explored the relationship between household portfolios and a wide range of household characteristics including socio-demographic characteristics such as age, education and health and financial characteristics such as net wealth, employment status and income.Young households, for example, with low levels of financial wealth have been found to hold undiversified portfolios comprising a small number , influence the probability of being in a particular class (q i ).Specifying the class membership equation with time invariant head of household controls in this way is akin to parameterising the household's fixed effect of being in each class.The covariates, Greene (2018)(2005)% (19%) of household heads are observed once (three times or more) in the panel.The outcome variable of interest is the number of financial assets held, which is comprised of the following assets: 7 savings accounts; national savings accounts; Investment Savings Accounts (ISAs); fixed term investment bonds; unit trusts; employee shares and/or share options; shares; bonds and gilts; insurance products; endowment or regular premium policies; single premium policy; and other types of investment.8fassets,see,Rocheetal.(2013).Hence, our set of explanatory variables follows this literature.In terms of the covariates, x i , used to model the number of financial assets held (y i ) we include head of household characteristics such as: a quadratic in age; whether single, never married (other marital states form the reference category); educational attainment -whether degree level or above, or whether a qualification below degree level (no education is the omitted category); being in good health; whether employed; occupationwhether managerial or professional, intermediate, small employer and own account, lower supervisory and technical (semi routine and unemployed is the omitted category); having a defined benefit occupational pension; and whether the head of household is financially optimistic or financially pessimistic (no expected change in financial position is the reference category).In addition, a number of household characteristics are included: the natural logarithm of labour income; the natural logarithm of non-labour income; the natural logarithm of pension wealth; the natural logarithm of net wealth (defined as liquid assets plus house value minus the amount of unsecured and secured debt); the number of children in the household; and the number of adults in the household (excluding the household head).In our panel data framework, given that the class probabilities are constant over time for each household, we follow the existing literature, e.g.Clark et al. (2005), Bago d'Uva and Jones(2009)andGreene (2018), and parameterise the model such that time-invariant covariates, z i

Table VI
shows the mean EV for each class (as discussed above), and the proportion of households without any financial assets.The latent class approach allows for no asset holding across each class, and this is, indeed, a characteristic of our data, where the minimum EV is zero across all classes.Interestingly, the proportion of households reporting zero assets does not decrease monotonically across classes (i.e., as the EVs increase),

Table I :
Summary statistics

Table III :
Coefficients and incident rate ratios for number of financial assets by class expected values EV denotes ex-post expected value, i.e. the number of financial assets.The incidence rate ratio is given by IRR = exp(βq).Standard errors are shown in parentheses.*** p < 0.01, ** p < 0.05, * p < 0.1.

Table VII :
Ex-post summary statistics -amounts & proportion of total assets