Global household energy model: a multivariate hierarchical approach to estimating trends in the use of polluting and clean fuels for cooking

In 2017 an estimated 3 billion people used polluting fuels and technologies as their primary cooking solution, with 3.8 million deaths annually attributed to household exposure to the resulting fine particulate matter air pollution. Currently, health burdens are calculated by using aggregations of fuel types, e.g. solid fuels, as country level estimates of the use of specific fuel types, e.g. wood and charcoal, are unavailable. To expand the knowledge base about effects of household air pollution on health, we develop and implement a novel Bayesian hierarchical model, based on generalized Dirichlet–multinomial distributions, that jointly estimates non‐linear trends in the use of eight key fuel types, overcoming several data‐specific challenges including missing or combined fuel use values. We assess model fit by using within‐sample predictive analysis and an out‐of‐sample prediction experiment to evaluate the model's forecasting performance.


Introduction
In 2017, an estimated 3 billion people, or 39% of the global population, used a solid fuel (charcoal, coal, crop residues, dung or wood) or kerosene as their primary fuel for cooking. This results in the emission of dangerous levels of pollutants, including fine particulate matter, PM 2:5 , and carbon monoxide (World Health Organization, 2014). The World Health Organization (WHO) has estimated that about 3.8 million deaths per year world wide can be attributed to pollution from household cooking (World Health Organization, 2018a). This harm is compounded by the burden on people-notably women and children, who must dedicate large amounts of time to fuel collection which might otherwise be spent on education or work-and the risk of burn injuries.
To address this leading cause of disease and premature death in low and middle income countries, the '2030 agenda for sustainable development', which has been adopted by all United Nations member states, set targets of universal access to clean fuels and technologies for cooking (sustainable development goal (SDG) 7.1.2) and to reduce substantially the number of deaths from the joint effects of ambient and household air pollution (SDG 3.9). Although there have been improvements in the proportion with access to clean fuels and technologies in some regions, globally these have been largely outpaced by population growth. This means that the absolute number of people without access to clean fuels and technologies has stagnated, decreasing only by 3% between 2000 and 2017. As a result, the world is projected to achieve only 74% clean fuel use by 2030 under current policy scenarios (SDG 7 Custodial Agencies, 2019).
In 2016 the World Health Assembly adopted a roadmap consisting of four priority areas of action to tackle the health risks of air pollution, notably 'expanding the knowledge base about impacts of air pollution on health' (World Health Organization, 2016). Currently, the WHO publishes estimates of 'polluting fuel use' and 'clean fuel and technology use', representing the combined use of all polluting fuels and all clean fuels and technologies respectively, for SDG monitoring. Here 'use' is defined as the proportion of people primarily relying on a given fuel or technology for cooking. In addition, the WHO at present assumes that 'clean fuel use equals clean fuel and technology use', because of the limited availability of data on the types of stoves that are used for cooking and the current absence of any scalable biomass stoves which can be considered 'clean' for health. These estimates are available for most countries, separately for urban and rural areas where fuel use trends often differ systematically, and for each year between 1990 and 2017. Conventionally, these estimates then serve as a practical surrogate for estimating the global burden of disease that is associated with using polluting fuels for cooking (Bonjour et al., 2013). However, basing estimates of health effects on the combined use of polluting fuels fails to take into account variation in the risks that are associated with different fuels and technologies. Recently, Shupler et al. (2018) introduced a method for estimating exposure for several specific types of fuel that takes into account variation in exposure between countries. Despite this, global burden-of-disease estimates based on the use of specific fuels remain unavailable, as this would also require global estimates of specific fuel use.
In this paper, by developing and implementing a novel model for the use of eight specific types of fuel, we make a substantial contribution to the expansion of the knowledge base on the effects of household air pollution. Our aims are (a) to estimate trends in specific fuel usage, together with coherent estimates of uncertainty, (b) to provide meaningful estimates of individual fuel usage for countries where data are limited and (c) to predict present day fuel usage, addressing lags in data collection, and to project estimated trends into the future.
Trends in the use of specific types of fuel are modelled together with survey sampling variability, which may vary between urban and rural areas and by country. Where data for a given country are limited, the model structure can derive information from regional trends. The model allows for different fuel use trends in urban and rural areas and can produce predictions (with associated uncertainty) of future use of various types of fuel, providing policy makers with a baseline against which they can evaluate the effectiveness of future interventions.
The remainder of the paper is organized as follows: Section 2 provides details of the available data and the proposed modelling approach, including the implementation of a model using Markov chain Monte Carlo (MCMC) sampling; Section 3 presents posterior predictive model checking and a future forecasting experiment; finally, Section 4 provides an overall summary and a concluding discussion of the model's impact.
The programs that were used to analyse the data can be obtained from https://rss.onlinelibrary.wiley.com/hub/journal/14679876/seriesc-datasets.

Methodology
Information on the types of technologies and fuels that are used by households for cooking is regularly collected in nationally representative household surveys or censuses and compiled in the WHO household energy database (World Health Organization, 2018b). At mid-2019, the database contained over 1100 surveys, with over 150 countries having at least one survey over the period 1990-2017. For each survey, the database contains the proportion of surveyed households using as their primary cooking fuel each of 10 key types: biogas; charcoal; coal; crop residues; dung; electricity; kerosene; liquid petroleum gas (LPG); natural gas; wood. Over the period 1990-2017, the average number of surveys per country per year was around 0.3. Even if survey coverage were far greater, survey sampling variability means that individual surveys would still not be a reliable indicator on which to base policy decisions. Statistical models can be used to separate trends from sampling variability, while also enabling uncertainty in the trends to be appropriately quantified. Information from other sources, such as economic or social indicators, can also be included to allow for more reliable inference in countries with few surveys. For example, Rehfuess et al. (2006) used regression methods to quantify the association between solid fuel usage and a number of socio-economic factors to predict usage in countries where no data were available. An alternative source of information which can be exploited by statistical models is that the proportion of people using each type of fuel as their primary cooking fuel tends to be more similar, on average, between countries in the same region, than between countries in different regions. Fig. 1 illustrates differences in wood use by WHO region, with smooth density estimates of the proportion of households using wood as their primary cooking fuel, from surveys in years from 1990 to 2010. For example, the density estimates suggest that the use of wood is more prevalent in African countries than in European countries over this period.
The main modelling approach behind WHO monitoring of worldwide clean cooking fuel use was that of Bonjour et al. (2013) in the years leading up to 2018. Trends in overall solid fuel use (and more recently polluting fuel use) were estimated by using a multilevel (mixed effects) modelling approach. Unlike earlier regression-based approaches , Bonjour et al. (2013) did not include any covariates (e.g. national income). Instead, they relied exclusively on regional structures and smooth functions of time to estimate fuel use. To the best of our knowledge, the work that we present here constitutes the first major effort to estimate trends in the use of specific fuels for cooking. With that in mind, using data from the WHO household energy database to estimate trends in specific fuel use presents some challenges that are related to inconsistencies in both the quality and the quantity of information that is available from the surveys. We specifically address four of these issues in our modelling approach.
(a) Many surveys report fuel values which are in some sense incomplete. This often includes combining more than one specific type of fuel (e.g. LPG and natural gas) into a single option in the survey (e.g. gas). In some cases this can arise because cultures and/or languages have a single term which includes several distinct types of fuel (e.g. the French language term 'charbon' which can include both coal and charcoal). Another common problem is inconsistency in how subfuels are categorized: for example, grass may be included in the crop residues category in one survey and in the dung category in another. Other, less common, issues include non-exhaustive lists of individual fuel options, with key fuels included in an 'other' category, resulting in missing values for those fuels. These issues mean that the time series of survey values for some fuels in some countries can be highly unstable. (b) The total number of respondents is available for only approximately 50% of surveys in the database. For surveys where this information is not available, only the proportions using each fuel are given and the original counts (the number of respondents using each fuel) are non-recoverable. (c) Information on trends in the use of specific fuels is required for both urban and rural areas but, in many cases, surveys provide data for only the overall population.

Generalized Dirichlet-multinomial model
For clarity of exposition, the following explanation relates to y i , the number of respondents in a survey using fuel type i as their primary fuel for cooking, ignoring for now any indices that are related to the country and the year. If we knew the total number of survey respondents n for all data, a first approach to modelling could be to assume that data on y = {y i } arise from a multinomial.p, n/ distribution. Then p i would represent the proportion of people in the population using fuel i. This assumes that the survey sample is representative of the overall population. In reality, survey samples are imperfect and the multinomial model may not be sufficiently flexible to capture the extra variability that is caused by flaws in the survey design. For instance, the survey may not cover the whole geographical area of interest. A flexible extension of this approach is to model y by using a generalized Dirichlet multinomial.α, β, n/ (GDM) distribution (Zhang et al., 2017): a mixture of the generalized Dirichlet (GD) model with probability density function p.p 1 , p 2 , : : : , p k |α, β/ = p .1/ and the multinomial distribution, so that p ∼ generalized-Dirichlet.α, β/; y|p ∼ multinomial.p, n/: .2/ The marginal probability mass function of the GDM is then p.y 1 , y 2 , : : : , y k |α, β, n/ = Γ.n + 1/ Γ.y k + 1/ .3/ Any additional variability that is caused by non-representative sampling can be potentially captured by the GD component. The GD model also has a very flexible covariance structure compared with the Dirichlet model (for example it can allow for positive covariance between elements of p (Wong, 1998)), which it reduces to in the special case that β i = α i+1 + β i+1 for i ∈ 1, : : : , k − 2 and β k−1 = α k . The GDM model has strong potential as a flexible regression framework for multivariate count data (Zhang et al., 2017) that sum to a total. For instance, Stoner and Economou (2019) used the GDM to model reporting delays in time series of infectious disease counts. Nevertheless, the GDM model has seen little use in the modelling literature. Its use here is novel as a model for estimating trends and regional hierarchies in compositional data and for quantifying survey variability.
Recall from challenge (b) at the start of Section 2 that, for around half of the available data, only the proportion x = {y i =n} of respondents using each fuel is available, with the total number of respondents n being unknown. This means that we cannot use the GDM to model directly the number of respondents primarily using each fuel, if we wish to use all of the available data. However, as the principal interest lies in estimating or predicting trends in the fuel usage proportions x, an alternative approach would be to model the proportions themselves, e.g. by using a GD distribution. In that case, though, the presence of many 0% and 100% fuel usage observations (which fall outside the range space of the GD model) make this impractical. Instead, we opt for an approximate procedure for modelling x-which may also be applicable to other compositional data with a 0-1 inflation problem-where we transform observations of x i into conceptual counts v i , out of a chosen total N. To ensure that the sum of the transformed counts does not exceed N, one can compute v i = Nx i (using the floor function, as opposed to rounding). The counts v can then be modelled as GDM.α, β, N/, so that predictions are based on v i =N. The idea behind this is that the flexibility of the GDM model means that we can still capture the distribution of x well: any variability that is lost or gained from the multinomial component, by respectively using a larger or a smaller N compared with the original n, can be accounted for by appropriate adjustment in the parameters of the GD component. From a modelling perspective, this amounts to estimating GD parameter values that adequately capture the total survey variability arising from finite and non-representative sampling. To utilize the original n appropriately where they are known, we would need to estimate two separate sets of GD parameters: one set for when n is known, to capture survey sampling variability minus that arising from finite sampling, and a second set for when n is unknown, to capture all survey sampling variability. Alternatively, although not the approach that maximizes the available information in the data, treating all n as unknown affords a significant reduction in the number of parameters to estimate. We therefore choose to treat all n as unknown on the basis of practicality, subject to thorough model checking in Section 3 and Appendix C.
In Appendix A, we present a simulation study using the observed sample sizes n from the data. Indeed, we illustrate that this approximate method (where all n are assumed unavailable for modelling) yields an inference for the populationwide fuel usage which converges (as N increases) to the inference that is obtained by modelling y directly. Our simulation experiment suggested that values greater than N = 10000 are probably sufficiently large, so we conservatively opt for N = 100000. This results in a virtually zero contribution to the variability of v=N from the multinomial component, bearing in mind that the GD component can absorb any additional variation that is associated with smaller sample sizes.

Tiered approach
To motivate the way in which we shall employ the GDM for these data, it is instructive to consider  . 5/ Predictions for aggregate groups, e.g. solid fuels, can then be achieved by aggregating predictions for the individual fuels. However, recall that one of the key challenges with modelling these data, challenge (a), is inconsistency in data collection. For example, some surveys combine more than one type of fuel (e.g. charcoal and coal) into a single category. Furthermore, there is sometimes inconsistency in the way that surveys categorize subfuels (e.g. grass). The result of this issue is that, for some countries, the time series of affected individual fuels are unstable. As such, modelling the use of all individual fuel types with one GDM distribution (as in expression (4)) will adversely impact estimates for the mean trends, sampling variability and any associated uncertainty, not just for affected fuels but for the other fuels as well, owing to the multivariate nature of the model and the data. Fortunately, as they are the result of 'confusion' among certain types of fuel, these issues can be resolved by aggregating individual fuels into more general types of fuel. For example, confusion between wood, crop waste and dung can be resolved by aggregating data for these fuels into the more general category 'biomass' (which in this paper includes raw or unprocessed biomass fuels but excludes charcoal), and any outstanding confusion between charcoal and coal or between charcoal and wood can be resolved by aggregating into 'solid fuels'. Similarly, LPG and natural gas are very commonly combined at the survey level, which can be recognized by the formation of a 'gas' aggregate category.
This motivates the adoption of a tiered approach, where the use of the most aggregated fuel categories (e.g. solid fuels and gaseous fuels) are modelled as a GDM distribution at the 'top' tier (note that the tier does not relate to the merits or abundance of each fuel, only how we organize the fuels for modelling purposes), alongside other fuels that are unlikely to be confused or combined (e.g. kerosene and electricity) and an aggregation of other minor fuels and technologies (e.g. alcohol and solar stoves): .6/ This ensures that any instabilities arising from erroneous convolution of individual fuel types, e.g. charcoal and coal, does not propagate into the other fuel categories in the top tier. These categories can then be progressively disaggregated through nested GDM models. As in some countries there is a convolution between biomass fuel types (e.g. wood and crop waste), fully disaggregating solid fuels means that, in these countries, predictions for charcoal and coal will still be needlessly impacted. To address this, a 'mid'-tier is introduced to aggregate the biomass fuel types and to model these alongside charcoal and coal: The biomass fuel types can then be disaggregated in the 'lower' tier with a third GDM model: We could then disaggregate 'gas' into the three individual gaseous fuels with a fourth GDM model (a parallel mid-tier). This is, however, not essential for our application (estimating population exposure to household air pollution) as the difference between the different gaseous fuels in terms of pollutant concentrations is minimal compared with the difference between the gaseous fuels and the polluting fuels (World Health Organization, 2014). The upper, mid-and lower tiers are implemented together in a single Bayesian hierarchical model, so that uncertainty and variability are propagated both within and between tiers. Following this approach, the result is that a joint predictive inference for eight individual fuel types is achieved, but in a way which prevents inconsistency in particular types of fuel from affecting the others.

Conditional models
Recall that an additional challenge, challenge (a) at the start of Section 2, is that occasionally a value x i (and thus v i ) is missing for at least one individual fuel (for a given country-year combination). To model these data in a way that easily allows prediction of the missing fuel values, we implement each GDM distribution (from the three tiers) by using the implicit conditional mass functions rather than the joint mass function. Specifically, for counts v and total N, the conditional distribution of (fuel) v i given the others is Fitting this model in a Bayesian setting implies that any missing values v i can be sampled by using MCMC sampling. Furthermore, for ease of interpretation we reparameterize the conditional distributions in terms of their expectations ν i and dispersion parameters φ i : The relative mean ν i is interpreted as the expected proportion of households using fuel i out of those not using any of the fuels higher up the hierarchy .1, : : : , i − 1/. For example, in the top tier GDM model ν 1 is the expected proportion who use solid fuels from the whole population, ν 2 is the proportion who use kerosene from the population who do not use solid fuels and ν 3 is the proportion who use gas from the population who use neither solid fuels nor kerosene. Through parameter φ i , the model can compensate for any gain or loss of variance in the conditional multinomial model for v i that is caused by the introduction of the 'artificial' total N. For more interpretable inference, the marginal mean vector of proportions μ = {μ i } of households relying on each fuel i can be recovered from the relative means ν i : .12/

Country and regional models
Introducing indices for a survey that is conducted in area j (1, urban; 2, rural) of country c and in year t, the characterization of the relative mean ν i,j,c,t is defined by where the logistic transformation ensures that ν i,j,c,t ∈ .0, 1/. Here we characterize functions f as linear combinations of an intercept term, a linear term and non-linear thin plate spline terms: . 14/ Here X is a model matrix of spline terms, where X t,1 is linear in time and X t,2 , : : : , X t,K are nonlinear thin plate terms, and ξ 0,i,j,c , : : : , ξ K,i,j,c are the corresponding coefficients. The choice of the number of basis terms K must be made a priori, which corresponds to an upper bound on flexibility (similarly to choosing a number of polynomial terms). For larger K the functions are penalized for smoothness parametrically. Here we choose K = 10: approximately one basis term for every 3 years, which we found was sufficiently high not to limit substantially the flexibility of the trends a priori. All coefficients are modelled as random effects, whose prior distributions have expectations (characterized as fixed effects) that vary with the region that the country is in (denoted by region index r.c/). Several choices are available for regional classifications, such as the six WHO regions, the 21 global burden of disease (GBD) regions and the seven GBD superregions (Shaddick et al., 2018), or the eight SDG regions. For a chosen regional classification, the country random effects are modelled as .17/ The regional parameters (e.g. γ 0,i,j,r.c/ ) are then modelled as thin plate spline (fixed effect) coefficients. It can be shown that this model is equivalent to the additive combination of a regional level spline and a country level spline, where the former is a mean trend whereas the latter captures country level deviation from this mean. The advantage of this characterization over explicitly separating the country and regional effects into two different splines, however, is improved MCMC efficiency. Each precision matrix Ω −1.ξ/ i,j,c is known (Wood, 2016), and scaled by parameter λ .ξ/ i,j,c which penalizes the thin plate spline (specifically the deviation of the country spline from the regional spline) for smoothness, to avoid overfitting. Each log.λ .ξ/ i,j,c / is modelled as a random effect arising from an N.η .ξ/ i,j , σ 2.ξ/ 2,i,j / prior distribution.
The purpose of treating the coefficients ξ i,j,c and smoothing parameters λ .ξ/ i,j,c as random effects is to improve prediction in countries with sparse data, where regional trends and global hyperparameters can constrain the overall country effect to be not too extreme with respect to other countries in the same region. We trialled this approach using the six WHO regions and alternatively the 21 GBD regions. Using the six WHO regions, we found that they encompassed too broad a range of fuel use patterns to be particularly useful for improving prediction in countries with little data. Conversely, when using the 21 GBD regions we found that they often contained too few countries, or had too many countries with little data, to allow the precise estimation of regional trends.
To address the issues that were posed by this choice, we opted for a nested model which utilizes both GBD regional structures and GBD superregional structures (denoted by index s.r/): .20/ The regional thin plate spline coefficients (e.g. γ 0,i,j,r.c/ ) are now also modelled as random effects, with superregional expectations. Each precision matrix Ω −1.γ/ i,j,r is, as before, a known matrix scaled by parameter λ .γ/ i,j,r , to penalize for smoothness (of the deviation of the regional trend from the superregional trend). The penalty parameters λ .γ/ i,j,r are once more modelled at the log-scale as random effects, arising from an N.η i,j,s , whose prior distribution is N.0, 10 2 / at the logscale. Now each f i,j,c .t/ is equivalent to the additive combination of a superregional spline, a regional deviation spline and a country deviation spline. By adopting a nested regional structure, countries with little data benefit from more precise regional trend estimation, borrowing information from other countries in the same GBD region that have sufficient data. Failing that, further borrowing is achieved from the superregional trend.
Having specified models for the relative mean proportions ν i,j,c,t , it remains to define a model for the dispersion parameters φ i,c . Recall that these parameters are intended to capture additional survey variability (compared with the multinomial model), which is affected by the introduction of an 'artificial' sample size N. In countries with little data, we would like to constrain survey variability to reasonable levels, so we choose φ i,c as random effects which vary by country: .24/ In the absence of any belief that survey variability should be regionally structured, the random effects are constrained by global hyperparameters η .φ/ i,j and σ 2.φ/ i,j .

Urban and rural variability
A further challenge with modelling these data, challenge (c) at the start of Section 2, is that, although most surveys in the data report both urban and rural values, some report only an overall value for the whole sample. So that these surveys can inform the estimation of urban and rural trends, we incorporate a layer in the model which relates the marginal mean proportions of urban, rural and overall values as follows: .26/ The overall mean fuel usage proportions μ overall c,t are then defined as a weighted sum in equation (25), of the mean rural and urban proportions. The weights π c,t ∈ .0, 1/ represent the mean proportion of survey respondents living in an urban area, in country c and year t. To capture structured demographic variability between countries and over time, United Nations (UN) estimates (United Nations, 2018) of the proportion of people living in an urban area for each country and year, P c,t , are included as offsets in the model for π c,t . For each country, any remaining structured variability in the urban proportion is modelled by using a smooth function g c .t/. These functions should ideally be sufficiently flexible to capture the mean urban proportions well. However, from a modelling perspective, they also introduce extra degrees of freedom to capture the overall survey observations well. Therefore, to avoid overfitting, we once again employ penalized thin plate splines (with coefficients κ 0,c , : : : , κ K,c ) for g c .t/: .27/ is, for one final time, a known matrix, scaled by a penalty parameter λ .κ/ c for smoothness. Then, log.λ .κ/ c / ∼ N.η .κ/ , σ 2.κ/ 2 /. Unlike the splines for ν i,j,c,t , the prior expectations are zero, as opposed to regional or superregional. This is because we have no prior belief that residual deviation from UN estimates in the sampling of urban respondents should be regionally structured. Employing thin plate splines here enables g c .t/ to capture non-linear deviations from P c,t over time, but only when there is ample evidence in the data for a given country.

Robustness to outliers
In addition to the main data-specific modelling challenges that were highlighted in Section 2.1, the database contains some recorded values which truly defy the observed trend in their country. These values often cannot be explained by normal survey variability alone and can have an undue influence on the estimated trend if treated like ordinary observations. Although the beta-binomial conditional models that we employ are already more robust to outliers than equivalent binomial models, severe outliers can still cause issues, including causing the estimated trend to deviate substantially from other surveys to be closer to the outlier, or the overestimation of survey variability.
To address this problem, we model each observation as arising from a mixture distribution, which combines the beta-binomial conditional model with a discrete uniform distribution. The extent to which the model is either beta-binomial or uniform is controlled by the mixing parameter ρ. As ρ approaches 0, the mixture becomes beta-binomial and vice versa: . 31/ This approach effectively enables the model to decide, given sufficient evidence in the data, whether or not a survey observation could plausibly have arisen from the same model as other nearby (in time) surveys for that country and area. The degree of evidence that is required can be controlled through the prior distribution that is specified for each ρ. For example, a strong prior distribution with most of the probability mass close to 0 for each ρ corresponds to a strong belief that each survey value is very unlikely to be an outlier.
For this application, we introduce one ρ for each unique survey. This means that, if a survey has an urban, rural and an overall value, a single ρ controls the extent of mixing for all three. The reason for this is that if, for example, the model indicates that an urban value is a very severe outlier, we would prefer also to reduce the effect of the corresponding rural value on estimated trends and uncertainty. Including this layer in the model means that estimated trends are considerably more robust to outliers, as we shall highlight in Section 3.1. Additionally, predictions for ρ are useful as an indicator to flag surveys efficiently that may warrant further investigation.

Prior distributions and implementation
For all hyperparameters η which are the mean of a normal distribution (e.g. η .ξ/ i,j ), we specified non-informative N.0, 10 2 / prior distributions. For all hyperparameters σ which are the standard deviation of a normal distribution (e.g. σ .ξ/ 0,i,j ), we specified non-informative positive-truncated N.0, 10 2 / prior distributions.
All code was written and executed by using R (R Core Team, 2018) and the model was implemented by using NIMBLE (de Valpine et al., 2017), which is a facility for highly flexible implementation of MCMC models. For this application, we needed to add the beta-binomial distribution to NIMBLE, which was straightforward with only a few lines of R code. Four MCMC chains were run for 80000 iterations from different randomly generated initial values and with different random-number generator seeds. The first 40000 samples were discarded as burn-in and, to limit system memory usage, the remaining samples were thinned by 10. Convergence of the MCMC chains is discussed in Appendix B. The model was applied to a subset of the data consisting of 1084 surveys and predictions were made for all countries with at least one survey (after selection). Survey selection criteria are discussed in Appendix D. Associated NIMBLE model code is available from https://rss.onlinelibrary.wiley.com/hub/journal/14679876/ series-c-datasets and data are available on request.

Prediction
The posterior predictive distribution for a new set of fuel use proportions in area j (1, urban; 2, rural; 3, overall), x j,c,t , given all of the observed fuel use proportions x, is given by p.x j,c,t |x/ = p.x j,c,t |μ j,c,t , φ c /p.μ j,c,t , φ c |x/dμ j,c,t dφ c : . 32/ Here p.x j,c,t |μ j,c,t , φ c / represents the GDM likelihood for counts v j,c,t , where x j,c,t = v j,c,t =N, given mean fuel use μ j,c,t and variance parameters φ c . Also, p.μ j,c,t , φ c |x/ represents the joint posterior distribution for μ j,c,t and φ c , given the data x. Parameters μ j,c,t and φ c are composed of country, regional and superregional effects, meaning fuel use predictions may be correlated between countries in the same region or superregion. The joint posterior p.μ j,c,t , φ c |x/ is available to us in the form of MCMC samples, meaning that equation (32) can be evaluated by using Monte Carlo simulation. Specifically, for each MCMC sample of μ j,c,t and φ c we can simulate a new vector of fuel use counts v j,c,t from the GDM with mean μ j,c,t , variance parameters φ c and total N, before calculating x j,c,t = v j,c,t =N. This results in a set of samples of x j,c,t |x, which is a Monte Carlo approximation of the posterior predictive distribution.

Model checking
The task of assessing the validity of the statistical model is divided into two parts: basic procedures to check that there are no systematic issues with reproducing the observed data and a forecasting experiment to evaluate the ability of the model to predict future fuel usage values.

Posterior predictive checking
Given the Bayesian implementation of the model, assessing the fit to both in-sample and out-ofsample data is based on posterior predictive model checking (Gelman et al., 2014). For in-sample data, this involves simulating replicatesx|x of the observed fuel proportions x from the posterior predictive distribution, using the approach that was described in Section 2.8. The statistical properties of these replicates can then be compared with properties of the corresponding observations. For brevity, we present predictive checking for solid fuel use in this subsection and for all the other fuel types in Appendix C.
In the first instance, scatter plots comparing the posterior means of the replicates with the observed values can give an indication of any systematic issues. These are shown for solid fuels in Fig. 3 and, for the most part, there are no obvious systematic problems. Also shown are coverage values: the proportion of observed solid fuel use values which lie within the 95% posterior predictive intervals, computed from the corresponding replicates. A coverage that is substantially lower than 95% would mean that a high proportion of observed values are extreme values with respect to the posterior predictive model, implying a poor fit. In this case, the coverage values for the 95% credible intervals were higher than 95% for all fuels and areas. Taken together, these two checks indicate that the model captures the observed data well. Another way of checking the model is to compare predicted trends with survey observations on an individual country basis. Fig. 4 shows the median predicted proportion by using each fuel in each segment (urban, rural and overall) of India, with associated 95% posterior predictive intervals. Here it can be seen that the predicted trends follow the observed trends well, with prediction intervals that envelop a reasonable number of surveys. Moreover, by examining the tightness of the prediction intervals with respect to the variance of the observations, we can  value is likely to be an outlier, such that the estimated trend and variability are not adversely affected. This illustrates the effectiveness of incorporating mixture distributions (as described in Section 2.6) in making the model more robust to outliers.
To check whether the model reproduces the observed data well, the overall predictions in Figs 4 and 5 incorporate the model's prediction of any systematic deviation g c .t/ from the UN estimates of urban and rural proportions, in the sampling of urban and rural respondents. If desired, predictions of overall fuel usage can instead be based solely on the UN estimates of urban and rural proportions (rather than based on the proportions in the surveys). This is achieved by removing g c .t/ from equation (26)  We can also inspect the model's ability to capture structured between-country and temporal variability in the proportions of urban and rural respondents in the survey samples: Fig. 6 shows the proportion of (unweighted) respondents recorded as urban in the fuel surveys for Kenya ( Fig.  6(a)) and Malawi ( Fig. 6(b)) compared with UN estimates and predicted values from the model. The plot for Kenya shows evidence that the proportion of urban respondents in the surveys is, on average, higher than the UN estimates (g c .t/ > 0). The plot for Malawi, meanwhile, shows limited evidence of any systematic deviation (g c .t/ ≈ 0). In both of these cases, the spline that is incorporated in equation (26) appears to capture any remaining structured variability (or the lack thereof) well, enabling reliable prediction of urban and rural trends where surveys provide values for only the overall population.

Forecasting experiment
The model's ability to predict (forecast) fuel usage beyond the range of the data can be assessed by using out-of-sample predictive testing. This is important to validate the model's use for predicting present day fuel use, as there is a lag in data collection of 1-2 years, and for projecting estimated trends into the future, to provide a baseline against which the effects of interventions can be compared. To emulate a hypothetical forecasting scenario, the model was fitted only to surveys up to and including year 2012, therefore excluding 5 years (approximately 22% of the data). We then used the model to predict 5 years into the future and to produce predictive distributions for the out-of-sample surveys. As it is not our primary interest to forecast how any systematic trends in the sampling of urban and rural respondents will progress in the future, we focus on checking the out-of-sample prediction of urban and rural surveys. Fig. 7 shows scatter plots comparing the out-of-sample survey values with the mean predicted values from the model. Although there are some values which are not captured well (some potentially because of errors in data entry), generally the model does not seem to overpredict or underpredict systematically. Notably, the coverage values tend to be quite high, indicating that the model produces reliable uncertainty estimates when predicting into the future. To guard against high coverage values through unreasonably uncertain prediction intervals, we can assess the model's performance when forecasting by examining predictive plots for individual countries. Fig. 8 shows predictive fuel usage plots for Ghana, from the model where surveys from 2013 onwards are excluded. Here, the surveys removed are generally well within the 95% predictive intervals, which grow reasonably larger for predictions further into the future but are not so wide that they are impractical.

Discussion
Currently, the health burdens that are associated with exposure to air pollution from the use of polluting fuels for cooking are assessed on the basis of groupings of types of fuel (i.e. solid fuels or polluting fuels). However, this fails to take into account changes in the use of specific types of fuel that may affect the health impacts. For example, the results of the analyses that were performed here suggest that over the last few decades a substantial proportion of urban households in sub-Saharan Africa have switched from raw biomass fuels (i.e. wood, crop waste and dung) to charcoal, which has very different emissions characteristics. To expand the knowledge base about the effects of air pollution on health, burden-of-disease calculations should instead be based on the use of specific fuels, but until now country-specific estimates of specific fuel usage have been unavailable.
To address this, we have developed and implemented a novel multivariate hierarchical model for specific types of fuel which aims (a) to estimate trends and associated measures of uncertainty, for specific fuels, for every country, and separately for urban and rural areas within a coherent modelling framework, (b) to provide meaningful estimates in countries where there is limited data and (c) to forecast fuel usage up to the present day and into the future.
Based on GDM distributions, the global household energy model automatically constrains the proportions of populations using each of eight key types of fuel ensuring that their sum does not exceed 1. Set within a Bayesian modelling framework, parametric and predictive uncertainty is quantified (e.g. by 95% prediction intervals) and verified by using within-sample posterior predictive checking (see Section 3.1). Where data availability is limited within a country, the model can 'borrow' information from neighbouring countries by using nested country, regional and superregional random effects, reducing predictive uncertainty. The model can forecast a number of years beyond the extent of the data, with assessment of forecasted values performed by using an out-of-sample predictive experiment (see Section 3). This allows present day fuel use to be evaluated, as data collection lags behind by 1-2 years. In addition, fuel use predictions for future years provide a baseline representation of what might be expected in the absence of intervention, with which future surveys that are conducted post interventions can be compared.
In achieving these aims, the model overcomes several challenges that are associated with using these survey data: (a) inconsistency in survey design and collection, together with missing values, which can lead to highly unstable time series for some individual fuels in some countries; (b) the total number of respondents is unavailable for around half of surveys; (c) for many surveys, fuel use values are not available separately for urban and rural areas.
To address challenge (a), we adopted a tiered approach (Section 2.2) where we first modelled combined fuel use (e.g. solid fuels), which is progressively disaggregated into the component fuels. This ensures that excess variability and uncertainty among 'confused' fuels does not propagate into those which are unaffected, and that predictions for the aggregate quantities are stable. To address the problem where the total number of respondents is unknown, challenge (b), we approximate a GDM model for the number of respondents, transforming the proportions by using each fuel into counts from an artificial sample size (Section 2.1). We illustrate that this results in approximately the same inference for populationwide fuel usage as modelling the (unavailable) number of respondents in their original count form, through a simulation experiment (which is presented in Appendix A). We addressed the unavailability of information on separate urban and rural use of fuel for all surveys, challenge (c), by including a layer in the model which links the urban, rural and overall fuel use values for each survey. Structured between-country and temporal variability in the proportion of urban respondents was then accounted for by combining UN estimates with smooth functions of time for each country.
Finally, in addition to addressing these data-specific challenges, mixture distributions were employed to make the model more robust to potential outliers.
To date, the model has been adopted by the WHO to produce estimates of the proportion of people in each country who rely on polluting fuels as their primary fuel and technology for cooking and has played a central role in monitoring SDG 7.1.2 (SDG 7 Custodial Agencies, 2019). It has also played an important role in identifying data that appear to be out of line with general country level patterns for further investigation. Ultimately, the modelling approach proposed provides policy makers with decision quality information and enables a ground breaking reassessment of the health effects of cooking with polluting fuels and technologies. For each available n i (i = 1, : : : , 598), we simulate a vector of survey responses y i = {y i,1 , y i,2 , y i,3 , y i,4 } from a GDM model. Here, each country has a different (time constant) marginal mean vector μ c and variance parameters φ c (preserving the original associations between the countries and observed n i in the data, and ignoring countries with no observed n i ). Some countries will have only one y i and others will have several (each with its own unique n i ). We simulate all of the μ c from a Dirichlet(1) distribution, and all the φ c independently from a gamma(4, 0:1) distribution (inducing a moderately high degree of overdispersion, compared with the multinomial model): In the baseline scenario, against which we shall compare our approximate method, we have observations for all the n i and all the y i . This enables us to implement the above model directly, which we do in a Bayesian setting using a Dirichlet(1) prior for each μ c and a non-informative exponential(0.001) prior for each φ c .
In the second scenario, we do not know any of the n i or the y i , but we do have observations for x i = y i =n i . In this scenario, we can apply our approximate method (from Section 2.1), where we fit the GDM to constructed counts v i = Nx i . We proceed to apply this method while varying N over a range of values (10,20,30,50,100,300,1000,3000,10000,30000,100000,300000,1000000), so that we can investigate the effect of this choice on parameter inference.
Recall that in our application we are primarily interested in correct inference for the marginal mean proportions μ c (the populationwide fuel use in each country), and we claimed that a sufficiently large choice of N yields a parameter inference that is approximately the same as if we had modelled the y i directly, along with the sample sizes n i . To assess this, we begin by examining the models' accuracy when predicting the true marginal mean proportions μ c . For each posterior sample, we can compute the mean-squared error between the predicted values of μ c and the true values. Fig. 9(a) shows the median of this statistic, for varying N, as well as the interquartile range (dark), and 95% prediction interval (light). Compared with the same statistics for the baseline model, which are shown as horizontal lines, we can see that the distribution of mean-squared errors for the approximate method does indeed converge to the baseline model as N increases, from about N = 10000 onwards.
We can also examine how the approximate method quantifies uncertainty in μ c . For each individual μ 1, c , : : : , μ 4, c , we compute the standard deviation of the posterior samples. The median of these posterior samples are then shown for each N in Fig. 9(b), once again alongside the interquartile range and 95% interval. The distribution of posterior standard deviations for the approximate method also converges to the baseline model, but it does so for a much lower N (between 100 and 1000) than does the mean-squared error.
Finally, if we choose a single value of N, we can compare more closely the approximate method with the baseline model when estimating μ c . Fig. 10  for the μ 1, c , : : : , μ 4, c , from the approximate model with N = 10000, with the quantiles from the baseline model. The quantiles are virtually identical, suggesting that for these simulated data the same inference for μ c would be achieved either by modelling the true counts y i directly or by modelling the constructed counts v i = 10000x i .