Small area prediction of seat‐belt use rates using a Bayesian hierarchical unit‐level Poisson model with multivariate random effects

The Iowa Seat‐Belt Use Survey is an annual survey designed to provide estimates of seat‐belt use rates for the state of Iowa in the United States. A desire for county level (substate) estimates motivates the need for small area estimation. Developing a small area model for the seat‐belt survey data is challenging for two mean reasons. First, the data consist of multivariate counts. Second, the same sampling units are observed for five different time points. An appropriate model should reflect multivariate dependencies and the longitudinal data structure. We address these challenges though a unit‐level Bayesian hierarchical model. The observed counts have Poisson distributions. Latent random effects capture multivariate associations and correlations among the observations for the same sampling unit observed at different time points. We employ the posterior predictive distribution for model comparisons. Using the selected model, we construct small area predictors of two measures of seat‐belt use at the county level for 5 years.

types of counts may be correlated.Correlations across time may also be important.Therefore, the seat-belt survey application necessitates a model for multivariate count data observed over time.
We reflect the complex dependence structures in the seat-belt survey data through a Bayesian unit-level small area model.We assume that the collected counts for a road segment are realizations from Poisson distributions.We capture correlations among the four types of counts and the five time-points through latent random effects that have multivariate normal distributions.The model is in the category of "unit-level" small area models because the response variable in the model is the observed vector of counts for an individual road segment.In contrast, for "arealevel" small area models, the response variable would be a direct estimator for a county.
Existing small area estimation procedures for count data are not immediately suitable for the seat-belt survey application.Trevisani and Torelli (2017) review hierarchical Bayesian models for count data, focusing on area level models.One of the models that they consider is a univariate version of the multivariate area-level model used by Ferrante and Trivisano (2010) for the number of individuals recruited by business establishments.Boubeta et al. (2016) study empirical best predictors under an area level Poisson model, and Boubeta et al. (2017) extend Boubeta et al. (2016) to include time effects.DeSouza (1992) develops a bivariate Poisson lognormal model for area-level measures of disease frequencies.Bradley et al. (2018) cleverly utilize a multivariate log-gamma distribution for modeling area-level count data with spatial and temporal components.The multivariate log-gamma distribution leads to a Gibbs sampling routine in which conditional distributions have known forms.While Ferrante and Trivisano (2010), DeSouza (1992), Boubeta et al. (2017), and Bradley et al. (2018) consider multivariate and temporal data, they focus on area-level models.In contrast, we require a unit-level model.Molina et al. (2007) develop a unit-level multinomial-logit model for small area estimation, and L opez-Vizcaíno et al. (2015) extend Molina et al. (2007) to include time effects.Hobza and Morales (2016) develop empirical best prediction procedures for the unitlevel binomial distribution.Methods for multinomial or binomial data are not immediately applicable in our context because our counts are not constrained to sum to a fixed multinomial or binomial sample size.Instead, we observe a four-dimensional vector of unconstrained counts for each sampled unit.Therefore, the Poisson distribution is more appropriate than the multinomial distribution for our data.Ghosh et al. (1998) use a Poisson generalized linear mixed model (GLMM) to model lung cancer rates at the county level.Lu and Larsen (2007) conduct a Bayesian analysis of a unitlevel Poisson model for the purpose of small area estimation.Reluga et al. (2021) develop confidence intervals for small area predictions based on a GLMM, focusing on models for counts and proportions.Jiang and Lahiri (2006) review the use of a general GLMM for small area estimation.The GLMM's of Ghosh et al. (1998), Reluga et al. (2021), Lu and Larsen (2007), and Jiang and Lahiri (2006) are univariate, but we require a model for a multivariate response.Oleson et al. (2007) simultaneously model hunting success rates and hunting pressure using a bivariate GLMM for a pair consisting of a proportion and a count.They link a Poisson model for counts to a binomial model for proportions using correlated random effects.Li and Zaslavsky (2009) use latent variables to facilitate joint modeling of a binary and Gaussian response.The bivariate models of Oleson et al. (2007) and Li and Zaslavsky (2009) are not immediately suitable for modeling the multivariate count data observed in the seat-belt survey application.We uniquely employ multivariate latent variables in the Poisson framework to describe a four-dimensional vector of counts observed over time.
We propose a unit-level small area model with a Poisson random component that includes overdispersion and multivariate random effects.
To our knowledge, this is the first multivariate unit-level model with a Poisson random component in the small area estimation literature.This is also the first unit-level Poisson model in the small area estimation literature to integrate data for multiple time-points.We use hierarchical Bayesian methods for inference.Rue et al. (2009) use a Laplace approximation to sample from the posterior distribution of a model with Gaussian latent variables.We employ Hamiltonian Monte Carlo, as implemented in the statistical software Stan.Because the small area parameters are complex, nonlinear functions of the observed counts, Bayesian predictors for simple means are insufficient.To overcome this challenge, we use a variation of the method of Molina et al. (2014) to construct the small area predictors and obtain posterior standard deviations.The method of Molina et al. (2014) is general enough to accommodate small area parameters that are nonlinear functions of the model response variable.Molina et al. (2014) develop methodology under the premise of a linear model with normal distributions.We transfer the method of Molina et al. (2014) to the Poisson framework.In Section 2, we describe the exploratory analysis that led us to select the multivariate Poisson model.We also describe the design of the seat-belt-use survey in Section 2. In Section 3, we present the process that led us to select the overdispersed Poisson model with correlated random effects from a set of alternative models.In Section 4, we analyze the posterior distributions of the fixed parameters of the selected model.In Section 5, we present the small area predictors.Section 6 concludes.The supplement contains supporting plots that are omitted from the main document for brevity.

| SEAT-BELT USE SURVEY DATA
The Iowa Seat-Belt Use Survey is an annual survey intended to measure the overall seat-belt-use rate in Iowa.The sample is selected according to a stratified two-stage design.Primary sampling units are 15 counties.Secondary sampling units are road segments within counties.The road segments in a county are selected according to a stratified probability proportional to size (PPS) sample design.The strata are road types, defined as primary, secondary, and local roads.The size measure for probability proportional to size sampling is the vehicle miles traveled on a road segment (VMT).The sample sizes are determined with the aim of achieving a standard error for the state-level estimator below 2.5%.As the number of sampled segments per county ranges from 5 to 15, county estimation is clearly a small area problem.
The seat-belt survey is conducted annually.The sample for the 2017-2021 seat-belt-use surveys was selected in 2017, and the same road segments are observed over the 5-year period from 2017 through 2021.We use collected data for the years 2017-2021.
Every year, each sampled road segment is observed over a 45-min period.Data collectors record the number of drivers and passengers who are wearing or not wearing a seat-belt.This results in a vector of four counts for each road segment in a given year, defined by the number of belted drivers, the number of unbelted drivers, the number of belted passengers, and the number of unbelted passengers.The four collected counts for each road segment in a year serve as the vector of response variables for our model.
In small area estimation, model covariates are required not only for sampled road segments but also for road segments in the full population.
The two design variables of road type and VMT are known for all road segments in the population.These two design variables therefore serve as possible auxiliary variables in small area models.
We do not directly incorporate the survey weight in our model.We implicitly incorporate the information in the survey weight through our choice of model covariates.Because the road type and the VMT are the two design variables used to select road segments within counties, the auxiliary variables contain the information in the survey weight.Our choice of covariate essentially ensures that the design is noninformative for the specified model.The probabilities of selection for the counties are unimportant because we only construct predictors for counties that are included in the sample.
In this section, we report the results of an exploratory analysis that motivates a model form.To conserve space, we defer the details of the exploratory analysis to the Supporting Information.As a precursor to our discussion of the exploratory analysis, we establish notation in Section 2.1.We then discuss the results from the exploratory analysis in Section 2.2.

| Notation
We establish notation for the seat-belt survey data.Let i index the county, where i ¼ 1,…, 15.Let t ¼ 1,2,3,4,5 index the years 2017-2021.Let j index the road segment within a county, where j ¼ 1, …, n it , and n it is the number of road segments in the sample for county i and year t.
We observe four counts for each road segment in a given year.We refer to the four types of counts as categories, and we index the categories by k ¼ 1,2,3,4.The four categories are for drivers belted (k ¼ 1), drivers not belted (k ¼ 2), passengers belted (k ¼ 3), and passengers not belted ðk ¼ 4Þ.The response variable, denoted Y ijt,k , is the number of vehicle occupants in category k observed in year t for road segment j in county i.
For instance, Y ijt,1 is the number of belted drivers observed on road segment j of county i in year t.The interpretations of Y ijt,k for k ¼ 2,3,4 are analogous.
The two possible model covariates are the road type and the vehicle miles traveled.The covariates are constant across the years.The road type is a categorical covariate with three categories.We let x 0,ij denote the road type, where x 0,ij ¼ ð0,0Þ 0 if road segment ði, jÞ is a local road, x 0,ij ¼ ð1,0Þ 0 if road segment ðijÞ is a primary road, and x 0,ij ¼ ð0,1Þ 0 if road segment ði,jÞ is a secondary road.We let x 1,ij denote the log of the vehicle miles traveled for road segment ði, jÞ.We restrict attention to road segments with a positive value for the VMT.The two possible model covariates are x 0,ij and x 1,ij .

| Exploratory analysis
One modeling decision is the choice of the response distribution.Table 1 has summaries of the distributions of the observed counts.The counts for drivers belted are fairly large, and none of the counts is zero.For the other categories, the discrete nature of the data is more important, as several of the observed counts are zero.In preliminary work not presented here, we model the observed counts as realizations from a normal distribution, using a square root transformation to stabilize the variances.The normal distribution may be appropriate for drivers belted but does not reflect the discrete nature of the data for the other categories.A further difficulty with the normal distribution is that predicted counts can be negative.As a result of these issues associated with using the normal distribution, we prefer not to use a normal approximation for the distribution of transformed counts.We considered use of a binomial distribution instead of a Poisson distribution.The binomial distribution has two main disadvantages for this application.First, when using the binomial distribution, the binomial sample size is regarded as fixed.In this application, the total number of drivers or passengers observed on a road segment (the quantities that would define the binomial sample sizes) are random variables.The second issue is that the binomial distribution arises from independent Bernoulli trials.We do not think that the events that two individuals in the same car wear a seat-belt are independent.In contrast, we think that one passenger's decision to wear a seat-belt may be influenced by the behaviors of other passengers in the same car.For these reasons, we prefer the Poisson distribution over the binomial distribution.
We constructed several exploratory plots to identify the dominant trends in the data.For brevity, we relegate the plots to the Supporting Information.One goal of the exploratory analysis is to assess relationships between the response and the covariates.The other is to identify important dependence structures.We summarize the implications of the exploratory analysis for modeling.
The exploratory analysis suggests a model with several structures.The road type and vehicle miles traveled appear to be important covariates for some categories.The relationships between the covariates and the response vary across the four categories.The counts for a road segment appear to be correlated across time for a given category.County means are correlated across years for a given category, and interactions between counties and years appear to exist.The correlations among the counts for the four categories are important.
Two main questions remain.One issue is whether or not the relationships between the response and the covariates (road type and VMT) change over time.A second question is whether or not overdispersion exists.We will explore these issues formally through a comparison of alternative Bayesian hierarchical models.

| MODEL SELECTION, COMPARISONS, AND EVALUATION
We specify a general model that reflects the essential features of the exploratory analysis.We use a Poisson random component.We use multivariate random effects to capture correlations among the categories.We first define a full model.We will then consider three reduced versions of the full model.The four Bayesian hierarchical models integrate data from the five time-points and the four categories.
We define a full model by Þ and R ℓ is a 4Â4 correlation matrix with correlation parameter ρ ℓ,kk 0 for ℓ fb, u, η, ϵg and k, k 0 f1,2,3,4g.We explain the interpretations of several important components of the full model (1).The county effects are the parameters b i,k , and the county Â time interactions are given by the random effects u it,k .The η ij,k represent road segment effects.As our objective is to construct predictions for counties, we will include b i,k in all models that we consider.The exploratory analysis suggested the presence of county Â time interactions and road segment effects, so we will also include u it,k and η ij,k in our models.The full model (1) contains different regression coefficients associated with road type and vehicle miles traveled for the different time-points.Specifically, the parameters β t,k may vary across the time-points t ¼ 1, …, 5. Likewise, the parameters α t,k may differ across t ¼ 1,…, 5. We will consider reduced models, where these parameters are constant across time.The random variable ϵ ijt,k captures overdispersion relative to a model in which ϵ ijt,k ¼ 0. We will consider the need for these overdispersion parameters through formal model comparisons procedures.
We consider three reduced versions of the full model (1).This leads to four possible models.The models are defined as follows: • Model 1 (M1): We let Model 1 be the most restrictive model.For Model 1, we remove the overdispersion parameters so that ϵ ijt,k ¼ 0. Equivalently, Λ ϵ R ϵ Λ ϵ ¼ 0 for Model 1.For Model 1, we also restrict the regression coefficients to be constant across time.For Model 1, • Model 2 (M2): We define Model 2 to be a model with no overdispersion but where the regression coefficients are permitted to vary across the time-points.Model 2 is a restricted version of the general model in (1) without overdispersion but where the regression coefficients are permitted to vary across the time-points.For Model 2, we set ϵ ijt,k ¼ 0 (equivalently, Λ ϵ R ϵ Λ ϵ ¼ 0).We permit β t,k and α t,k to vary for t ¼ 1,…,5.
• Dispersed Model 1 (DM1): For the Dispersed Model 1, we add the overdispersion parameters, ϵ ijt,k , into Model 1.The Dispersed Model 1 is a reduced version of the full model, where the regression coefficients for road type and VMT are assumed to be constant across the time-points.
Specifically, the Dispersed Model 1 is a restricted version of the full model ( 1), where • Dispersed Model 2 (DM2): We refer to the full model in (1) as the Dispersed Model 2. The Dispersed Model 2 includes the overdispersion parameters and also permits the regression coefficients for road type and VMT to vary across the years.
We refer to the regression coefficients, standard deviation parameters, and correlation parameters as the fixed model parameters.For M1 and DM1, the fixed model parameters are γ t , α k , β k , Λ ℓ , and R ℓ for ℓ fb, u, η,ϵg.For M2 and DM2, the fixed parameters are γ t , α t,k , β t,k , Λ ℓ , and R ℓ for ℓ fb,u, η,ϵg.The fixed model parameters require prior distributions.We define prior distributions that are diffuse but proper.For models M1 and DM1, we define priors by α k $ iid Nð0,10 6 I 2Â2 Þ and β k $ iid Nð0,10 6 Þ.For Models M2 and DM2, we define priors by α t,k $ iid Nð0,10 6 I 2Â2 Þ and β t,k $ iid Nð0,10 6 Þ.For all models, we let the diagonal elements of Λ ℓ have independent Cauchy distributions with scale parameters of 10000.We let the priors for the correlation matrices be independent LKJ(1) distributions.The LKJ prior for a 4 Â 4 correlation matrix R is given by fðRjηÞ ¼ 2 where Bða, bÞ denotes the beta function.We set η ¼ 1 to obtain the LKJ(1) distribution.The half-Cauchy and LKJ prior distributions are recommended in the documentation for the STAN software (Stan development Team, 2021).Also, see Lewandowski et al. (2009).
We fit the models using Markov chain Monte Carlo (MCMC), as implemented in the statistical software package STAN (Stan development Team, 2021) via Rstan (Stan Development Team, 2020).For each model, we run three independent MCMC chains, each of length 1500.We discard the first 500 iterations for burn-in.The total number of resulting samples from the posterior distribution is 3000.
The models DM1 and DM2 utilize the random parameters ϵ ijt,k to model overdispersion.We considered using a negative binomial distribution to model overdispersion instead.When using the negative binomial distribution, we found that the trace plots exhibited a high degree of autocorrelation, and the chains did not appear to mix.This may occur because the negative binomial does not fit the data well.We therefore do not consider the negative binomial distribution.
In this section, we compare the four models (M1, M2, DM1, and DM2) in the direction of selecting one of them.We assess convergence to the stationary distribution in Section 3.1.We compare the alternative models through the posterior predictive distribution in Section 3.2.In Section 3.3, we consider the question of whether it is reasonable to restrict the coefficients for VMT and road type to be constant across the years.

| Convergence to stationary distribution
We assess convergence of the chains to the stationary distribution using trace plots and the scale reduction factor.We calculate the scale reduction factors for the fixed model parameters.Table 2 summarizes the distributions of scale reduction factors for the four models.The maximum value of the scale reduction factor is 1.15.The scale reduction factors indicate that the chains are converging.

| Model comparisons
We compare the models using the WAIC statistic.The WAIC statistic is a measure of goodness of fit.Alternative goodness of fit statistics are the BIC or DIC.We use the WAIC because it has been recommended for hierarchical models or when an objective is prediction (Gelman et al., 1995, p. 174).
T A B L E 2 Scale reduction factors for four models.3 contains the WAIC statistics for the four models.Incorporating overdispersion reduces the WAIC dramatically.We therefore prefer the models with overdispersion (DM1 and DM2) relative to the models without overdispersion (M1 and M2).The WAIC statistics for models DM1 and DM2 are similar.Likewise, the WAIC for models M1 and M2 are similar.Recall that models M1 and DM1 restrict the regression coefficients to be constant over time while M2 and DM2 permit the regression coefficients to vary across the years.The WAIC alone does not lead to a clear conclusion with respect to whether or not the coefficients should vary across the years.
We now examine the question of whether or not it is reasonable to restrict the coefficients for road type and VMT to be stable across the years more carefully.To do this, we examine the posterior distributions of the regression coefficients for DM2.The Supporting Information contains plots of the posterior distributions of β t,k and α t,k from DM2.The posterior distributions of fβ t,k : t ¼ 1,…,5g for fixed k overlap heavily.Similarly, many of the 95% credible intervals for α t,kq and α t 0 ,kq overlap.We infer that some of the coefficients for the same category at different timepoints differ.For example, a 95% credible interval for the difference between β 1,3 and β 4,3 is [0.019, 0.072].Nonetheless, most of the posterior distributions for different time-points but the same category overlap heavily.Further, the WAIC statistic for DM1 is slightly below the WAIC statistic for DM2.In the interest of parsimony, we think it is reasonable to restrict the coefficients for VMT and road type to be constant across the years.We therefore select DM1, the model with overdispersion that restricts the coefficients associated with the covariates to be stable across the years.

| Sensitivity to prior assumptions
We repeated the model selection process using a completely different set of priors.We specified improper uniform priors for the regression coefficients.We let the diagonal elements of Λ ℓ (ℓ fb, u, η, ϵg) have independent Cauchy distributions with scale parameters of 2.5.We obtained essentially the same results from this set of prior distributions.We omit tabular output for brevity.

| Evaluation of goodness of fit of selected model
We assess the goodness of fit of the selected model DM1 through the posterior predictive p value.We generate 1000 samples from the posterior predictive distribution and calculate a summary statistic, denoted as Qðy ðℓÞ Þ for simulated sample ℓ.We also calculate the statistic QðyÞ based on the observed data.We define the posterior predictive p value as L À1 P L ℓ¼1 I½QðyÞ ≥ Qðy ðℓÞ Þ.We first consider the summary statistics defined as the mean, median, and the standard deviation for groups defined by intersections of categories and time-points.The posterior predictive p values for the means are typically close to 0.5.None of the posterior predictive p values for the first two moments is below 0.15 or above 0.85.For the median, only three of the posterior predictive p values exceeds 0.85 (see Supporting Information).This indicates that DM1 captures the dominant trends in the data fairly well.
We next use as the summary statistic for the posterior predictive p value the pairwise correlation between each pair of categories (Table 5).
The correlations are across years and road segments.The posterior predictive p values for the Dispersed Model 1 for the pair-wise correlations are all between 34% and 72%.The Dispersed Model 1 captures the correlation among the categories well.
Finally, we calculate the posterior predictive p value using the Pearson chi-square statistic for each category separately.The posterior predictive p values range from 0.36 for passengers not belted to 0.47 for drivers belted.The posterior predictive p values indicate that the selected model ( DM1) is a reasonable representation of the observed data.

| INFERENCE FOR THE FIXED PARAMETERS OF DISPERSED MODEL 1
We conduct inference for the fixed parameters of the Dispersed Poisson Model 1.We plot the posterior distributions of the regression coefficients, variance parameters, and correlation parameters in Figures 1-3.The plots are constructed using the R package bayesplot.
Figure 1 contains plots of the posterior distributions of the regression coefficients for the vehicle miles traveled (β k ) and the road type The shaded regions indicate 95% credible intervals.The regression coefficients for the vehicle miles traveled (β k ) differ significantly from zero.The coefficient for VMT associated with the category drivers not belted (β 2 ) differs significantly from the coefficients for the other categories.Recall that α k ¼ ðα k1 , α k2 Þ, where α k1 is the coefficient for primary roads, and α k2 is the coefficient for secondary roads.With the exception of the category for un-belted drivers and primary roads, the 95% credible intervals for the coefficients associated with the road types (α kℓ ) overlap zero.The mean counts for primary roads appear to be lower for un-belted drivers than for the other categories.Although many of the coefficients for road type do not differ significantly from zero, we prefer to include road type in the model for two main reasons.One is that the coefficients for road type for drivers not belted differs significantly from zero.The second is that the road type is a design variable, so including road type guards against bias.Figure 3 contains the posterior means of the correlation parameters.These are the posterior means of the parameters ρ ℓ,kk 0 for ℓ fb,u, ηg and k, k 0 f1,2,3,4g.The posterior mean of the correlation between the road segment effects for each pair of categories ðρ η,kk 0 Þ is close to 0.8.
The posterior means of the pair-wise correlations for the county Â time interactions ðρ u,kk 0 Þ are also positive but are not as high as the posterior means of ρ η,kk 0 .For the county effects, the correlations are less important.The correlations between the categories are largely explained by the correlations between the road segment effects.Fitting a joint model to all of the categories simultaneously seems more appropriate for this data set than fitting separate models to the different categories.The multivariate structure of the data is important, on the basis of the posterior means of the correlation parameters.
T A B L E 5 Posterior predictive p values for correlation statistic.
Plots of posterior distributions for standard deviation parameters for the Dispersed Model 1. Notation in the plot: taub We define two types of small area parameters that give two different measures of the seat-belt use rate.The first type of parameter is defined as the proportion of belted drivers, passengers, and vehicle occupants who are wearing a seat belt.The second is the proportion of vehicle miles for which the driver, passenger, or occupant is wearing a seat-belt.Both types of parameters may be of interest to the user of the seat-belt survey data.The parameters are complex functions of the basic counts.We use a modification of the method of Molina et al. (2014) to construct the predictors.Molina et al. (2014) develops a general Bayesian procedure for obtaining small area predictors of parameters that are nonlinear functions of the response variable.We first define the parameters of interest.We then explain how we construct the predictors from the posterior samples.

| Definitions of small area parameters
Recall that Y ijt,k represents the count of type k observed in a 45-min period for unit j in county i for year t.The quantities of interest are not means of observations over a 45-min period.Instead, the long-term average seat-belt use rates are of interest.We therefore define the parameters of interest as functions of the mean counts instead of the observed counts.Let λ it,k ¼ ðλ i1t,k , …,λ iNit,k Þ 0 be the vector of mean counts for category k in the population of N i road segments in county i in year t.We first define parameters representing the proportions of belted drivers, passengers, and vehicle occupants.These measures of seat-belt use are defined as Plots of posterior means of correlation parameters for the Dispersed Model 1 and for drivers, passengers, and total occupants, respectively, where U i ¼ f1,…,N i g.In ( 2)-( 4), θ it,p is the proportion of belted passengers in county i for year t, θ it,d is the proportion of belted drivers in county i for year t, and θ it,v is the proportion of belted vehicle occupants for county i and year t.
We next define the parameters representing the proportions of belted vehicle miles traveled.We first convert the mean counts for a road segment to a proportion for a road segment.We define Note that p ijt,p is the proportion of belted passengers for road segment ij in year t.Likewise, p ijt,d and p ijt,v , respectively, are the proportions of belted drivers and vehicle occupants for road segment ij in year t.Then, define the area level parameters by and In ( 5)-( 7), v ij is the known vehicle miles traveled for road segment ði,jÞ.

| Predictors of parameters
We construct predictors for the 15 counties in the sample.The Markov chain furnishes samples for b i,k and u it,k .The η ij,k for nonsampled road segments is independent of the η ij,k for road segments in the sample.Likewise, ϵ ijt,k for nonsampled road segments is independent of ϵ ijt,k for road segments in the sample.We use the posterior draws for b i,k and u it,k .We generate independent samples for η ij,k and for ϵ ijt,k .
We let ℓ denote the draw from the posterior distribution.For every road segment in the population, we generate λ ðℓÞ ijt,k as logðλ We let and We define predictors as posterior means.We define the predictor of θ it,k by We define the predictor of μ it,k by We use the posterior standard deviation as a measure of uncertainty.The posterior standard deviation of θ it,k is approximated by The posterior standard deviation of μ it,k is approximated by We define approximate 95% credible intervals by 5.3 | Results of small area prediction Figures 4 and 5 contain plots of the 95% credible intervals for θ it,k and for μ it,k , respectively.The credible intervals are defined in ( 14) and ( 15).The small area predictors are usually between 90% and 95%.The predictors within a county are generally more similar to each other than the predictors between counties.
As a rough check, we compare the small area predictions to the direct, state-level estimates.Figure 6 contains a plot of μit,k against θit,k .The predictors of μ it,k tend to exceed the predictors of θ it,k .This is not surprising because the estimators of μ it,k assign higher weight to road segments with higher values for the vehicle miles traveled, where seat-belt use rates tend to be higher.

| Comparison of small area predictors to direct estimators
We compare the small area predictors to the direct estimators.We calculate the direct estimators and corresponding standard errors as if the design were probability proportional to size with replacement sampling within each county.The association between the predictors and the direct estimators is generally reasonable.The predictors tend to increase as the direct estimators increase.Small area prediction tends to shrink the direct estimators toward the overall mean.Small area prediction increases direct estimates that are far below the mean.Simultaneously, direct estimates that are close to 1 are reduced through small area prediction.
The estimates for passengers belted (in red) tend to be more variable than the estimates for drivers belted or for total vehicle occupants.This is the case for both μ it,k and θ it,k .Small area prediction increases estimates for passengers belted, where the direct estimator is roughly below 85%.Likewise, when the direct estimator for passengers belted is close to 1, small area prediction decreases the estimated proportion.The left and right panels of Figure 7 both have an outlying point, where the direct estimate is unusually low.The point corresponds to passengers belted for county 6 in year 2017.For θ, the direct estimate is 0.66 and the small area predictor is 0.82.For μ, the direct estimate is 0.58, and the small area predictor is 0.82.The direct estimates for the other 4 years for passengers belted in county 6 are above 83%.In this case, small area estimation shrinks the direct estimator for 2017 toward the estimators for the same county for the remaining years.
Figure 8 contains a plot of the posterior standard deviations against the standard errors of the direct estimators.A positive finding from Figure 8 is that the posterior standard deviations for drivers and total vehicle occupants are typically below the 2.5% threshold established for the state-level estimates.Unfortunately, the posterior standard deviations for passengers are often above the 2.5% level.For μ it,k , the relationship between the posterior standard deviations and the standard errors of the direct estimators is as expected.For μ it,k , the direct standard errors vary more than the posterior standard deviations.Many of the posterior standard deviations for μ it,k are below the corresponding direct standard error.
For θ it,k , the association between the posterior standard deviations and the direct standard errors is more variable than desired.A good quality is

| Sensitivity to prior assumptions
We constructed small area predictions for a model with a different choice of prior distribution.For the alternative prior distribution, we let the regression coefficients have improper uniform priors.We specify independent Cauchy distributions with a scale parameter of 10 for the diagonal elements of Λ ℓ .The main trends in the small area predictions based on the alternative prior distribution are the same as the basic patterns in the presented small area estimates.The small area predictors appear insensitive to the choice of prior distribution.

| DISCUSSION
We develop small area predictors based on a Bayesian hierarchical model with a Poisson random component and multivariate random effects.
Model covariates include the road type and the vehicle miles traveled.Random parameters account for road segment effects, county effects, and county Â time interactions.We also include random terms to account for overdispersion.The random parameters are multivariate and capture the correlations across the categories and correlations over time for a given sampling unit.
We compare four versions of the Bayesian hierarchical model using the WAIC statistic.On the basis of this evaluation, we prefer the Dispersed Model 1.This model allows for overdispersion but restricts the regression coefficients for VMT and road type to be stable over time.Posterior predictive p values indicate that the selected model fits the data adequately.
We construct small area predictions based on the Dispersed Model 1.The general magnitude of the small area predictors is reasonable, relative to the direct estimators.Small area prediction also has the anticipated effect of shrinking the county estimates toward the overall mean.The posterior standard deviations are reasonable relative to the direct standard errors.

BERGA
Poisson distribution is a natural response distribution for count data.The support of the Poisson distribution aligns with the support of the observed counts.The use of the Poisson distribution avoids a need for transformation.When coupled with a log link, predictions based on the Poisson distribution are guaranteed to be positive.We therefore use a Poisson response distribution with a log link.

Figure 2
Figure 2 contains plots of the posterior distributions of the standard deviation parameters.The road-segment effects are important, as the posterior distributions of the standard deviations for the road segment effects are stochastically greater than the posterior distributions of the other standard deviation parameters.With the exception of drivers belted, the posterior distributions of the standard deviations of the time Â county interactions are symmetric, and the lower endpoints of the 95% credible intervals are away from zero.The posterior distributions of the standard deviations of the county effects are highly skewed right.
Figure 7 contains a plot of the direct estimators on the horizontal axis with the corresponding small area predictor on the vertical axis.The left panel of Figure 7 contains the predictor of θ it,k and the right panel contains the predictor of μ it,k .The diagonal line in each plot is the 45 line through the origin.F I G U R E 4 Predictors and 95% credible intervals for θ it,k

F
I G U R E 5 Predictors and 95% credible intervals for μ it,k T A B L E 6 State-level direct estimates of θ and μ.

F
I G U R E 6 Small area predictors of μ it,k (vertical axis) plotted against small area predictors of θ it,k (horizontal axis) F I G U R E 7 Plot of direct estimator on horizontal axis with corresponding SAE predictor on vertical axis.Diagonal line in each plot is the 45 degree line through the origin.Left plot is for θ, and right plot is for μ that for cases where the direct standard error for an estimator of θ it,k is unusually high, the posterior standard deviation is below the direct standard error.Small area prediction, however, does not lead to a systematic decrease in the estimated uncertainty for θ it,k , as for μ it,k .The fact that small area estimation does not systematically decrease the standard errors does not necessarily indicate a failure of the model.The model may provide a better measure of uncertainty than the direct estimator.The sample sizes for the domains are only 5. The direct standard errors may understate the true standard deviations of direct estimators of the seat-belt use rates.

F
I G U R E 8 Plot of direct standard error on horizontal axis with corresponding SAE posterior standard deviation on vertical axis.Diagonal line in each plot is the 45 line through the origin Summaries of distributions of observed counts.
Note: The column "zeros" is the number of observed zeros.

Table
Table 4 contains the posterior predictive p values for these summary statistics.For brevity, we only provide results for 2017 in Table 4, and we present the entire table of p values in the Supporting Information.WAIC statistics for four models.
Note: p values for the remaining years (2018-2021) are presented in the Supporting Information.Abbreviations: DB, drivers belted; DNB, drivers not belted; PB, passengers belted; PNB, passengers not belted.
Table6contains the direct estimates of the state-level seat-belt use rates.The small area estimates of the proportions are reasonable, relative to the direct state-level estimates.The statelevel estimates are close to a typical value for a county-level predictor.The posterior standard deviations are also reasonable, relative to the target standard error of 0.025, established for the state level estimates.The posterior standard deviations for μ it,k range from 0.006 to 0.041, and the posterior standard deviations for θ it,k range from 0.008 to 0.044.The posterior standard deviations are usually below or close to the target standard error established for the state-level estimator.