### Abstract

- Top of page
- Abstract
- 1. Introduction
- 2. Modelling individual response probabilities
- 3. The model
- 4. Bayesian inference
- 5. Simulation study
- 6. Measuring drinking problems and alcohol-related expectancies among college students
- 7. Discussion
- Acknowledgement
- References
- Appendix: CPS-EQ Questionnaire
- Supporting Information

Randomized response (RR) models are often used for analysing univariate randomized response data and measuring population prevalence of sensitive behaviours. There is much empirical support for the belief that RR methods improve the cooperation of the respondents. Recently, RR models have been extended to measure individual unidimensional behaviour. An extension of this modelling framework is proposed to measure compensatory or non-compensatory multiple sensitive factors underlying the randomized item response process. A confirmatory multidimensional randomized item response theory model (MRIRT) is proposed for the analysis of multivariate RR data by modelling the response process and specifying structural relationships between sensitive behaviours and background information. A Markov chain Monte Carlo algorithm is developed to estimate simultaneously the parameters of the MRIRT model. The model extension enables the computation of individual true item response probabilities, estimates of individuals’ sensitive behaviour on different domains, and their relationships with background variables. An MRIRT analysis is presented of data from a college alcohol problem scale, measuring alcohol-related socio-emotional and community problems, and alcohol expectancy questionnaire, measuring alcohol-related sexual enhancement expectancies. Students were interviewed via direct or RR questioning. Scores of alcohol-related problems and expectancies are significantly higher for the group of students questioned using the RR technique. Alcohol-related problems and sexual enhancement expectancies are positively moderately correlated and vary differently across gender and universities.

### 1. Introduction

- Top of page
- Abstract
- 1. Introduction
- 2. Modelling individual response probabilities
- 3. The model
- 4. Bayesian inference
- 5. Simulation study
- 6. Measuring drinking problems and alcohol-related expectancies among college students
- 7. Discussion
- Acknowledgement
- References
- Appendix: CPS-EQ Questionnaire
- Supporting Information

In sample surveys, it can be difficult to obtain reliable information on stigmatizing or socially undesirable/unacceptable matters using the common direct questioning procedure. Direct questioning on sensitive matters often leads to refusals, non-responses, or socially desirable answers. Warner (1965) developed the randomized response (RR) technique to gather information on such sensitive matters by protecting the privacy of the respondents. It is shown in several studies (e.g., Lensvelt-Mulders, Hox, van der Heijden, & Maas, 2005) that the cooperation of respondents improved due to the RR technique. But despite the evident usefulness of the RR technique, inferences from applications utilizing it are limited to estimating population proportions. Further, the traditional RR models (e.g., Greenberg, Abul-Ela, Simmons, & Horvitz, 1969; Warner, 1965) are only appropriate for the analysis of univariate RR data, they do not account for individual response probabilities, and they do not allow for heterogeneity across respondents. In many cases, outcome data are multivariate or correlated, and it is appealing to model the individual outcomes while taking account of the dependency structure.

To motivate the problem, questionnaire data to measure different psycho-social dimensions of problem drinking among college students were collected on 793 students from four colleges/universities in North Carolina. Furthermore, responses to alcohol expectancy questionnaire items were collected, which measure alcohol-related sexual enhancement expectancies. A part of the participants were questioned via an RR technique to investigate whether this increases the accuracy of self-reports of sensitive information on the different latent dimensions. Further, interest is focused on relationships between latent factors, and their relationship with background variables (e.g., age, gender, racial origin).

Several attempts have been made to extend the class of RR models by modelling the item response process and/or by including various sources of information such as ancillary variables. Scheers and Dayton (1988) model the relation between the population proportion with the sensitive characteristic and covariate information. They showed that the use of relevant covariate information improved the estimation of the population proportion with the sensitive characteristic. Böckenholt and van der Heijden (2007) and Böckenholt, Barlas, and van der Heijden (2009) proposed an item randomized response model for multivariate dichotomous RR data that accounts for the possibility (1) that not all respondents may follow the randomization instructions (say ‘no’ regardless of the question), and (2) that respondents provide intentionally misleading answers to conceal socially undesirable behaviour. Fox (2005) and Fox and Wyrick (2008) modelled multivariate dichotomous and polytomous data with a randomized item response theory (IRT) model that accounts for individual differences in the response process, and that enables the computation of individual response probabilities, and the measurement of a unidimensional underlying sensitive characteristic. De Jong, Pieters, and Fox (2010) developed a mixture randomized IRT model for polytomous randomized responses.

The IRT modelling approach to RR data is generalized to measure multiple latent sensitive characteristics together with relationships with background variables using a structural multivariate regression model, given dichotomous or polytomous randomized responses. The situation is considered in which multiple latent factors underlie the manifest randomized responses in a compensatory or non-compensatory way. This study extends the work of Böckenholt and van der Heijden (2007), who proposed a between-item multidimensional item randomized response model for binary response data. In their study, multiple item bundles are considered, where each item bundle is used to measure a specific construct using the Rasch model in a non-compensatory way. At the level of observations, responses are assumed to be conditionally independently distributed given one of the factors. The present generalization makes it possible to handle also polytomous randomized item responses, compensatory items when the response process involves multiple constructs, and more advanced item response models that allow item-specific discriminations/loadings for multiple factors. A full Bayesian estimation procedure is proposed which supports the joint estimation of all parameters, including the estimation of all factor loadings. Here, the mixture model component for dealing with non-compliant response behaviour will not be considered.

In the spirit of multidimensional confirmatory item-factor models, a Bayesian multidimensional confirmatory IRT model is proposed for dichotomous and polytomous data to measure and relate factors underlying the individual sensitive characteristics given randomized response observations. The unobservable factors can be interpreted as a combination of subscale components, or as compensatory or non-compensatory factors that influence the item probabilities in a combined way. This modelling approach connects with recent developments in multidimensional IRT research showing the computational feasibility and increasing attention in the methodology (e.g., Chambers, 2010; Edwards, 2010; Reckase, 2009; Wirth and Edwards, 2007).

The proposed model consists of three components. At the first stage the multivariate RR data are related to individuals’ response probabilities via an RR model. At the second stage, the response process is modelled by assuming a multidimensional IRT model for the underlying true responses, which would have been observed if the responses had not been randomized. This enables the measurement of individual response probabilities and individual latent sensitive characteristics. At the third stage, the latent sensitive characteristics are considered to be outcomes of a multivariate regression model. This enables a marginal interpretation for the individual outcomes while appropriately accounting for the dependency structure. The multivariate model has the advantage that the dependency structure can be described parsimoniously in terms of correlation coefficients of the underlying latent characteristics. That is, heterogeneity across respondents and across groups can be properly modelled.

A Markov chain Monte Carlo (MCMC) algorithm is developed to estimate all parameters simultaneously. It is shown that the posterior computation can proceed through a Gibbs sampling algorithm using auxiliary variables. Two augmentation steps facilitate a sampling-based approach for estimating all model parameters simultaneously. First, discrete variables are defined that represent the true item responses that would have been observed without randomizing responses. The conditional distribution of the latent true item responses given the randomized responses are derived via Bayes’ theorem. Second, normally distributed latent variables are defined that are manifested as discrete true item responses through a threshold specification. The developed algorithm generalizes the procedure of Fox and Wyrick (2008) and De Jong *et al*. (2010) to deal with the multidimensional IRT model and the structural multivariate latent variable component.

In the following the three-stage multidimensional randomized item response theory model (MRIRT) model is presented. Then a general MCMC algorithm is described for dichotomous and polytomous RR data. Different prior choices are discussed that lead to proper posterior distributions. Then the posterior computations are illustrated with a simulation study. Subsequently, a description is given of the joint Bayesian multidimensional IRT analysis of the College Alcohol Problem Scale and Alcohol Expectancy Questionnaire RR data with careful attention to the underlying factor structure. Finally, the pros and cons of the new model are discussed and briefly compared with other approaches in the literature.

### 2. Modelling individual response probabilities

- Top of page
- Abstract
- 1. Introduction
- 2. Modelling individual response probabilities
- 3. The model
- 4. Bayesian inference
- 5. Simulation study
- 6. Measuring drinking problems and alcohol-related expectancies among college students
- 7. Discussion
- Acknowledgement
- References
- Appendix: CPS-EQ Questionnaire
- Supporting Information

In general, the RR technique is used to estimate the proportion, *π*, of respondents belonging to a sensitive class in the population. Horvitz, Shah, and Simmons (1967) proposed an unrelated question RR design that is based on two questions. One provocative question is related to the sensitive characteristic, and the other unrelated question refers to a non-sensitive innocuous attribute. Each respondent selects randomly, by means of a randomizing device such as a die or spinner, one of the two questions and answers it truthfully. The respondent does not tell the interviewer which question he has selected to answer. If the population proportion of the non-sensitive characteristic is known, and this is built into the randomizing device (see Boruch, 1971), the probability of a positive response will be

- (1)

where () is the number (proportion) of ‘yes’ answers reported by *n* individuals, and is the probability that the randomizing device selects the question related to the sensitive characteristic. Subsequently, with probability , the unrelated question is selected and with probability a positive response is given. Note that parameters and are known since they are specified by the randomizing device. Using equation (1), the proportion of persons affirming or denying the intrusive item can be accurately estimated. De Schrijver (2012) compared the forced response technique with the unrelated questioning technique and concluded that the forced response technique was better understood.

### 4. Bayesian inference

- Top of page
- Abstract
- 1. Introduction
- 2. Modelling individual response probabilities
- 3. The model
- 4. Bayesian inference
- 5. Simulation study
- 6. Measuring drinking problems and alcohol-related expectancies among college students
- 7. Discussion
- Acknowledgement
- References
- Appendix: CPS-EQ Questionnaire
- Supporting Information

An MCMC procedure is proposed for model estimation. This MCMC algorithm is based on developed MCMC methods for multidimensional IRT and factor-analytic models (e.g., Béguin & Glas, 2001; Bolt & Lall, 2003; Jackman, 2001; Lopes & West, 2004; Sheng & Wikle, 2007; Shi & Lee, 1998; Song & Lee, 2001; Yao & Boughton, 2007). A straightforward MCMC implementation is not possible, since discrete randomized response data are observed. Therefore, following the MCMC method of Fox and Wyrick (2008) for unidimensional IRT, a double augmentation step is proposed to sample the discrete true item response data and continuous latent response data given the RR data. As a result, the joint distribution of the parameters and augmented data are considered to circumvent a direct evaluation of the likelihood function, which is computationally intensive.

The joint posterior distribution can be expressed as

where defines the RR process given the characteristics of the randomizing device. The term defines the multidimensional IRT component for the true response data, and the multivariate model component for the factors.

Consider the multivariate regression model in equation (7) for the latent variables *θ* and the multidimensional IRT model in equation (3). Subsequently, the full conditional distribution is normal with mean

- (14)

where .

#### 4.1. Structural multivariate parameters

#### 4.2. Implementation issues

The MCMC algorithm can handle RR data as well as direct questioning data since the properties of the randomizing device are known and corresponding parameters can be set to specific values. For direct questioning data, is set to one. This corresponds with the approach of Chaudhuri and Mukerjee (1988) who permitted an option for direct questioning for those who volunteer to reveal the truth viewing the attribute not stigmatizing enough.

Ignorable missing response values are handled by sampling latent augmented data without truncating the values to a specific domain but based on given values of the item parameters and the latent variable. This imputation-based procedure creates a complete data set and the procedure is easily implemented in the MCMC algorithm. The imputed augmented values have larger standard deviations since they are not restricted to a specific domain such that uncertainty due to missing values is taken into account.

The convergence of the MCMC algorithm is depending on several factors. Convergence can be slow when the amount of missing information is high. In that case the latent person parameters and item parameters are poorly estimated with large variances. Also more iterations might be needed in order to obtain stable parameter estimates. Convergence can also be slow when the number of clusters is large, because for large *J* the posterior distribution for **T** given *ζ* becomes very tight, and, as a result, a drawn value of the covariance matrix **T** may remain close to its previous value. Convergence can be informally assessed by examining trace plots, time series plots, plots of the average of each parameter across multiple chains, and plots of the running average. Formal and informal convergence diagnostics can be found in Brooks and Gelman (1998) and Gilks (1996). Starting values for the MCMC algorithm can be obtained by fitting a multidimensional IRT model to the data but ignoring the randomized response character of the discrete response data using the MCMC algorithm of Béguin and Glas (2001). However, it will be shown in a simulation study that convergence properties of the proposed MCMC algorithm are good and independent of chosen starting values.

### 5. Simulation study

- Top of page
- Abstract
- 1. Introduction
- 2. Modelling individual response probabilities
- 3. The model
- 4. Bayesian inference
- 5. Simulation study
- 6. Measuring drinking problems and alcohol-related expectancies among college students
- 7. Discussion
- Acknowledgement
- References
- Appendix: CPS-EQ Questionnaire
- Supporting Information

In this section, results are reported from a simulation study for parameter recovery based on the MRIRT model for randomized item response data.

Using randomly generated starting values, two MCMC chains of 10,000 iterations each were simulated, and the first 5,000 iterations were discarded for the burn-in. Using the CODA package in R, convergence of the algorithm was assessed using several statistics as well as the visual inspection of trace plots. Figure 1 shows trace plots of two chains for the regression coefficients. For the MCMC chains of the regression parameters, Gelman's *R* statistic was estimated to be 1.01, which is below the recommended threshold of 1.10. Also for other parameters those statistics suggested convergence of the algorithm. The partial autocorrelation function for several chains suggested a first-order Markov process, only for some chains showing a statistically significant, but minor, autocorrelation of around .20 at lag 2.

Recovery of the loadings and the factor values was assessed graphically (not shown) and showed that the true against the re-estimated parameters closely followed the identity line. Finally, Table 1 shows the true and re-estimated parameters for the structural multivariate model on the subject parameters, indicating that the MCMC algorithm worked well for estimation of the model.

Table 1. Results of simulation study. Generating values, means and standard errors of recovered values | Gen. | RIRT model |
---|

Fixed | Coeff. | Mean | SD | 95% HPD |
---|

*First factor* |

| 0 | – | – | – |

| 1.00 | 1.04 | 0.04 | [0.95,1.12] |

| 0.00 | 0.00 | 0.07 | [−0.13,0.13] |

*Second factor* |

| 0 | – | – | – |

| 0.00 | −0.02 | 0.06 | [−0.15,0.10] |

| 1.00 | 1.00 | 0.04 | [0.92,1.08] |

Random | Coeff. | Mean | SD | 95% HPD |

| 1.00 | 0.92 | 0.09 | [0.79,1.06] |

| 0.30 | 0.35 | 0.10 | [0.17,0.51] |

| 1.00 | 1.01 | 0.08 | [0.88,1.14] |

### 6. Measuring drinking problems and alcohol-related expectancies among college students

- Top of page
- Abstract
- 1. Introduction
- 2. Modelling individual response probabilities
- 3. The model
- 4. Bayesian inference
- 5. Simulation study
- 6. Measuring drinking problems and alcohol-related expectancies among college students
- 7. Discussion
- Acknowledgement
- References
- Appendix: CPS-EQ Questionnaire
- Supporting Information

Thirteen items of the College Alcohol Problem Scale (CAPS; O'Hare, 1997) and four items of the Alcohol Expectancy Questionnaire (AEQ; Brown, Christiansen, & Goldman, 1987) were used to assess alcohol problems and alcohol-related expectancies among college students. The questionnaire items are given in the Appendix. The sensitive nature of the study supports an RR questioning technique to avoid refusals and misleading responses concealing socially undesirable behaviour. Any self-reported information about negative consequences of drinking is likely to be biased due to socially desirable responding. It is investigated whether the RR technique improved the cooperation of the respondents and the accuracy of the self-report data by comparing the RR outcomes with the direct questioning outcomes. Furthermore, using the MRIRT modelling approach, multiple sensitive constructs underlying both scales (CAPS and AEQ) are measured and their relationships with background information analysed.

The AEQ is used to measure the degree of expectancies associated with drinking alcohol. Expectancies related to alcohol use are known to influence alcohol use and behaviour while drinking (e.g., Werner, Walker, & Greene, 1995). The entire test consists of 90 items and covers six dimensions (see Brown *et al*., 1987). In the present study, attention was focused on alcohol use-related sexual enhancement expectancies using four items covering sexual enhancement expectancies, which are given in the Appendix.

The CAPS instrument is one of the major self-report measures used to asses drinking problems. The items cover socio-emotional (e.g., hangovers, memory loss, depression) as well as community problems (e.g., driving under the influence, engaging in activities related to illegal drugs, problems with the law). O'Hare (1997) developed the CAPS instrument to measure different psycho-social dimensions of problem drinking among college students. The two factors, socio-emotional and community problems, were identified from a factor analysis which explained more than 60% of the total variance. Fox and Wyrick (2008) analysed the CAPS data using a unidimensional RIRT model to measure a general construct alcohol dependence. Although the model described the data well, a multidimensional approach provides insight in the different factors related to problem drinking, factor-specific relationships with background variables, and supports the multidimensional nature of the CAPS.

Data were collected through a survey study in 2002 at four local colleges/universities, Elon University (*N* = 495), Guilford Technical Community College (*N* = 66), University of North Carolina (*N* = 166), and Wake Forest University (*N* = 66). A total of 351 students was assigned to the direct questioning (DQ) condition and 442 to the RR condition. Students in the DQ group served as the study's control group and were instructed to answer the questionnaire as they normally would. Students in the RR condition used a spinner to assist them in completing the questionnaire. For each item of the CAPS and AEQ, the spinner was used as a randomizing device and the outcome determined whether to answer honestly or to register a forced response. The properties of the spinner were set such that an honest answer was requested with a probability of 60% and a forced response with a probability of 40%. When a forced response was to be generated, each response was given an equal probability of 20%. No identifying information was collected, but age, gender, and ethnicity were also registered. Each class of students (5–10 participants) was randomly assigned to the DQ group or the RR group and selected from the same population. It was not possible to randomly assign students.

#### 6.1. Multidimensional scale analysis

First, a two-factor RIRT model was estimated, where items 1 and 14 had just one free non-zero loading to identify two factors. In Figure 2, the estimated factor loadings are given for the two-factor RIRT model stated in equation (16). The estimated factor loadings are standardized by dividing each loading by the average item loading. For each component, the sign of the loadings is set in such a way that a higher latent score corresponds to a higher observed score. It can be seen that items 14–17 measure the factor alcohol-related expectancy and that most other items are clearly measuring another factor, socio-emotional/community problems, which was labelled alcohol dependence in the Fox and Wyrick (2008) analysis. The loadings are all above .75, which indicates that both factors can be interpreted. Note that expectancies are increasing with alcohol consumption and slightly diminish socio-emotional/community problems given the negative factor loadings for the other component.

Second, a three-factor RIRT model was estimated to investigate whether problems associated with drinking were represented by two factors (i.e., socio-emotional and community problems), and sexual enhancement expectancies by another factor. Besides items 1 and 14, the loadings of item 5 were also restricted to identify the community problems factor as reported in the literature. In Table 2, the standardized estimated factor loadings of the three factors are given. Items 1–4, 6, 8, and 9 associate with the first factor, representing socio-emotional problems, and have factor loadings higher than .60. This first component represents drinking-related problems including depression, anxiety, and trouble with family, where the problems will increase with alcohol consumption. Some of the items also associate with the two other components. The second component, labelled community problems, covers items 5, 7, and 10–13, with loadings higher than .60, except for item 12. As reported in the literature, this item is associated with the community problems factor but relates also to the other components and most strongly with the third component. The community problems factor covers acute physiological effects of drunkenness together with illegal and potentially dangerous activities (e.g., driving under the influence).

Table 2. CAPS-EA scale: estimated weighted factor loadings for the three-factor analysisSubscale Items | Three-factor RIRT model | |
---|

Socio-emotional | Comp. 1 | Comp. 2 | Comp. 3 | |

1 Feeling sad, blue or depressed | 1.00 | .00 | .00 | |

2 Nervousness or irritability | .99 | .12 | −.05 | |

3 Hurt another person emotionally | .94 | .33 | .08 | |

4 Family problems related to your drinking | .82 | .55 | .15 | |

6 Badly affected friendship or relationship | .83 | .49 | .26 | |

8 Caused other to criticize your behavior | .78 | .50 | .38 | |

9 Nausea or vomiting | .73 | .39 | .58 | |

Community problems | Comp. 1 | Comp. 2 | Comp. 3 | |

5 Spent too much money on drugs | .00 | 1.00 | .00 | |

7 Hurt another person physically | .49 | .83 | .24 | |

10 Drove under the influence | .44 | .73 | .52 | |

11 Spent too much money on alcohol | .61 | .65 | .46 | |

12 Feeling tired or hung over | .59 | .41 | .69 | |

13 Illegal activities associated with drug use | .08 | .95 | .30 | |

Alcohol expectancy scale | Comp. 1 | Comp. 2 | Comp. 3 | |

14 I often feel sexier after I've had a couple of drinks | .00 | .00 | 1.00 | |

15 I'm a better lover after a few drinks | −.11 | −.06 | .99 | |

16 I enjoy having sex more if I've had some alcohol | −.16 | −.01 | .99 | |

17 After a few drinks, I am more sexually responsive | −.16 | −.02 | .99 | |

For the two-factor RIRT model, the second and third response options of the AEQ items were more likely to be endorsed than the second and third response options of the CAPS items. The corresponding threshold estimates of the AEQ items in comparison to the CAPS items for response categories 2 and 3 are lower – except for CAPS item 12 (feeling tired or hung over) on which students scored relatively high. The AEQ items 14–17 and CAPS item 12 can be considered as the less severe items. Items with relatively high thresholds, item 4 (family problems related to drinking), item 5 (spending too much money on drugs), item 7 (hurting another person physically), and 13 (illegal activities associated with drug use) were severe, where most students responded almost never or seldom. The threshold estimates of the three-factor model were similar, except the threshold estimates of item 5 were not very stable and much higher. This follows from the fact that most responses above one were considered to be forced randomized responses. Item 5 with a very low prevalence did not provide much information to assess drinking problems. A prior restriction on the upper bound led to a numerically stable solution.

#### 6.2. Structural model analysis

In the two- and three-factor RIRT models, the multivariate latent factor model was extended with an explanatory variable called RR and an indicator variable called Female (which equals 1 when true). For each dimension, both explanatory variables were included.

In Table 3, the structural multivariate parameter estimates of the three-factor and two-factor RIRT models are given. For the two-factor model, loadings of items 1 and 14 were fixed to identify two factors. One overall factor represents a composite measure of alcohol-related problems (i.e., socio-emotional and community problems) and the other factor alcohol-related sexual enhancement expectancies. It can be seen that there is a moderate positive covariance of 0.65 between the two factors, where the component variances are slightly smaller than 1.

Table 3. CAPS-EA scale: Parameter estimates of two- and three-factor RIRT modelParameter | Two-factor | Three-factor |
---|

Mean | SD | Mean | SD |
---|

Socio-emotional/community |

(RR) | .20 | .09 | .21 | .10 |

(Female) | .01 | .06 | .05 | .07 |

Alcohol expectancy |

(RR) | .22 | .06 | .21 | .07 |

(Female) | .03 | .04 | .06 | .05 |

Community |

(RR) | | | .32 | .10 |

(Female) | | | -.30 | .09 |

**Variance parameters** | Mean | SD | Mean | SD |

| 0.96 | 0.05 | 0.98 | 0.05 |

| 0.65 | 0.07 | 0.55 | 0.06 |

| | | 0.35 | 0.08 |

| 0.98 | 0.05 | 1.06 | 0.05 |

| | | 0.37 | 0.08 |

| | | 0.99 | 0.07 |

**Information criteria** |

−2log-likelihood | | 20,757 | | 19,678 |

The students in the RR condition score significantly higher for both factors. For the RR group, the average latent scores are .20 and .22 on the composite drinking problem scale and the alcohol-related expectancy scale, respectively, which are both zero in the DQ group. Fox and Wyrick (2008), who performed a unidimensional RIRT analysis using only the CAPS items, reported an RR effect of .23. The present multidimensional approach shows a comparable RR effect, as it does for the AEQ scores. The females and males show comparable scores on both factors.

In the three-factor RIRT model, problems associated with drinking are represented by two factors (i.e., socio-emotional and community problems) and sexual enhancement expectancies by another factor. The RR effects are significantly different from zero for all three factors, where the effect on the factor representing community problems related to alcohol use is around .32 and slightly higher than the effects on the other factors, which are around .21. It seems that students were less willing to admit to alcohol-related community problems, which induced more socially desirable responses than the other factors. The relatively high thresholds of items 5, 7, and 13 measuring community problems indicated that they were more severe and most likely more sensitive. Male students scored significantly higher than the female students on the factor representing community problems related to alcohol use. That is, male students were more likely to experience alcohol-related community problems than females. This gender effect was not found for the other factors.

Interest was focused on the average student drinking problems and expectancies of the four selected colleges/universities that took part in the experiment. The clustering of students in colleges/universities was represented using effect coding. In Table 4 the parameter estimates are given for the two-factor and three-factor models. For each factor, the intercept represents the average score across colleges and universities, which was set to zero. In the two-factor model, the average score of the RR group is .29 and .19, both significantly higher than zero. It follows that the mean scores of the alcohol-related problems factor for Guilford Technical Community College and Elon University are significantly higher than the means for University of North Carolina and Wake Forest. For the alcohol-related expectancy factor, Guilford Technical Community College scored on average higher than the other colleges and universities.

Table 4. CAPS-EA scale: Differences across colleges and universities using the two- and three-factor RIRT modelParameter | Two-factor | Three-factor |
---|

Mean | SD | Mean | SD |
---|

Socio-emotional/community |

(RR) | .29 | .08 | .28 | .09 |

*School variables* | | | | |

(Elon) | .19 | .06 | .19 | .07 |

(UNCG) | −.19 | .10 | −.14 | .11 |

(Wake Forest) | −.23 | .12 | −.11 | .14 |

(Guilford) | .24 | .12 | .05 | .13 |

Alcohol expectancy |

(RR) | .19 | .06 | .19 | .07 |

*School variables* | | | | |

(Elon) | .04 | .05 | .04 | .05 |

(UNCG) | −.11 | .07 | −.10 | .08 |

(Wake Forest) | −.23 | .08 | −.23 | .09 |

(Guilford) | .30 | .09 | .29 | .11 |

Community |

(RR) | | | .30 | .09 |

*School variables* | | | | |

(Elon) | | | .12 | .06 |

(UNCG) | | | −.13 | .10 |

(Wake Forest) | | | −.37 | .13 |

(Guilford) | | | .38 | .13 |

**Variance parameters** | Mean | SD | Mean | SD |

| 0.94 | 0.05 | 0.96 | 0.05 |

| 0.62 | 0.07 | 0.55 | 0.06 |

| | | 0.45 | 0.07 |

| 0.97 | 0.05 | 0.99 | 0.05 |

| | | 0.47 | 0.06 |

| | | 0.94 | 0.05 |

**Information criteria** |

−2log-likelihood | | 20,858 | | 19,660 |

For the three-factor model, the mean scores of the three factors of the RR group are significantly higher than zero and comparable when controlling for differences across universities and colleges. It follows that Guilford Technical Community College has the highest average score of alcohol-related community problems and of alcohol-related sexual enhancement expectancies. The results show that alcohol-related sexual enhancement expectancies and community problems are positively correlated, where scores differ across universities and colleges. The estimates of the RR effect indicate that the RR group scored significantly higher in comparison to the DQ group on each subscale. Although validation data are not available, it is to be expected that the RR technique led to an improved willingness of students in answering truthfully, given the random assignment to DQ and RR conditions.

### 7. Discussion

- Top of page
- Abstract
- 1. Introduction
- 2. Modelling individual response probabilities
- 3. The model
- 4. Bayesian inference
- 5. Simulation study
- 6. Measuring drinking problems and alcohol-related expectancies among college students
- 7. Discussion
- Acknowledgement
- References
- Appendix: CPS-EQ Questionnaire
- Supporting Information

In educational and psychological measurement, it is often more realistic to assume that multiple constructs influence the performance on test items. The multidimensional item response theory model can be used to assess the underlying latent variable structure given the test results (Reckase, 2009). When surveying sensitive topics, direct questioning may lead to social desirability bias. Therefore, in combination with a randomized response design, a multidimensional randomized item response model is proposed to measure multiple sensitive constructs given multivariate randomized response data. The confirmatory multidimensional model presented can handle dichotomous and polytomous randomized item responses. The application shows a model belonging to the class of compensatory models. However, when every item measures one construct a non-compensatory MRIRT model can be stated in a similar way.

Markov chain Monte Carlo methods have developed to tackle the high-dimensional integration problem in confirmatory multidimensional IRT analysis (e.g., Edwards, 2010; Sheng, 2010). Cai (2010) proposed a Metropolis–Hastings Robbins–Monro algorithm for an exploratory analysis given polytomous response data. Béguin and Glas (2001) developed a full Gibbs sampling procedure and proposed several posterior predictive checks. Yao and Boughton (2007) showed MCMC estimation of multidimensional partial credit models and the assessment of subscale scores. The present MCMC algorithm for the estimation of the MRIRT model is based on a double augmentation scheme to deal with the categorical randomized response outcomes. Furthermore, the MCMC algorithm can also handle the estimation of the multivariate structural model parameters. This includes the structural regression parameters, which specify the relationship between the multiple sensitive constructs and the explanatory background information, and the correlation structure among the latent variables. The MCMC algorithm has been developed in R and will be made freely available via the internet.

The present survey study on alcohol-related sexual enhancement expectancies and drinking problems showed that randomized response questioning improved the cooperation of the respondents and reduced domain-specific social desirability bias. The joint analysis results support the alcohol expectancy theory (e.g., Brown *et al*., 1987), which states that positive expectancies due to alcohol use lead to more positive initial drinking experiences, leading in turn to more positive expectancies. Here, it was shown that alcohol-related sexual enhancement expectancy scores were positively correlated with subscale scores for alcohol-related socio-emotional and community problems. In the literature, alcohol-related expectancies have been found to be useful in predicting drinking problems and drinking behaviour, and patterns of problematic use (e.g., Werner *et al*., 1995). The randomized response technique can improve the accuracy of the self-report data and related predictions, while the multidimensional modelling approach can improve the accuracy of subscale scores by using the additional subscale information (e.g., Yao & Boughton, 2007).

The measurement of alcohol expectancies of individuals is important to identify current and predict future problem drinking. The randomized response technique can improve the quality of the diagnostic self-report data, when respondents tend to underreport alcohol consumption and the negative effects of alcohol use due to social desirability or potential legal ramifications. The multifactor modelling approach will support the multifactorial nature of the expectancy questionnaires and individual measurements of expectancy behaviour given randomized response data.