# Journal of the Royal Statistical Society: Series A (Statistics in Society)

© Royal Statistical Society

Edited By: J. Carpenter and H. Goldstein

Impact Factor: 1.702

ISI Journal Citation Reports © Ranking: 2015: 13/49 (Social Sciences Mathematical Methods); 24/123 (Statistics & Probability)

Online ISSN: 1467-985X

Associated Title(s): Journal of the Royal Statistical Society: Series B (Statistical Methodology), Journal of the Royal Statistical Society: Series C (Applied Statistics), Significance

#### 177:3

*Using panel data for partial identification of human immunodeficiency virus prevalence when infection status is missing not at random*, by B. Arpino, E. De Cao and F. Peracchi, *Journal of the Royal Statistical Society, Series A, Statistics in Society*, Volume 177, part 3 (2014), pages 587 - 606

Data sets: The original datasets are freely available at the website:

http://malawi.pop.upenn.edu/malawi-data-mlsfh

We provide the datasets we prepared for the analysis for each wave separately: data_2004, data_2006, data_2008.

Each file is a comma separated text file with one observation per line and variables names on the first line.

Each dataset contains the following variables:

- respid (respondent identification number. It can be used to merge the three waves);

- id (a new respondent respondent identification number ranging from 1 to the sample size);

- Village (a code for each village);

- HIV (HIV status: 1 = infected; 2 = not infected; 3 = missing); - HIV_dyn (HIV status with missing values imputed exploiting the absorbing nature of HIV and past or future observation. It is the variable to be used to calculate the "dynamics bounds");

- Gender (respondent gender: 0=Female; 1=Male);

- Age (respondent age in categories,

in 2004 data: 0=equal to and below age 20; 1=between age 20 and 29 (included); 2=between age 29 and 39 (included); 3=above age 39;

in 2006 data: 0=equal to and below age 22; 1=between age 22 and 31 (included); 2=between age 31 and 41 (included); 3=above age 41;

in 2008 data: 0=equal to and below age 24; 1=between age 24 and 33 (included); 2=between age 33 and 43 (included); 3=above age 43)

- Region (respondent region of residence: 1=Centre; 2=South; 3=North);

- i_diffsex (a binary indicator equal to 1 if the gender of interviewer and interviewee is different, 0 otherwise. We used it as instrumental variable);

- i_int_before (a binary indicator equal to 1 for more experienced interviewers, 0 otherwise. We used it as instrumental variable);

- i_agecat (interviewer's age categorised in two classes: 0=below age 23; 1=equal to and above age 23. We used it as instrumental variable);

- i_month (the month of the first interview attempt in two categories: 0=May/June; 1=July/August. We used it as instrumental variable);

- miv (number of sexual partners categorised in 3 classes: 1=0 or 1 sexual partners; 2=2 sexual partners; 3=3 sexual partners; 4=4 or more sexual partners. We used it as monotone instrumental variable);

- Ethnic_group (respondent ethnic group: 1=Yao; 2=Chewa; 3=Lomwe; 4=Tumbuka; 5=Ngoni; 6=Sena; 7=Tonga; 8=Senga; 9=Other. We used this variable in the propensity score weighting method and Heckman selection model);

- Education (respondent education level: 0=no school; 1=primary level; 2=secondary level; 3=higher; 99=missing. We used this variable in the propensity score weighting method and Heckman selection model);

- Marital_status (respondent marital status: 1=married; 2=separated; 3=divorced; 4=widowed; 5=never married; 99=missing. We used this variable in the propensity score weighting method and Heckman selection model);

- Main_survey_outcome (indicates if the respondent participated in the main survey and the reasons for non participation. The categories are: 1=completed; 2=refused; 3=hospitalised; 5=not known; 6=temporarily absent; 7=moved; 8=other. We used this variable together with the information on the completion of VCT survey to define unit respondents - see the variable "Uni_resp");

- VCT_survey_outcome (indicates if the respondent participated in the Voluntary Consulting and Test (VCT) and the reasons for non participation. The categories are: 2=refused; 3=hospitalised; 5=not known; 6=temporarily absent; 7=moved; 8=other; 10=HIV negative; 11=HIV positive; 12=indeterminate; 13=results lost. We used this variable together with the information on the completion of the main survey to define unit respondents - see the variable "Uni_resp");

- Unit_resp (indicates if a respondent has to be considered as unit respondent or not: 0=no; 1=yes).

Computer codes:

To estimate the bounds we created the following programs using the software R:

- bounds

- blow and bup

- 2sboot

- IVbounds

- IVblow and IVbup

- IV2sboot

- MIVbounds

- MIVblow and IVbup

- MIV2sboot

The program "bounds" estimates classical and dynamic bounds.

It takes as input: HIV = is a categoriacal variable representing the HIV status:

1 = infected; 2 = not infected; 3 and higher = missing.

It gives as output: lower and upper bound and width.

The program "bounds" can be used both for classical and dynamic bounds. In the first case the input vector is the original (non adjusted) HIV status variable at time t.

In the second case the "adjusted" HIV status variable at time t has to be used that is, exploiting the absorbing nature of HIV, past and/or future information on HIV status is used to impute some of the missing HIV status at time t.

The program "bounds" assumes that all data management is done before running it.

The programs "blow" and "bup" give a separate estimate of the lower and upper bounds, respectively.

They work as the program "bounds" and are employed in the program "2sboot" (see below).

They take as input: HIV (as for "bounds")

They give as output: the lower and upper bound, respectively.

The program "2sboot" can be used to obtain a two-stage bootstrap estimates of the standard error of the lower and upper bounds and the confidence interval for the HIV prevalence using the procedure suggested by Imbens and Manski (2004).

In the first stage, villages are selected. In the second stage, a bootstrap sample of individuals within each selected village is considered.

It takes as input: HIV (as for "bounds"), id (identification code for each individual), village (categorical variable representing village codes), nboot (number of bootstrap replicates).

It gives as output: lower bound, standard error of lower bound, upper bound, standard error of upper bound, lower limit of confidence interval, upper limit of confidence interval, width of confidence interval.

Note: the program "2sboot" recalls the functions "blow" and "bup"

The program "IVbounds" estimates bounds with IV restrictions.

It takes as input: HIV (see the note for the program "bounds") and x which is a categorical variable representing the IV variable.

It gives as output: a list containig: 1) a matrix including all the subsample bounds; 2) IV lower bound; 3) IV upper bound 4) IV bound's width.

Note: the program "IVbounds" recalls the program "bounds"

The program "IV2sboot" is similar to "2sboot" but it exploits the IV restrictions.

It takes as input: data (a matrix, with HIV as first column and an instrumental variable as second column), id (identification code for each individual), village (categorical variable representing village codes), nboot (number of bootstrap replicates).

It gives as output: lower bound, standard error of lower bound, upper bound, standard error of upper bound, lower limit of confidence interval, upper limit of confidence interval, width of confidence interval.

Note: the program "IV2sboot" recalls the functions "IVblow" and "IVbup"

The program "MIVbounds" estimates bounds with MIV restrictions.

It takes as input: HIV (as for "bounds") and z which is a categorical variable representing the MIV variable.

It gives as output: a list containig: 1) a matrix including all the subsample bounds; 2) MIV lower bound; 3) MIV upper bound 4) MIV bound's width.

Note: the program "MIVbounds" recalls the program "bounds"

The programs "MIVblow" and "MIVbup" give separate estimates of the lower and upper bounds, respectively. They work as the program "MIVbounds" and are employed in the program "MIV2sboot" (see below).

They take as input: as for "MIVbounds"

They give as output: lower and upper bound, respectively.

Note: these programs recall the program "bounds"

The program "MIV2sboot" is similar to "IV2sboot" but it exploits the MIV restrictions.

It takes as input: data (a matrix, with HIV as first column and a MIV variable as second column), id (identification code for each individual), village (categorical variable representing village codes), nboot (number of bootstrap replicates).

It gives as output: lower bound, standard error of lower bound, upper bound, standard error of upper bound, lower limit of confidence interval, upper limit of confidence interval, width of confidence interval.

Note: the program "MIV2sboot" recalls the functions "MIVblow" and "MIVbup"

Elisabetta De Cao

Department of Economics, Econometrics and Finance

University of Groningen

Nettelbosje 2 9747 AE Groningen

The Netherlands

E-mail: elisabetta.decao@gmail.com

*Geostatistical survival models for environmental risk assessment with large retrospective cohorts*, by H. Jiang, P. E. Brown, H. Rue and S. Shimakura, *Journal of the Royal Statistical Society, Series A, Statistics in Society*, Volume 177, part

**D**ataset: We were not able to provide a real dataset because of privacy and confidentiality. However, we have provided the file "gmrfSim.rnw" which contains R code to simulate data and demonstrate the application.

To use the programs, one needs to download the following R libraries: geostatsp, INLA, geostatsinla and abind.

Huan Jiang

Prevention and Research

Cancer Care Ontario

620 University Avenue

Toronto, Ontario

Canada

M5G 2L7

E-mail: hedy.jiang@cancercare.on.ca

*On the epidemic of financial crises*, by N. Demiris, T. Kypraios and L. V. Smith, *Journal of the Royal Statistical Society, Series A, Statistics in Society*, Volume 177, part 3 (2014), pages 697–723

**************************************************************

The readme file describes the programs required for inferring the parameters of the two-level epidemic model and computing the threshold parameter Rstar. The programs are written in Fortran90.

**************************************************************

*Average household size and the eradication of malaria*, by L. Huldén, R. McKitrick and L. Huldén, *Journal of the Royal Statistical Society, Series A, Statistics in Society*, Volume 177, part 3 (2014), pages 725 - 742

INSTRUCTIONS:

The software is written for Stata version 12.

Download the file "maldeng.txt" and the accompanying csv data files into a folder.

- Rename maldeng.txt as maldeng.do (i.e. change the extension to .do)

- Create a subfolder called data and put all the csv files in it.

- Create a subfolder called figures

- Create a subfolder called logfiles

Execute maldeng.do in Stata. It will read and merge all data sets, then generate the output file and all the figures, putting them in their respective folders.

Ross McKitrick

Department of Economics and Finance

University of Guelph

Guelph

Ontario

N1G 2M5

Canada

E-mail: ross.mckitrick@uoguelph.ca