COLLECTING DATA DURING AN EPIDEMIC: A NOVEL MOBILE PHONE RESEARCH METHOD

. This study developed a data collection method, combining (i) Random-Digit Dialing and Interactive Voice Response to sample and screen respondents, and (ii) Computer-Assisted Telephone Interviewing to survey 2,265 respondents during the 2014 Ebola epidemic. The response, cooperation, refusal, and contact rates computed according to the American Association of Public Opinion Research were 51.97%, 52.62%, 41.85%, and 98.77%, for IVR, and 91.10%, 91.65%, 8.30%, and 99.40% for CATI. A comparison with Demographic and Health Surveys conﬁrmed that the sample is not nationally representative. However, this method oﬀers promise for data collection at a low cost ($24) and without any in-person interaction.


Introduction
More and more researchers collect survey data through the use of mobile phones (Toninelli et al., 2015).Mobile phone technology has removed significant barriers inherent to traditional survey data collection: It helps in gathering high-frequency panel data (Dillon, 2012, Hoogeveen et al., 2014, Ballivian et al., 2015), it provides timely access and monitoring of data collected through face-to-face surveys (Tomlinson et al., 2009;Schuster and Brito, 2011;Hughes et al., 2016), and it allows for fast and low-cost data collection (Schuster and Brito, 2011;Mahfoud et al., 2015;Leo et al., 2015;Garlick et al., 2019).Yet, as mobile phones are widely used as a data collection tool in developed countries where access is common, this research method is getting more and more traction in developing settings, too (Gibson et al., 2017).
Given the rise in mobile phone penetration rates in developing economies (World Bank, 2016), mobile phone surveys are increasingly used to gather national statistics and to conduct monitoring, bio-surveillance, and disaster management (Gallup, 2012, Twaweza East Africa, 2013, Bauer et al., 2013, Hoogeveen et al., 2014, van der Windt and Humphreys, 2014, Garlick et al., 2019).Researchers use mobile phones to conduct different types of mobile phone interviews (Lau et al., 2019): Interactive Voice Response (IVR) surveys, which rely on a pre-recorded voice recording to ask questions to respondents; Computer-Assisted Telephone Interviewing (CATI), which requires trained interviewers to make live calls to respondents following a script provided by a software application; SMS surveys, which require respondents to type an answer and send back a text message (Dabalen et al., 2016).
Yet, gathering data in developing settings-where there might be weak institutions, limited resources and infrastructure, cultural constraints, and low literacy-can be even more challenging than in developed countries (Grosh and Glewwe, 2000, Ganesan et al., 2012, Dabalen et al., 2016).In developing economies, for example, there is often no access to an initial list of contacts or publicly available data sources, and researchers need to gather baseline data themselves to have a sampling frame.Times of emergency, such as conflicts, infectious disease epidemics, or weather-related disasters, when it is hard to reach respondents in person for interviews, exacerbate these challenges.
This study tested the feasibility of a novel mobile phone data collection method to interview more than 2,000 respondents during the 2014 Ebola epidemic in Liberia.The research method uses (i) Random-Digit Dialing (RDD) and Interactive Voice Response (IVR) surveys to sample and screen respondents, and (ii) Computer-Assisted Telephone Interviewing (CATI) to conduct (30-45 minute) interviews.This method builds upon the RDD selection approach outlined by Leo et al. (2015), in which the sample was randomly selected through an online platform and the data collection was performed through an IVR survey. 1 The authors assessed whether mobile phone surveys were a feasible and cost-effective approach to collect data in four middle-or lowincome countries, focusing on whether the method could reach a nationally representative sample and how to improve its survey completion rates.Similarly, L' Engle et al. (2018) used RDD and IVR surveys to collect survey data in Ghana, assessing the response rate and representativeness of the obtained sample compared to face-to-face national surveys.
This method also builds upon previous studies, which used CATI to collect high frequency data.Hoogeveen et al. (2014) provided examples of phone surveys at high frequency in Tanzania and South Sudan through a call center.A similar approach was used by Dillon (2012) to elicit data regarding farmer expectations, production, and income levels over time.Demombynes et al. (2013) also used a similar high frequency survey approach, where the authors randomized the level of incentives and the phone equipment to increase response rates in South Sudan.Finally, Garlick et al. (2019) compared frequencies of in-person or CATI interviews to micro-enterprises, and found no difference in data quality or response rates between high-frequency CATI interviews and low-frequency in-person interviews.
By combining these established methods (RDD, IVR, CATI) in a novel manner, this study developed a data collection procedure which does not require in-person contacts, thus allowing researchers to gather survey data in challenging settings.In fact, although limitations in the application of mobile phones as unique data collection devices (Kempf and Remington, 2007) remain, evidence regarding their use -both as a method to gather survey data as well as to select and screen an initial sample of respondents to interview-is still lacking.This research method aims at overcoming two specific challenges related to data collection in developing countries and at times of emergency.
First, researchers begin field research by developing a sampling frame.Usually, they seek access to an initial list of contacts, such as a list of respondents from past studies or a list of phone numbers contained within the datasets from collaborating institutions (The World Bank Group, 2014, The World Bank Group, 2015).Alternatively, researchers may develop a sampling frame from publicly available data sources, such as large-scale national household surveys or population censuses, which provide the advantage of being nationally representative; or, they simply gather data themselves through a face-to-face baseline survey.However, getting a sampling frame is challenging when data do not exist or face-to-face data collection is not feasible.
Second, in a state of emergency, reaching survey respondents in person for interviews can be challenging, or the risks and costs associated with data collection in order to have a large 1 See Waksberg (1978) and Massey et al. (1997) for the use of RDD in the developed world.
This article is protected by copyright.All rights reserved.
enough sample may be insurmountable.For example, at the time of the Ebola epidemic-or currently during Covid-19-in-person contacts should be limited, if not avoided altogether.Similarly, in the aftermath of hurricanes, floods, or earthquakes, traveling to remote areas might be impossible.As weather-related disasters or infectious diseases remain a worldwide threat, especially in developing countries (United Nations Office for Disaster Risk Reduction, UNISDR, 2015), in-field data collection may not be always feasible.
This study proposes to use mobile phone technology as the sole tool for all stages of data collection, in order to overcome both of the aforementioned challenges.The research method allows researchers to conduct the entire data collection while relying solely on mobile phones, precluding the need for prior data or fieldwork activities to have a sampling frame or to gather survey data.While the studies mentioned above required at least one in-person interaction in order to facilitate data collection through mobile phones, the proposed method does not necessitate any physical contact with the respondents.This is key when a sampling frame is not available or fieldwork activities are excessively demanding, such as in the case of emergencies.Furthermore, this method was implemented and tested in a developing country where this type of technology is most needed.
The paper is organized as follows: Section 2 describes the method used to collect the data and the analysis conducted; Section 3 provides a description of the results by estimating call outcomes and rates, the representativeness of the survey sample, and the implementation costs of this research method; Section 4 discusses the lessons learned; Section 5 concludes the paper.

Methods
The goal of the initial project was to study the political economy of the 2014 West African Ebola epidemic (Maffioli, 2020), by gathering survey data on individuals' level of trust and perceived corruption toward several institutions, and their opinions on the government's actions during the response.However, successfully accomplishing this goal required surmounting significant survey data challenges.There was indeed no baseline data available to select respondents, and gathering face-to-face data was impossible due to the high costs and risks at the time of the epidemic.Thus, the project relied solely on mobile phone technology for both stages of the data collection procedure: (1) sampling and screening of the respondents, and (2) data gathering.
This study tested the feasibility of this research method, by conducting more than 2,200 interviews in Liberia between October 2015 and June 2016.Call outcomes and response, cooperation, refusal, and contact rates were calculated according to the American Association of Public Opinion Research guidelines (AAPOR, 2016).The representativeness of the survey sample was explored using the nationally representative Demographic and Health This article is protected by copyright.All rights reserved.2. 1.1. Sampling and Screening of Respondents (IVR).Due to the lack of access to a sample available in the pre-Ebola period, an online platform called VotoMobile was employed to draw an initial list of respondents through RDD.2 3 Given the known structure of the mobile phone numbers in Liberia, the platform created a list of randomly generated phone numbers that fit that structure through an algorithm.The platform was set up to select phone numbers from the two main Liberian phone companies at that time (Liberian Telecommunications Authority, 2012), LonestarCell/MTN with a share of 49.55% and Cellcom with a share of 40.36%.4These companies had a similar phone structure but different mobile phone prefixes, which were exploited in the algorithm used for the random selection. 5The platform was set up to randomly select half of the numbers from LonestarCell/MTN and half from Cellcom.The platform also mimicked Liberian phone numbers, using the prefixes of these two main phone companies.
The platform went through the randomly generated phone numbers.It was set up to attempt up to four calls to the same phone number: After the first attempt, the second call was placed after 5 minutes, while the third and the fourth calls were placed after 8 hours each.The calls were made 7 days a week from 8:00 a.m. to 8:00 p.m. Once the phone number connected and a person picked up the call-implying that it was an existing Liberian phone number-a short pre-recorded survey (IVR) informed the respondent that she/he had been selected for an interview.The IVR message asked the respondent three questions to gather her/his residence location at the beginning of the epidemic: (1) Whether she/he lived in Montserrado County; (2) if not, in which other county did she/he live; and (3) which district did she/he live in.
The aim was to gather a heterogeneous sample of individuals from all 15 counties in Liberia, to compare individuals affected by Ebola with those unaffected by it.More specifically, the first question was set up to limit respondents from Montserrado County, the most urbanized and populous region in Liberia and where most of the Ebola cases were concentrated: Selecting respondents from other counties would allow for a heterogeneous sample to make meaningful comparisons for the analysis. 6If the respondent answered all of the three IVR questions, then she/he would also be informed that someone would call back from the local NGO and that, upon completion of the CATI survey, she/he would receive $1 of free airtime for her/his phone as a sign of appreciation.The IVR survey was conducted between October 21 and 31 of 2015.
As no restrictions were placed on the selection process through IVR, any person answering the phone was considered eligible.Following AAPOR (2016) guidelines, complete interviews (I) were defined as answering the three location questions to gather information on both the county and the district where the respondent resided at the beginning of the epidemic.Partial interviews (P) were defined as answering the first two questions of the survey to gather the county, but not the district.Since the IVR confirms that a real person picked up the call only when the respondent answers the first question, refusals (R) were defined as not answering the first IVR question, i.e., whether the respondent resided in Montserrado County.Break-offs (R) were defined in a similar way for two reasons: First, following the definition of break-offs in AAPOR (2016) guidelines, the IVR survey was so short that answering the first question meant answering almost 50% of the survey; secondly, partial and complete interviews already took into account the completion of the second and third questions in the IVR message.In practice, I conservatively categorized as break-offs phone numbers which were dialed, for which the phone rang but the respondent did not pick up the call.According to VotoMobile coding, this category does not allow to distinguish whether the user did or did not deliberately respond to the call. 7In addition, unknown eligibility (UH) was classified as phone numbers which were always busy.
Finally, phone numbers that were dialed but could not be confirmed as known working numbers were classified as not eligible.A high number of ineligible calls is expected because of the automated nature of the RDD calling system. 8This category includes: (i) Phone numbers that never responded because the call never rang on a person's phone due to an error on the provider's end.In practice, these are non-existing phone numbers resulting from the random dialing; 9 (ii) phone numbers temporarily out of service; (iii) phone numbers unable to connect due to specific technological issues; (iv) phone numbers for which the call connected at the network level, but a valid connection to an individual's mobile phone could not be confirmed.These numbers were categorized as phone numbers which connected, but there was no or invalid selection, and correspond to quick hang-ups.
2. 1.2. Data Gathering (CATI).Screened and selected respondents through the IVR survey were called back by real enumerators from a local NGO to conduct a 30-45-minute interview. 10 The sample for CATI was selected for the initial research project (Maffioli, 2020) among the phone numbers called in stage (1), for which the IVR survey was either complete (I) or partial (P).
The enumerators from the local NGO were instructed to call the full list of phone numbers multiple times to reach the respondents.They also had the flexibility to re-contact the respondents at their most preferred time and to call them back when the survey was interrupted for any reason.Due to budget constraints, the main data collection by the local NGO proceeded in two rounds.Round 1 was conducted between December 2015 and February 2016, while round 2 was conducted in June 2016.
The CATI survey was performed by 18 enumerators, five of whom were female, who had received training in human subject research and a training for mobile phone data collection.In addition to respondent and household socio-demographics, the survey tool collected data on: (1) political outcomes, such as self-reported level of trust in governmental and nongovernmental institutions and people, perceived corruption of similar institutions, and past voting behavior; and (2) Ebola-related questions, such as self-reported Ebola incidence in the community, the level of information received, the experience with the response, and perceptions about the government's performance (see Maffioli 2020; Gonzalez and Maffioli 2020 for the use of the survey sample for other research). 11 The data were collected in Kobo Toobox through mobile phone devices then exported automatically in Excel, and quality-checks were performed by the researcher daily.Data cleaning and analysis were conducted using Stata software v15.Ethical approval was obtained from the University of Liberia and Duke University.
The CATI interviews for which respondents gave verbal consent and that were more than 80% completed were classified as complete (I).Since all respondents finished the entire survey, there were no CATI interviews classified as partial (P).Refusals (R) were defined as not agreeing to participate in the survey, while break-offs (R) were defined as phone numbers which were dialed for which the phone rang but the respondent did not pick up the call.Unknown eligibility (UH) was classified as phone numbers not screened for eligibility, for example in the case respondents reported having already been interviewed.Finally, not eligible phone numbers were classified as those numbers which (i) were ineligible because respondents were younger than 18 years old: This restriction was placed on the CATI selection process since sensitive information was asked.(ii) Were temporarily out of service; (iii) never responded because the call never rang on a person's phone due to an error on the provider's end; in practice, since the call did not connect at the time of the CATI-between 2 and 8 months after the IVR survey-it was impossible to know whether the number was still existing and valid.
It is important to notice that this last group of phone numbers was active during the IVR survey implemented in October 2015, but the phone numbers turned out to be non-existing or non-active at the time of CATI, and this is why they could be classified as not eligible (4.31) according to the American Association of Public Opinion Research guidelines (AA-POR, 2016).This classification assumes the phone numbers to be existing and valid at the time of CATI to be defined eligible.However, an alternative classification could assume that these phone numbers were instead eligible, but not interviewed (non-contacts 2.20) since they were working the first time they were contacted for the IVR, i.e., between 2 and 8 months before CATI.An alternative estimation of call rates is performed under this assumption.
For both IVR and CATI, response, cooperation, refusal, and contact rates were computed according to the American Association of Public Opinion Research guidelines (AAPOR, 2016), as follows: Response rate 1: (2.1) where I, P , R, U H were defined as above; N C were defined as non-contacts, including cases in which the number was confirmed as an eligible respondent, but the selected respondent was never available or only a telephone answering device was reached (see AAPOR 2016 categories 2.21 or 2.22); O were defined as other cases, such as instances in which there was a respondent who did not refuse the interview, but no interview was obtainable, including death or inabilities (see AAPOR 2016 categories 2.31-2.36);U O were defined as contacts who remained of unknown eligibility, such as failure to complete a needed screener, instances in This article is protected by copyright.All rights reserved.
which a person's eligibility status could not be confirmed or disconfirmed, and other miscellaneous cases in which the eligibility of the number was undetermined and which did not clearly fit into one of the other designations (see AAPOR 2016 categories 3.2-3.9).Response rate 2 added partial interviews P to the numerator; response rate 3 multiplied (U H + U O) by e, defined as the proportion of all callers screened for eligibility who were eligible (see details below); response rate 4 added P to the numerator and multiplied (U H + U O) by e.
Cooperation rate 1: (2.2) where I, P , R, O were defined as above.Cooperation rate 2 added partial interviews P to the numerator; cooperation rate 3 did not consider other causes (O) among eligible respondents, but not interviewed (see AAPOR 2016 categories 2.0-2.3);cooperation rate 4 added partial interviews P to the numerator and did not consider other causes (O).
Refusal rate 1: where I, P , R, O, N C, U H and U O were defined as above.Refusal rate 2 multiplied (U H + U O) by e, defined as the proportion of all callers screened for eligibility who were eligible; refusal rate 3 excluded (U H + U O).

Contact rate 1:
(2.4) where I, P , R, O, N C, U H and U O were defined as above.Contact rate 2 multiplied (U H + U O) by e, defined as the proportion of all callers screened for eligibility who were eligible; refusal rate 3 did not consider (U H + U O).
It continues to be debated which is the best method to estimate e.Most AAPOR methodologies either assume that all callers are eligible (e = 100%) or all ineligible (e = 0%) to define a range of minimum or maximum response rate, respectively (Smith, 2009).However, assuming that some are eligible seems to be a more plausible assumption (Martsolf et al., 2012).I followed Martsolf et al. (2012) and estimated e using the most conservative method (AAPOR4), which considered all unknown eligibility refusals (quick hang-ups) to be eligible non-interviews.In practice, e was calculated by taking the sum of cases that were considered eligible (P, I, R) and dividing by the sum of those cases (P, I, R) plus the known ineligible This article is protected by copyright.All rights reserved.
non household cases (U H).12However, robustness checks were also implemented assuming the extreme cases of e = 100% or e = 0%.

Sample Representativeness.
To assess the representativeness of the survey sample, the study used data from DHS (Demographic Health Survey Liberia, 2013), most recently collected in 2013, which is a nationally representative sample of household face-to-face interviews.4,118 male and 9,239 female respondents between 15 and 49 years old were used as a benchmark for the survey sample (2,265 respondents), to compare respondent and household socio-demographic characteristics.A z-test for the difference in proportions of respondent and household characteristics was performed between the DHS sample and the (unweighted) survey sample.A t-test for the difference in means was implemented for one continuous variable (age), instead.The statistical significance of the differences in proportions and means was estimated at the 5% level.
A second comparison was drawn between DHS and the survey sample, re-weighting the survey sample based on four selected socio-demographic characteristics which were widely unbalanced across samples: whether the respondent was male, whether she/he had no or primary education, whether she/he owned a mobile phone and whether she/he lived in a rural area. 13Sixteen strata were constructed, given by the combination of the four sociodemographic characteristics.14Conditional probabilities were derived in each stratum, and the survey sample weights were reconstructed as the proportion of respondents in DHS 2013 divided by the proportion of respondents in the survey sample within each stratum, in order to perfectly match the nationally representative distribution from DHS 2013.A similar ztest and t-test-for the difference in proportions and means, respectively-of respondent and household characteristics was performed between the DHS sample and the weighted survey sample.
2.3.Costs.The recording of the implementation costs during the study period allows for a comparative analysis between this mobile phone research method and other data collection methods used in past studies.First, a detailed summary of the costs associated with each stage of the data collection, i.e., sample and screening of respondents (through IVR) and data gathering (through CATI), was presented.Second, the costs of this novel method were compared to those of data collection implemented through IVR and CATI in past studies (see Introduction for these studies).

Sampling and Screening of Respondents (IVR)
. Table 1 describes the classification of the mobile phone numbers used, as well as the response, cooperation, refusal and contact rates for the IVR survey in stage (1), which are computed according to the American Association of Public Opinion Research guidelines (AAPOR, 2016).
The platform placed 214,823 calls, and the numbers were called on average 3.55 times.The average duration of the IVR survey was of only 1.11 minutes (min 0.41 -max 18.91), since the majority of eligible respondents (71%) who were from Montserrado County answered only two questions.12,761 respondents completed the IVR survey, while 1,216 answered only the first two questions out of the three in the survey.Break-offs and refusals, which were similarly defined as answering the first IVR question, i.e., whether the respondent resided in Montserrado County or not, were 10,276.302 phone numbers were always busy and thus classified as of unknown eligibility.The majority of the numbers dialed were classified as ineligible since the phone numbers were not valid as they did not connect with an eligible respondent due to an error on the provider's end (107,967), and this was due to the automated nature of the RDD calling system; they were temporarily out of service (52,249); they had specific technological issues with connection (31); or, they connected, but there was no or invalid selection, such as quick hang-ups (30,021).The proportion of all callers screened for eligibility who were eligible (e) was estimated at 11.30%.However, Appendix Table 1 describes alternative response, cooperation, refusal and contact rates, assuming e = 100% or e = 0% (Smith, 2009).
Appendix Table 1 confirms similar rates under alternative assumptions of the proportion of all callers screened for eligibility who were eligible (e): Response rates 3 and 4 were still around 52% (52.62% and 51.97%) and 57% (57.63% and 56.92%), assuming e = 100% or e = 0%, respectively.Estimates were also similar for refusal rate 3 at around 42% (42.37% and 41.85%), and for contact rate 2 at more than 98% (100% and 98.77%).This is due to the fact that the proportion of unknown household (U H) is small and the proportion of other (O) is null in the sample.In fact, only 302 phone numbers were of unknown eligibility and counted toward the part of the denominator (U H + U O) which was multiplied by e in the AAPOR call rates.
This article is protected by copyright.All rights reserved.
Overall, these estimates suggest that a short IVR is a feasible and successful procedure to sample and screen respondents.In fact, in line with another study using the same method (L'Engle et al., 2018), these results established the feasibility of RDD and IVR surveys in another developing country, such as Liberia, and at the time of an epidemic.
The local NGO, which conducted the CATI survey, was provided with a total of 3,779 phone numbers (Table 2) selected for the initial research project (Maffioli, 2020). 15Out of the 3,779 phone numbers, the enumerators called back 2,319 respondents in round 1, and 1,460 respondents in round 2. In round 1, 1,957 respondents completed the interview, while in round 2, due to the high marginal costs of interviewing additional respondents, the NGO stopped at 314 individuals interviewed.In fact, since phone number prefixes are associated with different phone companies and each phone company allows taking advantage of different call, text, or data promotions, it is common for Liberians to switch between phone companies and thus frequently change phone numbers.Between 2 and 8 months after the selection and screening process (stage ( 1)), the NGO found that, of all the numbers provided for round 2 (1,460 phone numbers), 43% (634) of the numbers were permanently switched off and 28% (408) were not ringing.
The final sample that was eligible for the initial research project (Maffioli, 2020) and consented to be interviewed through CATI included 2,271 individuals (1,957 from round 1 and 314 from round 2, Table 2) across the entirety of Liberia: These respondents were the ones completing the interview.In addition, 113 respondents refused to be interviewed; 94 were phone numbers which were dialed, for which the phone rang but the respondent did not pick up the call, and they were defined as break-offs; 15 phone numbers were defined as ineligible as no screening was completed: These respondents reported that they have been already interviewed.Finally, phone numbers were classified as non-eligible: (i) if respondents were younger than 18 years old (50); (ii) phone numbers never responded because the call never rang on a person's phone (602): These phone numbers were categorized as unknown if the number is valid, since the call did not connect; (iii) were temporarily out of service (634).However, Appendix Table 2 describes alternative response, cooperation, refusal and contact rates, defining the 602 phone numbers as non-contacts (2.20) since these phone numbers were valid and contact was established during the IVR survey.
Response rate 1 (&2), cooperation rate 1 (&3), refusal rate 1, and contact rate 1 were 91.10%, 91.65%, 8.30%, and 99.40% respectively (Table 2).The implementation of the CATI survey was more successful in round 1 compared to round 2, suggesting that a longer waiting time (in round (2)) between stage (1) and stage (2) led to a higher number of non-eligible phone numbers as well as a higher proportion of refusals and break-offs.Call rates under the alternative classification of 602 phone numbers as non-contacts are slightly different: Response rate 1 (&2) is lower at 73.38% and contact rate 1 is lower at 80.06%.However, refusal rate 1 is lower at 6.69%, while cooperation rate 1 (&3) is the same at 91.65%.This is expected since non-contacts (2.2) count toward the denominator of the AAPOR call rates.
Overall, these estimates suggest that a 30-45-minute CATI survey with 100 questions was feasible in a developing country and at the time of an epidemic.They also shed light on how sensitive information on individuals' experience with Ebola or their political views could be asked through CATI without compromising the response rate.

Sample Representativeness.
To understand the representativeness of the survey sample, the analysis is conducted on 2,265 respondents out of the initial 2,271, since for six of them their reported location did not match up with the list of villages provided by the Liberian Institute of Statistics and Geo-Information Services (LISGIS).
The comparison in proportions of respondent and household characteristics between DHS and the survey sample yielded statistically significant differences at 5% level, with the exception of one variable (whether the household has pigs).Table 3 describes a survey sample biased toward male, educated respondents from urban areas and with access to mobile phones (column 2 versus 1).Survey respondents were also on average wealthier as defined by several measures of asset ownership and improved sources of toilet, wall, and roof material. 16he fact that the survey sample is not representative of the national Liberian population is not surprising, since both stages of the data collection were conducted through mobile phones, and individuals needed to have access to a mobile phone at the time of the call: Male, more educated and wealthier individuals from urban areas are more likely to own mobile phones (Demographic Health Survey Liberia, 2013).Furthermore, stage (1) was set up to limit respondents from Montserrado County, the most economically developed and urban county.In addition, a selection based on the county and district where the respondent resided at the beginning of the epidemic was imposed to define the final sample frame (3,779 phone numbers, Table 2) which the local NGO called back in stage (2).Table 3 confirms that the final sample of 2,265 respondents is very different from a nationally representative survey.
Table 3 column 4 shows the mean estimates from the CATI survey weighted by four selected characteristics (whether the respondent is a male, whether she/he has no or primary education, whether she/he owns a mobile phone and whether she/he lives in rural areas).By construction, the weighted survey sample is identical to DHS 2013 in the four dimensions selected.After weighting, for 10 out of the 18 variables considered, the difference in proportions or means between the weighted survey sample and the DHS sample is reduced (Table 3, columns 3 versus 5).However, even comparing the DHS sample with the weighted survey sample, the proportions or means reported in Table 3 remain statistically significantly different from each other at 5% level, suggesting that weighting did not entirely solve the bias. 17 18 It is important to highlight that the IVR survey was not set up to target a nationally representative sample, by imposing quotas of respondents with certain geographical or sociodemographic characteristics.Thus, it should not be surprising that the survey sample of mobile phone owners is not representative of the country's population.4 presents a summary of the costs.The costs for the sampling and screening of respondents through IVR in stage (1) include the fixed initial cost of consulting for the use and maintenance of the platform, piloting costs, and airtime.The cost per each picked-up call in stage ( 1) was only $0.10.The cost of the IVR survey was higher ($1.49) for each survey, considering both complete and partial, and even higher ($1.63), considering only complete surveys.

Costs. Table
Regarding the CATI survey used to gather data in stage (2), the costs depend on the country in which researchers work as well as constraints due to the emergency situation, lack of electricity to charge phones, or lack of money to buy airtime.In this study, the costs included both common data collection costs and additional costs due to the high risk of the epidemic: enumerators' monthly salaries, human resources costs for survey programming, testing, and revisions; other data cleaning costs; internet, fuel for electricity generators, mobile phone airtime for enumerators and survey respondents (gift of $1 airtime); vehicle maintenance to bring enumerators to the office during the Ebola epidemic; security and Ebola safety measures.Working with the local NGO partner resulted in a cost of about $13.49 per respondent they tried to reach, and $22.45 per complete survey.Completing a CATI 17 It is worthy to highlight, however, that despite the variables compared in Table 3 being constructed in similar ways, the questions in the DHS and in the survey were asked differently.As a result, the construction of similar but not identical variables such as occupation or improved assets might contribute to explaining some of these differences. 18The survey in the initial research project was conducted to collect political outcomes.Another potential comparison would be to test the difference between political outcomes in survey sample and Afrobarometer data (2015).Unfortunately, trust and perceived corruption questions were asked so differently across surveys that it is hard to homogenize the outcomes.Future studies should take this into account, and they should consider asking questions in a format similar to existing national representative surveys to be able to weight the sample based on socio-demographic characteristics and compare the outcomes of interest.
This article is protected by copyright.All rights reserved.
survey 8 months after the IVR survey (round 2) costs up to six times more than gathering the data between 2 and 4 months after (round 1) ($162.36versus $26.05).In summary, the total cost of this novel method per complete survey (including both stage (1)-sampling and screening-and stage (2)-data gathering) is around $24.
Several studies estimated the costs of collecting data through different methods.More expensive data collection methods are face-to-face surveys which cost at least $25, but can reach values as high as $150, depending on the complexity of the survey and the distances that have to be covered to find respondents.For example, Lietz et al. (2015) estimated a cost of $25 per survey in Burkina Faso; Mahfoud et al. (2015) $36 per survey in Lebanon; Ballivian et al. (2015) $40 per survey in Peru and Honduras; Hoogeveen et al. (2014) and Dillon (2012) between $50-150 and $97 per survey, respectively, in Tanzania; Dabalen et al. (2016) $150 per survey in Malawi.
On the other hand, costs for IVR and CATI surveys have been estimated to be much lower than face-to-face surveys in a similar setting (Schuster and Brito, 2011, Mahfoud et al., 2015, Garlick et al., 2019, Lau et al., 2019).CATI interviews cost between $4.10-7.30per survey in Tanzania (Hoogeveen et al. 2014 andDillon 2012), $5.80-8.80 per survey in Malawi (Dabalen et al., 2016), andbetween $4.44-22.20 in Lebanon (Mahfoud et al., 2015).Similarly lower costs have been estimated for IVR surveys, such as $17 in Ballivian et al. (2015), $4.95 in L'Engle et al. ( 2018) and about $2 in Leo et al. (2015), depending on the length of the survey and the criteria applied to select the sample.
Compared to face-to-face data collection, this two-stage mobile phone research method is then advantageous by eliminating many of the implementation costs associated with infield sampling and screening (in stage ( 1)), and face-to-face surveys (in stage ( 2)), such as personnel, logistics, and distribution of phones.Compared to a single IVR or CATI survey, this method might be more expensive.However, both IVR and CATI surveys require a list of phone numbers to start with.If this initial list of respondents is not available and selecting a sample through in-person interviews is too costly or risky in challenging settings or during emergencies, then this method might still be a cost-effective solution.In fact, adding the costs of a baseline face-to-face data collection to gather the initial list of respondents to interview (stage ( 1)) to the costs of IVR or CATI surveys (stage (2)) to gather data, the combined costs would be as high as $160, compared to $24 for the method proposed in this study.It is also important to notice that the costs at each stage ($1.63 per IVR survey; $22.45 per CATI survey, Table 4) are similar or lower than the estimated costs reported in other studies.
This does not indicate that this method is superior to others, rather it provides evidence that this two-stage mobile phone research method can be affordably implemented in challenging settings where in-person data baseline collection is prohibitively costly or dangerous.
This article is protected by copyright.All rights reserved.

Discussion
The two-stage data collection method proposed in this study uniquely samples and screens respondents and gathers data without any in-person interaction with respondents, in a developing country and at the time of an epidemic.The method allowed researchers to conduct more than 2,200 interviews in Liberia during the 2014 Ebola epidemic, with an average estimated cost of 24$ per survey and with call rates comparable to similar survey research methods.
By combining established data collection methods (RDD, IVR, CATI) in a novel manner, the main strength is that, relying solely on mobile phone technology, this precludes the need for prior data or fieldwork activities to have a sampling frame, and it allows researchers to gather survey data in challenging settings.The sampling and screening of respondents through IVR (stage ( 1)) could be useful in any country with some phone access (Liberia has on average 65% of phone coverage, Appendix Table 3), and in total absence of any initial list of respondents.The data collection through CATI (stage ( 2)) documents how sensitive information can be asked in a 30-45-minute phone interview without compromising the response rate.Altogether, the survey data highlight how this innovative data collection method was feasible and cost-effective in a developing country and at the time of an epidemic.
There are several meaningful lessons learned, which are important to discuss for the use of this method in future research.
Firstly, the study found that the group of respondents was not representative of the general population of the country.In fact, in Table 3, the survey sample was compared to a nationally representative survey (Demographic Health Survey Liberia, 2013) and it was shown to be biased.Even after re-weighting on a few selected socio-demographic variables, most of the differences remained statistically significant.This is not surprising since the selection and screening were conducted through mobile phones and respondents needed to have access to a mobile phone in order to pick up the call.This is also in line with other research in developed (Lee et al., 2010) and developing countries (Leo et al., 2015;Lau et al., 2019) where respondents interviewed through telephone surveys are different from the entire population, even after controlling for demographic characteristics.This method, then, does not appear to be the best approach if researchers are interested in a nationally representative sample, unless fixed quotas are used in the IVR survey to select respondents based on pre-defined geographical and socio-demographic characteristics, in order to reproduce a nationally representative sample.
Reaching sample representativeness is achievable, but it would come at the expense of higher implementation costs in the IVR message, since several screening questions would need to be added to the survey tool.As the number of questions asked through the IVR survey increases along with sample specificity (for example, half women and half men, or a fixed number of respondents per geographical area), so do the difficulty and costs associated with finding the targeted nationally representative sample. 19he principal remaining advantage of gathering data using this innovative method (but from a non-nationally representative sample) is to collect high-quality data at time of emergency when in-person interactions are impossible, and no publicly available data exist to select a nationally representative sample.Several research questions are and could be answered by focusing on a specific sample of respondents selected with a small set of IVR quotas.
Secondly, another important lesson learned from this study to consider for future research relates to the attrition that researchers could face from the initial list of phone numbers generated by the RDD to the completion of the CATI survey.
World Bank researchers who performed phone surveys during the Ebola epidemic in Liberia reported that only 30% of the initial sample completed the survey (16% of the original sample, The World Bank Group, 2014).In Sierra Leone, the response rate was also lower than expected, given the nature of the survey and the difficult conditions under which it was conducted: About 69% of the sample respondents with phone numbers completed the survey (45% of the original sample, The World Bank Group, 2015).Other surveys, performed through face-to-face interviews in Montserrado County, reached 95% of the respondents (Blair et al., 2016).Follow-up phone surveys reached about 80% of the original sample.The initial in-person interaction between the field enumerators and respondents during the baseline survey seemed to have been the main factor determining the lower attrition rate.Since this project was conducted starting in late 2015, when the epidemic was not at the peak and life was going back to normal, the response rate was expected to be similar or higher than it was in the World Bank surveys, and were indeed computed at 51.97% for the IVR survey (Table 2), and 91.10% for the CATI survey (Table 3).
Similarly, participation for confidentiality reasons was a related concern.First, although individuals never provided their phone numbers to enumerators, there was concern that respondents, once called back from the local NGO, would ask enumerators where they accessed their phone number from and refuse to participate.In Liberia, this was not a problem because people are used to receiving calls for advertisements or polls: The refusal rate of the CATI survey was in fact 8.30% across the two rounds of data collection (Table 3).
Second, because of the complete lack of in-person interaction at both stages of the data collection, there was a concern that respondents would not feel at ease to provide opinions to a stranger.20There was also uncertainty that, due to the personal nature of the questions asked in this study (for example, about their experience with Ebola or their political views) respondents would be reluctant or they would refuse to stay on the phone for a long time (the survey lasted 30-45 minutes with 100 questions).Rather, only 113 respondents refused to participate (Table 2), and once enumerators established a first phone contact and call at an appropriate time, individuals completed the CATI survey.
Third, respondents were provided with a $1 airtime incentive that was directly transferred upon the completion of the survey.21Respondents were informed at the time of the sampling, through the IVR survey, about this direct benefit.Recent phone surveys in developing countries found small differences in attrition when incentives were randomly varied (Gallup, 2012, Demombynes et al., 2013, Hoogeveen et al., 2014, Leo et al., 2015), therefore the incentive may have contributed to a lower non-response rate.
Finally, this method does not solve the problem of people sharing a phone number.Both for stage (1) and stage (2), whoever answered the call was interviewed.It is then possible that whoever picked up the call during stage (1) would not be the same person in stage (2).Even if that was the case, stage (1) only collected the geographical location of the respondent, while the survey data were collected in stage (2).During the CATI surveys, the respondent was asked about her/his location again.If different from what was reported during the IVR, the respondent was confronted about it and was asked to confirm the correct location at the beginning of the epidemic.In about 16% of the cases, individuals reported a different location at the two stages.Qualitatively, the majority of respondents said that they had problems with the speed of the IVR message and how they inputted their answers into the keyboard.For the analysis, the location data collected through CATI interviews were trusted more than the data collected through IVR.However, this problem should be taken into account and addressed in future uses of this method.
A last important lesson learned relates to the time waited between the sampling and screening through IVR (stage (1)) and the data collection through CATI (stage (2)).For budgetary reasons, a mobile phone data collection (round 2) was added at a later date, and this caused between 2 to 8 months of delay between stage (1) and stage (2).In round 2, the local NGO found that 43% (634) of the numbers provided (1,475) were permanently switched off and 28% (408) were not ringing (Table 3).While Liberians have the habit of owning multiple SIM cards and switch between them based on cost advantages for airtime, text messages, and internet data, it is not clear whether this would happen in other countries.Still, it is advised that researchers limit the time window between the two stages of data collection as much as possible to reduce potential additional problems of not finding (at least temporarily) working phone numbers.22Despite this caveat, respondents were willing to answer long (30-45 minutes, 100 questions) CATI surveys with low refusal and break off rates (Table 3).This is a good indication that CATI surveys are feasible in challenging settings such as at the time of an epidemic in a developing country.

Conclusion
This study proposes and describes a novel two-stage data collection method which combines: (1) Random-Digit Dialing (RDD) of phone numbers and an Interactive Voice Response (IVR) survey to conduct sampling and screening; and (2) data collection through Computer-Assisted Telephone Interviewing (CATI).This procedure was used to conduct more than 2,200 interviews in the country of Liberia, at the time of the 2014 Ebola epidemic.Following the American Association of Public Opinion Research guidelines (AAPOR, 2016), response, cooperation, refusal, and contact rates were computed at 51.97%, 52.62%, 41.85%, and 98.77% for the IVR survey.The CATI survey was more successful: Response, cooperation, refusal, and contact rates were 91.10%, 91.65%, 8.30%, and 99.40%, respectively.
Unsurprisingly, since both stages of the data collection method were conducted by mobile phones and no quotes were set up to reach a nationally representative sample, male, educated respondents from urban areas and with access to mobile phones were more likely to be interviewed.A comparison between the survey sample and the nationally representative sample from the Demographic and Health Surveys indeed confirmed statistically significant differences in socio-demographic characteristics.The re-weighting of the sample on a few selected covariates reduced the gaps, but the differences remained statistically significant.Yet, in an analysis of the costs compared to other past research approaches used, the study found that this method offers promise for data collection in developing countries at a low cost ($24 per survey), especially in challenging settings.
The results on call rates, sample representativeness, and costs are in line with other studies which use RDD, IVR, or CATI to collect data in developing countries.First, the closest studies to mine in terms of the methods used to gather data in other developing countries (L'Engle et al., 2018;Lau et al., 2019) found worse call rates.Second, similar to what I found, some research-both in developed (Lee et al., 2010) and developing countries (Leo et al., 2015;Lau et al., 2019)-found that respondents interviewed through CATI were different from the entire population, even after controlling for demographic characteristics.Third, the costs of this innovative method were lower than face-to-face surveys (Dillon, 2012;Hoogeveen et al., 2014;Lietz et al., 2015;Mahfoud et al., 2015;Ballivian et al., 2015;Dabalen et al., 2016) and within the range of studies using IVR or CATI (Schuster and Brito, 2011;Dillon, 2012;Ballivian et al., 2015;Leo et al., 2015;Mahfoud et al., 2015;Dabalen et al., 2016;Garlick et al., 2019;L'Engle et al., 2018;Lau et al., 2019), suggesting that this method is feasible and affordable.
I suggest that researchers weigh the advantages and limitations of this approach before implementing it in any specific country or context.Still, the proposed method remains the first to only rely on mobile phone technology for all stages of data collection.As the utilization of mobile phones for data collection and research is increasing in developing countries, this innovative two-stage procedure improves the ability of researchers to gather information in emergencies in developing economies, making it a unique and feasible approach to implement in challenging settings.

Main Tables.
Table 1.Call outcomes and rates for IVR -sampling and screening  This article is protected by copyright.All rights reserved.This article is protected by copyright.All rights reserved.
Survey (DHS) (Demographic Health Survey Liberia, 2013) as a benchmark.Costs were also computed in comparison to similar mobile phone approaches used in past research studies.2.1.Data Collection.

Table 2 .
Call outcomes and rates for CATI -data gathering Notes: This table illustrates call outcomes and rates for Computer-Assisted Telephone Interviewing (CATI) for data gathering, constructed using American Association for Public Opinion Research standards (AAPOR 2016).

Table 3 .
Socio-demographic characteristics of national sample from DHS 2013 and survey sample 2016

Table 4 .
Costs, by stage and survey type

Type Unit No. units Cost/Unit Total cost 1. RDD and IVR -sampling and screening Fix cost
This table illustrates the costs for stage(1): the sampling and screening process through the Random-Digit Dialing (RDD) and Interactive Voice Response (IVR), and stage (2): data gathering through Computer-Assisted Telephone Interviewing (CATI), and by round of data collection.*Thecost of $3,395 to call back each phone number up to 3 additional times was estimated assuming that 150,000 calls are made, and (1) 15% of them would be picked-up a first time; thus, calling back 85% of phone numbers a second time would cost $1,275 (127,500 calls x $0.01 per call); (2) 12.5% of them would be picked-up a second time; thus, calling back 87.5% of phone numbers a third time would cost $1,116 (111,562 calls x $0.01 per call); (3) 10% of them would be picked-up a third time; thus, calling back 90% of phone numbers a fourth time would cost $1,004 (100,406 calls x $0.01 per call).The assumptions were made by VotoMobile based on their past experience.The total costs sum-up to $3,995(1,275+1,116+1,004=$3,995).

Table 1 .
Robustness.Call outcomes and rates for IVR -sampling and screening

Table 2 .
Robustness.Call outcomes and rates for CATI -data gathering

Table 3 :
Summary statistics, by county County Pop Pop Pop density Rural Phone coverage No. resp No. resp (%) (pop/sq mile) (%) This table illustrates summary statistics by county, comparing available data sources provided by LISGIS (Liberian Institute of Statistics and Geo-Information Services) and the survey sample.