Feasibility of replacing face‐to‐face with telephone interviews for the World Mental Health Qatar survey during the COVID‐19 pandemic

Abstract Objectives We investigated the feasibility of replacing face‐to‐face with telephone interviews conducted as part of the World Mental Health Qatar (WMHQ) survey and discuss the main methodological changes across the two pilots that were subsequently implemented in the full‐scale WMHQ telephone survey. Methods We assessed the net mode effect by comparing the lifetime prevalence estimates of the main mental disorder classes (mood and anxiety disorders) and a number of disorders across the two survey pilots conducted prior to and post‐pandemic. Results The main differences in terms of methodology for both pilots stemmed from differences in the survey mode, including questionnaire length, study recruitment method, and fielding team size and structure. These factors influenced response rates and costs. However, the lifetime prevalence estimates and other key indicators of survey results did not differ across modes. Conclusions Our findings confirm the comparability of data collected via telephone and face‐to‐face modes, supporting the adoption of telephone surveys for future mental health studies, particularly in the context of pandemics. They also confirm the feasibility of changing or mixing modes depending on field conditions in future psychiatric epidemiological research.

face-to-face interviewing.As a result, many official surveys suspended their data collection efforts in response to pandemic-related physical distancing guidelines.For example, a survey conducted by the United Nations Statistics Division and the World Bank reported that in May 2020, 96% of national statistical offices either partially or completely halted face-to-face data collection activities (Inter-Secretariat Working Group on Household Surveys, 2020).
WMH surveys were no exception to this trend.The COVID-19 outbreak triggered many changes, including the transition to remote surveying.National statistical offices resorted to remote data collection tools as the primary means for maintaining continuity in survey production (Gourlay et al., 2021).Since most psychiatric epidemiology surveys, including the WMH surveys, relied on face-toface interviews, the change to remote modes of data collection was a significant one.In addition to flexibility, face-to-face interviewing is considered the best survey mode in terms of data quality and response rates (Hox & De Leeuw, 1994b;Lavrakas, 2008;Schröder, 2016;Tom W., 1984).Furthermore, this mode of data collection enables interviewers to observe both verbal and nonverbal response cues from respondents and to take direct measurements if needed.It also allows for longer surveys.Due to these advantages, face-to-face interviewing is considered "the gold standard" for survey research practice.
Telephone interviews are considered the most effective alternative to face-to-face interviews for ensuring continuity of survey research activities if a probability sample is required (Groves, 2009).
In the context of the COVID-19 pandemic, it was hoped that telephone surveys could be applied to carry out WMH surveys, much like telepsychiatry effectively served patients with pre-existing mental health disorders during the pandemic (Li et al., 2021).Nevertheless, because all WMH surveys used the same standardized procedures for sampling, interviewing, and data analysis (Kessler et al., 2009), concerns emerged regarding whether a shift in survey mode would affect the comparability of results between WMH surveys conducted before and during the pandemic.
To compound these concerns, several published studies have suggested that telephone interviews were ineffective in gathering accurate survey data on sensitive topics (Gross et al., 2018;Gupta & Pathak, 2018;Montemurro & Riehman-Murphy, 2019;Taylor et al., 2018).Other reported issues included an increase in acquiescence and extremeness in telephone interview responses compared to face-to-face surveys (Groves & Kahn, 1979;Jordan et al., 1980).On the contrary, some studies found no differences in compliance, reliability of responses, or outcomes between these two main survey modes (Kennedy et al., 2016(Kennedy et al., , 2017;;Marel et al., 2015;Taylor et al., 2016).Historically, during the 1960s and 1970s, multiple studies addressed concerns about the methodological limitations of telephone surveys (Groves & Kahn, 1979;Hochstim, 1967;Sudman & Bradburn, 1974).Over the subsequent years, telephone surveys became more prevalent in psychiatric epidemiology (Stefl, 1984).Finally, there is also some evidence supporting phone surveys as more effective than either mail-in or face-to-face modes in community psychiatric surveys (Conwell et al., 2018;Fenig et al., 1993;Hinkle & King, 1978).
Researchers in Qatar, in consultation with experts from the WMH consortium, recognized the need to change the survey mode for their WMH study in response to the COVID-19 pandemic.Since the change happened suddenly, it was not possible to conduct experimental comparisons to assess the reliability and validity of the two survey modes.Therefore, the present study aimed to investigate the feasibility of replacing traditional face-to-face interviews with phone interviews for Qatar's national mental health survey as part of the WMHS consortium.This involved redesigning and comparing results between the two pilots before subsequently implementing the revised methodology in the full-scale production of the WMH Qatar (WMHQ) survey.In this paper, we explore the feasibility of adapting data collection methods under conditions of necessity in the context of WMH surveys.

| The World Mental Health Qatar (WMHQ)
Face-to-face interviews were used for the first pilot survey, which was conducted in early 2020, before the onset of the pandemic.The methodological procedures used in the initial face-to-face pilot survey are fully described in a separate published article (Khaled et al., 2021).However, due to the COVID-19 pandemic in Qatar, telephone was chosen instead of face-to-face interviews as the main mode of survey data collection.This revised methodology was tested in a second pilot survey conducted later that same year.
The telephone methodology employed in the second pilot survey served as the basis for the procedures used in the full WMHQ survey.
A comprehensive description of these procedures can be found in a separate article published in this issue (Khaled et al., 2024).
Table 1 summarizes the main methodological similarities and differences between the two pilot surveys of the WMHQ study.As shown in Table 1, the two pilot surveys were similar in many designrelated respects, including study target population, questionnaire, and quality control system.The main differences in methodological aspects of both pilots stemmed from differences in the survey mode.
These differences in turn influenced other aspects of the study, including questionnaire length, study recruitment method, and the size and structure of the fielding team (Table 1).A survey's sample design, including the sampling strategy, was also influenced by the mode, as the information used to improve sampling efficiency differs between modes.As mentioned earlier, a detailed description of the sample design is described elsewhere (Khaled et al., 2024).

| Survey questionnaire
We adapted the WMHQ survey instrument to phone mode through three main modifications.First, we revised the survey introduction.In phone surveys, the initial interaction with the interviewer typically T A B L E 1 Main methodological similarities and differences in the face-to-face (pre-pandemic) and phone (post-pandemic) pilots conducted as part of the World Mental Health Qatar Survey.

Methodological variables
Face-to-face pilot design Telephone pilot design starts with a skilled solicitation, but then shifts to a neutral and professional tone.As a result, our team modified the study questionnaire's introductory section to increase survey salience and align with the requirements of phone interviews.Second, we aimed to reduce the overall interview duration by an average of approximately 30 min compared to the previously piloted face-to-face survey instrument.Third, we wanted to incorporate COVID-19-related content to address the psychological impact of the crisis on respondents' mental health and capture any other pandemic-associated symptoms.
The final questionnaire was reduced from 25 (face-to-face pilot) to 18 (phone pilot) total modules.This reduction involved the removal of nine (CIDI and non-CIDI) modules, including suicide, persistent depression, anger attacks, social anxiety, and tobacco and drug use.In their place, we added two new modules, focusing on assessing the psychological toll of COVID-19 and diagnostic criteria for obsessive-compulsive disorder.

| Recruitment methods
Even Compared to the face-to-face pilot, technology played a larger role in the recruitment process for the phone pilot.We initiated the process by sending a Short Message Service (SMS) text to each eligible respondent 24 h prior to the first interview phone call.We chose to use an SMS because the length and sensitivity of the questions made it essential to increase the salience of the study by leveraging the prominence of the survey's sponsors, which we believed would reassure respondents prior to our call.
The SMS served a dual purpose: It not only informed potential respondents of the upcoming study participation call, but also provided a link to the study website, which helped explain the purpose of the survey.However, some eligible participants expressed security concerns regarding the link.Many hesitated to click it until they had been contacted by an interviewer and received reassurance about the link's authenticity.Individual SMS requests were also sent daily to any respondents who wanted to participate but were not able to find the original text message or had concerns about its source.

| Fielding team
Due to the pandemic, the phone survey lab at Qatar University's Social and Economic Survey Research Institute (SESRI) had to be temporarily closed.Consequently, SESRI initiated and tested a direct dialing phase for a distributed (remote) Computer Assisted Telephone Interviewing (CATI) system during the summer of 2020.This was done in preparation for the phone pilot scheduled for the fall of 2020.
During the phone pilot, SESRI interviewers were able to make calls to respondents safely from their homes rather than from the phone lab.They were closely monitored by field supervisors using this remote capability.This approach allowed the study team to collect data over the phone while adhering to pandemic-related social distancing policies.
As indicated in Table 1, the size of the fielding team is smaller in the phone pilot compared to the face-to-face pilot.However, the interviewer-to-supervisor ratio was higher for the phone pilot.This structural adjustment was necessary to facilitate more extensive remote verification activities.These activities included ensuring interviewers adhered to their working schedules, met the required minimum number of working hours, and were subjected to live monitoring of calls by supervisors.
The phone pilot involved an unusually long phone survey on a sensitive topic.Accordingly, the team placed increased emphasis on monitoring how questions were asked, the pace of the interviews, and identifying any irregularities in the collected data as crucial aspects of quality control.Therefore, we increased the proportion of live call monitoring sessions by involving research team members to assist the supervisors in this activity.

| Response rate & field cost
The response rates (RR) were calculated using standardized coding and interpretation procedures for different calling outcomes, (in the phone survey).Those who immediately refused to participate in the survey before interviewers were able to identify their eligibility were also included in this category.
We report two response rates in Table 3. First, the raw response rate, which is the ratio between the number of completions and total sample sizes after excluding ineligibles: is the number of completions, E is the number of eligible responses, and UE is the number of unknown eligibility.Second, the adjusted response rate, which is RR2 where e is the estimated proportion of eligibilities given by e where IE is the number of ineligible cases.
The break-off rate is calculated by dividing the number of breakoffs by the sum of the number of break-offs and the number of completions.The break-off group includes people who agreed to participate in the survey, answered some questions, but did not complete the entire survey interview.
The cost per completion was calculated using the total survey fielding cost divided by the number of completions.The field costs only cover payments made to supervisors and interviewers for their working hours during training and fielding.The cost per completion does not include any costs associated with questionnaire development, sampling, training, programming, or administrative activities.
We focus on the field cost to compare the two pilots because, as the study transitions from the pilot to full-scale production, only the field cost will grow rapidly while other costs remain relatively stable.

| Physical and mental health problems
During the study, two different sources were used to identify the history of physical and mental health disorders.The first source included direct responses from the respondents about whether they have ever been diagnosed by a health professional with major depression, panic attacks, post-traumatic stress disorder, obsessivecompulsive disorder, generalized anxiety disorder, mania, bipolar disorder, schizophrenia, or any other emotional problems.To ascertain the history of having any physical chronic condition, respondents were asked if they had any life-threatening or seriously impairing chronic physical health problems such as cancer, heart disease, or lung disease.Then respondents were able to choose from a list of physical diseases the type of physical illness they had at the time of the interview (if any).
The second source was responses to the diagnostic modules within the interview, which were used to calculate the prevalence of experiencing any mental disorder by the time of the interview.These were based on the DSM-5 criteria, as outlined in CIDI (version 3.3).
Based on the modules assessed in both pilots, we defined three main groups of mental disorders.Any anxiety disorder included meeting diagnostic criteria for any of the following conditions: generalized anxiety disorder, panic disorder, and post-traumatic stress disorder.
Any mood disorder included meeting diagnostic criteria for any of the following conditions: major depressive disorder, bipolar I-II disorders.
Any disorder was defined as meeting diagnostic criteria for any of the abovementioned anxiety and/or mood disorders.
As we aimed to examine survey mode effects, we needed to account for the effect of COVID-19 on the prevalence estimates in the phone pilot survey conducted during the pandemic.For this purpose, we identified and excluded 70 cases (16%) who reported the onset of mood and anxiety disorders during the pandemic period only.Since all the interviews were conducted during the COVID-19 pandemic period (2019-2022), we estimated the lifetime prevalence of the assessed disorders while excluding cases whose age of onset for any disorder occurred only during the pandemic period.
Therefore, only cases that met CIDI criteria for any disorder up to 2 years preceding the interview date were included and counted toward the lifetime prevalence rate.

| Sociodemographic variables
In both pilots, we assessed the same basic sociodemographic variables, including age, gender, marital status, education, employment, income, and nationality.Qatar's income categories were constructed in reference to Qatar's census income data.For example, more than half of the categories of income are less than the median personal earnings of 20,000 Qatari Riyals (QAR), equivalent to 5400 US dollars (USD). Similarly, other variables were adapted and modified for Qatar's context.For example, employment questions were adapted to reflect job categories and working hours in accordance with Qatar's employment system.Additionally, response options for the marital status question were slightly modified to reflect sanctioned cultural and religious aspects of marriage within the context of Qatar.

| Statistical analysis
We report descriptive statistics, including frequencies, percentages, and corresponding 95% confidence intervals (CI).All estimates were weighted to account for the sampling design in each pilot.We  As shown in Table 3, the raw response rates for the face-to-face and phone pilots were 32.4% and 18.8%, respectively.While the feasibility of conducting a lengthy phone interview was established, the incentive was deemed ineffective in increasing the participation rate and thus was not used in the actual survey production.

| RESULTS
The response rate was almost two times higher for the face-toface pilot compared to the phone pilot, with the following adjusted response rates: 47.1% and 24.6%, respectively (Table 3).The difference in response rates between the two modes can be explained by the break-off rate difference.In the face-to-face pilot, this rate was only 4.6%, while the same number for the phone pilot was much higher at 49.2%.In our calculations, if the break-off rate in the phone pilot was the same as the face-to-face pilot, then the response rate could be similar between the two pilots.
We compared the fielding costs between the two pilots.The main cost indicator in Table 3 is the field cost per completion in the last row.The face-to-face pilot's cost was more than double that of the phone pilot, at 164.9 USD versus 75.1 USD, respectively.This suggests that the field cost during full-scale production would be much larger for the face-to-face survey.Such a substantial cost difference could render a face-to-face survey financially infeasible, while a phone survey may remain a viable option.
The duration for fielding the surveys was 24 days for the phone pilot and 15 days for the face-to-face pilot (Table 3).The average duration of the phone interview was approximately 77 min, compared to 97 min for the face-to-face interview.The maximum number of contact attempts was three in the face-to-face pilot compared to seven in the phone pilot (Table 3).
As shown in Table 4, the prevalence of any lifetime mental disorder, specifically mood or anxiety disorders, reported by the participant as diagnosed by a health professional was 11.3% in the face-to-face pilot compared to 12.0% in the phone pilot (p = 0.793).
The percentage of respondents who reported having one mental disorder only diagnosed by a health profession was 6.6% in both the face-to-face and phone pilots.Meanwhile, the percentages for two or more reported disorders were 4.7% in the face-to-face pilot compared to 5.4% in the phone pilot (p = 0.921), respectively.Field cost per completion (US dollars) 164.9 75.1 Table 4 also shows that the prevalence of any chronic physical condition, reported by the participant as diagnosed by a health professional, was 7.8% in the face-to-face pilot compared to 10.4% in the phone pilot (p = 0.238).In the face-to-face pilot, 5.3% of respondents reported having only one chronic physical condition diagnosed by a health professional, compared to 7.6% for the phone.For those reporting two or more conditions, the percentages were 2.3% in the face-to-face compared to 2.8% in the phone pilot (p = 0.436).
As shown in Table 5, the prevalence of any mood or anxiety disorder as defined by the CIDI were also similar in the face-to-face (19.3%) and phone (22.7%) pilots (p = 0.305).Also shown in Table 5, the number of disorders as per CIDI criteria were also similar across the two modes with 4.9% of the sample in the face-to-face pilot meeting criteria for two or more disorders compared to 5.5% in the phone pilot (p = 0.579).Furthermore, similar results were obtained when stratifying the results by gender as shown in Appendix Table S1 (Male) and Appendix Table S2 (Female).

| DISCUSSION
This study investigated the feasibility of substituting telephone interviews for face-to-face interviews in the WMHQ survey during the COVID-19 pandemic.We assessed the net effect of survey mode by T A B L E 4 Survey mode comparisons for any lifetime and number of mood, anxiety, or physical disorders as diagnosed by a health professional.

Survey mode p-value
Face addressing the practical question of whether the resulting prevalence estimates of the two main classes of mental disorders in the WMHQ (mood and anxiety disorders) are similar or different across the two pilots.This assessment was made irrespective of pandemic-related influences on the prevalence of mental disorders or any specific methodological reasons behind these differences.We also compared response rates and fielding costs across modes.
The main source of variance in the methodological aspects of both pilots stemmed from differences in the survey modes of the two pilots, including factors such as questionnaire length, study recruitment method, and fielding team size and structure.These aspects affected the survey response rate and field costs of both pilot studies.
While the face-to-face pilot generated a response rate two to three times higher than the phone pilot, the total fielding (variable) costs in the face-to-face mode were just over two times that of the phone.Lower response rates in telephone surveys compared to faceto-face are consistent with previous studies (Groves & Kahn, 1979;Hox & De Leeuw, 1994a).The difference in response rate is largely attributed to the more personal nature of face-to-face interviews relative to phone (Drolet & Morris, 2000).Notably, the response rates between the two modes in our study were initially similar, but then diverged after accounting for the much higher break-off rate for the phone compared to face-to-face.In the face-to-face pilot, social norms likely made it less acceptable for participants to discontinue the interview after they had invited interviewers into their homes.
Conversely, in the phone survey, participants found it much easier to terminate the interview at any point.Given the survey's lengthy and sensitive questionnaire, break-offs were more likely, especially for the phone pilot.
The higher risk of break-offs for telephone surveys relative to inperson interviewing is well documented in survey research.It is easier to end a phone interview by simply hanging up, and the act of talking on the telephone for extended periods can be especially tiring for some respondents (Holbrook et al., 2003).For the WMHQ survey, the median interview length was 80 min for the phone pilot and 90 min for the face-to-face pilot.The eligibility of potential respondents is also less frequently known for a phone survey compared to face-to-face.A non-contact in a phone survey generally provides little or no information about eligibility compared to face-to-face, where the eligibility of the household can often be determined by interviewers through observation of the characteristics of the property.This was in fact the case for our study; the percentage of unknown eligibility in our phone sample was much higher than in the face-to-face sample.
Our results are consistent with the generally recognized higher costs of conducting face-to-face interviews relative to phone.
Furthermore, the ratio of costs is also on par with what has been reported in the literature of around 2 to 1 (van Campen, 1998; Warner et al., 1983;Weeks et al., 1983).This study, to our knowledge, is the first published to date that compares fielding costs per completion by mode within the WMH survey consortium.Arguably, the cost savings of conducting a lengthy survey over the phone can be reinvested into enhancing interviewer supervision, study visibility, and respondent assistance.
This reinvestment approach was employed in our phone study, where we allocated more resources to quality control monitoring, including a higher proportion of live interview monitoring and the use of technologies to capture and assess in real-time quality indicators from paradata and survey data.These indicators, in turn, highlighted which interviewers needed more attention to correct undesirable and potentially bias-inducing behavior.The advertisement budget for the study was also robust, as was the investment in handling respondent questions and concerns.It is worth noting, however, that costs for the telephone-based survey increased further over time during the production phase, as detailed in another manuscript in this journal issue (Khaled et al., 2024).
Regarding differences in demographic variable distributions across the two modes, the two samples were similar on most basic sociodemographic characteristics, except for gender.We found a statistically significant gender difference, with males constituting a higher percentage of the phone respondents than in the face-to-face sample (60% vs. 51%).This finding is also consistent with previous literature showing that males are somewhat more likely to participate in phone surveys compared to face-to-face surveys (Aneshensel et al., 1982;Ellis & Krosnick, 1999;Groves & Kahn, 1979;Weeks et al., 1983).
In terms of the Middle East and Qatar in particular, men are generally less inclined to participate in research and probably even less likely to participate in mental health research because of negative cultural attitudes and stigma against mental illness (Zolezzi et al., 2017).Therefore, participation gains among members of this group of the target population are advantageous.The increased privacy offered by the phone may lead to higher participation among males than in face-to-face surveys and perhaps even more accurate reporting of less socially desirable attributes related to the symptoms and burden of mental illness.However, there is no way of ascertaining the latter possibility from our study.
Finally, in addressing the crucial question of whether mode effects would influence the main survey estimates of interest for the study, our findings are largely reassuring.Both modes resulted in similar lifetime prevalence estimates of mental illness for the two main classes of disorders assessed in the WMHQ.Surveys conducted within the WMH consortium are known for their high quality and rigor in terms of estimates of mental illness prevalence and their associations with risk factors for mental illness (Kessler et al., 2009).However, to date, none of these surveys have used the telephone as the primary data collection mode.It is standard practice in the WMH surveys that, while the majority of the interview is completed face-to-face, long interviews requiring multiple visits to a household may be completed using a telephone follow-up.These results also logically support the validity of using the phone in this secondary role as well.
Importantly, in our study, the distribution of respondents across the number of disorders, which relates to severity of symptoms and burden of illness, were the same across both modes.This finding reassures us of the similar overall quality of the responses and supports the viability of a phone survey as a credible alternative to face-to-face.This is especially apparent during a pandemic like Born of necessity, two large probability-based pilots were conducted prior to and during the COVID-19 pandemic, using different modes of data collection.These allowed us to compare and contrast the methodological aspects of each mode.To our knowledge, this study is the first to provide evidence supporting the feasibility of telephone interviewing as a substitute for face-to-face interviews within the WMH survey initiative.The study's findings confirm that telephone interviews can yield similar criterion-based mental disorder prevalence estimates as face-to-face interviews.There are some caveats and limitations, however.
First, this survey targeted an Arabic-speaking population in a region with relatively high telephone response rates compared to many Western countries.Second, it was conducted under conditions permitting a high-coverage, relatively efficient cellular phone sample.
Third, the savings from not conducting a more costly face-to-face survey were in part reallocated to robust quality monitoring, advertising, and respondent outreach.This is necessary to offset the difficulties of gaining and sustaining cooperation for a long interview on a sensitive topic.In addition to these costs, pandemic conditions require a distributed network CATI system so that interviewers and their supervisors can work remotely.Such a system tends to be more costly to administer than a centralized calling lab.
A theoretical limitation is that there was no experimental assignment of mode.The two samples are drawn using different methods, and even if this issue could be overcome through some other sampling method, such as address-based (ABS), this comparison arose due to the unexpected and sudden onset of a pandemic.
The latter precluded a fully experimental design.With those caveats, overall, this study lends support to the feasibility of adopting a phone strategy for future mental health surveys where a probability sample is desired in the context of future pandemics.It also informs of the potential for changing the way data are collected under conditions of necessity, even for very long and sensitive studies like those typically administered within the WMH consortium.
after shortening the WMH questionnaire for phone mode, it remained quite long, taking 50-60 min to complete, in contrast to the typical 20-25 min duration of a standard phone survey.Consequently, the telephone pilot included an experiment that employed a gift-based incentive to boost the participation rate.The incentive offered was an electronic gift card for food outlets (one card per eligible participant) valued at 14 US dollars (USD), delivered via SMS text, or the option to donate the equivalent value of the gift card to a Qatar-based charity of their choice.
following the guidelines set by the American Association for Public Opinion Research(AAPOR, 2015).Completed responses included those who finished the whole survey questionnaire (reaching the last question in the survey).Those who did not complete the survey interview were divided into three categories: eligible, ineligible, and cases of unknown eligibility.Eligible respondents ("eligibles") included Arab residents who either refused to participate in the study, agreed to an appointment, but did not fulfill it upon follow-up, or completed part of the interview.Ineligible respondents ("ineligibles") included mostly non-Arabs and those under 18 years of age.Unknown eligibility cases ("unknowns") encompassed housing units with no one at home (in the face-to-face survey) or phone numbers with no answer compared proportions using p-values based on the F-transformed version of the Pearson Chi-square statistic, with a significance level defined at 0.05 for a two-tailed test.All statistical analyses were performed using Stata Software version 16 (Stata, 2016).
Sociodemographic characteristics of Qatar's national mental health survey pilots.
T A B L E 2T A B L E 3 Survey measures descriptive statistics by survey mode.