Is Client Reporting on Contraceptive Use Always Accurate? Measuring Consistency and Change with a Multicountry Study

The consistency of self-reported contraceptive use over short periods of time is important for understanding measurement reliability. We assess the consistency of and change in contraceptive use using longitudinal data from , urban female clients interviewed in DR Congo, India, Kenya, Niger, Nigeria, and Burkina Faso. Clients were interviewed in-person at a health facility and four to sixmonths later by phone.We compared reports of contraceptive use at baselinewith recall of baseline contraceptive use at follow-up. Agreement between these measures ranged from . percent in DR Congo to . percent in India. Change in both contraceptive method type (sterilization, long-acting, short-acting, nonuse) and use status (user, nonuser, discontinuer, adopter, switcher) was assessed comparing baseline to follow-up reports and retrospective versus current reports within the follow-up survey.More change in use was observed with panel reporting than within the cross section. The percent agreement between the two scenarios of change ranged from . percent in DR Congo to . percent in India, with cross-site variation. Consistently reported change in use status was highest for nonusers, followed by users, discontinuers, adopters, and switchers. Inconsistency in self-reported contraceptive use, even over four to six months, was nontrivial, indicating that studying measurement reliability of contraceptive use remains important.


INTRODUCTION
Self-reports of sensitive or personal behaviors cannot be avoided when studying sexual activity, contraceptive use or abortion practice, or illicit behaviors. Measures of these outcomes are necessary for understanding their risk factors and determinants. In the family planning field, reliance on national household surveys of women of childbearing age has provided a wealth of insights for decades but requires measurement of reproductive behaviors that may not be accurately reported.
In the early years of developing survey measurements of reproductive and contraceptive behaviors, more effort was devoted to establishing reliability and validity than in recent years (Bignami-Van Assche 2003). For example, Anderson and Cleland (1984) compared measurement of current contraceptive use between the World Fertility Surveys and the Contraceptive Prevalence Surveys, noting that ambiguity with how "current" is interpreted, excluding unmarried women as respondents on contraceptive use, and harmonizing the denominator of exposed women were some reasons behind disparate estimates. Pebley, Goldman, and Choe (1986) compared data from Korean surveys in the 1970s to assess consistency of reported contraceptive use to conclude that adequate interviewer training, having contraceptive method awareness questions precede questions on use, and asking about use within defined intervals all improved reporting reliability. An experimental contraceptive calendar introduced in the 1986 Peru Demographic and Health Survey (DHS) enabled assessment of data quality (Goldman, Moreno, and Westoff 1989) and was subsequently included in the core DHS female questionnaire. Two more recent studies of reporting reliability added further insights, both comparing baseline responses about contraceptive use at a particular point in time with retrospective reports using a calendar. The first is based on a panel of Moroccan DHS respondents followed up after three years (Strickler et al. 1997) and the second on a panel of rural Bangladeshi women also reinterviewed after three years (Callahan and Becker 2012). Both studies found substantial discordance in current use reports at the individual level, while aggregated use measured at the population level was more similar. In recent years, there have been fewer assessments of self-reports of contraceptive use, but with annual measurement of contraceptive use in selected countries through the Performance Monitoring for Action (PMA) surveys (Zimmerman et al. 2017), there is renewed opportunity to assess the reliability of self-reported contraceptive use.
Measurement errors during the data collection process can affect the validity and reliability of the key indicators and will be important in program monitoring of contraceptive behavior and validation of routine health information data (Nock, Zeller, and Carmines 1982). Response rates and reporting errors are affected by individual respondent and interviewer characteristics and the content and nature of the questions themselves. Sensitive, personal questions may be subjected to larger biases in response and as a result be less reliable (Knodel and Piampiti 1977). Large surveys employ probability samples to reduce measurement errors by design, while also addressing reliability, stability, and internal consistency concerns by using standardized well-tested questionnaire wordings (Kimberlin and Winterstein 2008). Another factor that may impact survey outcomes is the means of survey administration. Direct face-to-face surveys may offer the benefits of accuracy of screening responses but also can introduce interviewer effects and social desirability bias in responses to sensitive  Is Client Reporting on Contraceptive Use Always Accurate?
Because self-reported contraceptive use is a central measure in family planning research, it is important to continually assess the accuracy and consistency of self-reports. Surveybased instruments, such as recall using calendars, enhance the importance of accurate measurement. Expanded use of panel surveys to study change in contraceptive behaviors, such as Karp et al. (2021), further reinforce the need to assess the quality of personal reports. Gaps in knowledge remain, for example, in the degree of consistency over time, especially at short intervals and where methods are used in combination serially or simultaneously. At the population level, trends in contraceptive use status can mask change at the individual level. As fertility levels decline, the time spent being sexually active before, between and after births may be more or less protected by contraception and other family planning means to avoid unintended births. This heightens the significance of understanding the dynamics of contraceptive use over different intervals of time and obtaining answers to whether survey respondents, who are interviewed multiple times, recall their past behaviors accurately and consistently.
This study addresses the following two research questions using longitudinal data from six large samples of female clients in urban Democratic Republic of Congo (DR Congo), India, Kenya, Niger, Nigeria, and Burkina Faso, who consented to being reinterviewed by phone after an in-person exit interview at the attended health facility: (1) How consistently does a client report her contraceptive use status and method type at baseline interview compared to her retrospective report of that use at follow up four to six months later? (2) How much change in contraceptive use status and method type is observed based on comparing female clients' reports: (1) across surveys (baseline and follow-up), and (2) within the follow-up survey only (cross section)? Related to this, how similar or consistent are the two distributions?

Data
Data for this research come from panel surveys of female clients between the ages of 18-49 who attended an urban health facility. The surveys were conducted by in-country research institutions collaborating with the Performance Monitoring and Accountability (PMA) Agile project (www.pmadata.org/technical-areas/pma-agile). PMA Agile was a continuous datamonitoring and evaluation system that collected quarterly estimates of family planning and reproductive health care readiness measures from a probability sample of up to 200 public and private health facilities in each survey site. Semiannual surveys were conducted with a systematic sample of female and male clients-ten per facility-to assess their family planning consumption behaviors. Data were collected in urban sites using resident enumerators; the surveys are conducted at low cost with rapid turnaround (Tsui et al. 2020 Tsui et al. (2020). We use data from client exit interview (CEI) surveys and CEI follow-up surveys conducted by phone approximately four to six months later. Exit interviews were conducted with female clients systematically selected from a probability sample of public and private health facilities in the sites. Upon completion of the exit interview, participants were consented for a follow-up phone survey and asked to provide up to two telephone numbers where they could be reached. A mobile airtime card or recharge with a value of about 1 USD was provided to each respondent completing the baseline interview.
In five countries, the baseline CEI survey was conducted in 2018 (Niger took place in 2019). The total sample of female clients with completed baseline interviews was 13,316: 1,226 in DR Congo,1,596 in India,4,431 in Kenya,936 in Niger,3,615 in Nigeria, and 1,512 women in Burkina Faso. Consent for follow-up by telephone was obtained from 11,978 clients (90 percent) of whom 11,150 (93 percent) had telephone access. Of these, 9,390 (84 percent) completed follow-up interviews: 751 of 876 in DR Congo (86 percent), 659 of 1,002 in India (66 percent), 3,941 of 4,274 in Kenya (92 percent), 515 of 667 in Niger (77 percent), 2,326 of 2,947 in Nigeria (79 percent), and 1,198 of 1,384 in Burkina Faso (87 percent). Overall, 71 percent participants from the baseline sample were successfully reinterviewed through phone follow-up. Usually, the same interviewer at baseline conducted the follow-up interview. Mobile phone airtime of 1 USD, transmitted electronically, was again provided to the followed-up clients.
All survey data used for analysis were deidentified. Ethical approval was not sought for this analysis of secondary data.

Measures
In the baseline interview (T1), the client's current contraceptive use status was determined based on her responses to her reason for the visit. For a family planning client, her current contraceptive method was the one she was either prescribed or dispensed. If she came for nonfamily planning care, she was asked her current use status, "Are you or your partner currently taking any steps or using any method to avoid or prevent becoming pregnant?," irrespective of where she received this service. If she answered in the affirmative, she was asked about her (or her partner's) current method, including traditional methods. This point of measurement is defined as Time 1 current or T1c. At the follow-up interview (T2), the female client was asked to recall her contraceptive use and method at the time of the baseline exit interview: "We interviewed you at [facility name] about 4 months ago. At that time were you or your partner then using a method to avoid or to prevent becoming pregnant?" If she answered yes, she was then asked about the method she was prescribed or was using (this referent being Time 1 retrospective, or T1r). She was next asked about her current contraceptive use and method, repeating the same question wordings as in the baseline, for her T2 use status. To assess change, we classify the female client's use status as a (1) continuing user or (2) continuing nonuser if she reports using or not using consistently at both T1 and T2. We classify her as a (3) discontinuer if she reports using at T1 and not using at T2, or as (4) an adopter, if she reports not using at T1 but becomes a user at T2. Last, she is classified as (5) a method switcher if she reports using a different method at T1 compared to T2.
It can be challenging for women to recall their contraceptive status from four to six months ago accurately; retrospectively reported baseline contraceptive use may suffer from recall bias. It is also possible that we are measuring actual behavior change (adoption, discontinuation, and switching) in both scenarios and not inconsistent reporting. We cannot disentangle these two sources of variation, unfortunately and will instead focus on agreement among pairwise reports from T1c, T1r, and T2. These comparison pathways are shown in Figure 1, distinguishing the consistency of reporting the baseline method T1c-T1r with the dashed line from the shaded change scenarios of T1c-T2 and T1r-T2. Consistency in change reporting involves then agreement between the T1c-T2 and T1r-T2 scenarios of method type and status.
To assess the compositional heterogeneity of the client samples, we examine selected characteristics of the facility where the in-person exit interview took place and the reason for her visit. The facility type is classified into broad categories of medical college or hospital, private health clinic, public health clinic, and pharmacy. The reason for her health visit is classified as for family planning, maternal health, child health, or general health/other services. The client's age is grouped into 18-24, 25-34, and 35-49 years. Her number of own children is classified as 0-1, 2, and 3 or more. Client schooling is measured with ordinal categories taking the underlying distribution in each country into account. Across the sites, the categories have been combined into three, representing (1) never attended or attended primary school, (2) attended secondary or vocational school, and (3) attended tertiary school or higher. Household well-being is measured using client's self-report, using a Cantril-like economic ladder question (Cantril 1966) scaled 1-10 with 1 being the lowest step for the poorest and 10 being the highest step for the richest. Economic ladder categories are (1) poorest (steps 1-3), (2) poorer (step 4), (3) richer (step 5), and (4) richest (steps 6-10). Marital status is classified as married or living with someone and not in-union, with the latter including clients who are single/never married, divorced, or widowed.

Analyses
We first describe the composition of the client samples to assess the compositional heterogeneity across our six countries in Table 1. We then compare two outcome measures of the female client's self-reported contraceptive use at the baseline in-person interview (T1c) with her retrospective report of that use (T1r) at the follow-up telephone interview to examine consistency. The first outcome (Table 2) is the specific method she reports (sterilization, IUD, implant, injectable, pill, emergency contraception, condom, other modern or traditional method, and none), and the second outcome (Table 3) combines those methods into four categories (permanent/sterilization, long-acting, short-acting, and nonuse). These comparisons are made separately for each of the six sites panels. By ordering and grouping contraceptive methods by their effectiveness, we further test the sensitivity of reporting consistency. We compute Cohen's kappa statistic comparing the distribution between T1c and T1r to assess response agreement. This coefficient accounts for the chance of agreement in a nominal variable by computing the proportion of agreement after chance agreement is removed. The difference in the proportion of units in which women's reporting agreed minus the proportion of units for which agreement is expected by chance, is divided by the proportion of units for which agreement is not expected by chance (Cohen 1960). The strength of agreement of a kappa statistic can be interpreted as almost perfect if the statistic falls between 0.81 and 1.00, substantial between 0.61 and 0.80, moderate between 0.41 and 0.60, fair between 0.21 and 0.40, and slight between 0.00 and 0.20 (Landis and Koch 1977).
To assess the percent agreement in change over the four months, we next compare the client's baseline report of current use with her follow-up report of current use at time 2 (time 1c with time 2, or T1c-T2) using the four-method category (Table 4, left panel). We also compare her retrospective recall of the baseline method reported at the follow-up interview with the method she reports currently using at the follow-up interview (time 1r with time 2, or T1r-T2), again with the four-method-category (Table 4, right panel). This comparison by specific methods is available in Online Appendix Tables T1 and T2.
We then assess consistency in reported change based on continuity or shift in the client's use status by comparing T1c-T2 against T1r-T2. We test the level of agreement between these two distributions with Cohen's kappa (Table 5). The last analysis examines the percent agreement (and 95 percent confidence intervals) in use status comparing T1c-T2 against T1r-T2 (Table 6) across the six urban samples. These percentages are based on pairwise agreement among subsamples of clients who report being in the same use status at both times (T1c-T2 and T1r-T2) or at least one of them (T1c-T2 or T1r-T2).

RESULTS
A total of 9,390 female clients between the ages of 18 and 49, who consented to follow-up, were reached by phone and had complete information on contraceptive use consistency between and within follow-up and baseline. The client analysis samples across the countries are: DR Congo 755, India 653, Kenya 3,940, Niger 515, Nigeria 2,324, and Burkina Faso 1,198.
The percent distribution of urban contraceptive users by method at baseline varies by country sample (see Figure 2). In India, the majority of women rely on sterilization, 42 percent for female sterilization and less than one percent for male sterilization, followed by male condoms reported by 32 percent of users. In Kenya and Nigeria, most users rely on injectables, 46 and 30 percent, respectively, while in Niger most clients use the pill (57 percent). However, the second method most used is different for these countries. In Kenya 31 percent and in Niger 22 percent are implant users, in Nigeria 26 percent are male condom   users. In Burkina Faso, clients reported using IUDs and injectables about equally 28-29 percent. Contraceptive users in the DR Congo have a different method distribution than in the other five countries, with the majority relying on male condoms (32 percent) and a smaller fraction on IUDs (20 percent). The method choice distribution, which will also be affected by facility supply, can play a role in continuity and consistency of reported use.
In Table 1 we present the composition of the six panel samples of female clients according to selected client characteristics-reason for facility visit, type of facility, age group, parity group, schooling, household wealth, and marital status. The percentages of urban clients to present for family planning are relatively small, ranging from 4.1 percent in the DR Congo to 16.1 percent in Niger samples. Most clients present for general health or child health, followed by maternal health reasons. Most clients were interviewed at hospitals and public health clinics in the DR Congo, Kenya, and Niger samples, while private health clinics were a major source of client interviews in Nigeria and Burkina Faso samples. A high proportion of clients were interviewed at pharmacies in the Indian cities. The age composition of clients was relatively equal across sites with about half of females being 25-34 years of age. Clients had a higher number of children (three or more) in Niger (46.7 percent), Nigeria (

.
NOTE: Cell percentages sum to 100.0 within country panels. Inconsistent any is defined as those clients with an unmatched consistency report between T1c-T2 scenario and T1r-T2 scenario within country panels, not in the diagonal of the matrix. (See Figure 1 for    (41.7 percent). A majority of the samples was married or in-union, ranging from 92.2 percent in India to 65.3 percent in DR Congo. Table 2 presents the percent agreement between clients' reports of the method they currently use (T1c), and they retrospectively recalled (T1r) by site. Each panel sums to 100.0 percent with the percentage values on the diagonal indicating agreement, and the off-diagonal elements disagreement. Nonuse tends to have the highest agreement, ranging from 54.22 percent in the Nigeria sample to 47.93 percent (India), to 45.43 percent (DR Congo), to 38.23 percent (Burkina Faso), to 37.09 percent (Niger) to a low of 21.37 percent (Kenya). Low use of sterilization is reflected in small percentage agreements, except in India (15.31 percent), and a similar pattern is observed for IUDs. Consistent reporting on implant use is observed in Kenya (14.31 percent), Burkina Faso (11.69 percent), and Niger (8.54 percent) samples, with consistency in injectable reporting also high in the Kenya (23.10 percent) and Burkina Faso (10.85 percent) samples. Oral pill use is reported consistently by client users in Niger (22.72 percent) and condom in India (11.03 percent). The total percent agreement, based on the sum of the diagonal values, is not perfect, despite the short interval, ranging from 59.07 percent in DR Congo to 84.37 percent in India.
These patterns are reinforced in comparing agreement by client reporting using the four method type categories as seen in Table 3. Disagreement is observed, however, with the larger percentages for DR Congo, Niger, and Nigeria clients reporting short-acting method use in T1c but nonuse in T1r, that is, 15.23, 10.49, and 12.35 percent respectively. Likewise, high offdiagonal values are seen for Burkina Faso clients reporting using long-acting methods in T1c and sterilization in T1r (14.69 percent) as well as short-acting methods in T1c and long-acting methods in T1r (23.54 percent).
With the exception of the DR Congo, the kappa values in Table 2 have a moderate or better strength of agreement and reveal varying levels of consistent reporting of the specific baseline method, ranging from a high value of 0.753 in the India sample to a low of 0.337 for DR Congo. Next highest are 0.633 and 0.631 for Burkina Faso and Niger, respectively, with 0.564 for Kenya and 0.471 for Nigeria. The kappa values in Table 3 for consistent reporting by method type are very similar to those in Table 2 with the DR Congo value of 0.420 being moderate this time. Table 4 addresses the question of how much short-term change is observed in contraceptive method use, based on four categories, when judged by the baseline method as currently (T1c) or retrospectively (T1r) reported. The left panel of columns compares method type from T1c to T2 and the right panel from T1r to T2. Examining the percent agreement values on the diagonals for each country, we see greater agreement in the T1r-T2 reports than T1c-T2 ones. This suggests that clients are more likely to report to the interviewer that they are still using the same method at follow-up as recalled for baseline, than when compared to the method they reported at baseline. Because nonuse values are smaller in the T1c-T2 comparison (left panel), this implies clients are reporting more current use at baseline than four months later. A source of the shift is from baseline reports of short-acting method use to subsequent nonuse; for example, in the T1c-T2 comparison, we see percentages of 15.1 percent (DR Congo), 8.9 percent (India), 8.8 percent (Kenya), 15.3 percent (Niger), 10.3 percent (Nigeria), and 12.9 percent (Burkina Faso) for clients reporting discontinuation. The same percentages based on the T1r-T2 comparison are lower, that is, 7.2, 6. 1, 5.7, 8.7, 2.4, and 8.9 percent respectively. The extent of change in contraceptive behavior may not be large over four months, especially for clients reporting use of permanent or long-acting methods. However, given research showing significant discontinuation of short-acting methods (Ali, Cleland, & Shah, 2012) or of increased contraceptive availability (Ahmed et al., 2019), we expect to observe contraceptive switching, adoption and discontinuation even after four months. We are interested in assessing both the extent of T1c-T2 and T1r-T2 change as well as how similar or consistent the reported change is. If the T1c-Tr comparison is highly consistent, we would expect T1c-T2 and T1r-T2 to also be consistent. Table 5 examines change and consistency in reported use and nonuse, discontinuation, adoption, or switching by comparing the T1c-T2 and T1r-T2 scenarios. First, we discuss their marginal distributions as these reflect measured change, and then we examine their internal agreement. We see that based on panel reports (T1c-T2) a smaller percentage of clients are classified as continuing users, and a larger percentage as continuing nonusers, than when assessed in the cross section (T1r-T2). For example, 15.1 percent of the client panel in DR Congo reported themselves as users both in T1c and T2, whereas 24.6 percent reported accordingly based on the T1r and T2 comparison. For nonusers, the percentage reporting to be nonusers at both times is 41.6 percent based on T1c-T2 and 51.4 percent based on T1r-T2. This pattern holds true for all country panels except India where the difference for users and nonusers is small. The largest gap among continuing users is registered in Burkina Faso, 28.3 percentage points, and in Niger among nonusers, 15.3 percentage points. A similar pattern is observed for reporting of change in baseline method between T1c-T2 and T1r-T2 (Table 4). It appears that although only separated by about four months, retrospective reporting of one's contraceptive use results in higher prevalence than when based on actual in-time, longitudinal reporting. Longitudinal data measurement also appears to result in greater percentages reporting discontinuation, switching and adoption than within a cross-sectional round. In terms of crosssite variation, greater change in discontinuation is seen in Burkina Faso and Nigeria samples, while for switching, levels are higher in the DR Congo, Kenya, and Burkina samples.
Consistency in reported use status between T1c-T2 and T1r-T2 is observed with the percent agreement findings in Table 5. For DR Congo, 38.8 percent of the client panel consistently reported being a nonuser over time and 11.9 percent a user, with 4.0, 3.6, and 6.5 percent reporting being a discontinuer, switcher, or adopter. These total 64.8 percent, with 35.2 percent inconsistently reporting their change over four months. For example, 6.9 percent of DR Congo clients are classified as discontinuers based on T1c-T2 and as nonusers based on T1r-T2. The total percent disagreement (off diagonal totals) ranges from the high of 35.2 percent for DR Congo to a low of 15.5 percent (India) and between 23.7 percent for Niger and 29.8 percent for Kenya. Many of the off-diagonal values reflect client reports of shifts toward nonuse (discontinuation), adoption and switching whereas they retrospectively report themselves as users or nonusers at baseline. The degree of consistency in change reported with the two distributions is also revealed in the kappa values, all of which have a strength of agreement of moderate or better. These follow a similar pattern to the kappa values seen in Table 3, with the highest observed for the India sample (0.770), followed by Burkina Faso (0.669) and Niger (0.619), and the lowest in the DR Congo (0.420).
In Table 6, we further our examination of how consistent the reported change is by looking individually at the percent agreement for the five user statuses and across sites. The xxxx  Studies in Family Planning () percent agreement is tabulated for clients classified in each of the five user statuses either in the T1c-T2 or T1r-T2 distributions. This avoids biasing the percent agreement upward with the many remaining clients who will consistently not be in most of the pairwise comparisons.
The results affirm again that nonusers have the most consistent reporting, with the highest percent agreement values, ranging from 0.893 in India to 0.636 in Kenya. After nonusers, clients who are users show the next highest percent agreement, ranging from 0.741 in India to 0.429 in DR Congo. Discontinuation also tends to be consistently reported either in the T1c-T2 or T1r-T2 change measure in India, Niger and Burkina Faso but least in Nigeria. The percent agreement for switching or adopting between the two change distributions is weak to modest across all sites, except India.

DISCUSSION
Survey interviews are the principal mode of data collection for individual-level contraceptive practice in low-income countries (Khan et al. (2007). The validity and reliability of women's reports of their status as users are critical to obtaining an accurate profile of their pregnancy management behaviors, not only for program planning purposes but also for scientific measurement and understanding. Our descriptive study documents that females interviewed four to six months apart, first upon exit from a health facility visit and then by phone, provide two different profiles of consistency in reported baseline method and subsequent change in use status. When we examined reported change across T1c to T2 rounds and then by T1r to T2, the former revealed more status changes for users than the latter comparison, except in India. For T1r-T2 transitions, being a continuing nonuser outsized most other categories, raising the question of whether use at baseline was more accurately reported under face-to-face circumstances than four months later by telephone. We next compared the reporting consistency of these two scenarios (T1c-T2 vs. T1r-T2). Only 0.6 percent of Nigerian female clients consistently reported themselves as contraceptive users both in T1c-T2 and T1r-T2, with the counterpart percentages being 11.9 percent for DR Congo, 25.1 percent for Burkina Faso, 28.0 percent for India, 28.2 percent for Niger, and 38.5 percent for Kenyan clients. Our measures of consistent reporting were highest in India and lowest in DR Congo overall. This is partially explained by the high use of sterilization among contraceptives users in India and the high use of short-term contraceptive methods among contraceptive users in DR Congo. Because we have observed nontrivial amounts of inconsistent reporting, this suggests continuing attention to measurement reliability. Inconsistency may be the result of unwillingness to disclose methods at the baseline interview, misreporting or other recall bias. Our findings have the following implications for survey measurement of contraceptive behavior: (1) reports of nonuse are likely to be reliable; (2) currently obtained reports of use are next most likely to be reliable; (3) change in use is more robust in a panel study than when assessed with retrospective reporting; and (4) the extent of change in starting, stopping, and switching methods is dependent on any change being measured reliably. This third finding has implications in the reliability of the information collected in the contraceptive calendar, which is widely used to measure contraceptive use dynamics in low-income countries.
Our study design offers a number of strengths, including its demonstration of the feasibility of interviewing urban female health clients about contraception by phone in low-income countries. A large share, 71 percent, of the female clients were successfully reinterviewed, after being consented for follow-up, providing contact telephone numbers (either theirs or a family member's phone number), and then being reached and completing the interview by phone four to six months after the baseline interview. Another study strength is its longitudinal design that enabled assessing contraceptive use change and in particular the rate of adoption and discontinuation for some country sites. Across the six countries between 2.5 and 7.7 percent consistently reported going from nonuse at baseline to adopting a contraceptive method at follow-up, and between 1.9 and 9.7 percent consistently reported stopping use. In addition, while studies have monitored fertility preferences in low-income countries at short intervals, for example, Sennott and Yeatman (2012), to our knowledge there are no longitudinal studies conducted in low-income countries assessing shifts in women's contraceptive use with short intervals of time between surveys.
The inconsistent reporting of one's contraceptive status just four months earlier raises the question of which report to accept as the actual situation-the one at baseline or at followup. Levels of change were observed to be higher between baseline and follow-up rounds than within the follow-up round, suggesting that inaccurate recall of use four months earlier may be a factor. We cannot eliminate the possibility that the mode of survey administration, faceto-face or telephone, was a factor. It would also be important to establish if the baseline method was intentionally or unintentionally misreported as we are unable to differentiate change from misreporting.
Our study has its limitations, one being the limited generalizability of the baseline samples to all females seeking health care services. While systematic random sampling protocols were in place, their full implementation could not be assessed because the number of clients approached and recruited to participate was not uniformly recorded, limiting our ability to record accurately client flows per facility. Thus, we were unable to calculate client selection probabilities and weight the data accordingly. A second possible source of measurement bias was that contraceptive use at baseline was captured differently for family planning clients than for clients visiting for other services. The family planning client's reported method prescribed or dispensed was assigned as her current contraceptive method, while this was directly asked to other clients. However, the percentage of clients seeking family planning services at baseline was less than 16 percent in all countries. A third limitation is the low follow-up rates in India (59 percent) and DR Congo (68 percent) due to phone access and the inability to reach and reinterview many women by phone, which can potentially introduce selection bias to our estimates. Female clients in India reported low phone access, for example, phones were owned by their husbands and landline phones were scarce, while clients in DR Congo were difficult to recontact. Nonetheless, the other country samples had recontact rates of 73 percent or higher and the ability to reinterview the same female clients adds to the validity of the results. Last, our analysis does not attempt to assess whether the mode of survey interview affected response reliability, as interviews were all face-to-face at baseline and by phone at follow-up. This can affect the assessed consistency of self-reported contraceptive use. Respondents could be more or less accurate about their use four months earlier or when followed up.
Our study's focus has been on the accuracy of clients' self-reported contraceptive use and assessing consistency and change both between baseline and follow-up and within follow-up rounds across six urban samples. By doubly comparing self-reported use under the two xxxx  Studies in Family Planning () scenarios, our findings add to the research literature on reporting consistency of sexual and reproductive behaviors. The literature has largely focused on the reliability of reported sexual behaviors across panel rounds or waves, for example, Goldberg et al. (2014), Alexander et al. (1993), Sieving et al. (2005), or between cross-sectional samples and calendar-based retrospective reports (Strickler et al. 1997;Callahan and Becker 2012) but not when retrospectively recalled within a round. The study's cross-national samples, while not generalizable to the population of health-seeking females at large, are robust in size and exhibit patterns that would be expected in these settings. For example, consistency in being a continuing nonuser in India comports with the high proportion of client interviews taking place at pharmacies. Given the predominant method in the country is female sterilization, women are unlikely to be obtaining this method at such locations and their spouses are likely purchasing the condoms. Similarly, low contraceptive prevalence in Niger, Nigeria, and DR Congo indicates that most women are not using and thus there is a higher probability of continued nonuse being consistently reported by female clients both at four months and within the follow-up survey. Potential misreporting raises concerns of validity and reliability of measured indicators (Nock, Zeller, and Carmines 1982). The inconsistency in reporting that we observe, which may well have acceptable reasons behind it, has implications for studying contraceptive use dynamics specifically and sensitive behaviors generally. Panel data are invaluable for studying the determinants of behavioral change. At the same time, continued evaluation of the validity of measures and indicators that are of significant scientific and programmatic importance is warranted.