The PRIDE Study: Evaluation of online methods of data collection

Abstract Background Large birth cohort studies are extremely valuable in assessing associations between early life exposures and long‐term outcomes. Establishing new birth cohorts is challenging due to declining participation rates. Online methods of data collection may increase feasibility, but have not been evaluated thoroughly. Objective The primary objective of the ongoing PRegnancy and Infant DEvelopment (PRIDE) Study is to identify exposures during pregnancy and in early life that may affect short‐term or long‐term health of mother and/or child. In this manuscript, we aimed to evaluate methods of recruitment and online data collection applied. Population Dutch women aged ≥18 years in early pregnancy. Design Prospective cohort study. Methods Initially, only prenatal care providers recruited participants, but alternative recruitment methods were added as a result of disappointing participation rates, including collaboration with “Moeders voor Moeders” (organisation that visits women in early pregnancy) and Facebook advertisements. Data on demographic characteristics, obstetric history, maternal health, life style factors, occupational exposures, nutrition, pregnancy complications, and infant outcomes are primarily collected through Web‐based questionnaires at multiple time points during and after pregnancy. Additional data collection components include paternal questionnaires, blood and saliva sampling, and linkage to medical records. Preliminary results By September 2019, 9573 women were included in the PRIDE Study, of which 1.3% completed paper‐based questionnaires. Mean age of the women analysed was 30.6 years, 71.1% had a high level of education, 57.2% were primiparae, and mean gestational age at enrolment was 9.9 (range 3, 37) weeks, with slight differences between recruitment methods. Pregnancy outcome was known for 89.8%. Retention rate at 6 months after the estimated date of delivery was estimated at 70%. Multiple validation studies conducted within the PRIDE Study indicated high data quality. Conclusion(s) Although challenging and time‐consuming, online methods for recruitment and data collection may enable the establishment of new birth cohort studies.


| INTRODUC TI ON
Large birth cohort studies have proven to be extremely valuable for assessing associations between exposures early in life and long-term outcomes. 1  Unfortunately, the cancellations of the US National Children's Study (NCS; 2014) and the UK Life Study (2015), which aimed to include 80 000-100 000 infants, may have caused other researchers to refrain from establishing large birth cohort studies in the near future. 8 Parts of the reasons to cancel these two studies were the low recruitment and participation rates, which declined both in birth cohort studies and in general health-related research over the last decades. 9 A major reason for this gradual decline, which became steeper in recent years, is the subjective experience of being too busy by potential study participants. 10 Therefore, efforts should be undertaken to decrease study participant burden, for example by using modern methods of data collection. Implementation of online data collection methods may increase the feasibility of health-related studies in general and birth cohort studies in particular. For example, completing a Web-based questionnaire was reported to take only about half the time needed to answer the same questions in a telephone interview. 11 However, reports of best practices for recruitment and data collection in the field of paediatric and perinatal epidemiology are scarce.
The PRIDE Study was designed with application and validation of Web-based questionnaires and other online methods of data collection in mind. The design of the PRIDE Study was published in detail previously. 12 In the current paper, we evaluate the methods of recruitment and online data collection that are being used in this ongoing study. In addition, we give some recommendations on how to optimise recruitment and data collection in such studies.

| Overview, structure, and operations
The primary goal of the PRIDE Study, a prospective cohort study with follow-up into childhood, is to identify factors and circumstances to which women and their (unborn) children are exposed during pregnancy and in early life that may affect short-term or long-term health of the mother and/or the child.

Study question
How did the methods of recruitment and online data collection implemented in the PRIDE Study, an ongoing national prospective cohort study among pregnant women, perform?

What's already known
Online methods of recruitment and data collection may increase the feasibility of new birth cohort studies, but these have not been evaluated thoroughly.

What this study adds
Although challenging with regard to recruitment and retention, it is feasible to collect high-quality data on large numbers of pregnant women and their offspring using online methods of data collection. Evaluation and validation of the new methods used enhance the value of these data for epidemiologic studies.
PRIDE Study data is conducted by investigators from the project team as well as by national and international collaborators.

| Eligibility criteria
All Dutch pregnant women aged 18 years or older and able to understand the Dutch language are eligible for participation in the PRIDE Study. Although we aimed to include pregnant women before gestational week 17 only, it appeared to be infeasible to always uphold this criterion during enrolment in practice. Therefore, 3.9% of the participants are ≥17 weeks pregnant at enrolment.
Gestational carriers and traditional surrogates are excluded from the PRIDE Study.

| Recruitment methods
Based on a pilot study among all midwifery practices and hospitals in the Nijmegen region, 12 we initially planned to recruit pregnant women nationwide through prenatal care providers only.
Participating midwifes and gynaecologists invite pregnant women just before or during their first prenatal care visit, which usually takes place between gestational weeks 8 and 12. As a result of disappointing inclusion rates and new opportunities, however, alternative methods for recruitment of study participants were implemented as well. We partnered with "Moeders voor Moeders" (Mothers for Mothers), 13 an organisation that collects urine from women between gestational weeks 6 and 16 to extract human chorionic gonadotropin for the production of medication used in fertility treatment. Furthermore, we implemented intermittent Facebook and Google AdWords advertisements, 14 participated in exhibitions at pregnancy fairs, and placed advertisements in magazines targeted at pregnant women. Figure 1 shows an overview of the PRIDE Study data collection.

| Data collection
The study consists of two phases: phase 1 entails all data collection components until 6 months after the estimated date of delivery, whereas phase 2 involves biannual questionnaires starting at the age of 1 year until the children reach the age of 21 years. At enrolment, the participating women provide informed consent digitally for use of the self-reported data with the option to provide consent for linkage to medical records and registries as well.
For subgroups, biological samples are collected as well. Blood samples for genetic and biochemical analyses are collected from participants living in the Nijmegen region only at a mean gestational age of 11.0 weeks (standard deviation [SD] 2.0). In addition, all participants are asked to donate a single saliva sample to measure awakening cortisol levels using an at-home collection protocol in gestational week 17. Separate paper-based informed consent is obtained for all biological samples. To facilitate exposure assessment for some specific projects, a number of focus cohorts have been imbedded within the PRIDE Study, including cohorts providing additional information on medication use through diaries and participants who collected multiple faecal samples from themselves and their infants.

| Questionnaires
Data for the PRIDE Study are primarily collected using Web-based questionnaires administered at baseline (gestational weeks 5-16), in gestational weeks 17 and 34, at 2 and 6 months after the estimated date of delivery, and biannually throughout childhood.
The topic lists are provided in Table S1. The questionnaires were constructed based on a number of key exposures and outcomes, using standardised instruments whenever possible (Table 1) Furthermore, the women are asked to give permission to send the prospective biological father a single questionnaire focusing on paternal exposures in the 3 months before pregnancy.

| Data linkage
Obstetric records are requested to enrich the PRIDE Study database with clinical information, such as blood pressure readings, foetal growth measures, and events during delivery. Furthermore, obstetric records are used to obtain information on pregnancy complications and birth outcomes for participants lost to follow-up and for questionnaire validation purposes. Likewise, pharmacy records are requested to obtain information on the medications dispensed during pregnancy. We also planned to link the PRIDE Study database to the Perinatal Registry of the Netherlands (Perined) and to other national health registries for specific outcomes in the future.

| Statistical analysis
Descriptive statistics were used to characterise the study population. Univariable linear regression models were used to compare continuous maternal characteristics between recruitment methods and modes of data collection (IBM SPSS version 25), whereas Episheet 15 was used to compare categorical characteristics.

| Ethics approval
The PRIDE Study has been approved by the Regional Committee on Research Involving Human Subjects (CMO 2009/305).

| Recruitment
After the pilot phase in the region of Nijmegen, which started in July 2011, recruitment through prenatal care providers was gradually

| Data availability
The currently available dataset for analyses contains all PRIDE Study participants with an estimated date of delivery through 31 December 2017 (n = 5826). Figure 2 shows an overview of the available ques-  The advertisement was shown intermittently for a total of 211 d. c Presented as mean (standard deviation).

| Web-based vs paper-based questionnaires
Due to the high Internet access rates in the Netherlands, only 76 women (1.3%) in the current dataset participated with paper-based questionnaires. These participants were more likely to have a lower level of education ( (Table 3). Maternal age, ethnic background, and pre-pregnancy BMI did not seem to differ between the modes of data collection.
The larger proportions of item non-response among women who completed Web-based questionnaires are largely attributable to partially completed baseline questionnaires, for instance due to quitting halfway through the questionnaire. In that case, the completed sections of the Web-based questionnaires are saved, whereas partially completed paper-based questionnaires may be less likely to be returned.
Of the 5414 participants who received a paper-based FFQ, 4676 (86.4%) returned a completed questionnaire. Between March 2018 and May 2019, 1176 participants received the Web-based FFQ, of which 948 (80.6%) were returned. Therefore, for reasons yet unknown, the Web-based FFQ seems slightly less likely to be completed compared with the paper-based version (relative risk 1.07, 95% CI 1.04, 1.10).

| Validation of self-reported data
In response to the initial lack of evidence on the validity of data obtained by Web-based questionnaires, 17 we initiated a series of validation studies within the PRIDE Study on a number of key exposures and outcomes. In general, the validity of data collected through the Web-based questionnaires was similar or even higher compared to data collected through paper-based questionnaires and interviews in similar settings. For example, maternal medication use is assessed with a comprehensive indication-oriented structure with closedended questions to obtain information on generic and brand names, time periods and frequency of use, and quantity taken of prescription and over-the-counter medication. This approach was validated with medication diaries as the reference standard, with sensitivity ranging between 0.60 and 0.89 for pregnancy-related medication Likewise, very few false-positive and false-negative reports were observed for gestational diabetes and preeclampsia, but the validity of gestational hypertension, although in range with previous studies, seemed to be lower due to relatively high numbers of false-positive reports (submitted for publication). Additional validation studies of maternal report of childhood outcomes are being planned.

| Principal findings
Although the number of participants stayed far below expectations, 12 we showed that it is still feasible to establish a large birth cohort study with over 9500 participants in the current era of declining response rates. Detailed information on key exposures and outcomes was mainly obtained through the use of Web-based questionnaires, which appear to yield highly accurate data. The first studies based on data from the PRIDE Study have been published. 21-28

| Strengths of the study
One of the major differences between the PRIDE Study and the cancelled NCS and Life Study is the sampling method: we applied a nonprobability sampling approach, whereas the other two studies aimed to be representative for the underlying source population (ie national probability sampling). 29,30 Indeed, we have an overrepresentation of highly educated women within the PRIDE Study, while women with a non-Dutch ethnic background seem to be underrepresented. The study population became somewhat more diverse after the implementation of Facebook Ads for recruitment purposes, but only a minority of study participants was recruited through this method.
Comparable to previous studies, 31,32 the study's mixed mode design (ie offering Web-based and paper-based questionnaires) also yielded a more diverse population compared with not offering a paper-based version. Although only few participants requested paper-based questionnaires, we will keep offering this method of data collection to increase diversity, despite it being labour-intensive.
Concerns regarding the validity and reliability of data collected through Web-based questionnaires, which were raised in the beginning of this century, 33

| Limitations of the data
The non-probability sampling approach prohibits us from calculating national prevalence estimates for exposures and outcomes. However, the PRIDE Study does not aim to provide these figures, but focuses on providing valid estimates for associations between exposures during pregnancy and early life and maternal and child health outcomes. Reassuringly, previous studies indicated that selfselection does not bias the exposure-outcome associations estimated from birth cohort studies. [44][45][46] In an overview of nine

| Interpretation
Summarising the above, our recommendations to others who are considering starting a birth cohort study are to thoroughly consider different participant recruitment strategies. At first, we solely relied on traditional recruitment through health care providers. We observed that some minor protocol adjustments, including more personal contact and provision of a small monetary token of appreciation for each participant to recruiting health care providers while keeping the burden as low as possible, resulted in modest boosts in inclusion rates. Nevertheless, research will never become a priority in clinical settings and competing studies also impact recruitment results. The non-traditional methods of recruitment not only added more inclusions, but also more diversity to the PRIDE Study population, especially in terms of maternal level of education. In terms of absolute numbers, however, the contribution of the online recruitment methods was limited and needs further refinement and extension, for example through optimal budget settings and addition of other social media. 14 Furthermore, we are grateful for the collaboration with "Moeders voor Moeders," but are unaware of similar initiatives in other countries.
As we were among the first to implement Web-based questionnaires for data collection in health-related research, we obtained some unique insights into the do's and don'ts concerning this method. Setting up and maintaining a good online system for recruitment and data collection takes a lot of time and effort.
Although technical hassles should be prevented, they are unavoidable and directly impact questionnaire completion and study retention rates. We also put major efforts in the look-and-feel and user-friendliness of the questionnaires, taking the strict regulations on privacy and security of data into account. This may have contributed in the relatively low proportion of partially completed questionnaires. In phase 2, we decided to administer multiple shorter questionnaires biannually instead of a more lengthy annual questionnaire based on previous experiences that more regular contact with the study population increases retention. We were pleasantly surprised by the results of the validation studies performed so far, which indicated a very high data quality. Lastly, we learned that a substantial proportion of participants, in particular pregnant women and young mothers, prefer to complete the Web-based questionnaires on smartphones, which necessitates additional requirements in programming.

| CON CLUS IONS
Enrolment and follow-up for the PRIDE Study are still ongoing and will be in the upcoming years. This provides us with the opportunity to incorporate other novel methods of data collection, such as mobile applications and wearables, within this cohort to collect even more detailed, timely, and clinically relevant data, which are impossible to obtain through traditional data collection methods. Statistical approaches to deal with the multitude of time-varying exposures and time-varying confounders will be applied to assess associations of in utero and early life exposures with maternal and child health outcomes. Based on the current inclusion rate, we expect to enrol the 10 000th participant by the beginning of 2020. Ultimately, the insights obtained from the PRIDE Study may be used to improve maternal and child health by developing and implementing preventive measures in preconception and prenatal care as well as during childhood.

ACK N OWLED G EM ENTS
We thank the mothers and children who continue to take part in