Fetal safety of nicotine replacement therapy in pregnancy: systematic review and meta ‐ analysis

Background and aims Smoking in pregnancy causes substantial avoidable harm to mothers and offspring; nicotine replacement therapy (NRT) may prevent this, and is used to help women to quit. A recently updated Cochrane Review of randomized controlled trials (RCTs) investigating impacts of NRT in pregnancy focuses primarily on ef ﬁ cacy data, but also reports adverse impacts from NRT. Here we identify and summarize NRT impacts on adverse pregnancyoutcomes reported in non ‐ randomized controlled trials (non ‐ RCTs). Methods Systematic reviews and meta ‐ analyses of RCTs and non ‐ RCT studies of NRT in pregnancy, with design ‐ speci ﬁ c risk of bias assessment and grading of recommendations, assessment, development and evaluations (GRADE) criteria applied to selected outcomes. Findings Relevant Cochrane Review ﬁ ndings are reported alongside those from this new review. Seven RCTs were included; n = 2340. Nine meta ‐ analyses were performed; non ‐ statistically signi ﬁ cant estimates indicated potentially reduced risk from NRT compared with smoking for mean birth weight, low birth weight, preterm birth, intensive care admissions, neonatal death, congenital anomalies and caesarean section and potentially increased risks for miscarriage and stillbirth. GRADE assessment for mean birth weight and miscarriage outcomes indicated ‘ low ’ con ﬁ dence in ﬁ ndings. Eleven large studies from ﬁ ve routine health ‐ care cohorts reported clinical outcomes; 12 small studies investigated mainly physiological outcomes within in ‐ patient women given NRT. Findings from meta ‐ analyses for congenitalanomalies,stillbirthandpretermbirthwereunderpoweredandnotinaconsistentdirection;GRADEassessment of con ﬁ dence in ﬁ ndings was ‘ very low ’ . Routine health ‐ care studies were of higher quality, but implications of reported ﬁ ndings were unclear as there was inadequate measurement and reporting of women ’ s smoking. Conclusions Available evidence from randomized controlled trials and non ‐ randomized comparative studies does not currently provide clear evidence as to whether maternal use of nicotine replacement therapy during pregnancy is harmful to the fetus. birth weight; birth ( < 2500 preterm birth ( < neonatal intensive care unit admissions; neonatal death; caesarean section; congenital anomalies; infant development; and respiratory symptoms.


INTRODUCTION
Smoking in pregnancy has adverse effects on the health of pregnant women and their offspring in the pre-and perinatal periods and in later life [1][2][3]. Smoking rates are highest among younger, socially disadvantaged pregnant women [4,5], and up to 38% of socio-economic inequalities in stillbirths and infant deaths can be attributed to smoking [6].
Stopping smoking in pregnancy improves birth outcomes [7] and reduces the burden of health-care costs to the National Health Service (NHS) [8].
The National Institute for Health and Care Excellence (NICE) recommends nicotine replacement therapy (NRT) in those women who are unable to stop smoking with non-pharmacological interventions [9]. However, even when pregnant women choose NRT, many do not use this for very long [10] and adherence to NRT by pregnant women tends to be lower than in non-pregnant smokers [10][11][12]. This poor adherence may at least partially explain why NRT has been found to be less effective when used in pregnancy [13]. One possible reason for poor adherence to NRT in pregnancy is maternal concern about the safety of NRT. Qualitative interviews with pregnant women who sought support from NHS Stop Smoking Services demonstrated that they often reported using NRT intermittently or stopping courses early due to safety concerns [14].
There is a strong theoretical rationale for using NRT to avoid smoking in pregnancy; even if women do not stop smoking completely, cigarette smoke exposes the fetus to numerous toxins whereas NRT exposes them to only nicotine, and so is very likely to be safer [15]. A Cochrane Review investigating the impacts of NRT in pregnancy has recently been updated [13]. RCTs produce the least biased evidence but they also generally have small sample sizes, such that even when they are combined in meta-analyses, small adverse impacts may not be detected. Wellconducted, large non-RCT studies may be still prone to bias, but comprehensive confounder-adjustment could augment RCT data and provide sufficient power to investigate infrequent health outcomes following NRT use in pregnancy. The Cochrane Review focuses primarily on efficacy data, with adverse effects reported as secondary outcomes. Consequently, we conducted a systematic review of non-RCT studies reporting usually adverse fetal or infant health outcomes after pregnant women's use of NRT. Here we report this process alongside the safety-orientated findings from the updated Cochrane Review [13], with the aim of providing a comprehensive, objective and contemporary assessment of whether and how use of NRT during gestation affects pregnancy outcomes.

Randomized controlled studies (RCTs)
Standard Cochrane Review (CR) methods used are described in the published review [13]. Searches, for RCTs only, were concluded by 20 May 2019 and from included studies we extracted data on the following outcomes: miscarriage/spontaneous abortion; stillbirth; birth weight; low birth weight (< 2500 g); preterm birth (< 37 weeks' gestation); neonatal intensive care unit admissions; neonatal death; caesarean section; congenital anomalies; infant development; and respiratory symptoms. We assessed study quality using Cochrane's 'risk of bias' tool. A priori, we planned to use grading of recommendations, assessment, development and evaluations (GRADE) criteria for birth weight and miscarriage/spontaneous abortion outcomes, to report studies separately where in meta-analyses I 2 > 75%, and to conduct subgroup analyses for placebo and non-placebo RCTs.

Non-RCTs
A study protocol, written in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement, was registered on PROSPERO (International Prospective Register of Systematic Reviews) [16,17].

Inclusion criteria
We sought published non-RCT studies, of any design, in any language, reporting empirical data on potentially adverse fetal or infant health outcomes following NRT exposure or nicotine administration in pregnancy. Although we wanted to identify all health outcomes, we anticipated a priori that these would include at least some of the important clinical outcomes in a relevant 2015 Cochrane Review [18] (see below).

Exclusion criteria
We excluded RCTs and studies which reported only smoking-cessation outcomes [18].

Search strategy
A search strategy was developed in MEDLINE and then adapted for the CINAHL, Embase, PsycINFO, CAB Abstracts, Social Sciences Citation Index and Economic and Social Research Council databases. Supporting information, Table S1 gives search terms; we combined those relevant to pregnancy and fetal health with those referring to NRT or nicotine use. NRT became available in the 1980s, so we searched between 1980 and 12 June 2020, hand-searching references from retrieved full texts, including references from texts excluded from the review. Authors were contacted, as required, for study details.

Study selection and data extraction
One reviewer screened titles and abstracts, rejecting those which were not eligible for inclusion and retrieving manuscripts which appeared potentially includable or about which there was uncertainty. Two reviewers independently screened the full texts and a consensus decision was made on inclusion: if consensus was not possible, a third reviewer adjudicated. Study data were extracted by one reviewer and checked by a second, using a piloted form within Covidence (web-based systematic review platform) [19]. Extracted data included: author's details, publication date, study design and objectives, recruitment and data collection methods, participants' characteristics and study outcomes. For NRT exposure, we extracted data concerning when women were issued with or reported using this, and how many times and by what method these data were acquired. We also extracted smoking behaviour data, and particularly any information on smoking before and after NRT use, including how often and by what means, this was recorded.

Quality assessment
Two researchers independently quality-assessed studies using modified versions of the Newcastle-Ottawa scale (NOS) [20]. Disagreements about scoring were discussed and consensus reached using a third assessor, if necessary. One modified scale was created for studies in which NRT was used as part of routine clinical care; this had a maximum score of eight stars. The other was used for smaller cohorts in which NRT was an experimental intervention (maximum score: seven stars). Both assessed three domains: 'selection', 'design and analysis' and 'outcome', and were modified by removal of the 'demonstration that outcome of interest was not present at start of study' item as pregnancy outcomes could only occur at childbirth. The 'comparability' domain was renamed 'design and analysis', and we removed 'was follow up long enough for outcomes to occur?' from the 'outcome' domain. Supporting information, Appendix S1 details scale modifications and scoring.

Meta-analysis and GRADE criteria
We anticipated substantial variation in study designs and outcomes, so decisions about meta-analyses were made only after consideration of all included studies. Where appropriate, we planned to pool data comparing outcomes following NRT exposure with no NRT exposure. To provide contextual information within the same studies we also compared outcomes following reported NRT exposure with those after smoking.
We created three exposure groups; those women who: (i) were prescribed or reported being given or using NRT, (ii) reported smoking but not being given NRT or (iii) neither reported smoking nor using NRT. As the only indication for using NRT in pregnancy is as a substitute for smoking, we assumed that all women issued with NRT would have smoked prior to this, so where studies categorized women as only having used NRT and not having smoked, we combined these groups with NRT-exposed groups from other studies which did not make this claim. Hence, we assumed that all women issued NRT would have smoked at some point in pregnancy. Review Manager version 5 software generated pooled risk ratios (RR) using a random-effects model and an estimate of heterogeneity using the I 2 statistic from the Mantel-Haenszel model [21]. As non-RCTs and RCTs are subject to very different biases and effects from unmeasured confounding, we decided to present non-RCT and RCT studies in separate meta-analyses. We anticipated that confounding due to women's smoking before, during or after use of NRT was likely to be particularly important to estimates derived from meta-analyses of non-RCTs, as few empirical studies attempted to adjust for this. Table 1 shows GRADE [22] criteria that were applied to assess strength of evidence for each meta-analysed outcome. These rate the quality or certainty of evidence as 'very low', 'low', 'moderate' or 'high quality'; ratings start at 'high quality' for RCTs and 'low quality' for observational studies and GRADE criteria are used to up/downgrade ratings, as appropriate. Two reviewers independently applied criteria for each meta-analysed outcome; disagreements were resolved by consensus [13].

RCTs
Full results, including the PRISMA diagram, are found in the published CR [13], but of nine RCTs which investigated NRT use in pregnancy, seven reported infant and fetal safety outcomes [23][24][25][26][27][28][29] and all were conducted in highincome countries (n = 2340). All RCTs recruited pregnant women who smoked and, as with non-RCTs, pregnancies would have been exposed to tobacco smoke before women joined trials. RCT groups all received either behavioural support alone or with a placebo, or active NRT. Four placebo-RCTs were judged to be at low [23,24,26,29] and two non-placebo RCTs at high risk of bias [25,28]; for the remaining study this was unclear [27]. High bias risk was generally allocated to studies with no placebo control.
All seven studies reported mean birth weight and gestational age at delivery and incidences of low birth weight (below 2500 g). Six reported rates of preterm birth (birth before 37 weeks), miscarriage or spontaneous abortion and stillbirth [23,24,[26][27][28][29] and four reported rates of infants' admissions to special care and of neonatal death [23,24,26,28]. Three trials reported rates of congenital malformation [23,24,27] and two reported caesarean section rates [23,24]. One study [30] reported infants' 'survival without developmental impairment' and respiratory symptoms at 2 years. Figure 1 shows RCT meta-analyses findings. There was no evidence of a difference in risk of miscarriage/spontaneous abortion between NRT and control groups [RR = 1.60, 95% confidence interval (CI) = 0.53-4.83, I 2 = 0%; Fig. 1.1]. Similarly, there was no evidence of a difference between the numbers of stillbirths in the NRT and control groups (RR = 1.24, 95% CI = 0.54 to 2.84, I 2 = 0%; Fig. 1.2). The pooled estimate for birth weight was higher for the NRT than for the control group, but the CIs incorporated a small decrease in birth weight as well as a more substantial increase, and heterogeneity was high [mean difference (MD) = 99.73 g, 95% CI = -6.65 to 206.10, I 2 = 70%; Fig. 1.3]. There was no evidence of a difference in the incidence of low birth weight and there was much heterogeneity in the analysis (RR = 0.69, 95% CI = 0.39-1.20, I 2 = 69%; Fig. 1.4).
GRADE assessment found a 'low' certainty of evidence for mean birth weight and miscarriage/spontaneous abortion outcomes.

Narratively reported outcomes: RCTs
Two RCTs [23,24] reported the distribution of Apgar scores at 5 minutes after birth, cord arterial blood pH, intraventricular haemorrhage, neonatal convulsions, necrotizing enterocolitis, mechanical ventilation of infant, assisted vaginal delivery and maternal death between NRT and placebo groups; no statistically significant differences were noted. One RCT [30] reported infant outcomes after the neonatal period. Using a composite self-report outcome based on the Ages and Stages Questionnaire, 3rd edition instrument [31], significantly better infant developmental outcomes were observed in infants born to women who had been randomized to NRT compared to those in the placebo group. The odds ratio (OR) for infants reaching 2 years of age 'without developmental impairment' (i.e. normal development) was 1.40 (95% CI = 1.05-1.86). However, there was no difference in parental reports of infants' respiratory symptoms; the OR for reporting of any respiratory problem in the NRT group was 1.32 (95% CI = 0.97-1.74).

Study selection, characteristics and outcome measures
A total of 18467 titles and abstracts were identified and, after duplicate removal, 9391 records were screened. Forty-five full text articles were retrieved and 23 were included in the review; Fig. 2 shows the reasons for study exclusion. Table 2 presents characteristics of the 23 included studies (n = 931 163). Eleven were conducted in health-care settings, used routine clinical data [32][33][34][35][36][37][38][39][40][41][42], compared women prescribed or issued NRT with those who were not and were derived from five discrete birth cohorts. A UK cohort reported outcomes in two manuscripts [32,34] and a PhD [38]; a Danish cohort reported outcomes in five papers [33,35,37,39,42] and Canadian [40], US [36] and Australian [41] cohorts were reported in single studies. Eleven studies described This criterion assesses if evidence included in the review directly answers the review question. Quality of evidence was not downgraded based on this criterion due to the problem/patient/population, intervention/indicator, comparison, outcome (PICO) criteria used when searching. We felt our narrow PICO criteria meant that all studies included were reporting data that answered the review question, as we wanted information on all health outcomes reported after NRT exposure in pregnancy Imprecision If the confidence interval for the effect estimate was so wide that it could be consistent with having an effect in either direction, this was deemed to be a sign of imprecision and rating was downgraded by one level Publication bias Quality of evidence not downgraded based on this criterion due to the types of studies appraised

Upgrading
Quality of evidence not upgraded as there was no supporting evidence for the three recommended reasons to upgrade: large magnitude of effect, the presence of a dose-response gradient or that the effect of all plausible confounding factors would be to reduce the effect seen. It is also not recommended to upgrade a downgraded outcome Criteria derived from the grading of recommendations, assessment, development and evaluation (GRADE) Working Group Handbook [22]. For all criteria, meta-analysed studies' quality was judged against reasons to downgrade. If there was serious concern regarding any criteria (except 'upgrading'), quality of evidence was downgraded to 'very low' quality, from the starting level of 'low' for observational (non-randomized controlled trial) studies.
NRT exposure data was obtained from electronic medical records or prospectively from telephone interviews in nine routine health-care studies [32][33][34][35][37][38][39]41,42]; two others collected data retrospectively via self-administered postal questionnaires [36,40] sent 3-8 years [40] and 2-3 months [36] after pregnancy. Although women in the Danish cohort were asked in which gestational weeks they had used NRT or smoked, manuscripts did not report the details [33,35,37,39,42] and one routine health-care study reported median duration of NRT use but not when, in pregnancy, this occurred [40]. All 12 interventional studies reported women's gestational ages at NRT administration, with nine providing mean gestational ages at exposure (range = 21.5-35.6 weeks) [45,46,[48][49][50][51][52][53][54].  Table 2 shows which studies reported or adjusted for women's smoking before or after NRT use. Of the six studies in meta-analyses, three reported women's smoking behaviour before NRT use/exposure [33,36,39], but data were collected by questionnaire at set time-points, so no smoking behaviour information was available later in participants' pregnancies. Consequently, many pregnant women in NRT-exposed arms of meta-analyses will also have smoked, and exposures to NRT and smoking are not completely differentiated. Two routine health-care studies adjusted for smoking status during NRT use [33,37]. Two routine health-care studies recruited only pregnant women who smoked, and investigated impacts of using NRT within this group [40,41]. Experimental studies all recorded women's smoking status at the time of recruitment, and nine also validated abstinence just before NRT was given to women [43][44][45][46][48][49][50][51][52] and two followed participants until childbirth, collecting some information on smoking after NRT exposure [53,54]. Table 3 summarizes studies' outcomes. Routine health-care cohorts reported pregnancy outcomes such as congenital anomalies [34,35], birth weight [36][37][38][39][40], gestational age at birth [36,37,39,40] and stillbirth [32,33]. Interventional studies generally monitored physiological observations, including biophysical profiles [49,52], umbilical and uterine artery Dopplers  Fetal safety of NRT in pregnancy 7 Fetal safety of NRT in pregnancy 9 To compare risk of adverse perinatal outcomes between pregnancies exposed to pharmacotherapies (NRT, bupropion, varenicline d ) and pregnancies exposed to smoking but no pharmacotherapy 3608 pregnancies exposed to either NRT or smoking Well-balanced baseline characteristics between groups as pregnancies matched between NRT users and smokers Fetal safety of NRT in pregnancy 11 Non-users of NRT n = 85105 Sub-categorized into: smokers (see previous column) and non-smokers (controls) n = 71 839, non-smokers who were not exposed to NRT, including those who quit before conception, or reported being an ex-smoker at interview but may have smoked in early pregnancy Fetal safety of NRT in pregnancy 25 Systolic and diastolic BP and maternal HR not significantly different between nicotine patch use/ smoking. Significant time effects for systolic BP and maternal HR (both P < 0.001) with max. increases of 5 and 6 mmHg and 10-11 beats/minute 2 hours after baseline measurement in both groups. Diastolic BP also changed significantly over time in both groups (P = 0.007). The change in middle cerebral artery RI from baseline to 4 hours later was similar during patch use and smoking. There were no group differences in Doppler measurements of the middle cerebral, umbilical and uterine arteries. Time effects were significant for the middle cerebral artery RI (P = 0.02) and the uterine artery RI (P = 0.02). Baseline FHR reactivity changes were variable with no significant difference between groups and mean FHR was not significantly different between baseline and 4 hours for either group. There was no change in the waveform of blood velocity in either the fetal aorta or the umbilical artery. One fetus had a supraventricular arrhythmia 15 minutes after gum chewing started which was sustained, but this had a normal heart rhythm the next day Mean gestational age at delivery 39.8 weeks (SD = 1.2). Mean birth weight 3424 g (SD = 445). All newborns had Apgar scores

Main/significant results
Ogburn Jr USA 1999 [49] No comparator group for fetal/maternal observations except baseline measurements Baseline measurements compared to inpatient phase for FHR, systolic: diastolic ratio in umbilical artery, maternal vital signs, maternal nicotine withdrawal scores During days 2, 3 and 4 of inpatient phase, morning baseline FHR was significantly reduced relative to baseline when mother was smoking No significant changes in umbilical artery systolic/ diastolic ratio from baseline at day 1 or day 4 No changes from baseline in maternal HR Reduction from baseline in overall maternal nicotine withdrawal score each morning Oncken USA 1996 [50] Smoking n = 10, continued smoking as usual group to the smoking group. There were no fetal deaths in either group and one abruption placentae in the control group. Preterm birth: NRT n = 4 (5.3%); control n = 5 (3.3%) (P = 0.5). Small for gestational age: NRT n = 5 (6.7%); control n = 11 (7.3%) (P = 1.0) All confidence intervals (CIs) 95% unless stated otherwise. a Pregnant women prescribed or issued nicotine replacement therapy (NRT) were assumed to have smoked prior to that point in pregnancy; hence this group was exposed to both smoking and NRT. b Pregnant women prescribed or issued NRT were assumed to have smoked prior to that point in pregnancy, so women who reported using NRT 'on its own' were pooled with other NRT users (who smoked concurrently) for analysis. c Episodes of infantile colic in smokers were quoted as 11417 in a table in this paper but odds ratio (OR) suggests that this is a typographic error-adjusted to 1417.   [45,46,50], fetal breathing [43,44] and heart rate [43,[46][47][48]50,51] and maternal blood pressure and heart rate [43,[46][47][48][49][50][51][52]; some also reported pregnancy outcomes [48,[52][53][54].
Quality assessment Table 4 reports quality assessments. Routine health-care studies had a median score of 6/8 stars [interquartile range (IQR) = 5-7] and low scores often reflected a lack of validation of participants' exposures (e.g. NRT use), retrospective exposure assessment or a lack of adverse outcome validation. Interventional studies' median score was 4/7 stars (IQR = 2.5-4.5); these often scored poorly on cohort representativeness but relatively well for having biochemical validation of smoking abstinence.

Meta-analysis outcomes
We performed meta-analyses for congenital anomalies, stillbirth and preterm birth outcomes, but for others this was not possible due to differences in study designs. Analyses only included routine health-care studies. As interventional cohorts used 'before-after' designs without appropriate comparison groups, the few which reported birth outcomes could not be included. The study which investigated a subsample of quasi-RCT intervention group participants selected intervention and comparison groups in very different ways, and was judged unsuitable for inclusion [54]. Major congenital anomalies after first-trimester NRT exposure were reported using the European Surveillance of Congenital Anomalies and Twins (EUROCAT) classification system in two studies [34,35,56]. Stillbirth rate was reported in two; one study defined this as a baby born not showing signs of life at ≥ 28 weeks [32] and the other after 20 weeks [33]; we pooled these, as both represented death in later pregnancy. One interventional study reported fetal deaths but was excluded for the reason outlined above [54]. Preterm birth (at < 37 weeks) was an outcome in six studies, but only two were pooled [36,37]; three were without appropriate comparison groups [40,41,54] and one [39] duplicated findings from another included study [37].  [20]; see Supporting information, Appendix S1 for scales. RCTs = randomized controlled trials. Figure 3 shows non-RCT meta-analysis findings. Compared with no NRT use, there was no evidence for an association between using NRT and risks of congenital anomalies (RR = 1.17, 95% CI = 0.97-1.42, I 2 = 0%; Fig. 3.1) or stillbirth (RR = 1.14, 95% CI = 0.63-2.04, I 2 = 56%; Fig. 3.2). Similarly, when compared to smoking, NRT use was not associated with anomalies (RR = 1.06, 95% CI = 0.86-1.32, I 2 = 0%) or stillbirth (RR = 0.75, 95% CI = 0.41-1.36, I 2 = 54%). Compared with no NRT use, meta-analysis of two studies suggested a slightly increased risk of preterm birth (RR = 1.25, 95% CI = 1.07-1.46, I 2 = 0%; Fig. 3.3) but, compared to smoking, NRT was not associated with greater preterm birth risk (RR = 1.12, 95% CI = 0.95-1.33, I 2 = 0%). For 'NRT versus no NRT' comparisons GRADE criteria certainty of evidence for these outcomes was 'very low'.

Meta-analysis results: non-RCTs
Narratively reported outcomes: non-RCTs Table 3 reports outcomes by study. Two studies excluded from the preterm birth meta-analysis compared risks of preterm birth following NRT use in women who smoked; there was a significantly reduced risk in NRT users compared to non-users in one paper (adjusted OR = 0.21, 95% CI = 0.13-0.34) [40], while the second showed no significant difference (HR = 1.00, 95% CI = 0.71-1.42) [41].
Four studies reported mean gestational age at birth for NRT-exposed women [40,48,52,53] but only one, which enrolled only women who smoked, had a comparison group [40]; with no statistical comparison, this reported a mean (standard deviation (SD)) birth gestational age in NRT users of 38.9 (1.9) weeks and in non-NRT users of 37.5 (3.3).
Three studies reported small for gestational age (SGA) rates [40,41,54]. Two included only women who smoked, with one reporting a significantly reduced risk of SGA in those using NRT compared to those who did not (adjusted OR = 0.61, 95% CI = 0.41-0.90) [40] and the other showing no significant change in risk (HR = 0.77, 95% CI = 0.56-1.07) [41]. The other study used very different methods for selecting exposure groups rendering these non-comparable, but reported no significant difference in SGA rates [54].
Mean birth weight was reported by six studies [37,38,40,48,52,53], three were interventional [48,52,53] and three had comparison groups which were too dissimilar to be aggregated [37,38,40]. One of these enrolled women who smoked reported, with no statistical comparison, a mean birth weight (SD) in NRT users of 3257.9 g (553.1) and non-users of 2943.5 g (733.5) [40]. A PhD thesis using medical record data compared mean birth weight in NRT users and women who neither smoked nor used NRT in pregnancy and found these were lower (β = À168 g, 99% CI = -214 to À122, P < 0.001)   [38]. Within a multivariate analysis which adjusted for reported smoking behaviour, a population-based cohort found no statistically significant associations between duration of NRT use and mean birth weight (β = 0.25 g per week of NRT use, CI = -2.31 to 2.81) [37]. Low birth weight (less than 2500 g) was reported by three studies which seemed similar enough to be aggregated, but due to heterogeneity (I 2 = 76%) are presented separately [36,38,39]. One reported low birth weight incidences of 2.4% in unexposed women, 2.9% in NRT users, 4.8% of women who smoked and used NRT and 4.3% in smokers [39]. A retrospective questionnaire study found that 13.1% of NRT-exposed women delivered low birth weight infants and rates were 9.26% within women who smoked and 6.99% with neither exposure [36]. Another study reported that NRT exposure was associated with increased risk of low birth weight when compared to no exposure (OR = 1.88, 99% CI = 1.42-2.49, P < 0.001) [38]. Two of these studies had the lowest quality scores of all routine health-care studies (see Table 4) [36,39].
Fetal death, a composite of stillbirth and miscarriage [38], delivery mode [38], infantile colic [39] and infant strabismus [42], were reported in single studies and Table 2 reports these findings. Compared with no NRT use, exposure was associated with reduced risk of fetal death (OR = 0.44, 99% CI = 0.38-0.50, P < 0.001) [38] and of assisted delivery (relative RR (RRR) = 0.68, 99% CI = 0.54-0.85, P < 0.001) but not with increased risk of caesarean section [38]. A study of women who smoked who were exposed to NRT reported a composite outcome: 'any adverse perinatal event', encompassing a number of separate birth outcomes [41]. Table 2 reports the individual outcome HRs, but there was no significant change in overall risk of any adverse perinatal event when comparing women who smoked who were exposed to NRT and those who were not (HR = 1.02, 95% CI = 0.84-1.23). Table 5 presents physiological outcomes measured by study. In nine studies, fetal physiological observations were recorded at baseline and compared to readings taken when abstinent and using NRT [43][44][45][46][48][49][50][51][52]. Three also compared these within-patient changes from baseline with those recorded during or after smoking following a similar period of abstinence [43,45,46]. Results showed no consistent patterns, and most studies did not report significant outcome changes after NRT administration.

Key findings
Overall, we found no evidence that NRT used by pregnant women who smoke has adverse impacts on fetal and infant outcomes. Although underpowered, the direction of point-estimates derived from most RCT meta-analyses suggest that NRT is not likely to have adverse impacts or be more harmful than smoking in pregnancy. The robustness of non-RCT evidence was poor, with meta-analyses' findings affected by imprecision or potential biases, which may explain the inconsistency in the direction of associations found in non-RCT meta-analyses. NRT-exposed women are likely to have smoked at some point in pregnancy but, generally, this was not measured and so could not be adjusted for in non-RCTs, making interpretation of these studies' findings particularly difficult.

Strengths and limitations
Our synthesis meta-analyses of non-RCT studies are limited by the inherent biases in these study designs. An issue was that ascertainment of NRT exposure relied upon maternal self-report or prescription records. Women's recall may not have been perfect and, as some women prescribed NRT will not have used it, using prescription records could overestimate NRT exposure. More importantly, studies generally assessed NRT exposure at only one or two time-points in pregnancy and in most, smoking intensity either before or after NRT use was not reported, despite smoking being known to adversely affect outcomes. The omission of detailed smoking data from non-RCT reports was probably the greatest threat to these studies' validity.
It is logical to assume that all women issued NRT would have smoked at least in early pregnancy, and this will have tended to reduce differences between exposure groups' outcomes. Only two non-RCT studies adjusted for smoking behaviour [33,37]; others could be subject to confounding of unknown magnitude. Another important issue was that NRT prescribing involved confounding by indication [57]. In three of the five birth cohorts which provided non-RCT studies' data, women issued with NRT had higher rates of comorbidities and lower socio-economic status than other women who smoked, and so very probably experienced 'higher-risk' pregnancies [10,33,36] which may have substantially affected adverse outcomes. We believe that our modified NOS for non-RCTs' quality assessments and the application of GRADE criteria should help readers to understand the degree to which observed associations might be causal or due to bias, confounding or chance. For the non-RCT review, only one reviewer screened titles and abstracts and extracted data; although another person checked this, there was no parallel independent screening or extraction by the second researcher, so researcher bias is a possibility. Additionally, some non-RCTs may not have been indexed in databases, but we are confident that our comprehensive search strategy will have found all which were and, hopefully, methods for assessing bias and certainty of non-RCT evidence assist the findings' interpretation.
Strengths of this work include applying 'Cochrane-type' review methods to find all available and relevant RCTs and non-RCTs. We believe this is the first attempt to systematically retrieve and synthesize all studies which report fetal and infant health outcomes after pregnant women have used or been offered NRT, and that we have successfully identified, assessed and presented together all relevant studies. This, coupled with objective methods for assessing studies' biases and the strength of evidence produced by meta-analyses, should provide a thorough report of what is known about the impact of NRT on pregnancy outcomes. Similar reviews have had less thorough search strategies, presented only narrative data or have not attempted to assess bias [15,58,59]. While meta-analyses are underpowered, these remain the strongest currently available data on NRT safety in pregnancy, and strengths and weaknesses of the literature are highlighted. The juxtaposition of non-RCT and RCT meta-analyses is perhaps the most useful feature of the review, and is illustrated by considering findings regarding preterm birth. For this outcome, meta-analysis of two non-RCT studies revealed a statistically significant association between NRT use and higher rates of prematurity in which we have 'very low' certainty. However, meta-analysis of data from seven RCTs provides a non-statistically significant 'best estimate' for this association being in the opposite (protective) direction. This direct comparison helps the reader to more clearly appreciate and consider the quality of available data before drawing conclusions. This disparity might be explained by women's smoking either before, after or alongside NRT exposure, which was generally not adjusted for by non-RCTs.
Smoking is well known to contribute to increased risk of pre-term birth [60], and one of the included studies in this meta-analysis acknowledges that the women recommended or prescribed NRT by a health-care professional might be those who smoke more heavily [36] and find it harder to quit [61].

Findings in context of previous literature
The most robust research on the safety of NRT in pregnancy comes from RCTs, and we report meta-analyses for nine safety-orientated outcomes [13]. In RCTs there is no confounding by indication, and randomization ensures that unknown confounders are distributed equally between trial groups, so differences in birth outcomes can be assumed to be caused by NRT. Although meta-analyses were underpowered and there were no significant differences between the NRT and control groups, the trend in non-statistically significant point estimates derived from these analyses is noteworthy. For low birth weight, preterm birth, neonatal intensive care unit admissions, neonatal death and congenital anomalies, point estimates suggest a protective effect of NRT, whereas those for miscarriage and stillbirth do not. Additionally, caesarean section rates were non-significantly higher following NRT but, in the absence of contextual data, it is not clear if this is an adverse or a positive outcome. This point estimate trend suggests that, with more data from RCTs, NRT could well prove to be less harmful than smoking in pregnancy. Due to design issues, non-RCT meta-analyses are probably not methodologically robust enough to inform clinical practice and their findings do not add to those from RCT meta-analyses. Pregnant women in non-RCT studies are only likely to have been prescribed or offered NRT by clinicians if they smoked. Consequently, to provide valid findings, these studies should have assessed pregnant women's smoking behaviour and adjusted analyses for this. As the probable mechanism for NRT improving birth outcomes is due to women stopping smoking or smoking less, this is particularly important.

Further work
RCTs and robust population-based cohort studies from routine health-care settings are needed to improve the evidence base for the safety of NRT use in pregnancy. Electronic medical records databases offer the potential for valid capture of near-complete pregnancy outcome data. However, to make a valid contribution to the literature, future non-RCT studies need better methods for quantifying exposures to NRT and smoking during the whole of pregnancy and to adjust for the latter in analyses.

CONCLUSIONS
The strongest data on the probable impacts of NRT exposure in pregnancy on birth outcomes comes from RCTs, and these provide no suggestion that NRT might be harmful. Non-RCT studies have less consistent findings, due most probably to inherent design weaknesses, and future observational studies should provide analyses which account for the impact of smoking behaviour within women who also use NRT in pregnancy.

Declaration of interests
None.