Do clinical prediction models improve concordance of treatment decisions in reproductive medicine?
Dr JW van der Steeg, Department of Obstetrics and Gynaecology, Centre for Reproductive Medicine, Room H4-213, Academic Medical Centre, Meibergdreef 9, 1105 AZ Amsterdam, the Netherlands. Email firstname.lastname@example.org
Objective To assess whether the use of clinical prediction models improves concordance between gynaecologists with respect to treatment decisions in reproductive medicine.
Design We constructed 16 vignettes of subfertile couples by varying fertility history, postcoital test, sperm motility, follicle-stimulating hormone level and Chlamydia antibody titre.
Setting Thirty-five gynaecologists estimated three probabilities, i.e. the 1-year probability of spontaneous pregnancy, the pregnancy chance after intrauterine insemination (IUI) and the pregnancy chance after in vitro fertilisation (IVF). Subsequently they proposed therapeutic regimens for these 16 fictional couples, i.e. expectant management, IUI or IVF. Three months later, the participant gynaecologists again had to propose therapeutic regimes for the same 16 fictional cases but this time accompanied by pregnancy chances obtained from prediction models: predictions on spontaneous pregnancy, IUI and IVF.
Population Thirty-five gynaecologists working in academic and nonacademic hospitals in the Netherlands.
Methods Setting section.
Main outcome measures The concordance between gynaecologists of probability estimates, expressed as interclass correlation coefficient (ICC) and the concordance between gynaecologists of treatment decisions, analysed by calculating Cohen’s kappa (κ).
Results The gynaecologists differed widely in estimating pregnancy chances (ICC: 0.34). Furthermore, there was a huge variation in the proposed therapeutic regimens (κ: 0.21). The treatment decisions made by gynaecologists were consistent with the ranking of their probability estimates. When prediction models were used, the concordance (κ) for treatment decisions increased from 0.21 to 0.38. The number of gynaecologists counselling for expectant management increased from 39 to 51%, whereas counselling for IVF dropped from 23 to 14%.
Conclusion Gynaecologists differed widely in their estimation of prognosis in 16 fictional cases of subfertile couples. Their therapeutic regimens showed likewise huge variation. After confrontation with prediction models in the same 16 fictional cases, the proposed therapeutic regimens showed only slightly better concordance. Therefore a simple introduction of validated prediction models is insufficient to introduce concordant management between doctors.
The challenge of modern subfertility treatment is to offer a tailored treatment to individual subfertile couples. Both early treatment in couples with high chances of spontaneous pregnancy and unnecessary delay of treatment (expectant management) in couples with poor chances of spontaneous pregnancy should be avoided, resulting in cost-effectiveness, reduced multiple pregnancies and reduced complications after assisted reproduction technique.1
Can gynaecologists predict pregnancy chances based on their clinical experience and gut feeling? One study reported a good concordance between gynaecologists in predicting spontaneous pregnancy chance but a poor concordance in predicting pregnancy after in vitro fertilisation (IVF).2 As a consequence, the introduction of validated prediction models could be a useful tool in selecting the optimum treatment regimens for subfertile couples, taking into account their individual pregnancy chances.
There are several clinical prediction models for the prediction of spontaneous pregnancy,3–7 pregnancy after intrauterine insemination (IUI)8 and pregnancy after IVF,9–11 of which some have been validated.12–14 These models are intended to help gynaecologists in patient communication and decision making about the timing of treatment.
In this study, we aim to determine the concordance between gynaecologists to predict pregnancy chances, the concordance between gynaecologists in their treatment decisions and the influence of using prediction models on the concordance in treatment decisions.
We constructed 16 fictional vignettes of couples who underwent the basic fertility work up. All women had a regular cycle. The vignettes differed on the following seven prognostic factors: woman age, previous pregnancy, duration of subfertility, basal follicle-stimulating hormone (FSH) level, Chlamydia antibody titre (CAT), outcome of the postcoital test (PCT) and sperm motility. We selected these factors since they are the main prognosticators in existing prediction models in fertility, except for basal FSH level and CAT. The latter two tests have recently been introduced into the fertility work up and as such are likely to influence decision making.
Woman age was 25, 32 or 38 years. The duration of subfertility, defined as failure to conceive after 1 year of frequent unprotected intercourse, was 1, 2 or 3 years. The basal FSH level on cycle day 3 was 6 or 15 iu/l. The CAT was 1:8 or 1:128. The outcome of the PCT was: ‘no spermatozoa’, ‘motile, nonprogressive spermatozoa’ (motile spermatozoa that are not moving forwards) or ‘progressive spermatozoa’ (motile spermatozoa that are moving forwards). Progressive sperm motility was 15, 35 or 65%. Tubal patency was confirmed by either hysterosalpingography or laparoscopy, depending on the result of the CAT.
Vignettes were generated from an orthogonal design. Orthogonal designs are constructed in such a way that inferences are based on main effects. Level combinations necessary for estimating second and higher order effects are excluded. Thereby the required number of measurements can be reduced.15 We used seven factors. Combining all factors would have resulted in 648 unique cases, whereas our orthogonal design needed only 16 cases without losing statistical information. Table 1 shows the composition of the 16 case vignettes. Institutional Review Board approval was not requested, since no real patients were involved in this study.
Table 1. Overview of the 16 constructed vignettes of subfertile couples who underwent the basic fertility work up (sorted by woman age, duration of subfertility and type of subfertility, respectively)
|13||25||1||Primary||6||1:8||Motile, progressive||15||34||0.50 (0.20–0.90)||0.12 (0.03–0.40)||0.30 (0.20–0.68)||34||88.2||11.8|| |
|12||25||1||Primary||15||1:128||Motile, progressive||15||34||0.30 (0.10–0.90)||0.10 (0.02–0.63)||0.20 (0.10–0.45)||33||39.4||36.4||24.2|
|8||25||1||Secondary||6||1:128||Motile, nonprogressive||65||35||0.50 (0.25–0.90)||0.10 (0.03–0.40)||0.26 (0.10–0.85)||33||81.8||15.2||3.0|
|1||25||1||Secondary||15||1:8||Motile, nonprogressive||35||35||0.40 (0.15–0.90)||0.10 (0.05–0.40)||0.25 (0.10–0.60)||35||71.4||14.3||14.3|
|7||25||2||Primary||15||1:128||Nonmotile||65||35||0.25 (0.10–0.90)||0.10 (0.02–0.40)||0.20 (0.08–0.60)||34||23.5||50.0||26.5|
|15||25||2||Secondary||15||1:8||Motile, progressive||15||34||0.40 (0.05–0.90)||0.10 (0.03–0.40)||0.22 (0.05–0.50)||34||41.2||41.2||17.6|
|3||25||3||Primary||6||1:8||Nonmotile||35||33||0.20 (0.10–0.50)||0.10 (0.03–0.60)||0.25 (0.15–0.60)||35||5.7||82.9||11.4|
|10||25||3||Secondary||6||1:128||Motile, progressive||15||35||0.25 (0.03–0.90)||0.10 (0.03–0.40)||0.25 (0.10–0.65)||34||11.8||70.6||17.6|
|4||32||1||Primary||6||1:8||Motile, progressive||65||33||0.50 (0.20–0.90)||0.12 (0.05–0.55)||0.25 (0.15–0.60)||33||97.0||3.0|| |
|16||32||1||Secondary||15||1:8||Nonmotile||15||34||0.30 (0.06–0.90)||0.10 (0.03–0.40)||0.20 (0.06–0.60)||34||32.4||50.0||17.6|
|2||32||2||Secondary||6||1:128||Motile, progressive||35||35||0.30 (0.10–0.70)||0.10 (0.04–0.55)||0.25 (0.15–0.65)||32||62.5||28.1||9.4|
|5||32||3||Primary||15||1:128||Motile, nonprogressive||15||35||0.10 (0.03–0.50)||0.10 (0.01–0.30)||0.20 (0.08–0.60)||32||3.1||56.3||40.6|
|14||38||1||Primary||15||1:128||Motile, progressive||35||34||0.20 (0.05–0.90)||0.09 (0.02–0.40)||0.19 (0.05–0.40)||33||18.2||30.3||51.5|
|9||38||1||Secondary||6||1:128||Nonmotile||15||35||0.20 (0.05–0.80)||0.10 (0.03–0.40)||0.20 (0.10–0.40)||33||30.3||42.4||27.3|
|11||38||2||Primary||6||1:8||Motile, nonprogressive||15||34||0.20 (0.04–0.65)||0.10 (0.03–0.30)||0.20 (0.10–0.40)||34||2.9||61.8||35.3|
|6||38||3||Secondary||15||1:8||Motile, progressive||65||35||0.18 (0.03–0.50)||0.08 (0.00–0.30)||0.18 (0.05–0.30)||35||5.7||37.1||57.1|
| ||Total 550 (98%)||Total 538 (96%)|
Thirty-five gynaecologists working in 14 academic and nonacademic hospitals participated in our survey. For all participants, we recorded the treatments they offered in their clinics, the time they had been practicing in fertility care, age and whether they used prediction rules in their daily care. Participants were asked for each of the 16 cases to appraise the 1-year probability of spontaneous pregnancy, the probability to conceive after one IUI cycle and the probability to conceive after one IVF cycle (questionnaire 1). This estimation had to be marked on a scale, ranging from 0 to 100%, and divided into steps of 10%. After estimating the probabilities, they were asked to make a treatment decision: expectant management for at least 6 months, IUI for six cycles (either with or without ovarian hyperstimulation) or fresh IVF for three cycles (including frozen replacement cycles). All participants received a cover letter together with the 16 cases, explaining how to use the prediction models for spontaneous pregnancy, IUI and IVF. In case of any remaining uncertainty, the participants could contact the first author.
Three months later, the participant gynaecologists again had to propose therapeutic regimes for the same 16 fictional cases but this time accompanied by pregnancy chances obtained from prediction models: predictions on spontaneous pregnancy, IUI and IVF (questionnaire 2).7,8,10
The concordance between the respondents for the first questionnaire was assessed in two ways. First, we analysed the concordance between gynaecologists of their probability estimates for spontaneous pregnancy, IUI and IVF by calculation of the interclass correlation coefficient (ICC).16 The ICC indicates the fraction of true variance from the total variance. Its value can vary between 0 and 1. The ICC can be considered as a measure of concordance for continuous variables.
Second, we analysed the concordance between the gynaecologists in their treatment decisions, by calculating Cohen’s kappa (κ), a concordance statistic for categorical outcomes.17 Cohen‘s κ also takes values in the 0–1 range. A value of 1 would indicate perfect concordance whereas a value of 0 would indicate no concordance between the gynaecologists.
Finally, we analysed whether the predicted probabilities were associated with the proposed treatment regimens, with logistic regression analysis, using a P level for significance of 0.05. We assumed that 1 year of IUI treatment consists of six IUI cycles and 1 year of IVF treatment consists of three IVF cycles.
In the second questionnaire, the results of formal prediction models had been added to the case profiles. We assessed in what proportion the prediction models influenced concordance between gynaecologists in their treatment decisions, by comparing the concordance of treatment decisions before and after exposure to prediction models. The overall concordance in treatment decisions was tested with the McNemar statistic (P level: 0.05), a nonparametric test for two related dichotomous variables.
Of the 35 participating gynaecologists, 32 completed both questionnaires (91%). Of the respondents, 25 (71%) were working in specialised fertility units and 10 were working at general departments of obstetrics and gynaecology (29%). Their mean age was 40.2 years (minimum–maximum [min–max]: 26–57 years) and their mean experience in clinical practice was 8.9 years (min–max: 1.7–24 years). Twenty-eight respondents (80%) could offer IVF in their clinic and all respondents could offer IUI. Twenty-four participants (69%) reported working with formal prediction models for spontaneous pregnancy. None of the gynaecologists used to work with prediction models for IUI or IVF.
The median probability estimates for spontaneous pregnancy, IUI and IVF and the subsequent treatment decision are shown in Table 1. Concordance between gynaecologists was poor for the probability estimates of spontaneous pregnancy, pregnancy after IUI and pregnancy after IVF, with ICCs varying between 0.05 and 0.34 (Table 2). In addition, concordance between gynaecologists with respect to their treatment decisions based on their intuitive probability estimates was also poor (Table 2).
Table 2. Concordance between gynaecologists with respect to probability estimates of pregnancy rates and treatment decisions
|Concordance of probability estimates for spontaneous pregnancy (ICC)||0.34||0.17–0.69||—||—||—||—|
|Concordance of probability estimates for pregnancy after IUI (ICC)||0.05||0.01–0.19||—||—||—||—|
|Concordance of probability estimates for pregnancy after IVF (ICC)||0.15||0.06–0.43||—||—||—||—|
|Concordance of treatment decisions between gynaecologists (κ)||0.21||0.19–0.24||0.38||0.34–0.43||0.17||0.12–0.22|
Overall, the treatment decisions made by gynaecologists were consistent with the ranking of their probability estimates. In the univariable analysis, the choice for expectant management was significantly influenced by the probability estimates of spontaneous pregnancy, IUI and IVF, the difference between probabilities of spontaneous pregnancy and IUI and the difference between the probabilities of spontaneous pregnancy and IVF (Table 3). In the multivariable analysis, only the probability estimate of spontaneous pregnancy was a statistically significant predictor for the choice for expectant management and IUI (OR: 3.5, 95% CI: 2.8–4.3). No other factor was significant once this factor had entered the model.
Table 3. Association of intuitive probability estimates and the subsequent treatment decision (i.e. expectant management for 6 months); results of the univariable and multivariable regression analyses
|Intuitive probability for (per 10% increase)|
|Spontaneous pregnancy (within 12 months)||3.5||2.8–4.3||<0.001||3.5||2.8–4.3||<0.001|
|Pregnancy after IUI (six cycles)||1.2||1.1–1.3||<0.001||—|| |
|Pregnancy after IVF (three cycles)||1.6||1.4–1.9||<0.001||—|| |
|Difference between probability estimates of|
|Spontaneous pregnancy versus pregnancy after IUI||1.6||1.5–1.8||<0.001||—|| |
|Spontaneous pregnancy versus pregnancy after IVF||2.0||1.7–2.3||<0.001||—|| |
The results of the second questionnaire are shown in Table 2. After prediction models had been added to the case profiles, the concordance between the gynaecologists with respect to their treatments decisions increased from 0.21 to 0.38 (difference: 0.17, 95% CI: 0.12–0.22).
The impact of the use of prediction models on the treatment decision by gynaecologists is shown in Table 4. Due to missing data, we could only analyse 495 out of 560 treatment decisions (89%). Of the 192 initial proposals for expectant management, 24 (13%) were changed to IUI and 5 (3%) to IVF. Of the 188 treatment proposals for IUI, 66 (35%) were changed to expectant management and 11 (6%) to IVF. Of the 115 treatment proposals for IVF treatment, 25 (22%) were changed to expectant management and 39 (34%) to IUI. Overall, the percentage of expectant management increased from 39% (192 out of 495) to 51% (254 out of 495) after prediction models had been added. Consequently, the percentage of IUI dropped from 38% (188 out of 495) to 35% (174 out of 495) and the percentage of IVF treatment dropped from 23% (115 out of 495) to 14% (67 out of 495). This overall change from 61% treatment to 49% treatment was statistically significant (P < 0.001, McNemar’s test).
Table 4. Comparison of treatment decisions made by the participating gynaecologists in questionnaire 1 and questionnaire 2
|Treatment decisions in questionnaire 1 (without prediction rules)||Expectant management||163||24||5||192|
The importance of estimating conception chances after completion of the fertility work up has been stressed before.1,18 The key question of whether gynaecologists are able to do this has never been addressed and, therefore, this study aimed to determine the concordance between gynaecologists in predicting pregnancy chance and in treatment decisions and to assess in what proportion prediction models influence this concordance.
Unfortunately, the gynaecologists differed widely in estimating pregnancy chances and there was a huge variation in the proposed therapeutic regimens. However, the treatment decisions made by gynaecologists were consistent with the ranking of their probability estimates. When prediction models were used, the concordance for treatment decisions increased only slightly. With prediction models, more gynaecologists were likely to propose expectant management in favour of IVF.
Why did the use of prediction models improve the concordance only slightly? The most obvious explanation may be the difference in interpretation of pregnancy chances. Whereas one doctor may feel that expectant management is still justified in case a 30% pregnancy chance without treatment is expected to increase to 60% with IVF, another doctor might recommend IVF treatment in this situation. This difference was seen when treatment decisions of older gynaecologists were compared with those of the younger ones. Gynaecologists older than 40 years (n= 18) were slightly more inclined to adopt a conservative approach (55% in all cases), as compared with gynaecologists younger than 40 years, who proposed expectant management in 48% of the cases.
Another explanation may be that gynaecologists may variably offer treatment despite a reasonable prognosis for expectant management in older women in view of the decreasing time remaining for conception, particularly if the couple are aiming for more than one child. This hypothesis was supported by the fact that 71% of the gynaecologists would advise treatment in older women (38 years) with a good prognosis (>30% in 1 year), whereas only 16% would do so in younger women (25 and 32 years) with a good prognosis. Woman age has been reported to play an important role in clinical decision making in subfertility.19
In daily practice, the choice of treatment is a decision made by both the doctor and patient and depends on more variables than fecundity alone. For example, in several countries, fertility treatment is not reimbursed, and the choice for treatment strongly depends on the economic situation of the patient.20
Some of the participating gynaecologists already used to work with prediction models in their clinic. The probability estimates of these gynaecologists may have been influenced by previous experience with the prediction models. Most clinicians only used prediction models for spontaneous pregnancy, not for IUI and IVF. This can be an explanation for the fact that concordance in prediction was better for spontaneous pregnancy than for IUI and IVF.
Although the PCT is considered to be a controversial test,21,22 it is used in the majority of the prediction models that are available.3,6,7 For this reason only, we chose to use the PCT in the vignettes of the present study. However, we do not believe that omission of the PCT from the paper cases would affect the results and conclusions of the present study.
The use of prediction models improved the concordance in treatment decisions from fair (κ: 0.21) to moderate (κ: 0.38). The clinical implication of this improvement can be explained in the following example. Let us assume that two gynaecologists have to counsel 100 subfertile couples. When the concordance between their treatment decisions is 0.2 and the fraction of decisions on expectant management, IUI and IVF are 39, 38 and 23%, respectively, they will agree in treatment decisions in 48 couples. An increase of the concordance to a κ value of 0.4 will implicate that treatment decisions will become equal in 61 couples.
One previous study on this topic reported a good concordance for the estimation of the probability of spontaneous conception (ICC: 0.71). This contrasts with the moderate concordance between respondents’ probability estimations for spontaneous pregnancy found in our study (ICC: 0.34).2 This difference can be explained by the fact that in the previous study only four cases were used, of which two cases had rather extreme profiles. In contrast, the concordance of probability estimates for IVF was poor in the previous study and this study (ICC: 0.24 and 0.15, respectively). The previous study did not assess the final treatment decision.
Gynaecologists favoured expectant management rather than IVF once prediction models were made available. An explanation may be that gynaecologists were more comfortable with expectant management once they could rely on predicted pregnancy rates rather than on their own intuitive estimates. If this were to be the case, prediction models could not only reduce treatment differences between gynaecologists but also prevent treatment at an early stage of subfertility when prospects for spontaneous pregnancy are still good.
In conclusion, our study shows that gynaecologists differ widely in estimating prognosis in fictional cases of subfertile couples. Their proposed therapeutic regimens likewise show huge variation. The use of prediction models for spontaneous pregnancy, IUI and IVF in itself are thus not sufficient to guarantee uniform counselling of subfertile couples. Future studies designed to evaluate the potential effectiveness of the use of prediction models should therefore offer guidelines for clinicians on how to deal with these models.
The authors thank all participating gynaecologists for their contribution to this study. Furthermore, M Pel is acknowledged for her critical reading and valuable suggestions on previous drafts of this study (Department of Obstetrics and Gynaecology, Academic Medical Centre, Amsterdam, The Netherlands).
This study was facilitated by grant 945/12/002 from ZonMW, The Netherlands Organization for Health Research and Development, The Hague, The Netherlands.