We agree, and have pointed out in our Conclusions, that the predictive model needs to be validated in a new cohort in order to confirm its predictive ability. When we performed the analyses, we decided not to divide our data set into two parts to develop the model in one half and test it in the other. This division would have resulted in a substantial reduction of the predictive power of this study, with its limited number of patients. Our results must thus be replicated in another cohort.
The prediction models in the paper were created based on a stepwise logistic regression.2 The calculations resulted in several models with almost similar predictive capacity, of which we have chosen to present the best. A bootstrapping approach would have led to many of these models being found to be almost similar, which might have led to misleading results if presented for the single model chosen, or for all possible models. Therefore, we chose not to use bootstrapping methodology in this study.
We agree that sensitivity, specificity, positive and negative predictive values, positive likelihood ratio (LR+), negative likelihood ratio (LR−) and odds ratio are among the tests that can be used to evaluate the validity of a test in relation to a gold standard. As we have stated in the Discussion, the gold standard in our study is cervical length measurement by transvaginal ultrasound.2,3 We believe that we have addressed the issue that concerns Dr Sabour in the comparison between our final model, the best model from just the biomarkers, and the gold standard in Table 4.2 We also believe that we have addressed the LR− issue in Table 3.3