Comparison of major depression diagnostic classification probability using the SCID, CIDI, and MINI diagnostic interviews among women in pregnancy or postpartum: An individual participant data meta‐analysis

Abstract Objectives A previous individual participant data meta‐analysis (IPDMA) identified differences in major depression classification rates between different diagnostic interviews, controlling for depressive symptoms on the basis of the Patient Health Questionnaire‐9. We aimed to determine whether similar results would be seen in a different population, using studies that administered the Edinburgh Postnatal Depression Scale (EPDS) in pregnancy or postpartum. Methods Data accrued for an EPDS diagnostic accuracy IPDMA were analysed. Binomial generalised linear mixed models were fit to compare depression classification odds for the Mini International Neuropsychiatric Interview (MINI), Composite International Diagnostic Interview (CIDI), and Structured Clinical Interview for DSM (SCID), controlling for EPDS scores and participant characteristics. Results Among fully structured interviews, the MINI (15 studies, 2,532 participants, 342 major depression cases) classified depression more often than the CIDI (3 studies, 2,948 participants, 194 major depression cases; adjusted odds ratio [aOR] = 3.72, 95% confidence interval [CI] [1.21, 11.43]). Compared with the semistructured SCID (28 studies, 7,403 participants, 1,027 major depression cases), odds with the CIDI (interaction aOR = 0.88, 95% CI [0.85, 0.92]) and MINI (interaction aOR = 0.95, 95% CI [0.92, 0.99]) increased less as EPDS scores increased. Conclusion Different interviews may not classify major depression equivalently.

Interview for DSM (SCID; First, 1995), are designed to be administered by clinically trained professionals, who may insert unscripted queries and use judgement to decide whether symptoms are present.
Fully structured interviews, such as the Composite International Diagnostic Interview (CIDI; Robins et al., 1988), are completely scripted and can be administered by lay interviewers. The Mini International Neuropsychiatric Interview (MINI; Lecrubier et al., 1997;Sheehan et al., 1997) is a very brief fully structured interview that was designed for rapid administration and intended to be overinclusive.
However, a recent individual participant data meta-analysis (IPDMA) of 57 studies (17,158 participants) from diverse settings that controlled for participant characteristics and depressive symptom severity on the basis of the Patient Health Questionnaire-9 (PHQ-9) found that, among fully structured interviews, the MINI classified depression about twice as often as the CIDI. Compared with semistructured interviews, fully structured interviews (MINI excluded) classified fewer participants with high-level depressive symptoms as depressed (Levis et al., 2018). This was the first large study to compare major depression classification across diagnostic interviews. However, it is important to determine if findings can be replicated in more than a single study.
The present study aimed to determine whether similar patterns between diagnostic interview and major depression classification could be seen among an independent set of studies that administered the Edinburgh Postnatal Depression Scale (EPDS) to women who were pregnant or had recently given birth, also using an IPDMA approach (Cox, Holden, & Sagovsky, 1987). As in the previous study, we first compared major depression classification odds within fully structured interviews to determine if different fully structured interviews perform differently (MINI vs. CIDI). Then, we compared the CIDI and MINI with the semistructured SCID, separately. In each case, we controlled for participant characteristics and depressive symptom severity on the basis of EPDS scores. Finally, we tested whether differences in classification rates between interviews were associated with depressive symptom severity.

| METHODS
We used data accrued for an IPDMA on the diagnostic accuracy of the EPDS, which is the most commonly used depression screening tool for women in pregnancy or postpartum (Hewitt et al., 2009). The IPDMA was registered in PROSPERO (CRD42015024785), a protocol was published (Thombs et al., 2015), and results were reported follow-

| Identification of eligible studies
For the main IPDMA, data sets from articles in any language were eligible for inclusion if (a) they included diagnostic classification for current major depressive disorder (MDD) or major depressive episode (MDE) using any version of Diagnostic and Statistical Manual of Mental Disorders (DSM; American Psychiatric Association [APA], 1987;APA, 1994;APA, 2000) or International Classification of Diseases (ICD; World Health Organization, 1992) criteria on the basis of a validated semistructured or fully structured interview; (b) they included EPDS scores; (c) the diagnostic interview and EPDS were administered within 2 weeks of each other because DSM and ICD criteria specify that symptoms must have been present in the last 2 weeks; (d) participants were women aged ≥18 years who were not recruited from youth or college settings; and (e) participants were not recruited from psychiatric settings or because they were identified as having symptoms of depression because screening is done to identify previously unrecognised cases. For the present study, we only included studies that assessed major depression using the SCID, CIDI, and MINI because there were only three studies that used other interviews.
Data sets where not all participants were eligible were included if primary data allowed selection of eligible participants. For defining major depression, we considered MDD or MDE on the basis of the DSM or ICD. If more than one was reported, we prioritised MDE over MDD, because screening would attempt to detect depressive episodes and further interview would determine if the episode is related to MDD or bipolar disorder, and DSM over ICD. Two investigators independently reviewed titles and abstracts for eligibility. If either deemed a study potentially eligible, full-text review was done by two investigators, independently, with disagreements resolved by consensus, consulting a third investigator when necessary. A translator was consulted for determining the eligibility of one Chinese article.

| Data extraction, contribution, and synthesis
Authors of eligible data sets were invited to contribute de-identified primary data. We emailed corresponding authors of eligible primary studies at least three times, as necessary. If we did not receive a response, we emailed co-authors and attempted to contact corresponding authors by phone.
Diagnostic interview used as the reference standard and country were extracted from published reports by two investigators independently, with disagreements resolved by consensus. Countries were categorised as "very high," "high," or "low-medium" development on the basis of the United Nation's Human Development Index, a statistical composite index that includes indicators of life expectancy, education, and income (United Nations, 2019). Participant-level data provided in data sets included age, pregnancy status (pregnant vs. postpartum), EPDS scores, and major depression status.
Individual participant data were converted to a standard format and synthesised into a single data set with study-level data. We compared published participant characteristics and diagnostic accuracy results with results from raw data sets and resolved any discrepancies in consultation with the original investigators. For the present study, we restricted our data to participants with complete data for all variables included in our analyses. Then, for studies that collected data at multiple time points, we restricted our data to the time point with the most participants. If there was a tie, we selected the time point with the largest number of major depression cases.

| Statistical analyses
To isolate the association between diagnostic assessment method and major depression classification, we estimated binomial generalised linear mixed models with a logit link function. All analyses controlled for depressive symptom severity (continuous EPDS scores), age (continuous), country Human Development Index (very high, high, or lowmedium), and pregnant versus postpartum status. Given that each study only administered one diagnostic interview, these covariates were included in analyses to account for their potential influence on major depression classification. Covariates were chosen a priori on the basis of their potential influence on major depression classification as well as their availability across primary studies. To account for correlation between subjects within the same primary study, a random intercept was fit for each primary study. Fixed slopes were estimated for EPDS score, diagnostic interview, age, Human Development Index, and pregnant versus postpartum status.
We estimated generalised linear mixed models to compare major depression classification odds for MINI versus CIDI, CIDI versus SCID, and MINI versus SCID. We then fit additional models including an interaction between interview and EPDS score. All analyses were run in R using the glmer function within the lme4 package.

| RESULTS
Of 3,418 unique titles and abstracts identified from the database search, 3,097 were excluded after title and abstract review and 226 were excluded after full text review, leaving 95 eligible articles with data from 64 unique participant samples, of which 45 (70% of data sets; 70% of participants) contributed data (Figure 1). Reasons for exclusion for the articles excluded at the full-text level are given in Table S1. In addition, authors of included studies contributed data from an additional eligible study that was not identified in the search, for a total of 46 data sets. Characteristics of included studies and eligible studies that did not provide data sets are shown in Table S2. In total, 12,759 participants (1,553 [12%] with major depression) were included; none of whom were included in the previous PHQ-9 analysis (Levis et al., 2018).
As shown in Figure 2 and Table S3, for all interviews, the proportion with major depression generally increased as EPDS scores increased.
Model coefficients for each analysis are shown in Table S4

| DISCUSSION
We compared depression classification across diagnostic interviews in studies that administered the EPDS with women in pregnancy or postpartum, controlling for participant characteristics and depressive symptom severity on the basis of EPDS scores. Among fully structured interviews, odds of major depression were substantially higher for the MINI than the CIDI. As depressive symptom severity increased, the probability of diagnosis increased more for the MINI than for the CIDI. There were no definitive differences in classification odds between the CIDI and SCID and between the MINI and SCID, but, as EPDS scores increased, likelihood of classification increased less for the CIDI and MINI than for the SCID. Results were similar to those of our previous study that assessed depressive symptom severity in diverse patient groups with the PHQ-9 (Levis et al., 2018). In that study, on the basis of subgroup analyses by PHQ-9 scores, we found that the CIDI classified fewer participants with high-level depressive symptoms as depressed than the SCID. Due to limited numbers of participants and major depression cases for each interview across EPDS scores in the present study, we were unable to conduct subgroup analyses based on EPDS scores. However, our interaction analyses were generally consistent with previous findings.
There are limitations to consider. First, we were unable to obtain primary data for 19 of 64 eligible data sets identified in our search (30% of data sets; 30% of participants). Second, only three included studies used the CIDI, one of which had only one major depression case. Third, across interviews, there were few participants with high EPDS scores and few major depression cases with low EPDS scores.
For the CIDI, data were sparse across EPDS scores. Notwithstanding, F I G U R E 1 Flow diagram of study selection process

| CONCLUSION
The previous PHQ-9 IPDMA found that different diagnostic interviews may not be equivalent for major depression classification. In the present study, we observed similar patterns. The CIDI and MINI were designed as less resource-intensive options that can be administered by research staff without diagnostic skills, but they may misclassify major depression in substantial numbers of patients compared with the SCID. The findings of both the previous and present IPDMAs suggest that different interviews may not classify major depression equivalently and should be combined in metaanalyses with caution.

DATA ACCESSIBILITY
Requests to access data should be made to the corresponding author at brett.thombs@mcgill.ca.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of this article.