Maternal major depression disorder misclassification errors: Remedies for valid individual‐ and population‐level inference

Abstract Individual and population level inference about risk and burden of MDD, particularly maternal MDD, is often made using case‐finding tools that are imperfect and prone to misclassification error (i.e. false positives and negatives). These errors or biases are rarely accounted for and lead to inappropriate clinical decisions, inefficient allocation of scarce resources, and poor planning of maternal MDD prevention and treatment interventions. The argument that the use of existing maternal MDD case‐finding instruments results in misclassification errors is not new; in fact, it has been argued for decades, but by and large its implications and particularly how to correct for these errors for valid inference is unexplored. Correction of the estimates of maternal MDD prevalence, case‐finding tool sensitivity and specificity is possible and should be done to inform valid individual and population‐level inferences.

Some cohort studies have suggested that age at first onset (which can occur at any time) may reflect different causal mechanisms (Burke et al., 1991;Kessler et al., 2007;Weissman et al., 1988). First-time diagnosis during childhood may be indicative of genetic predisposition (Hazell, 2002a;Rice et al., 2002) or exposure to psychosocial childhood adversity (Hazell, 2002a). During adolescence, etiology has been mainly attributed to psychosocial and economic factors (Birmaher et al., 1996;Hazell, 2002b). At this age, a disparity in MDD incidence and prevalence by sex emerges with significantly higher incidence and prevalence among girls than boys (Hankin & Abramson, 2001;Nolen-Hoeksema & Girgus, 1994). Reasons for this disparity include differences in biological body mechanisms, stress sensitivity, culture, and stress coping strategies between males and females (Nolen-Hoeksema, 1991;Nolen-Hoeksema et al., 1991;Shih et al., 2006). This gender disparity in morbidity persists into adulthood. Women as a function of changing biological and hormonal factors remain at high risk of MDD during their childbearing years (Kessler, 2003;Kessler et al., 1994) particularly during prenatal and postnatal periods.

EFFECTS OF MATERNAL MDD
Among pregnant women, MDD negatively affects fetus health (Chung et al., 2001;Dieter et al., 2008;Kinsella & Monk, 2009 (Allister et al., 2001;Sandman et al., 2003). Among pregnant women with MDD, the higher baseline FHR and delayed habituation poststimulation are associated with HPA dysregulation (linked to higher levels of glucocorticoid transfer from mother to fetus) that negatively impact fetal development (Gilles et al., 2018;Sandman et al., 2003). Higher levels of fetal glucocorticoid exposure are associated with lower birth weight and shorter gestation at delivery (Gilles et al., 2018). Similar findings supportive of a causal hypothesis were reported in a prospective cohort study examining the association between FHR and general psychosocial stress, a risk factor for maternal MDD (DiPietro et al., 1996). Overall, MDD during pregnancy is linked to increased risk of negative obstetric and neonatal outcomes such as preeclampsia, premature delivery, and low birth weight (Buss et al., 2012;Chung et al., 2001). During the postnatal period, these effects may be compounded by poor mother-child interactions and nurturing among mothers with MDD putting children at high risk of infant morbidity and mortality (as a function of either neglect or abuse), delay in meeting appropriate development milestones, and behavioral problems (Lovejoy, 1991;Surkan et al., 2012Surkan et al., , 2014.

MATERNAL MDD DETECTION AND DIAGNOSIS
Similar to a diagnosis of MDD in the general population, maternal MDD is not an objective diagnosis because it is in part based on subjective experiences and perceptions. As a consequence of its subjective nature, a number of different maternal MDD tools have been adopted for screening, case-finding, and diagnosis as well as for monitoring treatment progress (Myers et al, 2013). The operational definitions of MDD under these tools typically involve a count and weighting of symptoms that are present over a period of 1 or 2 weeks. The number of symptoms present (including their severity ratings) is used to set a threshold above which a patient meets the MDD operational definition Myers et al., 2013;Pignone et al., 2002). Often, the diagnostic performance of these case-finding tools is confounded by different perceptions, cultures, and assessment periods-prenatal versus postnatal (Horwitz et al., 2007;Owora et al., 2016b). Indeed, maternal MDD is often under-or overdiagnosed due to the presence of symptoms that mimic those of normal prenatal and postnatal periods (Owora et al., 2016b). Heterogeneity has also been demonstrated in existing diagnostic accuracy studies (Owora et al., 2016a, b) in part due to clinical diversity (i.e., differences between study participants) and methodological diversity (i.e., differences in the measurement, timing, and definition of MDD). These differences have important implications for the validity of case-finding tools used to classify mothers as either MDD-positive or -negative. Some studies (Levis et al., 2020) have attempted to address potential misclassification by using higher cutoff values and/or redesign of self-reported questions (e.g., Edinburgh Postnatal Depression Scale) to reduce confusion between MDD and normal prenatal and postpartum symptoms, with mixed results for the reduction of false-positives and -negatives.

MATERNAL MDD: INDIVIDUAL -AND POPULATION-LEVEL INFERENCE
In psychiatry, there continues to be a paucity of research on the impact of imperfect case-finding tools on individual-and population-level inference. Yet, if unaccounted for, misclassification of psychiatric disorders, such as MDD, can lead to inappropriate clinical decisions in patient care (e.g., treating or referring a patient without MDD for further diagnostic work-up or failing to do so for patients with MDD). Such misclassification may be more prevalent among nonspecialist than specialist clinicians in primary care and/or public health prevention program settings (Horwitz et al., 2007;Myers et al, 2013).
At the population level, estimation of disease burden or risk is hampered with direct implications for allocation of public health resources and design or targeting of prevention efforts, respectively. The argument that the use of existing case-finding tools results in misclassification bias is not new; in fact, it has been argued for decades, but their implications and particularly how to correct for these errors for valid inference is unexplored.
In this perspective piece, I revisit why MDD measurement errors or bias are important to consider from a clinical and public health view using recent epidemiologic studies. This article is not intended to be a comprehensive review; rather, I have deliberately selected articles, some of which are my own (Owora & Carabin, 2018;Owora et al., 2019Owora et al., , 2016a, to illustrate how existing estimates of maternal MDD prevalence, case-finding tool sensitivity, and specificity can be used to generate accurate risk, burden, and measures of association to inform valid individual-and population-level inference.

CORRECTING FOR MISCLASSIFICATION ERRORS TO MAKE INDIVIDUAL-LEVEL INFERENCE
The concept of quantifying perceptions or impressions in clinical decision making, especially regarding diagnosis and prognosis, is not new.
For instance, the likelihood of a specific diagnosis (i.e., the presence or absence of disease) is particularly appealing in the absence of confirmatory diagnostic testing. To evaluate a disease hypothesis based on nonconfirmatory test results, a positive predictive value (PPV) is defined as the probability of disease (e.g., MDD) given a positive test result (e.g., a Center of Epidemiological Studies-Depression 20-item questionnaire [CESD20] with a moderate or severe score ≥16). Conversely, a negative predictive value (NPV) is defined as the probability of no disease given a negative test result (e.g., no or few MDD-related symptoms reported on the CESD20).
When combined with disease prevalence or pretest probability of disease (P D ) and known test properties, such as sensitivity (Se) and specificity (Sp) using Bayes theorem, conditional probabilities (PPV and NPV) and likelihood ratios can be used to make individual-level inference about the probability of disease given a test result, that is, P (D|T).
Evidently, the higher the disease prevalence (P D ), the higher we expect PPV and NPV values. Moreover, the calculation of these values is expected to vary by test score cut points used to define MDD status.
On the other hand, likelihood ratios provide an intuitive and straightforward interpretation. The likelihood ratio is a ratio of two conditional probabilities-probability of a positive (or negative) test result given that the disease is present (or absent). Therein, two variants of the likelihood ratios are needed, one for if an individual's test is positive (positive likelihood ratio: LR + ) and another if an individual's test is negative (negative likelihood ratio: LR-).
Applied to MDD, the post-test probability of disease (i.e., P (D|T) can be derived from the post-test odds (i.e., product of the pre-test odds and likelihood ratio) as: Post-test odds = pre-test odds x likelihood ratio (5) and Post-test probability = Post-test odds∕ (1 + post-test odds) , where odds = probability of having MDD (P D ) ∕1 −probability of having MDD (P D ) and The probability of having MDD (P D ) = odds∕ (1 + odds) . It should be noted, however, that the above example is only for illustration purposes and is not a substitute for a full clinical workup and differential diagnoses, but hopefully augments that process for better clinical judgment among mental healthcare providers.
It is important to note, however, that the use of the likelihood ratio approach for individual-level inference is not without its own limitations. For example, (1) a given LR+ (e.g., 10) value can be generated from different combinations of Se and Sp (e.g., 10 and 99% or 40 and 96%, respectively); (2) LRs are not linear (i.e., formula involves a division "÷" arithmetic operation); and (3)

precision of high and low
LRs is low. Despite these limitations, the translation of the likelihood ratio approach for individual-level inference using a nomogram (Fagan, 1975) can enhance its clinical utility. If the prevalence of disease and likelihood ratios are known, one can easily find the P (D|T) associated with a particular test result (±).

Prevalence estimation
In our recent article, we illustrate the prevalence estimation problem using results of the CESD20 (Owora & Carabin, 2018). Based on recent meta-analysis results (Owora et al., 2016b), the CESD20 is estimated to have on average, a sensitivity (Se) of 84% and specificity (Sp) of 78% for identifying patients with moderate or severe MDD symptoms based on a total score cut point of 16. If the CESD20 were administered to 1000 women in a population with a "true" MDD prevalence of 10%, we expect results shown in Table 1.
In this case, the estimated (biased) prevalence of MDD would be 28.2% (282/1000) which is 18.2% higher than the "true" prevalence To correct for such misclassification error, if we assume T + represents the number of individuals who test positive for MDD using the CESD20 and D + represents the number of individuals who truly have MDD then the conditional probability of an individual testing positive given that an individual truly has MDD is equal to the probability of truly having MDD and testing positive divided by the probability of truly having MDD denoted as: Using Bayes' theorem, if we assume a gold standard test for MDD exists, we can describe the association between the observed and true status as follows: where P(T + ) corresponds to the proportion of individuals testing positive for MDD (observed prevalence), P (T + |D + ) corresponds to the sensitivity of the test (Se), P (T + |D − ) corresponds to one minus the specificity of the test (1-Sp), and P (D + ) to the true prevalence of MDD.
In our previous study (Owora & Carabin, 2018), we extend these concepts in a Bayesian latent class model to demonstrate that ignoring the misclassification error of case-finding tools (e.g., CESD20) when estimating MDD prevalence among pregnant and postpartum women can result in an underestimation of the true MDD prevalence with misclassification bias (i.e., difference between adjusted and observed prevalence estimates) ranging from 6 to 43%, depending on the distribution of pre-versus postnatal assessments. Such bias can lead to the misappropriation of scarce resources to tackle the issue of MDD among mothers.

Risk factor measures of association
As an extension to the above discussion, unbiased measures of asso-

TA B L E 2 Contingency
where where C = E − − B.
In real life, bias can involve more the just the measurement of the outcome of interest but also exposures, confounders, mediators, or moderators; these misclassification errors can be with either nondifferential or differential (i.e., Se and Sp are different for E + and E − ). Moreover, these variables can be either categorical or continuous. The correction of misclassification bias (or measurement error) can involve a simple bias correction (Lash et al., 2009) to more complex approaches that include probabilistic bias correction (Fox et al., 2005), Bayesian bias-correction (MacLehose et al., 2009), modified maximum likelihood (Edwards et al., 2014), and multiple imputation (Cole et al., 2006), propensity score (Lunt et al., 2012), and/or regression calibration (Rosner et al., 1989).
In summary, interest in the validity of MDD case-finding tools among mothers of young children during the pre-and postnatal periods is well justified (Owora & Carabin, 2018;Owora et al., 2019Owora et al., , 2016a. There is a growing recognition of the multiple cross-cutting negative effects of MDD on maternal-child health during the critical developmental stages of a child (Ammerman et al., 2010;Chung et al., 2004;Heckman, 2006;Lyons-Ruth et al., 1990;Sills et al., 2007;Stewart & Vigod, 2019;Thombs et al., 2014;Whitaker et al., 2006). Errors in detection (i.e., false-positive and -negatives) can result in initiation of unnecessary treatment or failure to treat maternal MDD. Valid population-level inference related to incidence or prevalence and risk factor measures of association are critical to informing appropriate allocation of scarce healthcare resources and identifying modifiable factors for preventive intervention, respectively. Given the availability of methods that can be used to correct for MDD misclassification errors or bias derived from imperfect case-finding tools, we recommend that the correction for these errors in clinical, public health practice, and research should be the default option, and not the exception.

CONFLICT OF INTERESTS
Arthur H. Owora (author) has no conflict of interest to declare.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.

PEER REVIEW
The peer review history for this article is available at https://publons. com/publon/10.1002/brb3.2614