Profiles of depression in a treatment‐seeking Hispanic population: Psychometric properties of the Patient Health Questionnaire‐9

Abstract Objectives Screening instruments can be powerful tools in assisting primary care providers with detecting depression in their patients and monitoring treatment response. Health disparities among racial and ethnic minorities result from inaccurate assessment in primary care. Methods The current study used baseline data from two federally funded research studies of treatment for depression among Hispanics in primary care. The Patient Health Questionnaire‐9 (PHQ‐9) was administered at baseline prior to the study interventions, and 499 participants provided responses. Results Confirmatory factor analyses found excellent factor validity for the PHQ‐9, yet reliability remained poor. Possible heterogeneity in depressive item scores was examined, and latent profile analysis identified four distinct profiles of PHQ‐9 responses. Profiles included a lower depression, moderate/somatization, moderate/negative self‐view, and severe depression profiles. Results indicate modest support for the PHQ‐9 and its use among Hispanics for the purpose of depression screening. Conclusion Capturing four profiles of depression in a large primary care sample helps characterize the manifestation of depression in a Hispanic population. The single item related to fatigue had the greatest variation across groups indicating it might be useful as a screening item. Inadequate evaluation of symptoms could lead to significant under identification of the disorder among Hispanics.


| INTRODUCTION
Hispanic patients with depression and dysthymia have disproportionately high numbers of somatic symptoms, particularly among women less than 40 years of age (Chong, Reinschmidt, & Moreno, 2010). Within primary care, vague or unexplained symptoms such as aches, pain and fatigue are often presenting symptoms of depression (Trivedi, 2004). The lifetime prevalence of psychiatric disorders among Hispanics has been estimated to be 28.1% for men and 30.2% for women (Alegria et al., 2007), while overall prevalence of depression is estimated to be 27.0% (Wassertheil-Smoller et al., 2014). Hispanic patients with comorbid depression and chronic disease inadvertently delay intervention because of a somatic presentation of symptoms, which impedes accurate and timely detection of depression (Eghaneyan, Sanchez, & Mitschke, 2014;Interian et al., 2010). Such delays in addressing the underlying depression can not only make remission difficult but can make treatment of the physical condition challenging (Kravitz & Ford, 2008). Major depression increases the burden of chronic illness by increasing perception of physical symptoms, causing additional impairment in functioning (Katon, 2011;Unutzer, Schoenbaum, Druss, & Katon, 2006).

Self-report depression screening and measurement instruments
can be powerful tools in assisting primary care providers with detecting depression in their patient population, diagnosing depression and monitoring treatment response. These instruments can also help measure a patient's overall depression severity over time, as well as the specific symptoms that are improving or not with treatment (Trivedi, 2009). Accurate screening, diagnosis and treatment of depression are entirely dependent on accurate measurement of symptoms (Thielke, Vannoy, & Unutzer, 2007). Inaccurate assessment in primary care is an independent predictor of poor control of chronic disease and is a significant contributor to health disparities, lack of patient satisfaction and poor-quality patient education and understanding of their disorder (Sanchez, Ybarra, Chapa, & Martinez, 2016).
A symptom-level approach may provide important insight into how various individual factors relate to the broader disorder (Djelantik, Robinaugh, Kleber, Smid, & Boelen, 2019;. Researchers have increasingly used latent variable mixture modeling (LVMM; McCutcheon, 1987) techniques, person-centered analytic approaches, to identify groups of patients with similar profiles of symptoms. Within LVMM, latent profile analysis (LPA) is able to use symptoms to estimate an individual's likely membership in differing profiles of symptoms and then test for differences in characteristics among the resulting profiles (e.g., demographics and differences in self-report measure scores; Muthén & Muthén, 2000).
Estimating possible heterogeneity in depressive symptoms and then grouping individuals through common profiles of responses allows for researchers and clinicians to understand differing clinical presentations of those seeking treatment. Analysis of symptom level associations between depression and other factors have previously suggested paths by which these factors affect or are affected by the broader syndrome and their relationship to other disorders such as grief and PTSD (Djelantik et al., 2019).
The central aims of the current study were to (1) evaluate the psychometric quality of the Patient Health Questionnaire-9 (PHQ-9) measure (Kroenke & Spitzer, 2002) through the use of confirmatory factor analysis (CFA) and (2) explore possible heterogeneity in the symptom-level presentation of depression in a large sample of Hispanic primary care patients. The PHQ-9 is the most commonly used depression screening and measurement tool in medical settings (Savoy & O'Gurek, 2016). We expect the unidimensional model of the PHQ-9 to be supported in the current sample. A previous study of the PHQ-9 in a large sample of educated Mexican women (n ¼ 55,555; 89.4% with a university degree or more) found support for the PHQ-9 as a unidimensional model (Familiar et al., 2015). In the current analyses, we used CFA to evaluate the factor validity of the PHQ-9 followed by a three-step LPA approach which included the estimated profiles of depressive symptoms, posterior probabilities of membership, and classification uncertainty rates. The specific three-step LPA approach used predictors of latent profile membership yet included an estimation posterior probability and classification uncertainty rates into the full model with predictors of profile membership (Asparouhov & Muthén, 2013;Vermunt, 2010).

| Study design and setting
Data for the current study was baseline data collected as part of two federally funded intervention studies of Hispanic patients who screened positive for depression in primary care. The first study, Depression Screening and Education: Options to Reduce Barriers to Treatment (DESEO), was a one-group pretest-posttest design assessing a culturally-adapted Depression Education Intervention's (DEI) effects on depression knowledge, stigma, and engagement in treatment (Sanchez, Eghaneyan, & Trivedi, 2016 (Sanchez, Eghaneyan, Killian, Cabassa, & Trivedi, 2017

| Participants
All adult primary care patients were universally screened for depression using the PHQ-9 measure (Kroenke & Spitzer, 2002) delivered in English or Spanish, depending on the patient's preference, during annual or new/non-acute visits as part of normal clinical practice. When a patient scored a five or more on the PHQ-9, the primary care provider would initiate a "warm hand off", wherein the patient was referred and introduced to the clinic's bilingual LCSW.
Once referred, the LCSW assessed patients for the presence of the nine diagnostic criteria for major depression from the Diagnostic and Patients who met the following inclusion criteria were invited to participate in the studies: 18 years or older, self-identified as Hispanic, met diagnostic criteria for major depressive disorder, and were not already receiving treatment for depression (medication and/or psychotherapy). During the screening process, all patients were also given the Generalized Anxiety Disorder-7 (GAD-7) anxiety measure (Spitzer, Kroenke, Williams, & Lowe, 2006). All participants signed informed consent prior to participation. The total sample for the current analysis included baseline data from the DESEO (N ¼ 349) and METRIC (N ¼ 150) samples for a combined total of 499 participants. In both studies, all measures were completed prior to receipt of the educational intervention.

| Depression
Depression severity was measured using the PHQ-9, a self-report measure that assesses the frequency of depression symptoms within the last 2 weeks using each of the of the nine DSM-IV criteria for depression. Total scores range from 0 to 27 with scores of 5-9 representing mild depression, 10-14 representing moderate depression, 15-19 representing moderately severe depression, and greater than or equal to 20 representing severe depression. The PHQ-9 has demonstrated to be a reliable and valid measure of depression severity in racially and ethnically diverse primary care samples and is available in both English and Spanish (Huang, Chung, Kroenke, Delucchi, & Spitzer, 2006). Studies examining the factor structure of the PHQ-9 have demonstrated support for the measure as a single-factor structure among both English and Spanish speaking Hispanics with strong internal consistency Huang et al., 2006).

| Anxiety
Anxiety symptom severity was measured using the GAD-7, a 7-item self-report scale for identifying the presence of generalized anxiety disorder . The items of the GAD-7 assess frequency of symptoms over the last 2 weeks based on the diagnostic criteria for generalized anxiety disorder in the DSM-IV, with responses ranging from 0 for "not at all" to 3 for "nearly every day." The GAD-7 has been found to be a reliable and valid measure for use with Hispanic Americans and has demonstrated strong internal consistency reliability for both the English and Spanish versions (Mills et al., 2014).

| Demographics
Demographic measures were collected via self-report and medical record extraction and included age at enrollment, gender, primary language, marital status, and education level.

| CFA of PHQ-9
CFA was used to assess the fit of the PHQ-9 to the baseline DESEO and METRIC data (N ¼ 499). CFA modeling used meanand variance-adjusted weighted least squares estimator in Mplus 8.1 (Muthén & Muthén, 2018). Due to the clinical setting and nature of data collection supported by clinical staff in DESEO and METRIC, the PHQ-9 scores had no missing data. The model chisquare (χ 2 ) value (Kline, 2011), model chi-square value per degrees of freedom (χ 2 /df; Bollen, 1989), and root-mean-square error of approximation (RMSEA; Hu & Bentler, 1999) with a 90% confidence interval (90% CI) were used to assess model fit. Good model fit was indicated by a lower and non-significant χ 2 value, χ 2 /df value less than 3.0, and RMSEA scores less than 0.08 or 0. -3 of 11 2.4.2 | Latent profile analysis Self-reported items from the PHQ-9 were used as indicators in a LPA model using Mplus 8.2. The fit of the LPA model to the data was assessed through several fit indices and based on LVMM reporting recommendations of Nylund, Asparouhov, and Muthén (2007). Log likelihood (Lanza, Flaherty, & Collins, 2003) and information criteria based fit statistics were used to assess model fit. Lower values of the Bayesian Information Criteria (BIC; Schwartz, 1978), Akaike Information Criteria (Akaike, 1987), and Sample-Size Adjusted BIC (Sclove, 1987) each indicate better fit and model parsimony (Muthén & Muthén, 2003). The Lo-Mendell-Rubin test (LMR; Lo, Mendell, & Rubin, 2001) and the parametric Bootstrap Likelihood Ratio Test (BLRT; McLachlan & Peel, 2000) are both log likelihood ratio tests which compare these values between the model with k profiles and a model with one fewer profiles, or kÀ 1 (Bollen, 1989).
In the case of LPA, profile membership has traditionally been imputed from the maximum-probability assignment rule (Nagin, 2005) in which individuals are categorized into a profile based on their greatest posterior probability, the likelihood of membership in a particular profile for all within a sample (Muthén, 2001).

| Demographic characteristics
The sample of 499 Hispanics screening positive for depression across two samples (Table 1)

| Measurement validity of PHQ-9
Construct and criterion-related validity was assessed through testing the association between PHQ-9 scores and other reported patient demographics and self-report measure scores. PHQ-9 scores were strongly and significantly correlated with GAD-7 anx-

| Initial LPA modeling
LPA modeling indicated a four-profile solution best fit the data (Table 2) and was a significant improvement on a model with three profiles (LMR p ¼ 0.0146; BLRT p < 0.001). Estimation of a model with five profiles resulted in a decrease in model quality and fit.
Entropy scores (0.894) further supported the latent categorical variable and fit with the data with four profiles. Furthermore, profiles had varying patterns of responses across PHQ-9 items satisfying a LVMM modeling concern reported by Morin and Marsh (2015).

| Identified depression profiles
Profiles were found to be as follows (Figure 2 and Table 3): Profile 1: lower, mild depression profile (n ¼ 104, 20.8%); reported lowest mean item scores across seven of the nine items with only slightly higher scores on feeling bad about oneself and thoughts of self-harm.
Profile 2: moderate/somatization profile (n ¼ 183, 36.7%); reported higher mean item scores for items addressing anhedonia, sleeping problems, having little energy, and feeling like a failure.
Profile 3: moderate/negative self-view profile (n ¼ 72, 14.4%); reported moderate mean scores across the PHQ-9 items. This group was found to have elevated mean items scores for items related to feeling down or hopeless, feeling bad about oneself, and psychomotor disturbances. In terms of PHQ-9 overall severity, the somatization profile most closely aligned with the moderate/negative self-view profile.
Profile 4: severe depression (n ¼ 140, 28.1%); reported the highest mean item scores across all items in the PHQ-9.
PHQ-9 scores were significantly associated with profile membership, as expected (F [3, 495] ¼ 199.33, p < 0.001). Individuals categorized as Profile 1 had significantly lower PHQ-9 scores than each of the other three profiles, and Profile 4 individuals reported significantly higher scores than the other three. Individuals in Profiles 2 and 3 did not report significantly different scores. -5 of 11

| Association with covariates
The three-step LPA approach was completed for all demographic variables and self-report measure scores predicting likelihood of profile membership (Table 4). Anxiety assessed through the GAD-7 was the most predictive of PHQ-9 profile membership. When

| DISCUSSION
Results indicate modest support for the PHQ-9 and its use among Hispanics in primary care for the purpose of depression screening and monitoring. Internal consistency reliability was poor, consistent with qualitative research on the measure which found patients felt the items did not adequately characterize their experience of symptoms of low mood (Malpass et al., 2016). Cronbach alpha coefficient assumes tau-equivalence meaning that the statistic assumes all items have equal true scores (i.e., an assumption of equal factor loadings for all items in the model). Despite support for unidimensionality of the PHQ-9, the lower alpha coefficient estimate is likely F I G U R E 1 Patient Health Questionnaire-9 (PHQ-9) confirmatory factor analysis (CFA) diagram (N ¼ 499 due to a lack of tau-equivalence in the data (Dunn, Baguley, & Brunsden, 2014). Support for the unidimensional one-factor structure of the PHQ-9 was found with responses from Hispanic women from two community samples (Familiar et al., 2015;Merz, Malcarne, Roesch, Riley, & Sadler, 2011). PHQ-9 data from a large sample of highly educated Mexican, female educators fit the one-factor structure with higher reliability (⍺ ¼ 0.89) but lower in subsamples within the study (α ¼ 0.85 and 0.77; Familiar et al., 2015).  Abbreviations: LPA, latent profile analysis; PHQ-9, Patient Health Questionnaire-9.
KILLIAN ET AL.
-7 of 11 wherein the PHQ-9 becomes less reliable when reported scores increase throughout the sample.
The single item related to fatigue had the greatest variation across profiles indicating it might be useful as a screening item. In Hispanic populations, especially women, careful attention to physical descriptions of symptoms could prove important to understanding depression severity and has implications for treatment. Over 100 studies have examined the PHQ-9 for use in primary care and general medical settings (Kroenke, Spitzer, Williams, & Löwe, 2010), and severity of depression is routinely assessed by adding up scores for disparate symptoms to create a sum-score, even though symptom variability among people diagnosed with depression is broad and collapsing all symptoms into a single sum-score fails to characterize the unique combination for an individual Zimmerman, Ellison, Young, Chelminski, & Dalrymple, 2015). The negative correlations between error terms in the factor model was overlap among symptoms related to negative self-view and psychomotor disturbances. For example, we found that the symptoms "feeling down, depressed, or hopeless" was strongly associated "feeling bad about yourself," "trouble concentrating," and "moving or speaking so slowly that other people could have noticed," a community of symptoms found to cause significant impairment in functioning but which can be difficult to tease out in clinical practice .
Findings further support existing knowledge about the correlation between depression and anxiety, especially among women, which extends previous research and clinical discussions suggesting a close, predictable relationship between the two disorders (Goldberg & Fawcett, 2012). This finding also corresponds to current themes in the refining of the classification of anxiety and depression which suggest sufficient similarity and overlap of symptoms to consider a clinical grouping reflecting this comorbidity or that these disorders may reflect different manifestations of the same condition (Andrews, Anderson, Slade, & Sunderland, 2008 The substantial symptom variation among individuals who all qualify for a single diagnosis has been previously examined and our findings further highlight the potential explanation for the difficulty in achieving treatment efficacy given the different depression profiles . It is common practice to use measurement sum scores as a criteria for establishing depression severity and using algorithms to treat accordingly, especially in primary care (Manea, Gilbody, & McMillan, 2015). However, such broad assumptions may be unjustified because depression symptoms differ in their impact on impairment and functioning, and individuals with similar total severity scores can have very different syndromes (Cohen, Greenberg, & IsHak, 2013;. Results from the current study highlight the importance of looking beyond depression measurement summary scores and examining specific symptoms experienced by patients during treatment. Inadequate evaluation of specific subsets of symptoms could lead to significant under identification of the disorder (Fried, 2017) and contribute to treatment disparities in Hispanic populations.
Clinician focus on acute symptoms of mood without consideration of residual symptoms-including anxiety and psychomotor functioningis common and increases the likelihood of relapse (Trivedi, 2004).
Some specific clinical characteristics may inform the choice between medication and psychotherapy, the selection of specific medication, or the selection of a specific psychotherapy (Simon & Perlis, 2010).
A thorough assessment of all symptoms and their causal associations is necessary in order to achieve lasting remission and represents an initial step toward personalized treatment of depression that recognizes the heterogeneity of the disorder Trivedi, 2009).
Study results should be interpreted in light of its limitations.
First, the study sample was primarily women, who generally report greater symptoms of anxiety/somatization compared to men (Kornstein et al., 2000). A majority of the participants were also Spanish-speaking, a variable that was not controlled for in the analyses. However, prior research among Hispanic women support a similar one-factor structure of the PHQ-9 with equivalent response patterns among English and Spanish speakers (Merz, Malcarne, Roesch, Riley, & Sadler, 2011). Analyses with more heterogeneous samples or samples composed of primarily men warrant further investigation. Additionally, examining profiles of depression symptoms to better understand the various clinical presentations that may exist would benefit from multiple assessments including both selfreport and clinician-rated. However, it should be noted that the utility of identifying subtypes of depression may be limited given that the extent to which proposed subtypes or even individual symptoms change over time is uncertain (Ulbricht, Rothschild, & Lapane, 2016).
The sample was obtained through convenience sampling and was from two intervention studies of Hispanic patients who met diagnostic criteria for depression in an urban primary care setting in Texas, hence generalizability of results may be limited.
In conclusion, applying psychometric approaches like CFA and LPA to the most widely used depression screening and monitoring measure in primary care can yield important insights at the level of the individual because it allows for examination of relationships between symptoms and underlying dimensions. The described profiles identified through person-centered statistical techniques may be meaningful in clinical assessment to tease out the burden of symptoms and personalize treatment accordingly. And while the present findings about profiles may be too complex to have acceptable clinical utility, a focus on varying manifestations of depression can be useful for clinicians to differentiate profiles from sum scores in conjunction with decision support tools when indicated.