Eating disorders in youth: Diagnostic variability and predictive validity

Authors


  • Dr. Loeb receives support from federal grants through the National Institutes of Health (National Institute of Mental Health and National Institute for Child Health and Human Development). Dr. Le Grange receives support from the National Institutes of Health (National Institute of Mental Health and National Institute for Child Health and Human Development) and the Baker Foundation, and receives royalties from Guilford Press. Dr. Lock receives support from the National Institutes of Health (National Institute of Mental Health), the Davis Foundation, and the Lucile Packard foundation, and receives royalties from Guilford Press. Dr. Hildebrandt receives support from the National Institutes of Health (National Institute on Drug Abuse). Research on this project was partially supported by grants from the National Institute of Mental Health (NIMH): K24-MH074457 (PI: J. Lock), K23 MH074506-05 (PI: K. Loeb) and K23 MH01923 (PI: D. Le Grange), and the National Institute on Drug Abuse (NIDA): K23 DA024043 (PI: T. Hildebrandt).

Abstract

Objective:

The primary aim was to examine the utility of DSM-IV criteria in predicting treatment outcome in a sample of adolescents with eating disorders.

Method:

We (a) descriptively compared the baseline rates of anorexia nervosa (AN) and bulimia nervosa (BN) across multiple reference points for diagnostic criteria, (b) using ROC curve analyses, assessed the sensitivity and specificity of each diagnostic criterion in predicting clinical outcome, and (c) with logistic regression analyses, examined the incremental predictive value of each criterion.

Results:

Results show a high degree of variability in the baseline diagnostic profiles as a function of the information used to inform each DSM-IV criterion. For AN, Criterion A yielded the best predictive validity, with Criteria B-D providing no significant incremental value. For BN, none of the measures had a significant AUC, and results from logistic regression analyses showed that none of the indicators were robust in predicting outcome.

Discussion:

For AN, the existing Criterion A is appropriate for children and adolescents, and is sufficient to predict outcome in the context of active refusal to maintain a normal weight as well as multiple informants and behavioral indicators of the psychological aspects of AN. For BN, predictive validity could not be established. © 2010 by Wiley Periodicals, Inc. Int J Eat Disord 2010

Introduction

The primary purpose of diagnostic criteria, as put forth in the Diagnostic and Statistical Manual of Mental Disorders, 4th ed (DSM-IV), is improved care for patients with psychiatric disorders.1 To accomplish this stated mission, diagnostic criteria must have clinical utility, which is characterized by ease of use, reliability between clinicians, and predictive value.2 The diagnostic criteria for anorexia nervosa (AN) and bulimia nervosa (BN) may be compromised in these respects, especially for children and adolescents; the application of the current DSM criteria to younger populations can be awkward, resulting in a higher likelihood of misdiagnosis3 and an inflated representation of Eating Disorder Not Otherwise Specified (EDNOS) in specialty clinics.4–6 Concerns about accurate case identification using the existing DSM criteria for AN and BN is potentiated among children and adolescents, who often present atypically in the emerging stages of the disorder.3 Furthermore, the predictive validity of each individual diagnostic item with regard to outcome, a key issue to consider in the revisions for DSM-V7 is largely unknown. A primary aim of the current study is to examine the utility of DSM-IV criteria in predicting treatment response in a sample of adolescents with eating disorders.

AN is characterized by refusal to maintain normal body weight, fear of weight gain despite being underweight, disturbance in how one experiences shape or weight, and loss of menses in postmenarcheal women.1 Both the physiological and psychological diagnostic criteria can be operationalized in multiple ways and have been criticized for their lack of developmental sensitivity by failing to adequately specify age-specific symptom manifestations.3 Criterion A, “the refusal to maintain body weight at or above a minimally normal weight for age and height,” does not specify how to measure “normal weight,” a dynamic phenomenon given natural variability in the onset of pubertal growth spurts,8, 9 and for which there are various definitions in children and adolescents. Importantly, it remains unknown whether 85% of ideal body weight (IBW), the DSM-suggested cutoff for underweight status, is the most clinically predictive cutoff for adolescent AN.10 Criterion D, amenorrhea, which has a broader history of controversy,11–17 is particularly concerning for children and adolescents as menstrual cycles are frequently irregular during the first two years following menarche, and the expected age of menarche is often uncertain,10 especially in the context of an eating disorder.18 With regard to the cognitive criteria for AN (Criteria B and C), research indicates that brain maturation continues through adolescence19 and advanced cognitive functions, including abstract reasoning and emotional awareness, may not be fully developed in this population.20 Therefore, adolescents might have difficulty describing their thoughts and feelings related to the eating disorder, which could affect early diagnosis and treatment. Some researchers have suggested that for both children and adults, behavior might be a more reliable index than self report for the psychological features of eating disorders,21 yet the DSM-IV1 does not take such indicators into account, potentially leading to false negatives in diagnosis. Furthermore, the DSM does not provide guidelines for symptom ascertainment in younger populations or how to reconcile parent-child reporting discrepancies22, 23 of Criteria B and C for AN, especially in light of adolescents' tendency to minimize, deny, or fail to appreciate the harmful implications of their behavior.24

Symptoms of bulimia nervosa (BN) include recurrent episodes of binge eating (Criterion A) accompanied by inappropriate compensatory behaviors (Criterion B); these behaviors must both occur at a minimum of twice per week for three months (Criterion C). In addition, the self-evaluation of individuals with BN is unduly influenced by body shape and weight (Criterion D).1 The utility of the frequency/duration criterion among adolescents has been questioned, with evidence that the current thresholds are arbitrary, unnecessarily high, and do little to differentiate between-group levels of pathology.25, 26 Moreover, the symptom patterns of adolescents with BN-spectrum EDNOS may be in evolution,26 while equally pernicious to a full diagnosis profile in terms of treatment response, as well as medical and psychological comorbidity and risk.27 In fact, it can be argued that these liabilities are even more pronounced in the developing body of an adolescent. Additionally, as with AN, the psychological features of BN may present atypically or require allowances for behavioral indicators.

The nosological and practical implications of subthreshold eating disorders in adolescents and adults are discussed in several recent publications,2, 5, 25, 28 and research on the parameters that should define full diagnoses of AN and BN across stages of development is especially timely as the preparation for the fifth edition of the DSM continues.29 The three main objectives of this study were to (a) descriptively compare the baseline rates of AN versus subthreshold AN (SAN) and BN versus subthreshold BN (SBN) among adolescents across multiple reference points for select AN and BN criteria, respectively; (b) to assess the sensitivity and specificity of each criterion in predicting clinical outcome across strict and broader definitions of AN and BN, with the broader models reflecting potential modifications to symptom definition and ascertainment in youth; and (c) to determine the incremental predictive value of each diagnostic criterion across strict and broader definitions of AN and BN. For cases of AN and SAN, we also sought to determine which normal weight reference points and which corresponding cutoffs have the best predictive validity. We predicted that for AN, Criterion A (refusal to maintain a normal body weight) would have the greatest predictive validity given measurement error in the psychological criteria and the validity concerns in the amenorrhea criterion outlined above. In addition, we hypothesized that for BN, inappropriate compensatory mechanisms would have the greatest predictive validity in light of research demonstrating that purging may carry the greatest pathology of the BN criteria.30, 31

Method

Participants

A total number of 224 adolescents with an eating disorder diagnosis participated in this study. AN-spectrum participants (including SAN) (N = 144) were drawn from four sources across three sites: (a) a randomized controlled trial (RCT) at Stanford University of short- and long-term family-based treatment (FBT) for adolescent AN32; (b) a case series from the University of Chicago of FBT for adolescents with AN33; (c) a case series from Stanford University and the University of Chicago of FBT for children with AN34; and (d) an open dissemination trial at Columbia University of FBT for adolescent AN and SAN.35 Inclusion criteria for the RCT32 were age 12 to 18 and meeting DSM-IV criteria for AN (but allowing for partial weight restoration and the absence of one, not three menstrual cycles). Exclusion criteria were physical health problems likely to affect weight (e.g., diabetes mellitus), psychiatric illness likely to interfere with treatment (e.g., psychosis), or a prior failed course of FBT. The adolescent case series33 included clinic patients (mean age 14.5, SD = 2.3, range = 9–18) who either met full DSM-IV criteria for AN at presentation, were partially weight restored in the hospital before commencing FBT, or met all criteria for AN except that significant dietary restriction led to weight loss above the AN criterion but below a BMI of 20. Consistent with the FBT model, parent(s) were required to be available for treatment. The child case series34 was derived from a retrospective chart review of consecutive cases over seven years to identify children ages 12 years or younger who received a diagnosis of either AN or EDNOS rule/out AN by clinical interview applying DSM-IV criteria. The dissemination trial35 included adolescents 12–17 with a diagnosis of AN or SAN, medically cleared for outpatient care, and with at least one parent or guardian willing to participate in treatment. SAN was defined as either weight loss to below 100% IBW, but above the 85% cutoff for AN, plus secondary amenorrhea, or weight loss to below 85% IBW plus oligomenorrhea. Exclusion criteria were Axis I diagnosis of bipolar disorder, psychotic disorder, or substance dependence; mental retardation; acute suicidal risk; acute or unstable poorly controlled chronic medical illness; physical abuse perpetrated on the patient by the parent or guardian who would otherwise be participating in treatment; concurrent participation in another treatment for AN; and weight below 75% IBW.

BN-spectrum participants (including SBN) (N = 80) were derived from a randomized controlled trial at the University of Chicago comparing FBT and individual supportive psychotherapy for adolescent BN and SBN.36 Participants were males and females aged 12 to 19 years living with their families or adult caregivers, who either met DSM-IV criteria for BN (purging or non-purging type), or who engaged in binge eating and/or purging at least once per week for a stable period (6 months) in combination with the other DSM-IV criteria for BN. Exclusion criteria included physical or psychiatric disorder requiring hospitalization, insufficient proficiency in English to comprehend the intervention delivered in that language, current alcohol or other substance dependence, BMI ≤ 17.5, current treatment for the eating disorder, and physical conditions (e.g., diabetes mellitus or pregnancy), treatments, or medications known to affect eating or weight. Stable (≥4 weeks) antidepressants were permitted with the exception of ≥50 mg of fluoxetine.

All studies were approved by the Institutional Review Boards at their respective institutions. The combined AN dataset yielded a sample with a mean age of 14.3 years (SD = 1.9), a mean duration of illness of 12.0 months (SD = 11.2), and a mean BMI of 16.8 (SD = 1.6). The majority of the AN-spectrum sample was female (88.2%), white (82.6%), and from intact families (78.6%). A small majority (51.8%) met full diagnostic criteria for AN according to clinical interview, while the remainder (48.2%) met study criteria for SAN (defined as either weight below 85% ideal body weight (IBW) plus oligomenorrhea, or deliberate weight loss to below 100% IBW (but above 85% IBW) plus amenorrhea). The BN sample was characterized by a mean age of 16.1 years (SD = 1.6), a mean duration of illness of 21.2 months (SD = 22.3), and a mean BMI of 22.1 (SD = 3.0). Like the AN patients, the majority of the BN-spectrum sample was female (97.5%), white (63.8%), and from intact families (57.5%). Nearly half (46.3%) met full diagnostic criteria for BN, and 53.8% were best characterized as SBN (defined in the study as a combined frequency of binge eating and/or purging of at least once per week, but less than twice per week, over the previous 6 months).

Assessment and Measures

Height and Weight

We obtained height and weight on a physician's balance scale without shoes, in single-layer street clothes.

Eating Disorder Examination, 12th ed (EDE)

The EDE,37 the gold-standard assessment tool for eating disorders, is a semistructured interview that assesses eating behavior (e.g., binge eating on objectively large or subjectively large amounts of food), inappropriate compensatory behaviors (e.g., self-induced vomiting, laxative and diuretic misuse, fasting, excessive exercise), and four additional dimensions of eating disorder pathology (dietary restraint, eating concern, shape concern, and weight concern) over the past 1 to 6 months, depending on the item. For the purposes of this study, only present state (the prior 28 days) was analyzed given the potentially emerging nature of eating disorder symptoms in this younger sample. The EDE incorporates frequency and severity items that correspond to the DSM-IV physiological, behavioral, and psychological criteria for AN and BN. Severity items are scored on a 0-6 Likert scale, with a cutoff of 4 recommended as the diagnostic threshold. The EDE has been used successfully with adolescents.38–40

Clinical Interview

Across all source studies, psychiatric interviews were conducted with patients and their parent(s) to complement the semistructured interviews administered in the respective studies, and to ascertain current symptoms and diagnosis as well as history and details of present illness, plus other data including demographic variables. These interviews were not standardized. In addition to chief complaint and history of present illness, information was obtained regarding treatment history, history of other psychiatric symptoms and conditions (including substance abuse and dependence, psychosis, and suicidality, if not assessed through other standardized measures), psychosocial history, and medical history. Eating disorder symptoms were assessed using wording from the DSM-IV criteria for AN and BN, simplified as necessary for younger participants, and reworded when interviewing parents to allow them to provide their perspective on their offspring's clinical presentation and to elicit justification for their response (e.g., “Is your daughter afraid of gaining weight or becoming fat? How do you know? What does she say or do to indicate this?”). The clinical interview assessed for denial of seriousness of low weight (part of Criterion C for AN), which is not captured in the EDE. Working diagnoses generated from this interview were a function of clinical judgment based on all the information gathered.

AN Criteria

For AN, four reference points for Criterion A (“refusal to maintain body weight at or above a minimally normal weight for age and height (e.g., … less than 85% of that expected …”) at baseline were applied and compared: (a) percent ideal body weight (%IBW; (current weight/ideal weight) × 100) less than 85% using the weight corresponding to the 50th percentile for age, height, and gender according to older National Center for Health Statistics (NCHS) norms41 as a proxy for ideal; (b) % IBW less than 85% using the weight corresponding to the 50th percentile BMI-for-age according to newer NCHS norms42 as a proxy for ideal; (c) BMI equal to or below 17.5 [used in the International Classification of Diseases (ICD)-10 Diagnostic Criteria for Research]43; (d) BMI-for-age percentile equal to or below 5%. Older and newer NCHS norms were compared to determine if any changes in population-based trends (e.g., increased rates of pediatric obesity) would affect relative predictive validity, and because prior research on adolescent AN35 has used the older reference points. These four methods were used not just to determine how rates of AN in the sample would shift depending on whether respective thresholds for Criterion A were met, but also to analyze weight-related variables continuously to determine the thresholds that optimize predictive validity (see Data Analysis below). For AN Criterion B (“intense fear of gaining weight or becoming fat, even though underweight”), we compared three different criteria: (a) a stringent cutoff of 4 or greater for the EDE diagnostic item, Fear of Weight Gain; (b) a relaxed cutoff of 1 or greater for Fear of Weight Gain (i.e., documenting the presence of the symptom, albeit mild); and (c) intense fear of weight as determined by clinical interview, incorporating patient self-report, parent-report, and direct clinical observation, and allowing for behavioral evidence of this symptom (e.g., deliberate, rigid, extreme dietary restriction or expressions of extreme anxiety in response to being asked to consume foods that might increase weight by virtue of quantity or quality). Similarly, for AN Criterion C (“disturbance in the way in which one's body weight or shape is experienced, undue influence of body weight or shape on self-evaluation, or denial of the seriousness of the current low body weight”), we compared three methods: (a) a stringent cutoff of 4 or greater on one or more of the EDE diagnostic items Importance of Shape, Importance of Weight, and Feelings of Fatness; (b) a relaxed cutoff of 1 or greater for one or more of these three items; and (c) disturbance in experience of shape/weight, over-valuation of shape/weight, and/or denial of the seriousness of current low weight, as determined by clinical interview, incorporating patient self-report, parent-report, and direct clinical observation, and allowing for behavioral evidence of these symptoms (e.g., utterances such as “I'm so fat”). Criterion D (“in postmenarcheal females, amenorrhea, i.e., the absence of at least three consecutive menstrual cycles”) was determined by clinical interview. Amenorrhea was imputed based on weight status alone for females on oral contraceptives in the sample.

BN Criteria

For BN, several combinations and permutations of BN Criterion A (“recurrent episodes of binge eating…characterized by…eating, in a discrete period of time, an amount of food that is definitely larger than most people would eat…[and] a sense of lack of control over eating during the episode”), Criterion B (“recurrent inappropriate compensatory behavior in order to prevent weight gain”), and Criterion C (“both occur, on average, at least twice a week”) at baseline were compared to examine how rates of the disorder in this sample would change across criteria modifications. Specifically, we examined (a) objectively large binge eating episodes [EDE Objective Bulimic Episodes (OBE) item] at a minimum frequency of twice per week, reflecting original DSM-IV criteria; (b) objectively large binge eating episodes (EDE OBE item) at a relaxed frequency of at least once per week to address the controversy in the field around frequency thresholds25; or (c) objectively large binge eating episodes (EDE OBE item) and/or subjectively large binge eating episodes (EDE Subjective Bulimic Episode (SBE) item) at a combined frequency of twice per week or greater, thereby highlighting the loss of control over eating feature of BN, which some argue is more salient than size of binge per se.44 In addition, we compared rates of the disorder across two thresholds for inappropriate compensatory behaviors (spanning EDE items that correspond to DSM-IV examples of purging, specifically self-induced vomiting, laxative misuse, and diuretic misuse, plus an EDE item that assessed driven exercise): (a) a minimum of twice per week, reflecting original DSM-IV criteria and (b) a relaxed frequency criterion of at least once per week. These methods were used not just to determine how rates of BN in the sample would shift depending on whether respective thresholds for Criteria A-C were met, but also to analyze behavioral frequency variables continuously to determine the thresholds that optimize predictive validity (see Data Analysis below). For BN Criterion D (“self-evaluation is unduly influenced by body shape and weight”), we compared three methods: (a) a stringent cutoff of 4 or greater on one or both of the EDE diagnostic items Importance of Shape or Importance of Weight; (b) a relaxed cutoff of 1 or greater for one or both of these two items; and (c) over-valuation of shape/weight as determined by clinical interview, incorporating patient self-report, parent-report, and direct clinical observation, and allowing for behavioral evidence of these symptoms (e.g., the patient becoming more visibly or verbally distraught over weight gain or lack of weight loss than about other occurrences that would have previously caused paramount distress, such as a lower grade than anticipated or a fight with a best friend).

Outcome

For AN, results were categorized according to revised Morgan-Russell criteria for AN outcome,45 with “good” outcome defined as weight restoration (>85% ideal body weight) plus resumption or onset of menses, “intermediate” outcome defined as weight restoration in the absence of menses, and “poor” outcome defined as neither weight restoration nor resumption or onset of menses. Post-treatment menstrual status was defined as present if one or more consecutive periods had been initiated (for cases of primary amenorrhea) or had resumed (in instances of secondary amenorrhea) without oral contraceptives. Multiple consecutive periods were not required since irregular menses is common among the general adolescent population. For males with AN, Morgan-Russell outcome was defined on the basis of weight status alone. For the receiver operating characteristic (ROC) analyses (see Data Analysis below), which require a binary outcome, intermediate and good categories were combined to represent a weight restored group versus a poor outcome group who failed to gain to above 85% IBW. We also analyzed two stricter outcome criteria: weight at 90% IBW at outcome, and weight at 95% IBW at outcome. In addition, we reanalyzed the data with a stricter binary outcome by combining the poor and intermediate Morgan Russell groups (versus good), with abstinence from binge eating and purging added as additional criteria for good outcome. This modification to Morgan Russell criteria as applied in recent research is important since over time, a percentage of adolescents with restricting AN presentations develop bulimic symptoms, and ultimately BN, during the course of illness.

For BN, consistent with the methods in the parent RCT BN study,36 results were categorized as remission (no OBE, SBE, or compensatory behavior for the previous 4 weeks), partial remission (no longer meeting entry criteria for the study, i.e., termination OBE + purging frequency less than once per week), or no remission. For the ROC analyses (see Data Analysis below), full and partial remission categories were combined to represent a good outcome group who no longer met the study criteria for a BN-spectrum diagnosis versus a poor outcome group who at a minimum still met entry criteria for the study. We also analyzed full remission only as a stricter outcome criterion.

Data Analysis

Data analysis was conducted using a standard statistical software package (SPSS 17.0). Percentage of patients meeting each DSM-IV criterion reference point or modification was calculated for AN and BN. The predictive values of the continuous criteria were evaluated within a series of receiver-operator characteristic (ROC) curves. This methodology yields a plot of sensitivity against specificity at each level of the predictor and these coordinates can then be plotted to demonstrate the value of the measure against other measures. The area under the curve (AUC) provides a single overall measure of accuracy and the coordinate can be examined to identify the point or threshold of the predictor where the greatest degree of sensitivity and specificity is achieved. The significance of the AUC is in reference to chance level prediction, or an AUC of 0.5, shown as the reference line in an ROC curve. ROC analyses were conducted to determine which of the four AN Criterion A (underweight) methods had the greatest area under the curve (AUC), and which specific weight threshold yielded the maximum combination of sensitivity and specificity in predicting outcome. Similarly, for BN, ROC curves were calculated to determine whether OBEs, SBEs, or OBEs plus SBEs had the greatest AUC, and which specific binge eating and inappropriate compensatory behavior frequency cutoffs yielded the maximum combination of sensitivity and specificity in predicting outcome.

Finally, logistic regressions were conducted for AN and BN to assess significance of each individual criterion and its unique contribution to the overall prediction of outcome. In addition, the sensitivity and specificity of each criterion in predicting clinical outcome was calculated and the incremental predictive value of each diagnostic criterion across strict and broader definitions of AN and BN inferred from the results of the regression analysis. Specifically, for AN, three models were examined: one that incorporated the stringent EDE diagnostic threshold of 4 or greater for the psychological items, a second that applied the relaxed EDE threshold of 1 or greater, and a third that included data from the clinical interview (Table 1). The AN Criterion A (underweight) method entered in these logistic regressions was determined by the results of the ROC analysis (i.e., which method of establishing weight status had the greatest AUC). For BN, three models were also examined (Table 2).

Table 1. AN models
 StrictModerateBroad
  • a

    Criterion B = EDE fear of weight gain.

  • b

    Criterion C = EDE importance of shape and/or importance of weight and/or feelings of fatness.

  • c

    Psychological diagnostic criteria, including denial of seriousness of low weight, determined by incorporating patient self-report, additional informant (parent) report, direct clinical observation, and behavioral evidence of symptoms.

Cutoff for EDE psychological diagnostic itemsab≥4≥1N/A
Application of Information from clinical interviewcNoNoYes
Table 2. BN models
 StrictModerateBroad
  • Notes: OBE = objective bulimic episodes; SBE = subjective bulimic episodes.

  • a

    Criterion D = EDE importance of shape and/or importance of weight.

  • b

    Psychological diagnostic criteria determined by incorporating patient self-report, additional informant (parent) report, direct clinical observation, and behavioral evidence of symptoms.

Cutoff for EDE psychological diagnostic itemsa≥4≥1N/A
Application of information from clinical interviewbNoNoYes
EDE binge eating episode typeOBEOBEOBE + SBE
EDE binge eating episode frequency≥2×/wk≥1×/wk≥2×/wk
EDE inappropriate compensatory behavior frequency≥2×/wk≥1×/wk≥2×/wk

Results

Descriptive Comparisons

Figures1 and 2 show the percentage of patients meeting each DSM-IV criterion at baseline for AN and BN (based on each individual criterion alone), respectively, depending on which reference point or methodology was applied. For AN Criterion A, rates of AN (i.e., the percent meeting the DSM-IV diagnostic cutoff for low weight, separate from the other diagnostic criteria for AN) ranged from 27% to 66% depending on which reference point was applied; across the psychological criteria (B and C), rates ranged from 38% to 100%. For BN Criteria A-C, rates of BN ranged from 54% to 92% depending on the types of eating behaviors and frequency thresholds applied; for BN Criterion D, rates of BN ranged from 88% to 100%. Values within each method total less than 100% in cases of missing data.

Figure 1.

Percentage of patients meeting the threshold for anorexia nervosa (AN) versus eating disorder not otherwise specified (EDNOS, or subthreshold AN)—based on each DSM-IV criterion alone—as a function of alterations in reference points for Criterion A (A), Criterion B (B), and Criterion C (C). NCHS: National Center for Health Statistics; BMI: Body Mass Index; EDE: Eating Disorder Examination.

Figure 2.

Percentage of patients meeting the threshold for bulimia nervosa (BN) versus eating disorder not otherwise specified (EDNOS, or subthreshold BN)—based on each DSM-IV criterion alone—as a function of alterations in reference points for Criterion A-C (A) and Criterion D (B). OBE: Objective Bulimic Episodes; SBE: Subjective Bulimic Episodes; ICB: Inappropriate Compensatory Behaviors; EDE: Eating Disorder Examination.

Sensitivity and Specificity Analyses: AN

Figure3 and Table3 display the ROC curve results for the four methods of determining Criterion A for AN. Percent IBW (using the weight corresponding to the 50th percentile BMI-for-age according to newer NCHS norms as a proxy for ideal) had the greatest AUC, although very similar to BMI adjusted for age, with a cutoff of 81.56 (sensitivity = 0.727, specificity = 0.821) for predicting poor outcomes from treatment. Results from the logistic regression analyses for AN (Table4) showed that for each of the three models, Criterion A yielded the best predictive validity, with Criteria B-D providing no significant incremental value. Analyses with 90% IBW and 95% IBW as outcome variables, and with binary outcome redefined as poor + intermediate versus good Morgan Russell groups (with abstinence from binge eating and purging added as additional criteria for good outcome), all showed identical patterns of results, which are available on request.

Figure 3.

ROC curve for anorexia nervosa Criterion A with four reference points for ideal body weight (IBW). Diagonal segments are produced by ties.

Table 3. AN ROC curve results
 Area Under the Curve    Coordinates of the Curve
CriterionAreaSig95% CI Lower Bound95% CI Upper BoundPositive if ≤SensitivitySpecificity
  1. Notes: IBW = ideal body weight; BMI = body mass index.

% IBW-older norms.789.002.687.89183.46.909.632
% IBW-newer norms.847.000.751.94484.73.909.358
BMI.763.004.638.88816.25.818.321
BMI-for-age percentile.830.000.738.9229.35.909.358
Table 4. AN logistic regression results
 SensitivitySpecificityTotal Percent Correct
  1. Notes: Model 1: stringent EDE diagnostic thresholds; model 2: relaxed EDE diagnostic thresholds; model 3: clinical interview.

Model 1   
 Step 1 (Criterion A).50.9291.6
 Step 2 (Criterion B).50.9291.6
 Step 3 (Criterion C).50.9291.6
 Step 4 (Criterion D).50.9291.6
Model 2   
 Step 1 (Criterion A).50.9291.2
 Step 2 (Criterion B).50.9291.2
 Step 3 (Criterion C).50.9291.2
 Step 4 (Criterion D).50.9291.2
Model 3   
 Step 1 (Criterion A).50.9291.6
 Step 2 (Criterion B).50.9291.6
 Step 3 (Criterion C).00.9291.6
 Step 4 (Criterion C).00.9291.6

Sensitivity and Specificity Analyses: BN

Figure4 and Table5 display the ROC curve results for the frequency of binge eating (OBEs alone, SBEs alone, or OBEs plus SBEs) and inappropriate compensatory mechanism episodes for BN. None of the different measures had a significant AUC indicating that they were no better than chance at predicting remission at the end of treatment. Results from the logistic regression analyses for BN (Table6) showed that for each of the three models, none of the indicators were particularly robust. However, twice per week OBEs did perform better than once week OBEs or the combination of OBEs and SBEs. Criteria B-D provided no significant incremental value. Analyses with full remission only as the outcome variable showed an identical pattern of results, which are available on request.

Figure 4.

ROC curve for bulimia nervosa Criteria A-C, specifically frequency of inappropriate compensatory behaviors (ICB), frequency of objective bulimic episodes (OBE), frequency of subjective bulimic episodes (SBE), and frequency of OBE + SBE. Diagonal segments are produced by ties.

Table 5. BN ROC curve results
 Area Under the Curve    Coordinates of the Curve
CriterionAreaSig95% CI Lower Bound95% CI Upper BoundPositive if ≤SensitivitySpecificity
  1. Notes: ICB = inappropriate compensatory behaviors; OBE = objective bulimic episodes; SBE = subjective bulimic episodes.

ICB frequency.626.069.497.75581.50.935.244
OBE frequency.616.093.487.74533.50.935.317
SBE frequency.534.625.399.66916.00.839.317
OBE + SBE frequency.622.077.494.75139.50.935.634
Table 6. BN logistic regression results
 SensitivitySpecificityTotal Percent Correct
  1. Notes: Model 1: stringent criteria; model 2: moderately flexible criteria; model 3: broadly flexible criteria.

Model 1   
 Step 1 (Criteria A/C).68.5662.3
 Step 2 (Criteria B/C).74.5966.2
 Step 3 (Criterion D).74.5966.2
Model 2   
 Step 1 (Criteria A/C).59.5057.1
 Step 2 (Criteria B/C).60.7161.0
 Step 3 (Criterion D).62.7863.6
Model 3   
 Step 1 (Criteria A/C).59.0058.8
 Step 2 (Criteria B/C).62.7162.5
 Step 3 (Criterion D).62.7162.5

Discussion

Results show a high degree of variability in the baseline diagnostic profiles of eating disorders among a sample of treatment-seeking youth as a function of the information used to inform each DSM-IV criterion. This pattern was markedly more pronounced in AN than in BN, likely reflecting both a greater range of symptom presentation in AN, as well as greater minimization of symptoms on the EDE that would be better captured in the clinical interview with multiple informants and more flexible symptom ascertainment methods. Across AN and BN, the diagnostic variability found in this study is likely to be a clinical phenomenon seen in adults as well, but one that may be exacerbated in children and adolescents given an arguably increased tendency for denial of illness,46 a more limited cognitive capacity to directly endorse abstract psychological symptoms20 such as over-valuation of shape and weight, greater difficulty establishing norms regarding weight and menstrual status against which to determine psychopathology10 and the nature of symptom progression (e.g., increasing size and frequency of binge eating episodes, decreasing weight) at stages of development during which eating disorders tend to be in evolution.26 Given that adolescence represents a high risk period of onset for AN and BN, finding methods to increase early identification and treatment during this age range is key. Researchers and clinicians have suggested both broadening the criteria for eating disorders among children and adolescents, thereby lowering the threshold for diagnosis, as well as increasing allowances for atypical symptoms manifestations in youth, which would presumably improve the developmental sensitivity of the DSM criteria without necessarily altering the threshold for diagnosis.3 In other words, the criteria themselves could be modified for youth (e.g., by lowering the binge-purge frequency criterion for BN), or the criteria could remain intact while permitting age-specific manifestations, e.g., in the form of behavioral indicators. Results of this study support aspects of each approach, with methodology that speaks to the specific mission of DSM-IV as outlined in Walsh.2

Based on data from the sample in the current study, for AN, the newer reference points for normal weight, which are derived from current, population-based, and developmentally relevant data, prove the most useful in calculating current weight as a function of ideal weight. In addition, a similar threshold to the suggested cutoff in DSM-IV for normal weight (85% IBW) appears to maximize predictive validity, and, in turn, clinical utility. While, consistent with our hypothesis, the additional criteria for AN do not seem to add incremental value, this does not imply that weight status alone can be used to determine a diagnosis of AN in adolescence; underweight or excessive weight loss was examined in the context of a sample referred for clinic- or research-based treatment, refusing to maintain a normal weight (per DSM-IV Criterion A wording), and meeting all psychological criteria for AN when a multiple-informant method used that also permitted behavioral indicators of Criteria B and C. This converges with previous findings of low positive predictive values of the individual criteria for AN for a concurrent diagnosis,47 as well as growing evidence of the value of multiple informants in eating disorder diagnosis.22, 23 Therefore, other than Criterion D (amenorrhea), results of this study support current DSM-IV criteria for AN but strongly suggest a broader methodological scope in ascertaining the psychological criteria, similar to allowances in diagnosing anxiety disorders in younger individuals as outlined in DSM-IV.

For BN, in contrast to AN, criteria were no better than chance in predicting outcome, regardless of how criteria or outcome was defined. There was a suggestion that consistent with current DSM-IV BN criteria, objectively large binge eating episode at a twice per week frequency may be superior to the other permutations. The difficulty in ascertaining predictive validity may be a function of both the type of symptoms characteristic of BN (i.e., primarily frequency-based criteria) and the population in this particular sample (adolescents). Specifically, the binge eating and inappropriate compensatory mechanism frequencies measured in BN can be unstable over time, not only as a function of response to treatment.48, 49 The instability of such frequency-based criteria may be more pronounced in younger patients for whom the disorder and behaviors therein may be either transient or in evolution, suggesting that adults with BN-spectrum presentations would be a better sample with which to assess predictive validity. Moreover, the EDNOS subset of the BN spectrum sample included adolescents who were purging but not engaging in objectively large binge eating at baseline. In AN, weight is a better measure to provide reliable change information in that even in the context of adolescence, treatment, or both, it is a relatively stable variable. Contrary to our hypothesis, purging frequency was not a robust predictor of outcome in BN. This underlines the continued controversy and uncertainty regarding which features of BN represent the true psychopathology (i.e., binge size, loss of control, or inappropriate compensatory behaviors). However, the nature of the BN sample, which had a mean age of 16.1 (SD = 1.6), must be taken into consideration when interpreting the results of this study, since these patients had, on average, not passed through the high-risk period for the development of full BN, namely late adolescence to early adulthood.

Overall, results of this study suggest that for AN, the existing Criterion A is appropriate for children and adolescents, provided that clinicians use developmentally sensitive methods for ascertaining normal weight (BMI-for-age reference points). This, in the context of active refusal to maintain a normal weight as well as multiple informants and behavioral indicators of the psychological aspects of AN, is sufficient to predict outcome. For BN, predictive validity could not be established, and it is unclear how the current DSM-IV criteria fare in this regard with younger patients. As noted above, this may be a function of the current sample's developmental stage relative to the high-risk period for the development of full BN. Results hold implications for the development of revised AN and BN criteria for the upcoming DSM-V, which faces the challenge of the optimal disposition of subsets of the broad EDNOS category,5, 28, 50 including the significant representation of clinically meaningful child and adolescent eating disorder presentations. Specific justifications and recommendations, consistent with findings from this study, for DSM-V criteria revisions with greater sensitivity to how eating disorder symptoms manifest across early developmental stages have recently been proposed.51 Limitations of this study include its primarily research-based treatment-seeking sample, which may compromise generalizability. In addition, for AN, the Morgan Russell outcome criteria did not take shape and weight concerns into account beyond weight and menses, possibly explaining the predictive significance of AN Criterion A in this study. Also, the high treatment response rates may have limited the power to fully evaluate the predictive validity of the diagnostic criteria and similar studies are warranted in community based delivery of treatment for adolescents with eating disorders. Finally, it is important for studies to assess the longer-term predictive validity of eating disorder diagnostic criteria in youth as these patients pass from adolescence into adulthood, since the natural course of illness includes diagnostic crossover52 from subthreshold to threshold presentations and importantly from AN to BN over time and across high risk periods of development.

The authors acknowledge Judy Beenhackker, Sarah Forsberg, and Kristen Hewell for their generous help with data acquisition and management.

Ancillary