Studies of the reliability and validity of the Addiction Severity Index

Authors

  • Klaus Mäkelä

    Corresponding author
    1. Finnish Foundation for Alcohol Studies, Finland
      Klaus Mäkelä Laivurinkatu 43 A 21 00150 Helsinki 15 Finland
      E-mail: klaus.makela@helsinki.fi
    Search for more papers by this author

Klaus Mäkelä Laivurinkatu 43 A 21 00150 Helsinki 15 Finland
E-mail: klaus.makela@helsinki.fi

ABSTRACT

Aims  To examine the reliability and validity of the widely used Addiction Severity Index (ASI).

Material  Thirty-seven studies of the psychometric performance of the ASI.

Findings  The inter-rater and test–retest reliabilities of the severity ratings and composites scores vary from excellent to unsatisfactory. High internal consistencies have been reported regularly for only three of the seven composite scores (medical status, alcohol use, psychiatric status). The remaining four composite scores (employment status, drug use, legal status, family/social relations) have low consistencies in at least four different studies. Coefficients of criterion validity are often low.

Conclusions  There is a discrepancy between the psychometric performance of the ASI and its purported clinical, administrative and research uses.

INTRODUCTION

The Addiction Severity Index (ASI) was developed some 20 years ago, primarily for research purposes, by A. Thomas McLellan and his associates at the Center for Studies of Addiction in Philadelphia (McLellan et al. 1992). In the United States it has been used widely in clinical settings and in treatment research. The ASI has also been translated into a number of languages and adapted for use in various national circumstances (Hendriks et al. 1989; Scheurich et al. 2000; CUS 2002). Under the Cost-A-6 Programme of the European Commission, a European version of the instrument (EuropASI) was developed in the early 1990s (Kokkevi & Hartgers 1995).

The ASI is a 45–60 minute semistructured interview administered by a trained clinician or interviewer, who must spend another 10–20 minutes scoring it. It elicits the respondent's self-reported problems in seven areas: physical health, employment and financial support, illegal or criminal activity, family and social relationships, psychiatric symptoms, and drug and alcohol use. Both current (i.e. ‘last 30 days’) and life-time experience of symptoms and treatment are assessed in each problem area. Subjective measures of symptom distress and desire for additional treatment are elicited. The clinical aim is to identify treatment needs in each functional domain (Grissom & Bragg 1991; Corse et al. 1995; Butler et al. 2001).

One of the important new features of the ASI was that it broadened the perspective of treatment and treatment research to cover alcohol- and drug-related problems in various areas of life in addition to narrowly defined patterns of drinking and drug-taking. Another important idea was to combine objective and subjective measures in treatment planning.

In the literature, the psychometric properties of the ASI are usually summarized in very positive and categorical terms. The following examples are from reports from the group in Philadelphia.

‘. . . the ASI has been shown to be reliable and valid among substance abusers applying for treatment . . .’ (McLellan et al. 1992).

The ASI:

‘has been shown to be reliable and valid, when correctly administered, in a wide variety of clinical populations and treatment settings’ (Carise et al. 2001).

Independent authors often use even less qualified formulations:

‘The ASI has been found to be reliable and valid across clients of varying demographic features and problems’ (Wertz et al. 1995).

‘Reliability and validity studies were conducted on substance-abusing populations with very satisfactory results’ (Kokkevi & Hartgers 1995).

This paper presents a review of the psychometric properties of the ASI. The review is based on the following sources:

  • 1Records found in a search of PubMed at the United States National Library of Medicine using the terms ‘addiction severity index’and (validity or reliability or sensitivity or specificity). If the abstract included information on the psychometric performance of the ASI, the full report was studied. Twenty-nine relevant studies were identified.
  • 2Five additional reports included in the list of publications at the home page of the Treatment Research Institute in Philadelphia (Treatment Research Institute 2002).
  • 3Three additional publications referred to in the two previous groups of reports and presenting data on the properties of the ASI.

Altogether, 37 studies reporting empirical data on the psychometric performance of the ASI are included in the review.

ORIGINAL ASI SUMMARY INDICES

The designers of the ASI developed two summary indices for each problem area—interviewer severity ratings (ISRs) and composite scores (CSs). In the ISRs, interviewers follow a complicated procedure to establish a rating of the severity of ‘need for additional treatment’ in each area. A first estimate is made on the basis of quantitative information provided by the respondent. The critical items reflect, to a large extent, life-time problems. The manual instructs the interviewer to determine a 2–3-point severity range along a 10-point scale. Next, the interviewer has to refine this estimate using the patient's subjective ratings of the current importance of the problem and the need for treatment (Alterman et al. 1994).

The composite scores are summary scores of a defined set of items in each area. The composite scores only include items that are subject to change (occurrence in past 30 days or during the follow-up period).

The number of items included in each composite score varies from three (medical status) to 13 (family/social relationships). With the exception of the employment area, all composite indices include two items measuring how ‘troubled or bothered’ the respondent has been by problems in each area in the past 30 days and how important it is for the client to receive treatment, counselling or referral for problems.

In EuropASI, modifications have been made in the method of calculating the composite scores, but they still include both purposefully subjective and purportedly objective questions (Koeter & Hartgers 1997; Jansson 2001). In the Swedish version, the calculation of composite scores follows the American model (Andréasson et al. 1999; Jansson 2001).

STANDARDS OF RELIABILITY

A satisfactory level of reliability depends on how a measure is used. According to Nunnally & Bernstein (1994, p. 265), ‘increasing reliabilities much beyond 0.80 in basic research is often wasteful of time and money’, but ‘a reliability of 0.80 may not be nearly high enough in making decisions about individuals’.

An additional complication is that different authors use different coefficients to describe various aspects of the reliability of the ASI. There are no clear criteria for which threshold values of various indicators of reliability correspond to each other.

Any threshold values are bound to be arbitrary. In this review, values below 0.70 of test–retest correlation coefficients and of Cronbach's alpha and values below 0.60 of intraclass correlation coefficients are regarded as problematic (cf. Landis & Koch 1977; Dunn 1989; 37; Bravo & Potvin 1991).

INTER-RATER RELIABILITY

Inter-rater reliabilities have been presented both for the severity ratings and the composite scores. The technical details vary from one study to the next. Sometimes the interviews were observed via a one-way mirror by other interviewers, who completed the ASI form and the severity ratings, sometimes live or role-played videotapes were used. The resulting inter-rater reliabilities are upper estimates, because the effect of the primary interviewer on the responses is held constant.

Initially, high inter-rater reliabilities were obtained for the severity ratings (McLellan et al. 1980). High reliabilities were also obtained in a longitudinal study with very intensive training and monitoring of interviewers (Stöffelmayr et al. 1994). In other studies, inter-rater reliabilities that remain below the threshold values specified in the previous section have been reported in several problem areas (Hodgins & El-Guebaly 1992; Alterman et al. 1994; Wertz et al. 1995; Zanis et al. 1997). There is some variation between studies in the problem areas that obtain unsatisfactory reliabilities, but the employment, drugs, family/social and psychiatric areas seem to be particularly unstable.

The role of subjective judgement is smaller in the composite scores than in the severity ratings. As a consequence, inter-rater reliabilities of the composite scores have been more consistently high.

TEST–RETEST RELIABILITY

The originators of the ASI reported good test–retest reliabilities (McLellan et al. 1985).

In a study of a homeless sample, the Spearman–Brown test–retest coefficient of the employment severity rating was 0.51 and that of the family severity rating 0.60 (Zanis et al. 1994).

In a study at nine sites of homeless people with substance abuse problems, subjects were reinterviewed approximately 1 week after their initial interview (Drake et al. 1995). Just over half were reinterviewed by the same interviewer. As shown by the intraclass correlation coefficients presented in Table 1, some ASI scales showed high variability across sites in test–retest reliability.

Table 1.  Test–retest reliabilities of ASI composite scores (intraclass correlation coefficients).
 Whole sampleSite variation
Lowest valueHighest value
  1. Source: Drake et al. (1995).

Medical status0.640.260.84
Employment status0.820.450.94
Alcohol use0.860.530.97
Drug use0.830.490.93
Legal status0.780.080.92
Family/social relations0.640.030.86
Psychiatric status0.710.380.85

In a study of people with co-occurring severe mental illness and substance use disorders, the test–retest reliability of the composite scores was described as ‘good’ in the domains of family, alcohol, employment and psychiatric status, but the medical, drug and legal domains had ‘poor’ reliability (Corse et al. 1995).

In a study of alcohol-dependent patients in French-speaking Switzerland, the 10-day test–retest reliability of the composite scores varied between 0.71 and 0.95 (Daeppen et al. 1996).

In a study of the 6-day test–retest reliability of selected ASI items in a sample of detoxification clients ‘approximately 24% of the items exhibited unacceptable levels of instability over time’ (Joyner et al. 1996). Examples include ‘days of medical problems’[intraclass correlation coefficient (ICC) = −0.01], ‘income from mate, family, friends’ (ICC = 0.26), ‘income from panhandling’ (ICC = 0.15), ‘troubled by alcohol problems’ (kappa coefficient = 0.23), ‘troubled by family problems (kappa = −0.12), ‘experienced serious anxiety or tensions’ (kappa = 0.10), ‘experienced serious depression’ (kappa = −0.28).

In a study of clients with severe mental illness, three composite scores had Pearson test–retest coefficients below 0.40: medical, drug use and legal (Zanis et al. 1997).

In a study of prison inmates, the test–retest reliabilities of the ASI composite scores for alcohol and drug use were close to 0.80, but somewhat lower than those of the Alcohol Dependence Scale (ADS), the Drug Abuse Screening Test (DAST) and the Michigan Alcoholism Screening Test (MAST) (Peters et al. 2000).

As shown by the figures presented in this section, the short-term test–retest reliabilities reported in the literature vary from excellent to unsatisfactory. Most of the problematic values come from studies of such special populations as mental patients or homeless clients.

INTERNAL CONSISTENCY

As seen in Table 2, regularly high internal consistencies have been reported for only three of the seven composite scores (medical status, alcohol use, psychiatric status). The remaining four composite scores (employment status, drug use, legal status, family/social relations) have low consistencies in at least four different studies.

Table 2.  Internal consistencies (Cronbach's alpha) of ASI composite scores.*
 ABCDEFGHIJKL
  • *

    Coefficients below 0.70 are shown in bold type.

  • A Patients admitted to a clinical detoxification centre (Hendriks et al. 1989).

  • B Patients at an out-patient dual diagnosis clinic (Hodgins & El-Guebaly 1992).

  • C Methadone maintenance patients (Alterman et al. 1994).

  • D Homeless substance users awaiting temporary housing placement (Zanis et al. 1994).

  • E Alcohol-dependent patients admitted to a Dutch centre for the treatment of drug and alcohol addicts (DeJong et al. 1995).

  • F Alcohol-dependent patients of four treatment institutions in French-speaking Switzerland (Daeppen et al. 1996).

  • G Patients admitted to a public mental hospital in the United States (Appleby et al. 1997).

  • H Women enrolled in or applying for substance abuse treatment (Comfort et al. 1999).

  • I Veterans entering substance abuse treatment in a United States Department of Veteran Affairs medical centre (Rosen et al. 2000).

  • J Male in-patients requesting treatment for alcohol dependence or alcohol abuse (Scheurich et al. 2000). In addition to the original American composite scores, Scheurich and his associates calculated modified summary indices with the following internal consistencies: income, 0.92; satisfaction with work situation, 0.88; drug use I, 0.76; drug use II, 0.77; family relations, 0.77; other social relations, 0.72.

  • K Patients at inner-city alcohol and drug abuse clinics (Leonhard et al. 2000).

  • L Methadone maintenance patients entering treatment (Bovasso et al. 2001).

Medical status0.810.880.770.930.730.760.890.910.860.860.890.93
Employment status0.580.690.630.500.680.640.700.870.710.650.69
Alcohol use0.920.740.870.870.460.840.870.750.870.740.840.91
Drug use0.730.580.620.700.790.770.770.640.690.71
Legal status0.710.480.660.810.700.640.750.620.690.650.74
Family/social relations0.730.640.720.520.780.520.750.520.720.540.740.71
Psychiatric status0.790.740.870.890.800.810.830.810.830.850.840.77

There are 22 coefficients below 0.70 in Table 2. Eight come from European studies, five from American studies of homeless or dual diagnosis populations and nine from American studies of other populations.

The value of Cronbach's alpha increases directly with the number of items in the scale. A relatively high internal consistency, therefore, does not prove that the scale is highly homogeneous, as can be seen from the examples presented in Table 3. The inter-item correlations of the drug scale are particularly low.

Table 3.  Number of items, Cronbach's alpha and mean interitem correlation of selected ASI composite scores in a study of methadone maintenance patients.
 No. of
items
Cronbach’s
alpha
Mean inter-item
correlation
  1. Source: Alterman et al. (1994).

Employment status 40.630.30
Drug use100.620.11
Legal status 50.660.28
Family/social relations130.730.35

INDEPENDENCE OF PROBLEM AREAS

The designers of the ASI put much emphasis on the independence of the seven problem areas (McLellan et al. 1980). In line with this, considerable attention has been paid to the discriminant validity of the summary measures.

A test exhibits discriminant validity if it is highly correlated with a conceptually related standard measure and at the same time has low correlations with conceptually unrelated standard measures. The ASI studies are, however, restricted to a discussion of the intercorrelations of ASI problem areas. The intercorrelations are usually low, but there are exceptions. The drug measures and the legal measures are sometimes moderately correlated (Appleby et al. 1997), and the same is true for the psychiatric and family/social measures (Bilal 1988; Hendriks et al. 1989; Zanis et al. 1994; Appleby et al. 1997)

Whether any two variables, for example drug use and criminal activities, are positively related obviously depends on historical circumstances. In many situations it is a desirable feature if the variables in a test battery are mutually independent, but this is usually not an important requirement, and correlations between subscales are not signs of inadequate validity. On the other hand, low correlations between subscales do not necessarily indicate discriminant validity but may reflect low reliability.

CRITERION VALIDITY

Criterion validity refers to the extent to which the measurement correlates with an external criterion of the phenomenon under study. Concurrent validity refers to situations where the measurement and the criterion refer to the same point in time. The measurement's predictive validity is expressed in terms of its ability to predict the criterion. (Last 1995, p. 171)

Correlations between severity ratings and composite scores

The correlations between severity ratings and composite scores for the same problem area present one potential indicator of the concurrent validity of ASI measures (Table 4). Many of the correlations are on the low side, keeping in mind that each pair of measures covers the same problem area and is based in part on the same questions. Of particular interest are the two last columns of Table 4 comparing the association of severity ratings and composite scores in groups of interviewers with varying amounts of training (Alterman et al. 2001b). The correlations were somewhat higher in the group that had received intense training, but even in this group the association was weak in the legal and drug areas and particularly in the employment/support area.

Table 4.  Correlations between ASI severity ratings and the corresponding composite scores.*
 ABCDEFGHI
Medical status0.750.710.840.740.880.750.77
Employment status0.110.080.240.720.130.490.150.16
Alcohol use0.680.620.780.670.030.790.470.680.77
Drug use0.650.660.530.710.780.160.44
Legal status0.050.690.760.810.820.640.64
Family/social relations0.480.700.530.210.700.540.71
Psychiatric status0.460.720.900.640.560.600.85

There are 15 coefficients below 0.50 in Table 4. Four come from European studies, six from American studies of homeless or dual diagnosis populations and five from American studies of other populations.

In conclusion, the correlations of the two summary measures are often low in several problem areas. Intense training of the interviewers produces higher correlations, but the associations remain alarmingly low in the areas of employment/support, drugs and legal status.

Associations between ASI measures and external criterion variables

Criterion validity exists in degrees. How close the association should be in order to be considered sufficient depends on the nature of the criterion and on what kinds of decisions will be based on the measurement under study. In parts of the ASI literature, any statistically significant associations between ASI measures and criterion variables have been presented as evidence of the validity of the ASI.

Table 5 presents the correlations of severity ratings with criterion variables in an early study of opiate addicts seeking treatment (Kosten et al. 1983).

Table 5.  Correlations of ASI severity ratings with criterion variables in a study of opiate addicts seeking treatment.
Severity ratingCriterionCorrelation
  • *

    In the early 1980s, alcohol and other drugs were still given a joint rating.

  • Source: Kosten et al. (1983).

Employment/supportSocial Adjustment Scale (work factor)0.39
Alcohol and drugs*Opiates in past 30 days0.11
Years of regular opiate use0.17
MAST−0.05
LegalDays spent gaining illegal profit0.43
Family/socialSocial Adjustment Scale (mean of all factors)0.46
PsychiatricBeck Depression Inventory0.51
Global Assessment Scale−0.42
Maudsley neuroticism0.55

The concurrent validity of three composite scores was studied in a sample of homeless substance users. The correlation of the alcohol score with MAST was 0.31, the correlation of the drug score with the Risk for AIDS Behavior score was 0.54 and the correlation of the psychiatric score with the Symptom Checklist-90 was 0.66 (Zanis et al. 1994).

In a study of alcoholics admitted to a Dutch centre for addiction treatment, the correlation between the psychiatric severity rating and the Symptom Checklist-90 was 0.33 (DeJong et al. 1995).

In a study of patients at two inner-city psychiatric units,  the  point–biserial  correlation  between  the  presence of any current psychiatric diagnosis [determined according to the Structured Clinical Interview for the Diagnostic and Statistical Manual, 3rd edn, revised (DSM-III-R)] and the ASI psychiatric composite score was 0.22 (Dixon et al. 1996).

Table 6 presents the correlations of ASI composite scores with criterion variables among patients admitted to a public mental hospital in the United States (Appleby et al. 1997).

Table 6.  Correlations of ASI composite scores with criterion variables in a study of patients admitted to a public mental hospital in the United States.
Composite scoreCriterionCorrelation
  1. Source: Appleby et al. (1997).

AlcoholCAGE0.50
Short Michigan Alcoholism Screening Test0.59
Clinical Use, Abuse and Dependence Scale0.72
DrugsCAGEAID0.64
Drug Abuse Screening Test0.73
Clinical Use, Abuse and Dependence Scale0.70

In the validation study of the German version of the ASI the correlation between the alcohol use composite score and MAST was as low as 0.34. In the same study, patients with an alcohol dependence diagnosis received somewhat higher ASI alcohol ratings and composite scores than those with no such diagnosis (Scheurich et al. 2000).

Further indications of the instability of the ASI are provided by a study comparing the performance of a CD-ROM-based self-administered, interactive multimedia simulation of the ASI and the interviewer-administered ASI (Butler et al. 2001). The correlations of the severity ratings and composite scores based on computerized calculations on the data elicited by the simulation with external criterion variables were, with very few exceptions, higher than the corresponding correlations of interviewer-based indices. The difference in favour of the self-administered and computerized version was often substantial.

The figures presented above show that the correlations between ASI summary measures and outside criterion variables are by no means uniformly high.

Moreover, in so far as the ASI is intended to provide a basis for clinical decisions, correlational studies are not enough. In addition, we would need data on the sensitivity and specificity of the summary measures. A measure lacking sensitivity will fail to identify individuals who have a problem. A measure lacking specificity will fail to identify individuals who do not have a problem.

A Costa Rican version of the ASI was used to distinguish between male alcoholic patients at Costa Rican addiction treatment facilities and non-alcoholic controls (Sandí Esquivel & Avila Corrales 1990). A value of 4 or more on the alcohol severity rating had a sensitivity of 98% and a specificity of 100%.

In a study of women receiving out-patient treatment for intravenous drug dependence, ASI alcohol questions had high sensitivity (96%) and specificity (94%) in identifying alcohol abuse or dependence as measured by the Structured Clinical Interview for DSM-III-R (Svikis et al. 1996).

In a study of psychiatric in-patients, even very low ASI alcohol and drug composite scores were highly specific for a DSM-III-R substance use disorder diagnosis (Lehman et al. 1996). However, the authors argue that the sensitivity of the test scores was unsatisfactory. The authors conclude that ‘because a major concern in clinical practice is the underdiagnosis of substance use disorders among psychiatric inpatients, a self-report test with limited sensitivity and high specificity such as ASI cannot be used alone’.

In another study of psychiatric in-patients, a value of 1 or more on the ASI alcohol severity rating had a sensitivity of 93% with respect to current alcohol abuse as measured by the Structured Clinical Interview for DSM-III-R (Appleby et al. 1997). The corresponding specificity was 59%. A value of 1 or more on the drug severity rating had a sensitivity of 93% and a specificity of 55% with respect to current drug abuse.

A study of opiate addicts analysed the performance of the ASI identifying depression diagnosed by research diagnostic criteria (RDC) (Kosten et al. 1983). With optimal cut-offs, the psychiatric rating had a sensitivity of 89% and a specificity of 67%. This compared favourably with the Beck Depression Inventory (BDI) sensitivity of 83% and specificity of 55%. The ASI also compared favourably with the Global Assessment Scale (GAS) as an instrument for assessing more global psychopathology in addicts.

Less impressive results were obtained in a Dutch evaluation of the ability of the psychiatric severity rating to detect current DSM-III major depressive disorder. The sensitivity was 86% and the specificity 42%. A higher cut-off score increased the specificity to 56%, but decreased the sensitivity to 72% (Hendriks et al. 1989).

In another Dutch study of opiate addicts, a value of 0.01 or more on the ASI psychiatric composite score was used to predict various DSM-III-R diagnoses as measured by the Composite International Diagnostic Interview (CIDI) (Eland-Goossensen et al. 1997). Even this low cutting-point missed a substantial number of cases with psychiatric disorders and at the same time produced a large proportion of false positives (Table 7).

Table 7.  Sensitivity and specificity values of the ASI psychiatric composite score in predicting DSM-III-R diagnoses in a study of opiate addicts.
Criterion diagnosisSensitivitySpecificity
  1. Source: Eland-Goossensen et al. (1997).

Affective disorders8740
Anxiety disorders8638
Schizophrenic disorders8736
Antisocial personality disorders8043

In a study of inpatient drug addicts in The Hague (Franken & Hendriks 2001), a value of 0.01 or more on the ASI psychiatric composite score had a sensitivity of 91.3% and a specificity of 24.1% with respect to anxiety and mood disorders according to DSM-III-R criteria, as measured by the CIDI.

A study of methadone maintenance patients assessed at treatment entry and followed for 2 years tested the predictive validity of six of the seven composite scores (Bovasso et al. 2001). Because of the lack of a suitable outcome measure, the family/social area was not included in the analysis. Logistic regression was used to estimate the sensitivity and specificity of the composite scores at intake in predicting dichotomous indicators of follow-up outcomes. Except for the medical score, each of the other composite scores significantly predicted its validity criterion measure. The sensitivity and specificity values are given in Table 8.

Table 8.  Sensitivity and specificity values of ASI composite scores at intake in predicting criterion variables during follow-up in a study of methadone maintenance patients.
Composite scoreCriterion variableSensitivitySpecificity
  1. Source: Bovasso et al. (2001).

Medical statusHospitalization for a medical problem 3–24 months after study entry 0100
Employment/support statusAbsence of any full-time employment 3–24 months after study entry76 30
AlcoholAny self-reported alcohol intoxication 3–24 months after study entry46 91
DrugsPositive tests on 50% or more of screenings for cocaine and opiates  during first 7 months of treatment52 79
Legal statusOccurrence of criminal charges during the follow-up period 9 96
Psychiatric statusHospitalization for a psychiatric problem 3–24 months after study entry 0100

CLINICAL INDICES AND EVALUATION INDICES

Using more modern and rigorous psychometric measures, Paul McDermott and colleagues extracted seven clinical indices (CIs) based on items measuring both life-time and recent problems and corresponding to the original problem areas of the ASI. Of the 155 problem items in the questionnaire, 26 were eliminated on conceptual or formal grounds. Of the remaining 129 items, 83 survived the item analysis. Nine items measuring recent use of different drugs were combined to form a composite variable. Altogether, 75 items thus emerged from the item analysis. The clinical indices were extracted in a study of methadone maintenance patients (McDermott et al. 1996), but were found later to be generalizable to samples of primarily alcohol-dependent, primarily cocaine-dependent and polydrug-dependent patients (Alterman et al. 2000). The internal consistencies of the indices were generally good or acceptable.

In an effort to assess recent problems and change more efficiently, a new set of evaluation indices (EIs) were derived, again using rigorous psychometric methods. The evaluation indices measure five of the seven problems domains based on items covering the past 30 days. Sufficiently robust indices of recent medical and employment problems could not be derived (Alterman et al. 1998).

Alterman and his collaborators compared the predictive validity of the original and the more recent summary indices of the ASI by putting each of the old and new measures into a hierarchic logistic regression and noting whether the added term brought a significant addition to the explained variance of the dependent variable (Alterman et al. 2001a). Outcomes were medical hospitalization, employment, alcohol intoxication, drug hospitalization, psychiatric hospitalization and criminal charges during follow-up. The clinical indices were superior to the other indices in predicting three of six outcomes (psychiatric hospitalization, drug hospitalization and criminal charges). The evaluation index was the best predictor of alcohol intoxication, and the composite score the best predictor of unemployment.

ASI COMPOSITE SCORES AND THE MEASUREMENT OF TREATMENT EFFECTS

There are reasons to believe that ASI composite scores are particularly inadequate as instruments for the measurement of treatment effects. Usually, no data are presented on which particular items produce the treatment effect. However, in one study (Wertz et al. 1995) an examination of alcohol and drug composite scores showed that practically all changes between measures at intake and at 6-month follow-up were due to changes in the client's subjective evaluation of the severity of his problems and his need for additional treatment. The authors conclude: ‘By combining more subjective information with more objective information, it becomes difficult to identify exactly what changes with treatment.’

At intake, most clients are motivated to agree that they need treatment. Six months later they are much less likely to report that they need additional (institutional) treatment.

An additional complication is that depending on the ideology of the treatment programme and on how well the patient has internalized its tenets, people with no current alcohol or drug use may assess differently their need for continuous treatment. Many sober and clean members of AA and NA, for example, may feel that they have been ‘troubled or bothered in the past 30 days’ by alcohol and drug problems or that they need continuous treatment.

This is an important issue. That a measure is cross-sectionally reliable does not mean that it is a reliable measure of change. In the case of the ASI, it is not only possible but likely that measures of objective change and items measuring the subjective need for further treatment felt (or rather expressed) by the client do not correlate.

It should be pointed out that the relationship of the patient's subjective assessments to the more objective items varies from one area to the next. The relationship of ‘conflicts with mother at any time’ to the client's rating of his or her need for family treatment probably is different compared to ‘days intoxicated during past 30 days’ in relation to the rating of need for alcohol treatment, which again is different compared to ‘number of arrests for robbery’ in relation to client's rating of the need for legal counseling.

It may be useful to make a clearer distinction between questions that aim at providing comparable information on the present status of all clients and questions that chart the concrete life situation of each client as a basis for treatment and counselling. It is a positive feature that the ASI covers life problems that might remain hidden in routine substance abuse treatment, but many of the questions are somewhat vague from the perspective of individual treatment planning and counseling. The client's subjective ratings of the current importance of each problem area and of the need for treatment could perhaps be substituted by open-ended questions. For example, instead of asking ‘how important to you now is treatment or counselling for these social problems?’ one might ask ‘what kind of assistance do you need to solve your present problems in relation to your family, friends, colleagues and neighbours?’. Such open-ended questions would not be included in summary measures, but would provide clinical background information.

CONTENT AND METRIC PROPERTIES OF INDIVIDUAL ITEMS

In view of the less than ideal performance of the summary indices in many contexts and particularly the low intercorrelations of questions belonging to the same problem area, there is a need for a detailed inspection of the content and metric characteristics of each of the items. In this context, however, it is enough to illustrate the types of problems with selected examples drawn from a field study (Corse et al. 1995) and a discussion of the metric and other properties of ASI items (Jansson 2001).

Some oddities in the questionnaire are amended easily. In the American questionnaire, two of the key items in the employment section are about having access to a car, which is not a universally important condition for obtaining a job in Europe. In the legal section, questions are asked about charges for prostitution and contempt of court that are not criminal offences in most European countries. It is easy to drop questions such as these (although that, of course, affects the technical comparability of national summary indices), but there are many less obvious problems in the questionnaire.

In a study of people with substance use disorders and concurrent severe mental illness, experiential data on the ASI were collected from interviewers and patients (Corse et al. 1995). Debriefing interviews were conducted immediately after the completion of the ASI with 10 respondents. The reactions and observations of the interviewers were also recorded. All these records were put under a systematic qualitative analysis.

In addition to issues related to the special nature of the sample, the study also identified problems that are relevant to broad segments of the alcohol and drug clientele. For example, the questions are orientated towards people with the potential for full-time employment and responsibility for their financial affairs. In a similar fashion, the focus is on conflicts within active relationships as opposed to problems with isolation or estrangement. Moreover, ASI questions may be useful in classifying the type of the client's present living arrangements, but little relevant information is captured as to the instability of housing arrangements.

The problematic aspects noted by Ingegerd Jansson (2001) in her careful critique of the calculation of the composite scores include the following features, some of which have strong effects on the final score:

  • • Ordinal scales are treated as interval or ratio scales, and complicated calculations are performed on arbitrary numerical values.
  • • The guidelines provide no instructions for avoiding logarithms of 0 and division by 0.
  • • Dichotomous items unintentionally receive more weight than questions with multiple or continuous response alternatives.
  • • In some of the indices, items that are not relevant for important groups of respondents have a large impact on the final score.

CONCLUSIONS ABOUT THE PSYCHOMETRIC PROPERTIES OF THE ASI

Drinking and drug-taking exhibit exceptionally large short-term and long-term fluctuations. This intra-individual variability makes it difficult to obtain reliable individual measures. A further complication is that a number of clients may have difficulties in remembering the relevant information. In addition, the data are often collected under circumstances where it may be in the respondent's interest either to exaggerate or to understate their drug-related problems. Keeping these difficulties in mind, we cannot expect of any drinking and drug-taking measures the same level of psychometric performance as of traditional psychological tests.

Moreover, the range of variation among clinical populations is extremely wide. It is a formidable task to design an instrument that is equally relevant to the economic, employment and housing situation of married and employed middle-class alcoholics as well as of homeless opiate users who for many years have had no full-time employment.

In the light of the evidence available, however, it is not helpful to speak of the reliability and validity of the ASI as an entity or to use such summary statements as ‘the validity and reliability of this instrument is well established in numerous studies’ (Scottish Addiction Studies 2002) or ‘the ASI interview has been scientifically tested in groups of substance abusers’ (CUS 2002). Instead, we need a discussion of what measures based on what parts of the ASI can be used in what populations and for what purposes.

It is possible that the performance of the ASI is worse in Europe than in the United States, but measurement problems are visible in a number of American studies as well. Another possibility is that the use of the ASI is particularly problematic in homeless or double-diagnosis populations. This may be the case, but indications of less than ideal performance are not restricted to these special groups.

According to the ASI manual, the interviewer should first determine a 2–3-point range for each severity rating along a 10-point scale. Only in the next step should the patient's subjective assessment of the problem and the need for treatment be used for a refinement of the rating (Alterman et al. 1994). Already in the original studies of the psychometric properties of the ASI, however, the patients’ subjective reports of their problems were found to play a prominent role in the interviewers’ estimates of severity (McLellan et al. 1980). In a study of patients at a dual-diagnosis clinic, stepwise multiple regression analyses were performed to obtain an indication of which items the raters were using to derive the severity ratings. The first variable to enter was the subject's self-rating of either ‘treatment need’ or ‘problems seriousness’ (Hodgins & El-Guebaly 1992). Similar results have been obtained in studies of methadone maintenance patients, alcoholics and mental hospital patients (Alterman et al. 1994; DeJong et al. 1995; Appleby et al. 1997). The discrepancy between the instructions and the rating practice may contribute to the instability of the severity ratings.

Paradoxically, the complexity (and instability) of the rating process may make it more attractive in the eyes of clinical professionals. Not only does it require training that adds to a person's formal merits, but because of the combination of strict guidelines and intuitive judgement its mastery is a clinical skill that separates initiates from novices. The attractions of the severity ratings as an instrument of professionalization do not, of course, compensate for their psychometric instability.

One of the attractions of the severity ratings is that they can be made available immediately after the interview and without time-consuming calculations. In the computerized versions of the ASI, however, the printing of composite scores and other similar summary measures takes even less time.

Many of the composite scores seem to be somewhat more robust, but they are by no means ideal measures and some subscales are clearly inadequate. Moreover, the method of calculating the summary scores contains a number of incongruities (Jansson 2001).

A lack of training of the interviewers is often blamed for the substandard performance of the severity ratings and the composite scores. It is true that with intense training and continuous monitoring, the ASI reached excellent levels of performance in a longitudinal study of patients seeking treatment. However, ‘the level of stability attained in this study required 3 working days per assessor every 2 months’ (Stöffelmayr et al. 1994). Such expensive continuous training is unlikely to be feasible in most research projects, not to mention the day-to-day practice in clinical settings. In focus groups on the ASI with 34 substance abuse counsellors conducted at a meeting of the National Association of Alcoholism and Drug Abuse Counselors, counsellors who used the ASI reported that training at their clinical sites was patchy, at best (Butler et al. 2001). Many reports emphasize the need for intense training: ‘our experience with scores of trainees has shown that even after reading the manual and viewing the videos of an interview, the majority of the participants in training workshops misinterpret at least some aspect of the administration of the ASI’ (Grissom & Bragg 1991). However, why should one adopt an instrument that requires such a disproportionate amount of training and supervision?

The psychometric performance of the more recent summary measures based on the ASI is promising. If the ASI already is in wide use and the data are used for secondary analyses, the new measures are to be preferred. Before the new measures can be used in European countries, however, the original psychometric analyses need to be replicated locally. Keeping in mind that the ASI purports to measure problems in seven areas, it is a serious limitation that sufficiently robust indices of recent medical and employment problems could not be derived. It should also be kept in mind that a considerable number of items remain outside the new indices. As many of these items seem to be of little independent interest, it is not advisable to bring the ASI into use as a package in new countries.

POTENTIAL USES OF THE ASI

Administrative initiatives toward a routine use of the ASI are often supported by bold promises. It is therefore germane to evaluate how realistic and practical are the promises.

According to the literature, the ASI has been used or is meant to be used for a wide variety of clinical, administrative and research purposes (Hendriks et al. 1989; McLellan et al. 1992; Kokkevi & Hartgers 1995; Leonhard et al. 2000; CUS 2002).

Clinical uses

In clinical settings, the ASI has been used or is meant to be used:

  • • to summarize the patient's overall status at treatment admission;
  • • to formulate treatment plans and to match clients to treatment;
  • • to provide a general prognosis for treatment;
  • • to establish a common language among treatment professionals;
  • • to pay attention to aspects of the life situation of the client that otherwise would go unnoticed; and
  • • for follow-up of individual clients

It may be that the most important function of the ASI has been in breaking down the general resistance of treatment professionals against the collection of systematic data. In the Netherlands, the ASI has been a ‘first generation instrument’ that ‘stimulated systematic registration and filing of patient characteristics, which until then was fairly nonexistent’ (Broekman et al.2004).

No studies of the actual clinical use of the ASI appear to be available. The number of options in drug and alcohol treatment is usually limited, and the choice among alternatives is based on the demographic characteristics of the clients and on the length of their career in addiction. Separate studies would be needed to show whether the additional information collected by the ASI in fact helps to make better decisions, and whether other and shorter instruments could serve the same purposes.

The ASI may help to pay attention to aspects of the life situation of the client that otherwise would go unnoticed. In routine substance treatment, the legal or psychiatric problems of the clients may not receive enough attention.

The widespread use of an instrument such as the ASI may help to establish a common language among treatment professionals.

Administrative uses

For purposes of treatment administration, the ASI has been used or is meant to be used:

  • • for collecting statistical information on the treatment clientele;
  • • to plan the activities of treatment organizations;
  • • for national comparisons of groups of clients; and
  • • to evaluate and compare treatment programmes.

The ASI can be used to collect statistical information and for national comparisons of groups of clients. Many, and in some contexts most, treatment contacts are so short, however, that the ASI cannot be administered, and many more are so short that it makes little sense to collect large amounts of information or to make any plans for routine follow-up. In clinical practice the interview will not be carried out with clients who stay for only a couple of days in an in-patient programme or who vanish after one or two visits to an out-patient facility. The proportion of these cases will normally not be recorded but will vary widely from one programme to the next. No intention-to-treat design on short-term patients will be possible, and comparisons between types of treatment are likely to be misleading.

Uses in research

In treatment research, the ASI has been used or is meant to be used:

  • • to compare clients and to identify client subgroups;
  • • to measure treatment outcome;
  • • to facilitate cross-study comparison from different countries; and
  • • for international comparative studies.

Individual ASI items can be used to measure change during treatment follow-up, but the composite scores are dubious measures of treatment effects.

The comparative potential of the ASI should not be overstated. The very nature of the severity ratings renders them unsuitable for cross-national comparisons. The adjustments that have been made in European versions of the ASI (Hendriks et al. 1989; deJong et al. 1995; Kokkevi & Hartgers 1995; Scheurich et al. 2000) already make the different versions technically incomparable. The psychometrically better indices developed by Arthur Alterman and Paul McDermott and their collaborators cannot be used as such in European countries without local item analyses.

In the long term, the diffusion of the ASI can pave the way for international comparative projects as it tends to bring national traditions of measurement closer to each other.

CONCLUDING RECOMMENDATIONS

ASI severity ratings should not be used in research or clinical decision making. ASI composite scores are useful but not ideal summary measures. Many individual ASI items can be used to describe clinical populations and to measure change.

In new settings and countries, the ASI should not be taken into use as a standard instrument and a ready-made package. As a pioneer instrument, the ASI provides a rich and extremely useful repository of experience. However, the way forward lies not in making small adjustments to the present ASI but in using the evidence produced by ASI and other similar measures to design new and better instruments item by item.

The life situation of drinkers and drug users varies so widely that perhaps no single instrument can cover the full range adequately. It is true that there is a huge variation among users of the same substance as well. It is also true that polysubstance use has become more common, but the clearest dividing line is still between alcoholics and users of heavy drugs. Asking exactly the same questions of both groups means that the clients have to respond to many irrelevant questions. If the instrument could be designed as a system of building blocks, different sets of questions could be put to clients with different user profiles and differentially long user careers.

The next step should be a close analysis of the conceptual content and metric properties of ASI items and new candidate items, coupled with small qualitative studies of how various groups of clients understand the content of the questions and how adequate are the response alternatives in relation to their life situation.

Ancillary