Alcohol measurement methodology in epidemiology: recent advances and opportunities

Authors


  • Parts of this paper are built upon a paper presented at a US National Institute on Alcohol Abuse and Alcoholism Extramural Scientific Board Meeting, Bethesda, Maryland, 5–6 May 1999 and drew upon papers given at an International Symposium and Thematic Meeting of the Kettil Bruun Society, ‘Monitoring Alcohol and Drug Related Harm: Building Systems to Support Better Policy’, Sidney, BC, Canada, 7–10 May, 2007.

Thomas K. Greenfield, Alcohol Research Group, Public Health Institute, 6475 Christie Avenue, Suite 400, Emeryville, CA 94608, USA. E-mail: tgreenfield@arg.org

ABSTRACT

Aim  To review and discuss measurement issues in survey assessment of alcohol consumption for epidemiological studies.

Methods  The following areas are considered: implications of cognitive studies of question answering such as self-referenced schemata of drinking, reference period and retrospective recall, as well as the assets and liabilities of types of current (e.g. food frequency, quantity–frequency, graduated frequencies and heavy drinking indicators) and life-time drinking measures. Finally we consider units of measurement and improving measurement by detailing the ethanol content of drinks in natural settings.

Results and conclusions  Cognitive studies suggest inherent limitations in the measurement enterprise, yet diary studies show promise of broadly validating methods that assess a range of drinking amounts per occasion; improvements in survey measures of drinking in the life course are indicated; attending in detail to on- and off-premise drink pour sizes and ethanol concentrations of various beverages shows promise of narrowing the coverage gap plaguing survey alcohol measurement.

INTRODUCTION

The epidemiology of alcohol use and alcohol-related consequences plays a vital role by monitoring populations' alcohol consumption patterns, alcohol dependence and acute and chronic problems associated with drinking. Such studies aid efforts to explain relationships between these variables by investigating mechanisms linking alcohol consumptive behavior to outcomes or potential harms [1,2]. A host of methodological and analytical techniques is involved in this broad enterprise, but here we emphasize advances in alcohol consumption measurement. We explore implications for monitoring and understanding the bases of temporal changes [3] considering mainly the individual as the unit of analysis [4]. This review has had to be selective and seeks largely to identify developments that we believe are critical to alcohol surveys at this time and ways of advancing alcohol measurement. After summarizing previous consensus-building projects, we take as a point of departure cognitive studies on how survey participants respond to questions. We consider important design aspects in choosing specific measures such as the reference period and degree of retrospection. Next we take up particular strategies for measuring key facets of consumption, particularly ways of assessing so-called patterns of drinking. Among those, we identify the specific challenge of trying to assess drinking retrospectively at previous life periods (life-time consumption measures), providing some rationales and urging further validity work. We then motivate and describe new efforts to increase precision of measurement further by attending closely to the amount of ethanol in an individual's ‘drink’, affected by both the precise nature of the beverage or mixture being drunk as well as by the drink or pour size. We are among researchers who have realized what a large difference the ethanol content of drinks can make to our alcohol measures, so we provide both relevant new findings and also suggest ways in which this root source of variation might begin to be addressed more effectively. We end with a summary and key directions for further research on alcohol measurement. We believe that understanding and improving alcohol measurement is basic to the enterprise of analyzing differences in risk factors for individuals' current and historical drinking patterns, and investigating associated morbidity, mortality and harms. Improving self-report alcohol measures is also vital for clinical and prevention research.

Previous consensus projects on alcohol measurement

Several important efforts around the millennium to improve alcohol measurement will be summarized, to serve as a jumping-off point. Proceedings of a 1998 National Institute of Alcohol Abuse and Alcoholism (NIAAA) international workshop on alcohol measurement [5] included consideration of volume measures [6], drinking pattern [7] and life-time drinking [8], emphasizing developments since earlier reviews [9,10]. Also in that year, limitations in the state of the art and future directions for consumption measurement were discussed in an Addiction editorial [11] with five commentaries [12–16]. A year later a second conference on the topic was held in Skarpö, Sweden, resulting in a special edition of Journal of Substance Abuse with papers describing, for example, several graduated frequencies measures [17], dimensions of alcohol-related problems [18] and summarizing consensus recommendations [19].

At about this time, international guidelines for alcohol use and problem monitoring were developed for the World Health Organization (WHO) by an international team and expert panel [20]. This is currently being updated (personal communication, Tim Stockwell, 6 May 2007) and may be expected soon. The 2000 WHO guide highlighted methodological concerns in cross-national comparisons, but most of these are equally critical for measurement in a single country. Several topics are addressed further here with recent studies. These include the importance of estimating typical beverage strength and beverage-specific serving size, especially relevant for special populations including heavy drinkers [11,21,22]; choice of reference period [6,18]; quantity per drinking day or per drinking occasion which, together with time of drinking session and biological factors, may seriously affect resultant blood alcohol concentration (BAC) and risks of acute alcohol consequences [23,24]; definitions of drinkers and non-drinkers [25]; and ways of assessing hazardous consumption [26,27].

The review is eclectic, but builds on these contributions and emphasizes methodological aspects involved in individual-level studies associated generally with survey data, those evolving from traditions that Dufour [28] has termed ‘psychosocial epidemiology’ and ‘epidemiological sociology’. These often rely on objectively scored scales using personally conducted interviews or postal surveys, among other approaches. The studies we are considering tend to assess alcohol use and its consequences independently rather than as a unitary condition, so that the relationship between consumption patterns and problems can be investigated [29,30]. To this day, despite the prevalence of widely used scales such as the DSM series on alcohol dependence and abuse [31,32] and screens such as the Alcohol Use Disorders Identification Test (AUDIT) [33,34], which includes alcohol consumption items, many surveys adopt an eclectic position on alcohol-related consequences and consider dimensionality an empirical question [18,35]. However, measurement and characterization of alcohol use disorders are not discussed here because these have been reviewed in another paper in this series [36]. Instead we focus upon recent improvements and future prospects for alcohol intake measurement, beginning with general considerations based on cognitive studies of assessment, because these inevitably underlie all self-report assessment efforts.

COGNITIVE STUDIES OF QUESTION ANSWERING: IMPLICATIONS FOR ALCOHOL MEASUREMENT

Cognitive studies of alcohol consumption measures have been undertaken for some time [37], but until recently were under-utilized. This experimental research by social psychologists has considerable relevance for self-report alcohol measurement. As summarized in Schwarz's [38] review of advances in psychological understanding of the self-report process, cognitive research tends to be consistent with ‘four maxims’, or respondents' implicit conceptions of a conversational interaction that govern the interview process. These are the maxim of relation, making the contribution relevant to the perceived aims of the conversation, of quantity, making production informative but not overly elaborate, of manner, making contributions clear rather than obscure, and of quality, holding speakers to not say anything deemed false or with insufficient basis. Both interviewer and interviewee assumed generally to be guided by these maxims (violated in deceptive research paradigms). These macro conversational precepts combine with memory-based retrieval strategies and comprehension to produce specific responses. All aspects of the survey, including introductory material, cue the respondent to what the interviewer must be seeking, framing the understanding of what must be intended by the questioner and what response strategies are appropriate. For example, in answering survey questions respondents have been shown to give widely different response distributions depending on the range of response options; they also draw on purely formal features of a rating scale, including its numeric values (i.e. in mailed or self-administered questionnaires). This is because respondents use such scaling information to interpret the meaning of the question and instructions may remind them of material they might otherwise not consider. These formal features account in part for wide differences sometimes evidenced in open- versus closed-ended questions (asking for a number of days drinking or providing categories such as ‘every day or nearly every day’, ‘four or five times a week’, etc.). As one example, a strategy that may mitigate the oft-discussed tendency to minimize one's quantity or frequency of drinking is to provide large values for the top levels of categorical scales, which may serve to legitimate or ‘normalize’ genuine heavy drinking reports [39]. It is generally little recognized that the order of response alternatives may play a cuing role, although the order of items has long been recognized to affect response frequencies [40]. In part, this stems from respondents' inferences about the questioner's ‘epistemic interest’[38]. (‘Is the interviewer intending me to focus on everyday occurrences of a minor nature or on major, extraordinary events?’, for example.) A multitude of subliminal clues (outside conscious attention) is used by respondents to infer the nature of the task. Interestingly, Bradburn [40] showed that interviewer effects are typically not as large as some of the formal aspects of instruments described above. What we should take from this is that questionnaire designers must consider carefully what respondents may infer about intent, and choose the formal features with care and thoughtfulness, where possible, to counteract prevailing bias.

Preceding questions

The content of immediately preceding (and in paper formats, more generally adjacent items) is especially critical to respondents' interpretations. To the same extent, the thrust of questioning ‘leads’ the respondent to formulate appropriate, cooperative answers, based on efforts to understand the question asked [41]. This can be very important in alcohol assessment, because generally series of questions about certain aspects of drinking need to be asked. Instructions play a role as well, and eliciting ‘truthful’ answers requires careful preparation during the explanatory and consent phase prior to answering questions [42]. Because items are embedded within this broader context, the survey process is transactional. The respondent generally undertakes the role of interviewee by drawing upon all available cues as to what the interviewer really seeks—the pragmatic rather than only the literal meaning of the question—and that subtle cues help this ascertainment-based tacit assumptions that speakers should try to be ‘informative, truthful, relevant, and clear’[38].

Potential for bias despite ‘best’ practices: need for more study of sources

Although these features that combine to structure self-reports are in principle well known, the majority of research reports in the alcohol and epidemiological field take the responses ‘at face value’ in reporting parameters of drinking behavior and associated problems, despite periodic calls for more attention to validity and reliability [13,43–45]. Even simultaneous well-conducted large-scale national surveys with high response rates, undertaken by competent scientific and fieldwork teams asking about use and problems of a given substance (e.g. cannabis), can result in widely different prevalence rates [46,47]. In the absence of prospective methodological studies of possible influences, we remain uncertain about what underlies the differences seen. Fortunately in the US culture at large, unlike illegal substance use, alcohol consumption does not appear particularly sensitive to most Americans [48,49]. However, some co-occurring problems are socially disapproved of and so may be under-reported. Further prospective studies are needed to establish relative biases in consumption estimation [50]. Differential bias is especially important with regard to respondents with differing demographic characteristics, beverage choices and drinking profiles [15,26], as variation in cognitive ability, attitudes toward alcohol, gender and many other variables are known to affect both question interpretation and response [38]. For a number of purposes, including the ability to interpolate and extrapolate the form of drinking distribution from inherently limited measures [11,51–53], disaggregating the degree and sources of bias of various types of drinkers is an important research agenda. We need both experimental and cognitive methodological studies with well-characterized population subgroups, particularly those potentially omitted from standard general population surveys [54].

Self-referenced schemata of drinking

Although new computer assisted telephone (CATI) and personal (CAPI) interviewing methods show great promise [11,55,56], it must be borne in mind that many respondents give global appraisals of their drinking based on self-referenced schemata. Schwarz comments:

Ideally, the researcher wants respondents to identify the intended behavior, to search memory for relevant episodes, to date these episodes with regard to the reference period, and to count them up to arrive at a numeric answer. Unfortunately, this is the course of action that respondents are least likely to follow. In fact, unless the behavior is rare and of considerable importance, respondents are unlikely to have detailed episodic representations available in memory. Instead, the individual instances of frequent behaviors blend into generic, knowledge-like representations that lack the time and space markers that allow for episodic recall . . . ' [38].

The prevalence of this style of responding to alcohol consumption questions has been confirmed in cognitive studies conducted at our center [17,57]. This places certain limits on researchers' quest for veridical measures and makes it quixotic to assume that perfect measures suitable for all populations can be attained [11]. None the less, offsetting these concerns somewhat, several summary consumption measures seem to perform fairly well in practice and may be improved further, as will now be discussed.

Recent experience with latest generation measures such as the graduated frequency (GF) alcohol consumption measure [17,58,59], other validity studies [12,45] and reproducible risk relationships seen in numerous studies [7,29,30,39,60] imply that the consumption volume and heavy drinking measures are serving fairly well to order individuals with respect to intake. Preferred will be measures capable of reliably identifying those with hazardous drinking patterns [26,61] in order to characterize risk relationships more clearly [53,62]. Thus, while further methodological studies to improve measures are clearly needed [50], as Midanik [12] put it, ‘the glass is half full’, not half empty. We will now take up several aspects of alcohol measurement that need to be attended to and which may help to explain discrepancies seen across studies.

Reference period

As noted before, the ‘think aloud’ technique, in which respondents report how they go about answering questions, has been used to study the ways people with varying characteristics (such as gender, education and ethnic background) may interpret questions about drinking behavior [17,57,63] with the aim of applying results to improve measurement [12]. We have learned that it is crucial to provide a reference period—the period over which the respondent is instructed to provide summary information—such as 12 months or 30 days. When the reference period is not explicit (e.g. ‘how often do you usually drink alcohol’ rather than ‘during the last 12 months, how often . . .’), respondents tell us that they assume periods ranging from a week to several years when describing their recent drinking [17,25]. In population surveys, 30-day (or better, because of the commonly observed weekly cycle of drinking amounts, 28-day) measures can easily omit both infrequent light and intermittently heavy drinkers, captured more accurately by 12-month measures. Therefore, while the previous-month measures appear to offer better recall, cognitive studies suggest recourse in both instances (i.e. 12-month and past-month recall) to the ‘picture of the drinking self’ more than to specific and detailed memories of actual drinking events. Although schemata may underlie responses to both past-month and past-year measures, it remains true that people do consider the time-frame presented and drinking behaviors vary over time (as sales show). One reason for preferring 12-month measures is that seasonal variability is minimized by these. Measures of drinking on days of the previous week only, if plausibly better recalled than the longer duration measures, contain more happenstance variability on an individual basis; they have tended to result in lower estimates of harmful and hazardous drinking than GF measures [59]. Finally, although unsuitable as an individual consumption measure, assessment of the previous day's drinking may truly avoid Schwarz's ‘knowledge-like generic representations’ and be less downwardly biased, resulting in better coverage when aggregated to the population [64]. This method also has promise for calibrating summary consumption measures with longer reference periods (e.g. either previous year or previous month measures) to characterize individuals' consumption. For example, collaborative work is in progress at the Centre for Additions Research of British Columbia and the Alcohol Research Group in California uses yesterday reports and drinking diary measures to develop empirical category averages for the graduated frequencies quantity ranges (e.g. one to two, three to four, five to seven, eight to 11, 12+ drinks), by capturing more accurately the distribution of quantities reported in these GF ranges. This is particularly important for the highest, unbounded level, e.g. 12+ drinks, found to have a mean of 15.5 drinks in a recent US analysis of daily diaries.

Based presumably on different investigator assumptions, preferences and purposes, population survey alcohol measures have adopted numerous reference periods. Widely used instruments include those with the previous 12 months as the reference, such as the US National Alcohol Survey (NAS) [22,55,65] and the National Epidemiological Survey of Alcohol and Related Conditions (NESARC) [66,67], or (rather problematically with a restriction to those who reported consuming at least 12 drinks during the period) the 1988 National Health Interview Survey NHIS [29,68–70] and the 1992 National Longitudinal Alcohol Epidemiological Survey (NLAES) [24,71]. Previous-month [72] or previous 30-day [73] measures are also common, as represented by the National Survey of Drug Use and Health (NSDUH) [74,75] and the Behavioral Risk Factor Surveillance System (BRFSS) surveys [76]. As noted, 30-day measures may be downwardly biased for intermittent heavy drinkers who may report lower drinking, or even no drinking, accurately in the previous month when heavy drinking episodes or periods occurred earlier (an unspecified time-frame has the same deficiency). Some surveys have also used as a reference period the last 7 days, assessed as individual days [6,77], or as a combined previous week [78]. Such measures are even more prone to miss intermittent heavy drinking occasions than the 30-day measures. Methodological work, including several reference periods, has revealed the problem with the week-duration measures: by restricting the range of heavy drinking they may be insensitive risk measures in relation to acute and possibly even chronic harms [59]. Some Scandinavian strategies have involved varying reference periods based on the individual's recency of drinking occasions [79–81], but these can be difficult to compare with measures based on fixed drinking durations. We have already addressed measures of drinking in a single day—yesterday [64]. The reference period, with or without associated minimum drinking criteria (such as the problematic 12 drinks in any year), implicitly define current drinkers; by implication this also establishes the study's definition of non-current drinker or abstainers [25]. Resultant between-study differences in drinking distributions can make results non-comparable in ways investigators may not appreciate [22,55]. In a way, this is but a critical one of the numerous wording differences that can make undertaking study comparisons sobering [43]. Another, of course, is beverage-specific versus combined-alcohol measures [17], an issue to be taken up later. Next, however, we consider issues of recall in more depth.

Retrospective recall

Questions assessing a respondent's drinking in the past are thought generally to be less reliable than assessments of current drinking patterns because they are subject to recall bias, in addition to having the same issues as current measures. Recall bias refers to a generally downward bias in response to questions about past drinking because of forgetting due to reduced salience and re-imagining, and minimizing or purposeful under-reporting of behaviors viewed currently as socially unacceptable, such as heavy drinking. Generally, the further in the past that the behavior occurred the stronger this effect will be. There is also some evidence that the respondents' current drinking behavior will influence their retrospective reports [82]. This generally means a downward bias, because drinking often decreases with age (although we note that this is by no means a universal maturational pattern, especially in cultures where drinking is a privilege of age). One Finnish longitudinal study found that 18-year retrospective reports resulted in double the alcohol of the original reports in a situation where the country's per capita consumption had tripled over the follow-up period [83]. Measures of the age of onset of drinking and even assessment of life-time abstention (see later section) are subject to this type of bias. A recent study of the life-time drinking a the 1958 British Birth Cohort Study found that only half of those reporting never having a drink at age 42 also had no reports of drinking in any of the earlier assessments [84]. Changes in life-time status across longitudinal assessments are known as ‘recanting’ and have been found to be very high for illegal drug use as well as for alcohol use [85]. Such instability in respondent's characterization of ever using alcohol highlights the importance of social factors, perceived confidentiality and selected memory in alcohol use assessment in general.

Even for shorter-duration ‘current’ drinking assessment, there is also the issue of variability in drinking from day to day. Particular drinking episodes may stand out uniquely (for example, when one drank the most), offering some possibility of assessment [39] (discussed further under the heading Maximum measures). It is a paradox that the most common measurement approach asks for usual quantity and usual frequency (QF), while most researchers are aware that trying to capture ‘typical’ drinking behavior is problematic because of the normative day-to-day variability of drinking amounts (and variation in frequencies of drinking as well). In most places, individual drinking patterns are seldom so routine and unvarying as to make the average amount the same as the actual amount typically consumed—and researchers have long noted that individuals may estimate their own averages inaccurately [57]. Next, we consider this further.

SPECIFIC TYPES OF MEASURES: ‘CURRENT’ DRINKING

In this section we examine various types of measures designed to tap ‘current drinking’, whatever reference period is used, in contrast to drinking over the life-course, to be addressed later. We begin with the simplest (too simple) single item, the so-called food frequency measure, describe next the widely used two-item QF measure, followed by the GF measure, designed to assess pattern as well as volume, and end the section by considering so-called ‘binge drinking’ measures: frequency of large amounts, frequency of drunkenness and maximum per day measures, each sometimes used in addition to volume, to characterize drinking patterns associated with elevated risks of experiencing various harms and even mortality.

Single-item food frequency-based alcohol measures

Many epidemiological studies, particularly those focused upon alcohol-related health outcomes, have relied on the type of food frequency measures used commonly in nutrition and other medical research. In many cases all alcohol, or beer, wine and spirits separately, would be assessed in the course of obtaining rates of consuming other beverages such as milk, soda and coffee. These questions typically include several frequency categories for less than daily drinking and quantity levels for daily drinkers. For example, one study used in addition to categories of ‘less than monthly’, ‘at least monthly but less than daily’, one to two, three to five, six to eight and nine or more drinks per day [86]. In effect, such a measure confounds frequency and quantity by assuming that only daily drinkers consume large amounts on an occasion. Many who drink less than daily drink large quantities, but the form of the question makes reporting irregular heavy drinking impossible. As another example, in the American Cancer Society (ACS) prospective study begun in 1959 subjects reported beverage-specific consumption as occasional, irregular or daily, with daily categories of one, two, three, four, five and six or more [87]. This assessment of men aged 40–59 years found that about 70% of current drinkers were daily drinkers, far more than was typical for the United States even during heavier epochs than today (e.g. only about 14% of this group were daily drinkers in the 1984 National Alcohol Survey [88]). This discrepancy highlights the main problem with this type of measure, which is that it is not suited for populations that drink typically on a weekly or monthly basis and often have more than one drink per occasion. These common patterns would be classified crudely as monthly or irregular drinking, but it is likely that many non-daily drinkers respond using the daily quantity categories, given the excessive proportion of daily drinkers in the ACS study. In other words, many respondents who drank more than one drink, but did so less than daily, used the scale to indicate typical quantity and so appeared to be daily drinkers. In so doing they were following Schwarz's [38] principles and attempting to give the best answer possible to a single question mingling quantity and frequency—a hallmark of the food frequency approach. This makes these measures prone to miss-classification based on providing one rather than at least two (quantity and frequency) dimensions, as considered next.

Two-item QF measures

When very few questions on alcohol consumption can be included in epidemiological surveys, the minimum set generally includes a ‘usual’ frequency of drinking F (number of days drinking in a given reference period, or a range of temporal categories that can be converted to this metric) and the ‘usual’ quantity Q of alcohol drunk (generally in standard drinks, however, defined) since Q × F = V, or average volume. The QF measure has one advantage, simplicity and relative ease of completion, and numerous disadvantages, particularly as it has long been recognized that usual Q, as reported typically, tends not to be the arithmetic mean of a person's varying pattern and under-represents heavy occasions [72,89]. It has been noted and found empirically that this leads to downward bias in estimated volume [59,90,91] compared to measures designed to capture ‘unusual’ heavy drinking as well [89]. For this reason, a US NIAAA Task Force draft report on alcohol questions [92] recommended that epidemiological studies include a minimum of at least three alcohol questions: usual quantity, usual frequency and a third including frequency of heavy drinking (see further later), an approach suggested long ago by Armor & Polich [89] as a way of adjusting the calculation of volume to account for ‘unusual’ quantities.

GF measures

The GF measures of several types [9] were developed at the Alcohol Research Group to overcome some of the deficiencies of QF measures just described, and to allow pattern information as well as volume to be generated directly. In essence, all measures of this type seek to obtain QFs at a series of quantity levels, generally ranges or ‘bands’ of quantities (e.g. one to two drinks, three to four drinks, etc.). Three types of GF measures have been used fairly widely [17]. The most common asks about combined alcoholic beverages (e.g. typically any drink involving wine, beer or spirits, with a standard drink defined in the instructions), and uses a 12-month reference period. It is initiated by a question soliciting the maximum number of drinks in any day in the previous year, asked categorically (see Maximum section below). This maximum quantity defines the entry into a ‘GF series’ of questions at an appropriate level. If the individual ever drank 12 or more drinks, the frequency of drinking 12+ amounts is asked, and so on for each of a series of fixed levels in descending quantities (i.e. eight to 11, five to seven, three to four and one to two drinks). Thus, if the person reports a maximum of five to seven drinks in any day of the previous 12 months, only frequencies of drinking amounts in each of three bands—five to seven, three to four and one to two drinks—are asked. The GF is an efficient measure in countries where most individuals drink smaller quantities per occasion (or actually day) as few items will then need to be asked: the maximum (five to seven drinks in this example) and in this instance a further two stepped-down QF items (frequencies of drinking three to four and one to two drinks, respectively). Validity studies in the United States using daily diaries have showed reasonably good agreement between the GF's frequencies at the various Q levels and the diary-based number of days at equivalent quantities as well as diary- versus GF-estimated volumes [58]. The combined alcohol GF measure has been implemented successfully for general population surveys in various countries where a standard drink has some consensual meaning, such as the United States [91], Canada [59] and in a paper-and-pencil version in Australia [64]. Additionally, the GF approach has been recommended for international alcohol monitoring surveys [20]. One problem that must be addressed in a minority of cases, mostly certain heavy drinkers, is accounting for more than 365 days in the year in the summation of frequencies, addressed by truncating or prorating in the volume algorithm [17]. One implementation, referencing the GF levels in grams of ethanol equivalents, was incorporated into the GENACIS (Gender, Alcohol and Culture: an International Study) surveys [93,94]. This form of the GF has been criticized because of complexities leading to imperfect operationalization in several countries and inconsistent reports [95], and a recent Canadian pilot for that country's GENACIS survey also noted such problems [96] and it has been noted that other approaches than the combined-alcohol GF can at times yield higher volumes [97]. For these reasons, interviewer training in the use of the GF and correct programming if implemented in CATI telephone survey are particularly important, as well as adequate respondent instructions.

A second type of GF measure is the beverage-specific measure known as the Knupfer Series [9,17]. This approach asks frequency questions at various quantity levels for wine, beer and spirits (or ‘liquor’) separately [98]. For each beverage type the respondent is asked the proportion of time that they drink at three quantity levels: one to two, three to four and five, six or more drinks. Asking beverage-specific questions is one of the ways in which one may cue respondents about specifics of drinking occasions. Asking beverage-specific questions can yield better recall and lead to reporting higher volumes than combined-alcohol QF questions [99,100]. Also, there are demographically distinct patterns of drinking by beverage type [101] and this can have important implications for risk analyses [26,53], because risks may differ by the beverage drunk most often. Even risks of certain types of morbidity and mortality such as esophageal cancer may be affected by specific types of alcoholic beverages [102,103]. Thus, for various purposes, beverage-specific questions are essential. A positive feature of beverage-specific GF consumption measures is that ethanol adjustments for pour sizes and drink strengths can be made (see later section). Conversely, it is difficult using this method alone to estimate drinking or heavy consumption rates on particular days, because unless only one beverage type is drunk various combinations of beverages may, to an unknown extent, be drunk on the same day. Although rates of five or more beer, five or more wine and five or more spirits are obtained separately by this beverage-specific GF measure, some combinations of lower levels could result in exceeding five or more drinks of any alcoholic beverage (e.g. drinking three glasses of wine and two cocktails). Because heavy quantity consumption is vital for predicting acute harms such as injuries [104], when beverage-specific questions are used for volume calculations, frequency of combined-alcohol heavy drinking (or similar measures) should also be asked (see later sections).

Paper-and-pencil GF forms for postal surveys have also been developed, e.g. one used in both community [105] and college student [72] samples. For the past 30 days, respondents are asked on how many days they drank at least one drink. The next item asks: ‘On how many of those days (with a graphic pointing to the number of drinking days they just inserted) did you have at least four drinks?’; then, in ascending order, similar questions are asked for at least eight and at least 12 drinks [17]. Revised versions have used 28 days as a reference period in order to include exactly four weekends, given the common weekly variation in daily amounts. A self-completed GF was implemented for the 2001 and 1004 Australian National Drug Strategy Household Surveys (NDSHS). After an instruction noting the 12-month reference period and standard drink quantities defined previously (10 g ethanol in Australia) came a matrix of response options with eight rows, one for each quantity level beginning with ‘20 or more standard drinks in a day’ then, in descending quantity bands, 11–19, seven to 10 standard drinks, etc., each row providing eight frequency options across the page from ‘every day’ to ‘never’[64]. For countries where ‘drinks’ have meaning GF measures provide an individual-level profile of drinking quantities that can yield proportions of drinking at hazardous levels [26,106] and other pattern variables.

Heavy drinking per occasion or day

Frequency of heavy drinking or quantity per occasion measures have been recognized as critical for drinking pattern measurement for many years [72,107], but the appreciation of their importance in the last decade represents a paradigmatic shift [108], with even newer mortality studies finding drinking pattern effects on all-cause mortality [109,110]. The somewhat problematic term ‘binge drinking’ has become synonymous with heavy quantity consumption following the college studies of Wechsler and colleagues [111]. The Harvard group [112] also argued for a gender-based definition (five or more drinks for men, four or more for women) which has gained traction, although this distinction cannot be applied in all studies, e.g. when only a single five or more drinks measure is available in a survey series [76,113]. One advantage of the term ‘binge drinking’ for such quantity measures is that it may help to distinguish them from heavy drinking as sometimes is (mis)applied to high-volume drinking. The term ‘heavy episodic drinking’ is also in widespread use, but this is also problematic in that heavy drinking episodes can be routine, not episodic, as implied. We feel ‘heavy quantity drinking’ is a preferable term. Frequency of heavy quantity drinking has been found related to item response format, with categorical scales eliciting higher frequencies than open-ended formats [114]. One must keep in mind that definitions of heavy (or binge) drinking vary, some being considerably higher than five or more drinks, with eight or more equivalents in heavy drinking populations not uncommon [115]. It has been argued that pattern measures (e.g. four or more, eight or more, 12+ drinks per occasion) need to be scaled to an individual's volume to discriminate the tendency to obtain a given volume either by frequently drinking lower quantities or infrequently having larger quantities [72], the latter, of course, being the riskier pattern. Unpublished new work has found empirical pattern distinctions of this type to be highly predictive of alcohol use disorders [62]. Recently there is interest in trying to define binge drinking by the blood alcohol level obtained [116], with NIAAA suggesting five or more (four or more for women) drinks in 2 hours as an appropriate binge measure, because this would result typically in a 0.08 BAC. Appealing as this idea is, for epidemiological surveys this may be too restrictive a definition; unless five or more drinks in a day is also asked, several important types of analyses will not be possible, e.g. heavy drinking adjustments to volume measurement [89] or proportions of volume consumed hazardously [26].

Frequency of subjective drunkenness

Frequency of drunkenness attempts to capture the rate of reaching intoxication that results from intake. One rationale for subjective effects measures is that they may better take account of gender differences and other factors such as individual variations in body water, metabolism rate, etc. [117]. Yet it is also known that over time, populations' definitions of number of drinks to feel drunk change [118], with a substantial downward shift between 1979 and 2000 during the drying trend in the United States, later surveys showing fewer drinks to feel drunk on average. The strength of the measure, its subjective nature, is in some ways therefore also its weakness, in that even the same type of individuals may change over time. Nevertheless, frequency of drunkenness has proved superior to several other heavy drinking measures in predicting outcomes such as criminal behavior [7], so it can supplement usefully other more ‘objective’ pattern measures.

Maximum amount drunk in any day of reference period

The maximum amount consumed in any day during the reference period is an important pattern variable, probably the single best indicator of drinking variability [39]. Additionally, the most consumed in any day, e.g. in the previous year, can serve to define those whose drinking quantities are within (or exceed) safe drinking guidelines, however defined [119–123]. We know that ‘coverage’ of alcohol consumed, the average volume per individual as estimated by surveys, is a fraction of the per capita consumption estimated by national and state sales statistics [26,65,124]. Thus survey measures are, to a greater or lesser extent, downwardly biased, despite best practices. The maximum quantity measure is important of itself, but for the GF series [39,55] described earlier it also defines the quantity starting point. It is very important because of the skip it therefore introduces that it not be miss-classified in a downward direction. For this reason, as mentioned earlier, the initial response option is ‘24 or more drinks in a single day’, an unusual but not rare response. This is a valid response, because this quantity is quite common in clinical populations and those with alcohol abuse disorders [125,126]. However, the use of this high quantity also serves to cue more typical respondents (those drinking occasionally a lower but still high quantities) to the interviewer's acceptance of large quantities and establishes such amounts as ‘well within the normal range’. Permission-giving appears important to reduce downward bias: one would expect a very different drinking distribution in a sample if the maximum was not asked in this way and instead GF items were presented in ascending order [17]. Similarly, when the 12-month maximum amount in any day is asked in the standard categorical fashion one tends to obtain higher values for heavy drinkers than with an open-ended most drinks measure [39].

SPECIFIC TYPES OF MEASURES: DRINKING OVER THE LIFE COURSE

Life-time measures of alcohol consumption

Here we review findings related to life-time drinking assessment. These measures are important if one is limited to a single survey, or to include in the baseline of a prospective panel study, because they provide some historical picture of previous drinking which can have profound implications for prediction of future risks. Possibly because of the concern about the adequacy of long-term recall, as exemplified by Simpura & Poikolainen's [83] classic study of inaccuracies in recall after 18 years, life-time history measures, available for a quarter century [127], have been used less widely than one might expect. In the majority of surveys on alcohol, questions on consumption have been limited to recent drinking, however, the reference period is defined. Implicitly, many researchers, even those skeptical of long-term drinking recall, believe certain aspects of drinking behavior can be assessed over the life-course. For example, in many studies an item is included that indexes ever having drunk alcohol (variously defined, e.g. more than a few sips) versus life-time abstention. Another fairly common measure for non-life-time abstainers is age of drinking onset, with younger age predicting risk of (or possibly mediating other risk factors for) life-time alcohol dependence [128,129]. Fortunately, using the National Longitudinal Survey of Youth (NLSY) a methodological analysis has suggested that age of onset questions found typically in surveys ‘are of sufficient reliability for most epidemiological applications’ although cautioning that when this variable is the focus of interest additional efforts to achieve reliability should be made [130]. Another similar, but more demanding, recall is number of drinks needed to ‘feel the effects’ the first five times the respondent drank, used among other measures to define low response (LR) to alcohol, viewed as a phenotype at risk for alcohol dependence [131,132]. Unfortunately, such variables tell very little directly about drinking history over the life course, and there is a need for many purposes to incorporate serviceable survey measures of life-time drinking [8,82]. The Lifetime Drinking History (LDH) of Skinner & Sheu (1982) asks about typical frequencies and quantities, as well as maximum quantity in self-defined life phases to capture major changes. A frequency of the maximum amount has been added to improve capture of heavy drinking [133]. Unfortunately the LDH, with its ‘floating’ time intervals, takes 20–30 minutes to administer and so is not really suitable for population surveys.

Fortunately, research has suggested that the use of simpler decade-based measures may provide useful information, comparable to more comprehensive drinking histories. For example, cumulative life-time consumption based on a more elaborate, time-consuming ‘floating’ format involving potential changes in drinking associated with important life events, specifically recalled, correlated in the 0.80s or high 0.70s with equivalent measures derived from a simpler ‘fixed’ version asking about decades of life [134]. Thus, survey applications of the simpler, fixed period-based variety are feasible [3,135]. The life-time drinking perspective is crucial for mortality studies [136,137], which have seldom been based on multi-point consumption measurement [86,138]. Single-point drinking assessments provide inadequate estimates of cumulative exposure [104,139]. Other types of alcohol epidemiological work may also benefit from development of efficient life-time measures for surveys such as studies of the ageing process [140], of natural versus treated recovery from drinking problems [135,141,142] and the sequencing and severity of health harms [143] associated with drinking at various life phases [144]. Prospective studies measuring drinking at various times in relation to life transitions and outcomes are highly desirable but expensive and difficult to accomplish [145], often requiring inferences to be based upon cross-sectional series or single point surveys [146]. Even in longitudinal panel studies, retrospective previous life-time measures are desirable at baseline as individuals differ in early adult drinking trajectories [147]. In clinical populations, life-course measures are also valuable. Studies have established reasonably unbiased recall for ‘remote’ (meaning a few years) recall for use in retrospective data imputation of intermediate missing data points [148]. Obviously, many methodological problems remain to be solved in survey life-time drinking measurement [8], and retrospective report has inherent limitations [149]. Writing some time ago, Lemmens [8] cited three problems requiring further work, which is still germane today: (i) how much detail can researchers expect respondents to give with reasonable reliability; (ii) how is recall of earlier periods influenced by current drinking; and (iii) how are inconsistencies in reported histories to be resolved (similar to inconsistent results in longitudinal waves)?

UNITS OF MEASUREMENT: THE ETHANOL CONTENT OF DRINKS

Now we take up the issue of the ethanol content of ‘drinks’, however defined in a survey. We give this attention because it has become apparent recently that the variation contributed by this source of measurement may be of equal importance to all the other influences discussed above from the perspective of accuracy of consumption and drinking pattern measurement. The ethanol content consumed by a particular individual in a survey, or the average ethanol content of drinks consumed in a country [150–152], are relevant to all aspects of research where a drink is the unit of measurement. Most surveys of alcohol consumption are phrased in terms of ‘drinks’, meaning standard drinks of the respective country or area surveyed. In turn, the findings and recommendations stemming from studies using measures of numbers of drinks (as most have) are also stated generally in terms of numbers or ranges of drinks consumed. In some cases, these drinks are defined for the respondents in terms of the milliliters (or typical container sizes) of beer, wine or spirits suggested to constitute a standard drink, but in many cases they are not. Regardless, it appears that respondents are likely to report in terms of the drink sizes they actually consume [153]. Only detailed analyses of that respondent's drinks, or the drinks consumed by similar people in a given society, can tell us what was meant by a report of a given number of drinks (in a day, typical occasion, etc.). Research has established that differences between individuals' drinks can be large, with some individuals reporting as little as 4 g of ethanol in their ‘drink’, while others report drinks of more than 30 g of ethanol; differences exist between the mean alcohol content of drinks by beverage type, with one US methodological survey indicating an average of 16.6 ml for beer, 19.5 ml for wine and 26.3 ml for spirits drinks [154]. These differences will affect the precision and significance of risk estimates as a factor of the accuracy of the assumptions made about the ethanol content of self-reported drinks. Fundamentally, a given consumer's ability to stay within safe drinking guidelines or avoid drunk driving, defined as remaining below a jurisdiction's legal BAC threshold, is also compromised when the drinks such an individual consumes differ from the standards used to define such guidelines (even if the standards were accurate, as is not always the case) [155].

Standard drinks

The standard drink concept suggests that there is a serving size of alcohol that is typical of a particular country. Regardless of whether this is true, messages about risk of health or social problems [120], or estimates of likely estimated BAC (eBAC) following a given period during which there is a specific level of intake, e.g. based on the Widmark or related formula [156,157], need to be made in terms of some units, and a standard drink or serving is the most logical choice (in countries where these can be defined or approximated). However, ounces or milliliters of a particular beverage or ounces, milliliters or grams of ethanol, as used commonly in research studies, will not be understood easily by most consumers. The use of the standard drink concept is complicated by different standards across countries and even within countries [20]. In the United States, for example, both 12 g and 14 g are cited commonly as standard drink amounts. Some countries use a smaller standard, such as 8–10 g in the United Kingdom or 10 g in Australia, while others use a larger standard, the highest being reported to be 23.5 g in Japan [158]. In practice, one standard will be taken by researchers to apply to all beverage types, while in reality the typical serving sizes and ethanol contents tend to differ by beverage type, leading to non-equivalence.

Sources of variation

A number of factors are associated potentially with variation in drink ethanol content, including beverage type and brand, glass size and shape, cultural and historical factors, the context in which drinking takes place, the specific drink recipe used (in the case of mixed drinks), the percentage alcohol content of the beverage/s used and the type of measuring device or other pouring strategy involved. Primarily, of course, the intentions of the person who makes the drink and the perceived expectations of the intended drinker (who may or may not be the same person) matter most. Beers and ready-to-drink beverages are often consumed from containers, but several sizes may be available, e.g. ranging from the most common 12 oz (355 ml) size to 40 oz (1184 ml) or even 64 oz (1894 ml) for beer and malt liquors sold in the United States [159]. Variation in the strength of beers commonly consumed complicates this further. Even mean drink size and drink strength will depend on the particular mix of products consumed by subgroups and variation within a population, often ignored, and may change as new products such as coolers or low-strength beer gain popularity in some market segments [150,160]. In the United States, the market shares of light beer and malt liquor, for example, vary widely by state [152] and have changed over time. We next examine in turn volumetric estimates in homes, in on-premise establishments and then the measurement of drink concentration.

Home drink measurement strategies

Several strategies for measuring or estimating the ethanol content of drinks consumed at home have been employed in research studies. One method is to ask the respondent to report their typical container size or pour volume in ounces or milliliters [64,159,161]. Separate questions on each beverage type are usually asked. In face-to-face studies such as the 2001–02 National Epidemiologic Survey of Alcohol and Related Conditions (NESARC) survey in the United States, pictures of commonly used glassware with the ounces represented by different pour levels displayed on them may be used to aid recall. Another alternative employed in the 2005 National Alcohol Survey (NAS) and discussed below is to ask about relative drink size as compared to some standard such as a typical bar drink or bottle of beer (used as a reference). Some studies have shown respondents pictures of containers and glasses with pour levels marked with letters rather than volume labels and asked them to find the closest matching vessel and pour level, or in face-to-face situations, to solicit responses using actual glassware with labeled pour levels [153]. Other strategies include having respondents pour a drink like their usual one into glasses provided to estimate their usual pour [162,163] or to have them attempt to pour a standard drink into provided glasses to gauge their ability to do so [164,165]. Finally, direct measurement of simulated usual drinks poured at home is the method that we believe is most accurate for use in any survey mode where respondents are contacted in their own home (and a large proportion of drinking is in home settings—not always the case). In these studies the respondent is asked to make their usual drink of each beverage type they drink using their own glassware at home. These drinks are then measured by the interviewer in a graduated cylinder [166–169]. Our group has extended these methods for use in telephone surveys by pre-mailing accurate plastic measuring beakers to the respondent, contacted by phone [154]. We selected an inverse conical beaker type because this offers accuracy at the smaller as well as the larger pour sizes. Because such studies are time-consuming, they may best be conducted as smaller sample methodological adjuncts to population surveys, as has been our practice [154]. The issue of concentration of alcohol in the beverages will be considered after the next section on another key drinking venue.

On-premise drink measurement strategies

Bar drinks can be measured in self-reports using pictures, relative size or direct report of amount (vessel volume) as just described, along with brand or percentage alcohol by volume (%ABV) as detailed in the next section. Alternatively, a bar measurement study can be conducted where common bar drinks of beer, wine, spirits or other beverage types are purchased and measured. Wine and beer drinks are relatively simple to buy and measure using a funnel and graduated cylinder [168], although beer foam can be problematic. A straight spirits drink can be also be measured easily in the same way as wine or beer. Ethanol in spirits drinks is more difficult to assess, particularly when beverages are mixed in cocktails and combined with ice. To measure mixed drinks two strategies have been used. A Spanish study [162] determined popular drinks in each bar through observation of patrons and then worked with the bartender to measure each ingredient as an example of how each popular drink was made. One issue with this strategy is that the bartenders are aware of the measurement activity and may make the drink differently than they would otherwise. The alternative strategy, employed in a study conducted recently by our group in California, involves three or more study personnel who buy a variety of popular drinks naturalistically in a sampled establishment without the bartender's knowledge of measurement activity. Each mixed drink is poured off the ice through a funnel or cocktail shaker and the liquid volume is measured. Next, a small sample of each drink is taken with an eye-dropper bottle for later analysis. These samples are assayed later to determine the %ABV of the drink using a suitable device, in our case an Analox® AM3 alcohol analyzer. Either type of study should include different types of on-premise drinking venues such as bars, restaurants and nightclubs. Bartender interviews or focus groups may aid the design of such a study, as in the instance mentioned above, and should be carried out in advance to learn about possible issues with establishment types and popular drinks. An interesting note is that it was determined by appropriate oversight committees that, in the absence of interviews with the bartender from whom the purchase was made, human subjects were not involved, the measurements instead being of a physical sample.

Measurement of alcohol concentration

While drink size (amount or volume) appears to be the main source of drink ethanol variation within beverage type, the %ABV of drinks can also vary considerably. For beer in North America, the lowest strength is around 3.5%ABV, light beer is usually 4.2%ABV, standard beers are about 5%ABV and malt liquors or other craft beers may be 6–8%ABV [152]. Wine can vary greatly as well, with so-called wine coolers (usually not containing wine) at 4–5%ABV, table wines from 8 to 16%ABV and fortified wines at 17–22%ABV. The most variable, of course, are spirits drinks, which can reasonably be diluted to very low levels or drunk straight at 40–50%ABV or higher. Study participants' knowledge of the %ABV for any beverage type tends to be poor in the United States and probably in other countries as well. Thus, asking respondents to report the %ABV of drinks may very well result in inaccurate information. Drinkers can be expected to remember the preferred brand and type of each beverage (and varietal for table wine), which can be used to determine the %ABV. Many markets have thousands of brands and this may be difficult information to utilize; however, popular brands and types will be reported as typical, or preferred by many respondents. Also, especially for wine but also for other beverages, respondents may not be able to report a specific brand and may, e.g. for wine, give a varietal type such as red, Merlot or Chablis. In these instances, a market share (sales) weighted average of all or all major wines (or other beverages) of this type or description should be used where possible [152].

Application of drink ethanol estimates

The usefulness of information on drink ethanol content extends beyond the straightforward modification of the respondents' drinking volume and pattern measures. Estimates of the mean ethanol content of drinks by beverage type, drinking context and across demographic groups can be utilized to adjust consumption reports from larger surveys where it is not possible to collect this level of detail. For example, we plan to apply the subgroup mean drink ethanol estimates from our 2005 NAS methodological sample (approximately 350 individuals) and the bar drink ethanol content estimates from the ARG Bar Drink Study to the full 2005 NAS sample's reported drinks. The main NAS survey requests selected information on the beverage type consumed, some details of specific drink and size and contexts of drinking in addition to the generally available demographic information. Application of mean drink-ethanol estimates based on the methodological substudy can be targeted more precisely by adjusting estimates based on subgroup characteristics and thus some level of information, incorporated into any survey, may increase precision. A good example of this is the Australian National Drug Strategy Household Survey [64] using detailed information gathered about yesterday's drinks. Drink ethanol estimates have been found generally to be larger than the contents of a standard drink so we know that under-coverage of alcohol sales by surveys is due partially to assumptions that underestimate drink ethanol. More importantly, we would like to know if the improved precision of individually measured drink ethanol content and the application of estimated drink ethanol content to subgroups according to beverage choice, drinking context and demographic or other factors will improve models relating consumption patterns to alcohol-related health and social outcomes. Theoretically, ethanol-adjusted drinking measures should outperform ones using ‘standard drinks’ in predicting alcohol-related problems, a strategy already found valuable in comparing alternative alcohol consumption measures [7].

CONCLUSIONS

This is an exciting period for the epidemiological study of alcohol use patterns and associated problems. New methodologies for alcohol assessment, as well as more flexible and powerful analytical tools, are being developed that promise rapid advances in explanatory models of initiation and maintenance of patterns of alcohol consumption in populations and their subgroups. For example, analyzing the cross-sectional series of regularly repeated NAS monitoring surveys over a 20-year period using age–period–cohort (APC) modeling has allowed us to decompose raw trends for men and women in beer wine and spirits consumption into components owing to maturation (or age effects), survey period effects (including any cross-cutting methodological differences) and birth cohort [170]. While some variation in methods was present over the five surveys, a high degree of consistency of measurement was very helpful and even essential, no doubt contributing to plausibility of findings. One of the challenges of such monitoring surveys is how to evolve the measures to take account of advances described earlier while allowing commensurate comparisons with measures used in earlier epochs. In some cases, as with the drink–ethanol adjustment strategies described earlier, if data gathering is augmented (e.g. by assessing beverage specifics such as brand as well as drink size) one may make the adjustment or not, allowing for the needed look-back to earlier measures based only on ‘drinks’. The later surveys with drink ethanol data may be useful for sensitivity analyses examining the influence on results of more imprecise ‘drink’ measurement.

Collectively, the new studies of drink size and strength in home and on-premise settings are revealing the importance for alcohol measurement in surveys of taking better account of drink ethanol and we would argue that replications are needed. For the most part, we believe that unless the purpose is primarily methodological, for practical reasons deeper assessments of drink ethanol to improve survey precision will need to be framed as coupled substudies, the results of which can applied to a study's full sample (albeit imperfectly using modeling applied to more basic measures, as discussed above). Although face-to-face survey designs are able to incorporate container picture stimuli directly, such as was conducted in NESARC, or use other ‘hands-on’ drink-size strategies, subject burden sets practical limits on the detail of assessment in population surveys. One question remaining unanswered is whether policies regarding container labels providing number of standard drinks, and requiring similar consumer information for bar drinks, could lead to greater public awareness and possibly even more accurate survey reporting. In principle, human factors research could help the evidentiary basis for such policies.

We have described several approaches needed to estimate more precisely the ethanol content of an individual's typical drink using specialized studies. We believe repeated-measure designs are now needed to study variation of drinks within an individual across time and across drinking contexts. One such study of the performance of self-reported summary measures (such as the GF) is under way at our center involving 28-day prospective drinking diaries as well as physiological measures to validate these records [171]. In this case the diaries obtain details of all drink types and brands consumed on up to four drinking occasions in a day, as well as associated drink sizes. In this design, variation in drink ethanol can be examined both between and within subjects and across an individual's natural repertoire of drinking contexts, also reported for the occasions. Pours at home may differ from those at parties, and research described earlier indicates that they differ again from those in on-premise establishments. The more intensive designs offer promise of calibrating more accurately the types of alcohol measures described here. A remaining challenge, however, is deciding what essential information on all study participants needs to be collected routinely to make the best use of data generated from these more burdensome but detailed methodological studies, to allow for an efficient cross-walk.

We believe that beverage-specific measurement is essential (in part to improve recall and allow for drink ethanol adjustment), but so also are measures that combine into one amount all types of drinks consumed in a day, or in a defined drinking occasion, for assessing rates of heavy drinking or eBAC, respectively. In the latter instance duration of the drinking occasion, or rate of drinking, is needed and validation of such measures is another important need. Also important for future studies will be assessing relative amounts drunk in various venues [172], allowing adjustments that can address this variation on an individual basis. Alcohol-specific surveys have the luxury of permitting the incorporation of several types of alcohol measures, thereby meeting multiple study goals. However, the minimum set of alcohol consumption items required for adequate drinking pattern assessment in general-purpose epidemiological surveys remains controversial despite efforts to address this issue in expert panels [92].

Although methodological measurement studies are much more common in Europe and the United States, a limitation of this review, it will be very important for studies in developing countries to incorporate such approaches, especially drink ethanol assessment, into their designs. We are aware of several as yet unreported studies under way in the Indian subcontinent and Africa that measure carefully the strength and size of both commercial and locally fermented or distilled beverages.

There is still a great importance in conducting carefully designed studies incorporating the best available alcohol measures consistent with study purposes. We believe that methodological studies now being reported will lead to instrument refinements that will improve self-reported ethanol intake. When resources permit it is clear from data already available, and results of newer studies expected to be published soon, that achieving greater measurement precision can be achieved. We can be confident that alcohol consumption measurement, in its broadest sense, is essential for the full characterization of human drinking, alcohol use disorders, alcohol treatment and prevention [173]; improving intake measurement precision is urgent because it lies at the heart of the field's anticipated progress.

Acknowledgements

Work on this paper was supported by the National Institute on Alcohol Abuse and Alcoholism, Center Grant P30 05595 to the Public Health Institute.

Ancillary