Development of the Gambling Disorder Identification Test: Results from an international Delphi and consensus process

Abstract Objectives Diverse instruments are used to measure problem gambling and Gambling Disorder intervention outcomes. The 2004 Banff consensus agreement proposed necessary features for reporting gambling treatment efficacy. To address the challenge of including these features in a single instrument, a process was initiated to develop the Gambling Disorder Identification Test (GDIT), as an instrument analogous to the Alcohol Use Disorders Identification Test and the Drug Use Disorders Identification Test. Methods Gambling experts from 10 countries participated in an international two‐round Delphi (n = 61; n = 30), rating 30 items proposed for inclusion in the GDIT. Gambling researchers and clinicians from several countries participated in three consensus meetings (n = 10; n = 4; n = 3). User feedback was obtained from individuals with experience of problem gambling (n = 12) and from treatment‐seekers with Gambling Disorder (n = 8). Results Ten items fulfilled Delphi consensus criteria for inclusion in the GDIT (M ≥ 7 on a scale of 1–9 in the second round). Item‐related issues were addressed, and four more items were added to conform to the Banff agreement recommendations, yielding a final draft version of the GDIT with 14 items in three domains: gambling behavior, gambling symptoms and negative consequences. Conclusions This study established preliminary construct and face validity for the GDIT.

report assessment, such as the widely-used Problem Gambling Severity Index (PGSI; Ferris & Wynne, 2001). From a clinical perspective, the diagnostic criteria in the latest edition of the Diagnostic and Statistical

Manual of Mental Disorders (DSM-5) were revised in 2013 and labeled
Gambling Disorder (GD), with three levels of symptom severity (American Psychiatric Association, 2013). At the same time, gambling was classified together with substance use disorders, covering alcohol and drug use, which have long been the focus of extensive research on assessment, trajectories of use, and treatment outcomes.
A major persistent issue has been how to measure PG and GD (Caler, Garcia, & Nower, 2016;Dowling et al., 2017;Pickering, Keen, Entwistle, & Blaszczynski, 2017). In an effort to examine the global prevalence of PG across countries and time, Williams, Volberg, and Stevens (2012)  Institute's 3rd Annual Conference (Walker et al., 2006), an annual independent gambling conference in Banff, Canada. The result, known as the Banff consensus agreement, was a major step forward in the conceptualization of a framework for minimal features of treatment outcome measures. The Banff framework stipulates three domains: (1) measures of gambling behavior (net expenditure each month, the frequency in days per month when gambling takes place, and time spent thinking about or engaged in the pursuit of gambling each month); (2) measures of the harms caused by gambling (personal health, relationships, financial and legal); and (3) measures of the proposed mechanism of change in a specific treatment. At the time of the Banff consensus, it was clear that one obstacle to its realization was the lack of existing gambling measures that fully complied with it (Walker et al., 2006).
A recent systematic review (Pickering et al., 2017) concluded that most gambling studies failed to fulfill the measurement guidelines outlined by Walker et al. (2006). Furthermore, a comprehensive analysis of existing gambling measures (Molander et al., 2019) identified limitations in terms of content validity. Categorization of all items in 47 different gambling measures showed that they targeted a wide range of constructs, such as PG symptoms and urges, gambling behavior, monetary aspects, negative consequences of gambling, cognitive distortions, motivation and self-efficacy (Molander et al., 2019). Despite the passage of time, it was still the case that no measure seemed to adequately fulfill the recommendations in the Banff consensus (Walker et al., 2006). An additional limitation was that few measures were validated in relation to the new DSM-5 criteria for GD (Molander et al., 2019). Even more recently, a systematic review identified 31 different screening instruments from 60 studies, finding that only 3 instruments had been validated against the DSM-5 criteria for GD (Otto et al., 2020).
In order to redress this situation, we initiated a process to develop the Gambling Disorder Identification Test (GDIT), as an instrument measuring the frequency of gambling behavior as well as related symptoms and consequences, analogous to the Alcohol Use Disorders Identification Test (AUDIT; Saunders, Aasland, Babor, de La Fuente, & Grant, 1993) and the Drug Use Disorders Identification Test (DUDIT; Berman, Bergman, Palmstierna, & Schlyter, 2005). Using the AUDIT and DUDIT as a point of reference for this development process has several potential advantages. First, the AUDIT and the DUDIT content (substance use behaviors, dependence symptoms and negative consequences) corresponds to the first two domains of gambling behavior, and problems caused by gambling, recommended in the Banff consensus (Walker et al., 2006). We did not include the third Banff domain, items measuring processes of change, as such measures are treatment specific and need to be tailored to a range of possible theoretical assumptions. Secondly, the AUDIT and the DUDIT are widely used internationally to identify and assess problematic substance use within health care-and social service systems, as well as public health agencies (for reviews see Hildebrand, 2015;Reinert & Allen, 2002). Developing a measure for gambling similar to the AUDIT and the DUDIT is compatible with the DSM-5 decision to label gambling as an addictive behavior, and more easily facilitate implementation of screening procedures for PG. Third, the AUDIT and the DUDIT use frequency-based categories asking the respondent to state how often substance use behavior as well as dependence symptoms and consequences occur, for example "Never, Less than once a month, Every month, Every week, Daily or almost every day". This is an advantage compared to existing gambling measures using dichotomous "Yes/No" (e.g., the NORC Diagnostic Screen for Gambling Problems [NODS; D. C. Hodgins, 2004]); or vaguely stated verbal item responses, for example "Not at all, Rarely, Sometimes, Often" in the PGSI (Ferris & Wynne, 2001). Developing a gambling measure using specified frequency-based behavioral categories will enable clearer measurement procedures (e.g., De Vet, Terwee, Mokkink, & Knol, 2011) as well as possibly facilitate comparisons between problematic substance use and PG behavior.
The GDIT development process has included four steps, generally aligned with the instrument development steps outlined Gehlbach and Brinkworth (2011): (1) identification of items that might be eligible for the GDIT from a pool of existing gambling measures; (2) presentation of proposed items for evaluation by invited experts in gambling research, clinical practice and treatment training, through an online Delphi process and subsequent consensus meetings to determine included items and formulate new items as necessary; (3) pilot testing of a draft version of the GDIT for face validity in a small group of participants with self-experience of PG (n = 12), as well as preliminary psychometric properties in a small group of treatment-seeking participants with PG or GD (n = 8); and (4) evaluation of the psychometric properties of the final GDIT measure in 2 of 21 -MOLANDER ET AL. relation to existing instruments and semi-structured interviews assessing the DSM-5 criteria for GD, among individuals with PG or GD as well as non-problematic recreational gambling behaviors (sample target n = 600). The first, second and third steps have been completed and the fourth step is now underway. The first step, with identification and content-based categorization of 583 unique items from 47 existing gambling measures, has been described in a published research protocol (Molander et al., 2019). This first step also involved selection of 30 possible items eligible for inclusion in the GDIT, based on inter-rater agreement on items relevant for the proposed GDIT domains, previous psychometric findings regarding PG (Chamberlain, Stochl, Redden, Odlaug, & Grant, 2017;Stinchfield et al., 2016;Volberg & Williams, 2011) as well as the Banff consensus recommendations (Walker et al., 2006).
Our aim in this article is to describe steps two and three, showing how a consensus was reached regarding a specific set of items, and yielding a testable draft version of the GDIT. The consensus process built on prioritizing item domains recommended in the Banff agreement, with international input from a Delphi process with an ensuing consensus procedure. The research questions in this study are: 1. Which items should have the highest priority for inclusion in the GDIT?
2. What possible problematic issues emerged concerning the prioritized items?
3. How might problematic issues among the prioritized items be addressed?
4. Which additional items would need to be included in the first GDIT version, in order to fully comply with the Banff consensus agreement recommendation?

| METHOD
The methodology used in the GDIT development process has been described elsewhere (Molander et al., 2019). Briefly, the process builds on several interdependent stages (see Figure 1), where the recommendations from the Banff consensus were given priority beyond the Delphi results.

| Delphi survey rounds
An online international Delphi survey was launched with a presentation of the 30 items eligible for inclusion in the GDIT that were identified in step one (Molander et al., 2019). Using snowball sampling, we invited an extensive range of expert stakeholders to participate, aiming to include as many relevant stakeholders as possible. The invitation was sent to (1) all authors of the Banff consensus (Walker et al., 2006), (2) corresponding authors of articles reporting previous psychometric findings as well as reviews of gambling measures identified in our preparatory study (Caler et al., 2016;Chamberlain et al., 2017 first and last authors of reports and articles on gambling measures (see Molander et al., 2019) as well as (4) authors of reports on randomized trials evaluating interventions for PG and GD, published in systematic trials (Cowlishaw et al., 2012;Pallesen, Mitsem, Kvale, Johnsen, & Molde, 2005;Petry, Ginley, & Rash, 2017) identified in our preparatory study (Molander et al., 2019). We also invited all presenters at the Alberta Gambling Research Institute's 17th Annual Conference, 2018, members of the ongoing six-year research program on Responding to and Reducing Gambling Problem Studies, as well as members of the Swedish Gambling Research Network, a network convening Swedish researchers, clinicians and treatment trainers in the gambling field.
Invitations to participate in the first round of the Delphi process were sent by e-mail on March 16th, 2018 to 170 stakeholders, including the authors of this article. Some stakeholders were sent invitations to multiple email addresses that were identified, for example, via published articles or academic institutions. Stakeholders who completed the first round of the Delphi within two weeks were sent an invitation to participate in the second round. For each round, a single e-mail reminder was sent after one week to stakeholders who did not complete the questionnaire.
The 30 items were presented in the first Delphi round with a rationale for possible inclusion in the final GDIT draft. An example of the text presented is as follows: Item 8. How often have you gambled to win back money you lost, the past 12 months? Rationale: "Chasing losses" is a key dependence symptom in the diagnostic criteria of Gambling Disorder. Denis, Fatséas, and Auriacombe (2012) found that "chasing losses" in addition to three other DSM-IV criteria (repeated unsuccessful efforts to stop, lies, and jeopardized/lost relationships/job) best discriminated pathological-and non-pathological gamblers. In a later study of DSM-5 criteria. Chamberlain, Stochl, Redden, Odlaug and Grant (Chamberlain et al., 2017) found that "the main diagnostic item serving to discriminate recreational from problem gamblers was endorsement of chasing losses".
Participants were asked to rate the importance of each item for inclusion in the GDIT on a scale from 1 to 9, where scores of 1-3 were classed as "not important for inclusion," 4-6 were classed as "important but not critical," and 7-9 were classed as "critical for inclusion" (see Guyatt et al., 2011). In addition, stakeholders were offered space for optional comments on each item regarding possible problematic issues, such as psychometric relevance and accuracy, semantic item structure and content of multiple-choice alternatives. For the second Delphi round, the results from the first Delphi round were compiled and item ratings as well as all stakeholder comments for each item were presented. The respondents were asked to reflect on the results and to rate and comment on each item again. The consensus criterion regarding the importance of including an item in the GDIT was set to M ≥ 7 for each item in the second survey round; in view of the lack of MOLANDER ET AL. guidelines for Delphi consensus criteria, we chose to set the consensus criterion to include items rated in the top third of the rating scale. The results of Delphi rounds 1 and 2 were presented in three following consensus meetings with gambling researchers, where each item with its response categories was reviewed and discussed. This yielded a final selection of items, based upon (1) the recommended features of gambling measures in the Banff consensus (Walker et al., 2006), and (2) the consensus criteria in the expert Delphi.

| Consensus procedure
The results from both Delphi rounds were first presented at a consensus meeting on April 14, 2018 at the Alberta Gambling Research Institute's 17th Annual Conference, in Banff, Alberta, Canada (Molander et al., 2018). Participants in the consensus meeting were 10 gambling and addiction researchers from five countries, eight of whom had participated in the Delphi, and two of whom were recruited on site; all agreed to participate in the consensus meeting. Two following consensus meetings with a sub-group of four gambling researchers from two countries were held in Stockholm, at Karolinska Institutet, on May 8, and May 30, 2018. The purpose of all consensus meetings was to resolve issues in items through discussion and consensus decisions, in order to arrive at a draft version of the GDIT. During the meetings, Delphi item ratings and categories of item issues identified in expert comments from the Delphi questionnaire were discussed in detail for each item, in relation to the recommendations in the Banff consensus (Walker et al., 2006). A PowerPoint presentation was used as a tool to summarize items, problematic issues and proposed solutions (see Figure S1). At each consensus meeting, the discussion involved how to resolve the item issues identified in the expert comments, which frequently concerned item phrasing or formulation of response categories, as well as whether to include the item in the draft version of the GDIT. The discussion ended in a consensus-based outcome for each item. Thereafter, a draft version of the GDIT was formulated.

| User experience and pilot testing
To evaluate user experience and face validity, the draft GDIT English version was translated into Swedish using a back-translation procedure. The Swedish version was then presented to participants with self-experienced PG, recruited from self-help groups (n = 12), using a "think aloud" procedure (Boren & Ramey, 2000;Ericsson & Simon, 1980). The interviews were conducted by authors OM and VM at the local Association for Gambling Addiction in Stockholm and the Center for Dependency Disorders in Falun. In order to assess feasibility and face validity of the GDIT draft version, it was then administered to a small sample (n = 8) of treatment-seeking gamblers at the Stockholm Center for Dependency Disorders. This procedure constituted a purely qualitative test of the draft version and as such the sample sizes were deemed sufficient when participant comments were saturated.

| Data analysis
Frequencies, means, standard deviations as well as "critical for inclusion" percentages reflecting item ratings of 7-9, in Delphi rounds F I G U R E 1 Development of the Gambling Disorder Identification Test (GDIT), in four steps 1 and 2 were calculated. All quantitative analyses were done in R Studio version 1.1.456 (R Core Team, 2018). Qualitative analysis of the Delphi expert comments of issues in the proposed items was conducted by author OM, using a simple review and categorization procedure. Participant responses in the "think aloud" interviews as well as data from the psychometric pilot were reviewed by author OM with the aim of identifying and addressing remaining item issues, and subsequently discussed with author AHB in order to reach consensus decisions for each remaining issue.

| Quantitative Delphi analysis
Of the 170 invited stakeholders, 61 stakeholders consented and completed the first Delphi round, and 30 of these completed the second Delphi round (49% completion rate). Stakeholders included gambling researchers, clinicians and trainers from 10 countries (31% women). Table 1 shows participant characteristics from the first and second rounds.
The consensus process led to selection of 10 items, deriving from six different prior instruments (PPGM 10a, SOGS 4, PPGM 8, MAGS 25, PPGM 1b, CSPG 1, CPGI 8, NODS 14, CPGI 10, and "Gambling types") that fulfilled the criteria for consensus regarding importance of inclusion in the GDIT (see Table 2). These 10 items targeted the following constructs, listed in order of rating level, from highest to lowest: Loss of control, Chasing losses, Jeopardized opportunities, Financial problems, Frequency of gambling behavior, Tolerance, Relationship problems, Borrowed/Sold articles of value and Gambling types. Most items that fulfilled the criteria for consensus regarding importance of inclusion in the GDIT were in the domains of dependence symptoms (n = 4) and negative consequences (n = 4). None of the items targeting the constructs of Preoccupation or Expenditures were rated highly enough in terms of importance to be included in the GDIT. In general, all items targeting monetary constructs (e.g., losses, spending, income or net expenditures), were rated low in both Delphi questionnaire rounds 1 and 2 (mean < 6 on a scale from 1 to 9). Comments concerning the low ratings for monetary constructs suggested that such constructs are difficult to measure since they are complicated constructs liable to misinterpretation in terms of the time frame (e.g., gambling session length), spending versus winning/losing, impose a high cognitive load for the respondent due to this complexity, are vulnerable due to lack of verifiability regarding monetary expenditures and, finally, are plagued by recall bias.

| Qualitative Delphi analysis
A range of potential problematic issues in relation to the items rated in the Delphi questionnaire was identified in the expert comments, yielding six categories: Time frame, Response categories, Compound formulation (referring to double-or triple-barreled items), Phrasing, Lack of relevance/applicability, and Other/miscellaneous (see Table 2). Typically, comments on items in the domains of Dependence

| Item selection
The 10 items that fulfilled consensus criteria regarding importance of inclusion in the Delphi were reviewed by author OM in relation to the recommendations in the Banff consensus (Walker et al., 2006).
Several recommended constructs were lacking, for example, Preoccupation, Expenditures, and Health problems due to gambling, leading to construct under-representation in relation to the Banff recommendations. Therefore, 11 additional items below the Delphi consensus threshold (NODS 11, PPGM 12/BPGS 1, MAGS 21, CSPG2, PPGM2, "Income", CPG I4, GPI 1e, GQPN 5, PGBS 1, and CPGI 32/CPGI 33; see Table 3 below) were added to be considered for inclusion in GDIT in the three consensus meetings. Modified response categories, analogous to the AUDIT (Saunders et al., 1993) and the DUDIT (Berman et al., 2005) format, were also proposed for all the selected items within the GDIT domains Gambling consumption behavior, Dependence symptoms and Negative consequences.

| Consensus meetings
Three consensus meetings were held. The first meeting, held in Banff, included 10 gambling researchers from Canada, England, Sweden and the USA. The outcome of this meeting was the inclusion of six items fulfilling Delphi criteria in the draft GDIT version (PPGM 10a,SOGS 4,PPGM 8,MAGS 25,PPGM 1b,and NODS 14). Two changes in item phrasing were implemented: PPGM 8 was rephrased to avoid compound formulation and PPGM 1b was moved from the GDIT Dependence symptoms domain to the Negative consequences domain.
Also, a discussion was held concerning whether to expand response categories in the Gambling consumption behaviors domain, as gambling may occur more frequently than alcohol or drug use. The second meeting, held in Stockholm, included four gambling researchers from Sweden and Canada. The outcome of this meeting was the inclusion of four remaining items which fulfilled the Delphi criteria in the draft GDIT version (CSPG 1, CPGI 8, "Gambling types," and CPGI 10). Two items that were rated below the Delphi consensus threshold (NODS 1 and PPGM 12) were reviewed and included, based on their alignment with the Banff recommendations (Walker et al, 2006). In addition, CPGI 8 was moved from the GDIT Dependence symptoms domain to the Negative consequences domain, and CPGI 10 was MOLANDER ET AL. moved from the Negative consequences domain to the Dependence symptoms domain. The instructions for the "Gambling types" item were rephrased and the "Gambling list" was reviewed and revised to improve categories and examples of gambling types. The third and final consensus meeting included three gambling researchers from Sweden, who reviewed the remaining items that were rated below the Delphi consensus threshold. Five of these (MAGS 21, CSPG 2, PPGM 2, "Income," and GQPN 5) were included in the draft version of the GDIT.
The constructed "Income" item was rephrased as "income after tax," including salary and welfare or other subsidies, and GQPN 5 was rephrased to only assess losses rather than spending and losses, to clarify the question and reduce confusion. At each meeting, included items were rephrased and clarified to match the GDIT format.
Following the consensus meetings, the GDIT draft version in English was translated into Swedish using a back-translation procedure (Kuliś, Whittaker, Greimel, Bottomley, & Koller, 2017).

| Preliminary testing and final draft version
Participants with personal experience of PG (n = 12) were recruited from gambling self-help groups, and gave feedback on each item in the GDIT draft version according to a "think aloud" procedure (Ericsson & Simon, 1980). Overall, the participants expressed that the items in the GDIT draft version were comprehensible and important from PG and GD perspectives. The participants also suggested that response alternatives should be added in the Gambling behavior domain to include gambling every day, and discussed whether gamblers could reliably estimate and report gambling losses in the expenditures and gambling types appendix. See Table 4 for examples of participant responses.
The GDIT draft version was then administered to a subsample (n = 8) of treatment-seeking gamblers, in a pilot test. The participant responses were reviewed and remaining issues with the expenditure T A B L E 1 Participants in Delphi rounds, consensus meetings and evaluation of user experience

| DISCUSSION
This article describes an iterative collaborative consensus process for specific item selection and modification in the development of a new gambling measure. A specific set of items with the highest priority was identified and included in a testable draft version of the GDIT.
Overall, the study established preliminary construct and face validity for the GDIT, with item domains that align with the constructs in the Banff consensus recommendations, as well as the AUDIT and DUDIT domains of consumption, symptoms and negative consequences.
Two major item-related issues were identified and addressed.
First, it became evident that many Delphi items, gathered from existing gambling measures, were phrased using double or triple compound phrasing. Some possible explanations for this could be that items were originally phrased in an effort to clarify their construct using examples, or that they emanated from the diagnostic criteria formulated in a compound manner, for example A8 "Has jeopardized or lost a significant relationship, job, or educational or career opportunity because of gambling" (American Psychiatric Association, 2013). However, while assessing items, many Delphi stakeholders emphasized that double or triple compound formulation of items can be problematic. Several participants in the "think aloud" interviews also remarked on this issue, stating for example that it was confusing to know which of the statements or examples to answer.
We addressed compound formulation issues, when applicable, by rephrasing the items so that they targeted a single construct of -17 of 21 primary interest (e.g., gambling-related negative consequences for relationships), in an effort to strengthen the construct validity of the GDIT draft version.
Second, items targeting expenditures were frequently identified as problematic by participants throughout all phases in the Delphi process. The Banff consensus (Walker et al., 2006)  This study was characterized by numerous strengths. First, initial item selection was based upon a comprehensive analysis of existing self-report instruments for measuring PG and GD. This analysis included inter-rater reliability calculations regarding content of specific items (Molander et al., 2019), referencing of previous psychometric findings (Chamberlain et al., 2017;Stinchfield et al., 2016;Volberg & Williams, 2011) and previous consensus-based frameworks among gambling researchers (Walker et al., 2006), as well as taking the revised DSM-5 criteria for GD into account (American Psychiatric Association, 2013). Second, an international group of experts from a total of ten countries participated in the Delphi survey, in many cases giving detailed, specific feedback on each individual item. Third, transparent procedures were applied for arriving at consensus-based decisions. Fourthly, we used think-aloud interviews to gather feedback from participants with experience of PG, in an effort to increase the face validity of the GDIT draft version, which led to extension of the response alternatives on the gambling behavioral frequency items to include multiple sessions during a 24-h period, as well as considering revision of the expenditure items to follow TLFB procedures. Further, pilot psychometric testing from participants with both PG and GD convinced us to revise the expenditure items, as the initial responses were very difficult to interpret. Fifth, structured consensus procedures were used to resolve item-related issues that were identified throughout the phases in the study, as well as to address the recommendations in the Banff consensus (Walker et al., 2006). An additional strength concerns the research strategy from a wider perspective. As noted above, the gambling research field encompasses a large number of diverse measures and outcomes (Molander et al., 2019;Otto et al., 2020;Pallesen et al., 2005;Pickering et al., 2017), making it difficult to synthesize research findings, for example in systematic reviews and meta-analyses of trial outcomes. This problem has also been identified in the area of hazardous and harmful alcohol consumption, and is being addressed by an initiative to establish a minimum set of core outcomes for wide use in treatment outcome studies (Shorter et al., 2019). The problem of measure diversity not only hinders comparability, it also contributes to researchers spending valuable time and resources collecting data and analyzing results that may not make as great a contribution as desired. By joining forces, the research field can avoid "reinventing the wheel" and combine forces to advance the gambling studies field. In sum, the development of the GDIT has the potential to resolve some of the field's current challenges related to measurement. Some limitations also characterized this study. First, it was not possible to reach a broad consensus-based conclusion on how to measure gambling expenditures on a specific item level, reflecting the complexity of this issue. available period for stakeholders to complete the Delphi rounds was short. Thirdly, it would have been preferable to include all gambling researchers from the consensus meeting in Banff in all consensusbased decisions regarding the GDIT, but this was not possible due to practical and time-related challenges. Finally, the use of a predetermined rating scale (see Guyatt et al, 2011) for item inclusion in the Delphi, may have made it more difficult to include items of content-based value in relation to the Banff consensus (Walker et al., 2006). Had we chosen a more data driven approach for establishing a tailored rating scale, the results of the Delphi process might have been more aligned with the recommendations of Walker et al. (2006).
Future psychometric studies on the GDIT instrument will evaluate the validity and reliability of the GDIT against the DSM-5 criteria for GD, a study that is ongoing as step 4 in the develop-