Development of a digestive health status instrument: tests of scaling assumptions, structure and reliability in a primary care population

Authors


Shaw Health Research Center, Park Nicollet Clinic, Institute for Research and Education, HealthSystem Minnesota, 3800 Park Nicollet Boulevard, Minneapolis, MN 55416, USA.

Abstract

Background:

The absence of valid and reliable health status measures for functional gastrointestinal illness has limited research and patient care for this common group of disorders. A self-report survey has been developed.

Methods:

Initial development focused on extensive pre-testing of patients, primary care physicians and gastroenterologists. The disease-specific portion included the Rome criteria for dyspepsia subgroups and the Manning and Rome criteria for irritable bowel syndrome. The Short Form-36 was added. Psychometric analyses included techniques of multitrait scaling, scale internal consistency and criterion validation.

Results:

Six hundred and ninety patients presenting to their primary care physician for treatment of heartburn, abdominal pain or discomfort completed the 98 question survey. The disease-specific portion revealed five components including reflux, dysmotility, a two-domain bowel dysfunction complex, and a pain index. Internal consistency measures demonstrated good to excellent reliability. Scaling successes were observed on multitrait scaling. The disease-specific portion was reduced to 34 questions. Criterion validity was demonstrated with the correlation of the disease-specific questions to the SF-36.

Conclusions:

The psychometric analyses lend credence to the concept of stomach and bowel symptom subgrouping as proposed by expert consensus. The psychometric properties of the five summated disease-specific scales compare favourably with standardized health status measures.

INTRODUCTION

Patients presenting to primary care physicians for evaluation of abdominal pain, discomfort or heartburn present many challenges during initial assessment and treatment. Most have functional gastrointestinal illness1[2]–3 with no identifiable pathophysiological markers.4 Consequently, patient-reported symptoms are the predominant source of information for the physician. These symptoms are both heterogeneous, with frequent overlap of irritable bowel syndrome and dyspepsia symptom subgroups,5 and dynamic, as a majority demonstrate a change from one subgroup to another over a 1-year period.6 It is not surprising that the initial evaluation and subsequent follow-up is quite variable in both primary care and specialty settings.

The availability of valid and reliable survey instruments to characterize patient-reported symptoms could help overcome these problems. Surveys incorporating symptom-based diagnostic criteria may discriminate those more likely to have irritable bowel syndrome, reflux disease or non-functional illness from the large pool of patients presenting to primary care physicians with gastrointestinal complaints.2, 7 Such instruments would ensure that the full spectrum and intensity of gastrointestinal complaints were characterized and quantified. The capacity to measure changes in response to treatment would be particularly useful. Determination of the degree of interference with life caused by functional illness compared to a general clinic population could be achieved by inclusion of generic health status measures, such as the Medical Outcomes Study Short Form-36 (SF-36).8, 9

A number of symptom-based, expert classification systems for functional gastrointestinal illness have been developed.10, 11 Based on patient symptoms, the focus of these systems is on diagnosis. The expert consensus-derived definitions have supported more systematic data collection in clinical and epidemiological research.12[13][14]–15 Psychometric assessment has been minimal. Operationalizing these definitions into self-report surveys has not been done, limiting their use in primary care.

Three dyspepsia-specific health status measures have been published.16[17]–18 One of these included a generic health status measure and symptoms typical of irritable bowel syndrome, in addition to dyspeptic complaints.16 Content validity of the preliminary instrument was not supported by expert consensus criteria for dyspepsia and irritable bowel syndrome. The dimensionality of symptoms was examined using factor analysis,16 but multitrait scaling was not employed in the initial analyses. This instrument has proved useful in clinical trials and normative data are available.19 The limitations in content validity will compromise the discrimination of patients with the overlapping symptom complexes of dyspepsia and irritable bowel syndrome or to detect the change in symptom subgroups so characteristic of a primary care clinic population.6 The other two, shorter instruments may prove helpful in a limited number of patient groups for detecting change in severity of a very focused group of symptoms.17, 18

One disease-specific patient report survey for irritable bowel syndrome has been developed in tertiary centre populations.20, 21 Neither expert consensus definitions nor dyspepsia symptom complexes were included. Internal validity was not established by multitrait scaling nor was the dimensionality established by factor analysis. Reliability measurement was limited to internal consistency. The validation effort exceeded that which is typically seen for a preliminary instrument. Responsiveness was not reported. As this instrument requires that the patient have a known diagnosis of irritable bowel syndrome, it should perform well in clinical trials or practice settings where patients have previously been diagnosed.

This paper reports the development of a discriminative digestive health status instrument (DHSI) designed to support the practising physician. The instrument relies on the heritage of existing symptom surveys but improves upon them in several ways. The DHSI includes criteria consistent with expert classification systems. The development was informed by established principles of instrument design and evaluation including extensive cognitive interviewing of patients and primary care physicians, multitrait scaling, and various methods of assessing scale validity and reliability. These steps in instrument testing and refinement were attained using a large cohort of unselected primary care patients presenting for management of abdominal pain, discomfort or heartburn.

METHODS

Subjects and setting

Consecutive patients over 18 years of age who presented to their primary care physician at four of 19 Park Nicollet Clinics (a 422 physician multispecialty group practice based in Minneapolis, MN, and its suburbs) complaining of abdominal pain, discomfort or heartburn were offered enrolment in the study by means of a survey mailed to their home. The clinics were selected to provide a 700 patient base with a broad spectrum of ages and at least some racial diversity over an 8-month enrolment period. Thirteen internists and 13 family practitioners practised at the four sites.

Appropriate subjects were identified at registration for the visit by a two-item questionnaire that asked if they were coming to see their physician for abdominal pain, discomfort or heartburn. The check-in clerks provided and collected the questionnaires at a site separate from the examination rooms. The screening questionnaires were forwarded to research staff who mailed all patients a packet containing a covering letter describing the study and a preliminary survey, including the SF-36 and stomach- and bowel-specific items. Patients were encouraged to return the questionnaire as soon as possible, and a self-addressed, stamped envelope was provided to facilitate this. Reminder postcards were sent 3 days after the initial mailing and non-responders were called after 5 working days.

Questionnaire development

Initial survey construction followed the method of Dillman.22 The literature was searched for instruments, followed by an initial draft of the questionnaire by two gastroenterologists on the research team.15[16]–17, 23 All 42 questions from the Bowel Disease Questionnaire were included.24 The Rome definitions for dyspepsia subgroups were included as were the Manning and Rome criteria for irritable bowel syndrome.10, 11, 25 Two groups of five practising gastroenterologists reviewed the draft for content. Survey and outcomes methodologists suggested further modifications and scaled the questions. All questions requested subjects to respond about their symptom experience over the past 4 weeks.

Internists and family practitioners at each of the four clinic sites were invited to informational meetings at their clinic. Sixteen of the 26 physicians attended. Study design and in particular survey design were discussed with this important group of future survey users. Survey technicians and the principal investigator sought their general opinions on the content and clarity of the survey. The survey technicians engaged all attendees in prepared queries about individual questions. After the suggestions of the primary care physicians had been incorporated, the SF-36 was added at the end of the questionnaire, to give a total of 98 questions.

The survey was pre-tested in 10 patients presenting to the gastroenterology clinic at Park Nicollet with complaints of abdominal pain, discomfort or heartburn. Cognitive interviews were conducted to detect potential problems with the design or wording of questions. How respondents processed the solicited information, as well as problems with interpretation, were examined. Finally, subjects were queried as to whether the items adequately surveyed the breadth of their symptoms. Suggestions were made to improve the clarity of the questionnaire, but content changes were not suggested.

All subjects enrolled in the study were encouraged to communicate suggestions for improving the questionnaire or their health care to the research team, either by writing on the questionnaire or by phone. Ten patients among the first 100 receiving the questionnaire by mail made written comments about their health care and were contacted to arrange a meeting with one of the research assistants for similar cognitive interviews as described above. No revisions were suggested by this group after the interviews. Three of the 722 patients who returned the questionnaire commented on the survey content—two requested that it be shortened and one recommended asking questions about the impact of gastrointestinal tract symptoms on sexual function.

Multitrait scaling analysis

To assess the validity of presenting the DHSI as summated scales, a general approach adapted from multitrait scaling was utilized26 and included the following:

1 Factor analytic test

2 Item-response variability

3 Scale internal consistency

4 Item convergent validity

5 Item discrimination validity

6 Item-total correlations

Factor analysis.

We were interested in whether the themes related to functional illness identified by the process of instrument development would be supported empirically. In order to examine the dimensionality of stomach and bowel symptoms, principal component analysis was undertaken. Three to six component models were considered. Criteria used to select the most plausible model included components accounting for at least 5% of total variance and with eigenvalues of one or greater. In addition, the scree plots were reviewed for a transition in the size of the eigenvalues. Retained components were required to have at least four items with all the items on a given component appearing to measure the same concept. After component extraction, orthogonal and oblique rotations were performed.27 The SF-36 was analysed similarly.

A variety of interpretability criteria were applied to selected items comprising the components after rotation. Items with loadings of less than 0.40 on any component or with loadings greater than 0.40 on two components were dropped from the survey.27, 28

Missing data were handled by using the respondent’s average for the remaining items in the scale if the subject was missing less than half of the items in the scale. The percentage of patients for whom scale scores could be calculated was noted for each sample (either from complete data or from imputation).

The raw scale score was calculated by summing across the items. The raw score was then transformed into a scale that had a range of 0–100 if fewer than half the items were missing. Symptoms of increasing severity were associated with higher scores. To determine whether the range of the hypothesized scales was appropriate for all subgroups, the skewness of each scale score distribution was estimated. The percentage of the sample achieving the lowest (floor effect) and highest (ceiling effect) possible score was also calculated.

Item-response variability.

The frequency of positive response to an item, the means and standard deviations, and distribution of responses were determined. The item-response variability criterion requires that response distributions and standard deviations be roughly symmetrical in items measuring the same construct. Variables with the frequency of endorsement for a single response category in greater than 80% of patients or in less than 10% were discarded.29

Scale internal consistency.

The internal consistency of the summary scales was determined by the method of Cronbach.30 Reliability was considered acceptable when Cronbach’s alpha values were greater than or equal to 0.70.31, 32

Item convergent and discrimination validity.

Item-scale correlations were examined in a matrix in which the items are rows and the scales are columns. Correlations between items and scales were corrected for overlap. The item internal consistency criterion (convergent validity) was set at 0.35 to retain an item in a scale.26

Item-scale correlations were again examined in a matrix in which the items were rows and the scales were columns. The item discrimination criterion (discriminant validity) required that the correlation between an item and its hypothesized scale be greater than two standard errors larger than any other correlation in the same row to consider it a scaling success.26

Item-total correlations.

Items in the same scale should contain the same proportion of information about the construct. The range of correlations corrected for overlap for each scale was examined. The impact on Cronbach’s alpha of deleting the items with the highest and lowest correlation was determined.31

Criterion validity assessment

We examined the extent to which the identified scales correlated with the SF-36, hypothesizing that the greatest correlation would be seen between the pain and social functioning scales of the SF-36 and the pain index on the dyspepsia-specific measures. The social functioning scale was included, because our unpublished observations revealed a high correlation between the pain and social functioning scales on the SF-36.

RESULTS

Study population

Seven hundred and twenty-two of 1022 patients identified as eligible for the study returned the initial survey mailed to their home. Thirty-two responders were withdrawn from the study. Chart review 1 year after study entry identified reasons for exclusion. The reasons for withdrawal were: 15 denied any abdominal complaints on the surveys and the chart note from the entry visit did not mention abdominal pain, discomfort or heartburn; four were terminally ill with a non-gastrointestinal illness; three had had a total gastrectomy; three had disabling dementia; three answered less than 50% of all questions; two had poor English speaking and reading skills; one subject’s spouse filled out the surveys; and one had active, major psychiatric illness. The mean age of the survey responders was 52.7 (s.d. 17.3) years; 64.6% were women and 94.1% were Caucasian. The non-responders were less likely to be women (54.9% women, = 0.006) and were younger (mean age 44.4 years with a s.d. 15.9, = 0.0001). Socioeconomic status was not directly measured; however, 94.8% of responders were high school graduates, 77.5% had some post-secondary education, and 39.8% at least graduated from college.

Multitrait scaling

Factor analysis.

Principal component analysis revealing the common features of stomach and bowel symptoms is shown in Table 1. Models containing three to six components were considered. Based on the criteria presented in the Methods section, a five-factor model obtained after oblique rotation seemed the most plausible.27, 28 For three items—acid taste in the mouth, postprandial fullness and early satiety—loadings of 0.39 were accepted.

Table 1.  .  Principal component analysis. Oblique rotated factor pattern (standard regression coefficients) Thumbnail image of

The first component contains four of six Manning criteria for irritable bowel syndrome25 plus a Rome criterion, defecatory urgency.11 Bowel dysfunction in the second component revolves around constipation and includes one Manning criterion (bloating) along with two Rome criteria (straining with BM and decreased frequency of bowel movements). Incomplete evacuation is complex and relates to both of these components. Variables showing high loadings on the third component are accepted as common symptoms for those with reflux disorders. The fourth component is comprised of many items in the dysmotility-like dyspepsia subgroup postulated by Talley et al.10 A fifth, distinct component reflected the patient pain experience. For the most part, these components suggest clinically and conceptually distinct features of irritable bowel syndrome and dyspepsia.

On reviewing the frequencies of endorsement of the items and examining the results from the principal component analysis, 30 items were dropped from subsequent analyses because they did not contribute differential information about the features of functional gastrointestinal illness in this population. Completion of these analyses resulted in a reduction in stomach–bowel items from 54 to 34 questions.

Principal component analysis of the SF-36 confirmed the dimensionality identified during the Medical Outcomes Study.33 The pain and social functioning scales loaded on the same component. Item-scale correlations also showed high correlations between these two scales (data not shown).

Item-response variability.

Item means and standard deviations were comparable across items in the same scale ( Table 2) as were the standard deviations (within 0.4). Missing value rates for the 34 items in the final scale were 7% or less for 17 items, less than 10% but greater than 7% for nine, and nine items had missing data for 10–12.5%. Items with missing rates of 10% or greater included questions introduced with a single stem and organized in tabular form in the questionnaire.

Table 2.  .  Item means and standard deviations Thumbnail image of

Table 3 presents the scale score distributions. Scores were computable for greater than 90% of respondents on every scale. The full range of score distribution was observed for each scale (results not shown). All of the scales were positively skewed with more respondents scoring among less severe symptoms. Floor effects were seen for all scales except the pain scale. Ceiling effects were minimal.

Table 3.  .  Scale score distributions Thumbnail image of

Scale internal consistency.

Cronbach’s alpha was calculated to get more information about the cohesiveness of items in each component. Alphas greater than 0.5 for all scales and greater than 0.7 except for the reflux scale ( Table 3) were demonstrated. For the SF-36, Cronbach’s alpha ranged from a low of 0.81 for the pain scale to a high of 0.95 on the physical function scale similar to what was seen in the Medical Outcomes Study (data not shown).33

Item discriminant and convergent validity.

Item-scale correlations are shown in matrix form in Table 4. Within each scale, item-scale correlations were within 0.2 of each other for the diarrhoea predominant irritable bowel, 0.26 for constipation predominant irritable bowel, 0.29 for reflux, 0.2 for dysmotility and 0.24 for the pain scales. Correlations for all items reached 0.35 or greater with the scale it was hypothesized to represent. For each scale, the correlation between an item and its hypothesized scale exceeded correlations with all other scales by more than two standard deviations.

Table 4.  .  Correlations between items and factors Thumbnail image of

Item total correlations.

Items identified on principal component analysis to belong to a scale correlated most closely with the hypothesized scale for every single item. The irritable bowel syndrome diarrhoea correlations ranged from 0.63 to 0.8, irritable bowel syndrome constipation 0.45–0.71, reflux 0.36–0.65, dysmotility 0.56–0.76 and pain index 0.48–0.81. Not only were the highest correlations seen between an item and its hypothesized scale, but correlations were within 0.3 of other items on the scales indicating a contribution of a similar proportion of information to the scale. Alphas did not change appreciably after deletion of items with either the lowest or highest correlation.

Scale criterion validity.

Criterion validity measures are presented in Table 5. Correlations between the SF-36 and the stomach–bowel specific scales are presented in Table 5. The correlations are negative as the two instruments are scored in opposite directions. As predicted, the strongest correlation is noted between the pain scale and the pain and social functioning scales on the SF-36. These corrrelations are two standard deviations greater than any other correlation.

Table 5.  .  Correlations between SF-36 and digestive specific scales Thumbnail image of

DISCUSSION

Research and care of patients with functional gastrointestinal illness have been hampered by the lack of valid and reliable patient-report survey instruments. We have followed established methods to develop an instrument containing gastrointestinal symptom-specific measures, a pain index and a generic health related quality of life measure. The responses to the questionnaire of almost 700 primary care patients have been assessed by multitrait scaling techniques, internal consistency measures, and for criterion validity. Our findings indicate that the DHSI meets or exceeds the minimum psychometric properties required to express the survey results as summated rating scales, demonstrates good scale internal consistency, and is valid by comparison with an established measure of health status—the SF-36.

A detailed development process was utilized including reviews by a large number of gastroenterologists, primary care physicians and patients prior to instrument use. In addition, this is the only disease-specific health status measure to incorporate Rome criteria for dyspepsia subgroups,10 the Manning criteria for irritable bowel syndrome,25 and the Rome criteria for IBS11 in the preliminary instrument. The initial questionnaire was longer compared to the published dyspepsia and irritable bowel disease specific instruments which started with 15 to 46 questions.16[17]–18, 20, 21 After item reduction, 34 questions remained. The scaling successes demonstrated on the multitrait scaling confirmed the need for a lengthier questionnaire and support the extensive efforts to ensure content validity.

The low levels of unit and item non-response attest to the quality of the data collected. Our primary care population with a broad range of gastrointestinal conditions clearly wished to provide more detailed information about their health than typically occurs. We did not observe a lower rate of response in the elderly as reported by others.33 The willingness of our predominantly Caucasian, highly educated study population to communicate additional information to their physicians via surveys may not generalize to other populations as reduced response rates have been observed in lower socioeconomic groups.33

While high rates of data completeness were the rule, exceptions were seen. Nine questions that were introduced with a single stem demonstrated missing rates of 10–12.5%. Review of these questions raised concerns about clarity of the question wording, despite the extensive patient, physician and expert reviews that occurred during instrument development. After rephrasing, the questions have been compared to the previous ones in pre-testing of 25 patients. Patients without exception identified the rephrased questions as more understandable. Item omission rates comparable to the other items are expected with future use.

Item means and standard deviations were comparable across each scale with some exceptions. Low scores of the unstandardized responses were seen from the reflux scale. The reflux scale item responses were dichotomized according to expert consensus for definitions of dyspepsia subgroups.10 Typical multiple response items have replaced the dichotomized responses in the survey resulting from this study. The criterion of item-response variability should now be satisfied for all scales.26

Disease-specific measures are focused on patients with symptoms and require measurement covering a range of symptom severity. Given negligible ceiling effects, this instrument should provide adequate score variability for a disease-specific questionnaire. A positive skew is expected in a survey covering multiple dimensions including some for which a patient will be asymptomatic. Despite that, floor effects were minimal or low on all scales.

An important first step in the psychometric analysis of the survey was to identify the number of discrete components. Principal component analysis including rotation identified a five component model: reflux, dysmotility, a pain index, and a two-domain bowel dysfunction complex. These results follow logically from the results of Whitehead et al. who used factor analysis of a 23-item stomach–bowel questionnaire to support the existence of irritable bowel syndrome.34 Four factors were suggested from a study population of only women, but the dyspeptic symptoms did not yield consistent results in two subsets of subjects.34 Our larger study population, which included men, yielded more consistent results from our survey containing more detail on upper gastrointestinal symptoms and the pain experience. Clustering of gastrointestinal symptoms into the same four dimensions plus a fifth one, the pain index, was seen.

Good to excellent reliability (defined as alpha greater than 0.7) for an early stage instrument was seen on all five scales as judged by internal consistency measures except for the reflux scale. Conversion of the reflux scale from dichotomized responses to Likert-type responses should increase internal consistency above the threshold of 0.7.35 Only the pain index has a reliability of 0.9, suggested as a minimum requirement to measure change over time.32 The responsiveness of individual scales is undergoing assessment.

None of the published dyspepsia-specific or irritable bowel-specific measures were assessed by the multitrait scaling methods16[17]–18, 20, 21 which provides an assessment of internal validity.26 Convergent validity was shown by item scale correlations that were much more substantial than correlations with other scales. Discriminant validity was confirmed by item correlations with its hypothesized scale exceeding those to the other scales by two standard deviations. The demonstration of a scaling success supported combining the items into summated rating scales.26

Our earlier survey of a community-based sample revealed that the only independent predictors of presentation to a physician for dyspepsia were features of abdominal pain and discomfort.5 Specifically, severity and duration of pain, pain interfering with work, and pain interfering with recreational activities distinguished presenters from non-presenters. Consequently, questions enquiring about the severity of and the effects of abdominal pain or discomfort on various activities were included. Principal component analysis demonstrated a distinct pain index as seen in the MOS.36 The multitrait scaling analyses confirmed the appropriateness of combining the items into a summated rating scale. Results on factor analysis from a published dyspepsia-specific measure suggested a separate pain factor, but the authors elected to combine that factor with two others for scoring purposes.18 Taken together, these psychometric analyses support the patient’s pain experience as a major theme in functional gastrointestinal illness.

In designing the original survey, questions were included suggestive of gastro-oesophageal reflux and to satisfy the Rome conference definitions for functional dyspepsia subgroups.10 This classification system was based on expert consensus. Our data lend empirical support to the concept of subgroups, however, there are significant differences between the system proposed by the expert panel and the empirically derived one. Patients classified with unspecified type dyspepsia represented less than 10% of our clinic population. Combined with the fact that the unspecified type is defined as chronic upper abdominal pain or discomfort with insufficient number of criteria to be classified in another subgroup, this would not be expected to appear as a separate domain after analyses. In the six variable upper tract dysmotility complex, five of six Rome criteria for dysmotility-like dyspepsia were present. A Rome criterion for dysmotility-like dyspepsia, postprandial fullness, was found in the Bowel Dysfunction complex. The final item in this domain, anorexia, was not a Rome criterion.

More remarkable differences were seen in items hypothesized to be ulcer-like dyspepsia and consistent with gastro-oesophageal reflux. Three ‘classic’ criteria for ulcer-like dyspepsia—periodic pain, nocturnal pain, and pain well localized to the epigastrium10—were not identified as significant by principal component analysis. In addition, reflux and ulcer-like dyspepsia were not separate domains. Our psychometric results indicate that expert consensus classification systems specifying separation of ‘functional’ complaints from reflux complaints may be erroneous. Clinical studies on the importance of reflux symptoms in functional dyspepsia and the ‘sensitive’ oesophagus also bring this separation into question.37, 38

The validity of the clinically developed Manning criteria for irritable bowel syndrome and the expert consensus derived Rome criteria received psychometric support from these analyses.11, 25 As previously reported,34 one of the six criteria, mucus per rectum, did not co-vary with any component. In addition, the bowel dysfunction complex was not a single entity. The results are consistent with the long-standing clinical impression of diarrhoea predominant and constipation predominant variants of irritable bowel syndrome. A recently published community survey has suggested that subgrouping of irritable bowel syndrome based on disturbed bowel habit may not identify clinically distinct entities as a significant minority did not report abnormal bowel function.14 Given a separate pain index and items within each domain of the bowel dysfunction complex that do not enquire about diarrhoea or constipation, the DHSI should provide valid and reliable self-report of symptoms in those with irritable bowel syndrome regardless of bowel habit.

A useful disease-specific measure should report elements of the patient’s experience that are different from information that a generic measure provides. Consequently, criterion validity assessed by correlating scores on the SF-36 and the DHSI should show limited correlation. With the exception of the pain scales showing moderate correlation, such was the case when both measures were compared.

The DHSI is a new instrument and has some limitations. Although the study population was large, it was relatively homogeneous. Performance in other racial groups and in patients of lower socioeconomic status has not been assessed. Confirmation of validity and reliability of the SF-12 comparable to SF-36 may facilitate use of a hybrid instrument containing both the DHSI and SF-12 in practice settings and in less well educated populations by decreasing respondent burden.39 To ensure that measurement error is at an acceptable level, other types of reliability should be assessed, especially test–retest reliability if the DHSI will be used to measure change in response to treatment. These initial psychometric analyses were performed in a primary care population presenting with abdominal pain. Psychometric function in settings where such an instrument would be helpful, including subspecialty gastroenterology practices and population-based epidemiological studies, has not been assessed. Evidence of other types of validity would support that the instrument is measuring what it is intended to measure.40 The ability to measure change over time in various conditions would be an especially useful attribute.41 Such studies are under way.

The DHSI was developed and evaluated using a breadth of known principles of instrument construction and assessment. First, construct formation and item selection was informed by a review of the extant literature, expert input from a large number of gastroenterologists and primary care physicians, and extensive cognitive interviewing of patients during instrument development. Second, factor analysis and the multitrait scaling analyses provided empirical support for the theoretical constructs that we hypothesized while developing the instrument and informed the construction of five symptom scales: diarrhoea-predominant irritable bowel syndrome; constipation-predominant irritable bowel syndrome; reflux; dysmotility; and pain. Third, reliability was demonstrated by the high levels of internal consistency of the scales. Fourth, evidence for criterion validity was demonstrated by correlation of the SF-36 with the DHSI. The DHSI should perform well in characterizing patients with gastrointestinal complaints at presentation to primary care physicians.

ACKNOWLEDGEMENTS

The authors acknowledge the excellent secretarial assistance provided by Beverly Gray and the technical support of Amy Castle and Ruth Taylor. The support provided by the nurses, office staff and physicians of Park Nicollet Clinic (Brookdale, St Louis Park, and Minneapolis Internal Medicine offices and St Louis Park Family Practice office) throughout this study was essential for its completion.

Robert Kane and Michael Newcomer critically reviewed the manuscript during its preparation.

Robert Cudeck provided many helpful suggestions on psychometric methods throughout this study.

The corresponding author will provide a copy of the questionnaire upon request.

Astra-Merck, Inc. provided an unrestricted grant in partial support of this project.

Ancillary