Validation of a new scoring system for the assessment of clinical trial research of oral mucositis induced by radiation or chemotherapy

Authors

  • Stephen T. Sonis D.M.D., D.M.Sc.,

    Corresponding author
    1. Division of Dentistry, Brigham and Women's Hospital, Boston, Massachusetts
    • Division of Dentistry, Brigham and Women's Hospital, 75 Francis Street, Boston, MA 02115
    Search for more papers by this author
  • June P. Eilers R.N., Ph.D.,

    1. University of Nebraska Medical Center, Omaha, Nebraska
    Search for more papers by this author
  • Joel B. Epstein D.M.D., M.S.D.,

    1. Vancouver General Hospital and the British Columbia Cancer Agency, Vancouver, British Columbia, Canada
    Search for more papers by this author
  • Francis G. LeVeque D.D.S.,

    1. Harper Hospital, Detroit, Michigan
    Search for more papers by this author
  • William H. Liggett Jr. D.M.D., D.M.Sc., M.D.,

    1. Johns Hopkins Oncology Center, Baltimore, Maryland
    Current affiliation:
    1. 793 EKU Bypass, Richmond, KY 40475
    Search for more papers by this author
  • Mary T. Mulagha M.D.,

    1. Novartis Pharmaceuticals, Inc., East Hanover, New Jersey
    Search for more papers by this author
    • Deceased.

  • Douglas E. Peterson D.M.D., Ph.D.,

    1. Bone Marrow Transplant Program and Department of Radiation Oncology, University of Connecticut Health Center, School of Dental Medicine, Farmington, Connecticut
    Search for more papers by this author
  • Ann H. Rose Ph.D.,

    1. OSI Pharmaceuticals, Inc., Washington, DC
    Search for more papers by this author
  • Mark M. Schubert D.D.S., M.S.D.,

    1. Fred Hutchinson Cancer Center and the University of Washington School of Dentistry, Seattle, Washington
    Search for more papers by this author
  • Frederik K. Spijkervet D.D.S., Ph.D.,

    1. University Hospital Groningen, Groningen, The Netherlands
    Search for more papers by this author
  • Janet P. Wittes Ph.D.,

    1. Statistics Collaborative, Inc., Washington, DC
    Search for more papers by this author
  • for the Mucositis Study Group

    1. Division of Dentistry, Brigham and Women's Hospital, Boston, Massachusetts
    Search for more papers by this author
    • The following are members of the Mucositis Study Group: J. Costa, J. Geier, A. Mekala, and S. Woo (Division of Dentistry, Brigham and Women's Hospital, Boston, MA); A. Berger, C. Clarke, J. Kohtz, S. Kruse, and L. Wills (University of Nebraska Medical Center, Omaha, NE); A. Ransier, C. Sawyer (Vancouver General Hospital and the British Columbia Cancer Agency, Vancouver, British Columbia, Canada); K. Damato, J. D'Ambrosio, R. Dowsett, R. Lalla, K. Nuki, C. Tafas, and P. Tutschka (Bone Marrow Transplant Program and Department of Radiation Oncology, University of Connecticut Health Center, School of Dental Medicine, Farmington, CT); R. Dansey and B. Klein (Harper Hospital, Detroit, MI); V. Almonte (Johns Hopkins Oncology Center, Baltimore, MD); M. Lloid (Fred Hutchinson Cancer Center and the University of Washington School of Dentistry, Seattle, WA); E. Devries (University Hospital Groningen, Groningen, The Netherlands); H. Wysowskj (Novartis Pharmaceuticals, Inc., East Hanover, NJ); and S. Jauffret (Statistics Collaborative, Inc., Washington, DC).


Abstract

BACKGROUND

An impediment to mucositis research has been the lack of an accepted, validated scoring system. The objective of this study was to design, test, and validate a new scoring system for mucositis that can be used easily, is reproducible, and provides an accurate system for research applications.

METHODS

A panel of experts, convened to design an objective, simple, and reproducible assessment tool to evaluate mucositis with specific application to multicenter clinical trials, developed a scale that measured objective and subjective indicators of mucositis. Nine centers participated in the study's validation. Paired investigators at each center evaluated patients receiving chemotherapy or head and neck radiation. Objective measures of mucositis evaluated ulceration/pseudomembrane formation and erythema. Subjective outcomes of mouth pain, ability to swallow, and function were measured. Analgesia use for mouth sensitivity was recorded.

RESULTS

One hundred eight chemotherapy and 56 radiation therapy patients were evaluated. Seventy-eight percent of chemotherapy patients and 64% of radiation therapy patients had clinically significant mucositis. Cumulative daily mucositis scores demonstrated a high correlation among observers. Using area under the curve analysis, it was found that for chemotherapy patients, the highest correlations (correlation coefficient > 0.92) occurred for the scores that selected the three highest daily values over the course of mucositis assessment. High interobserver correlations were noted for patients receiving radiation therapy. Objective mucositis scores demonstrated strong correlation with symptoms.

CONCLUSIONS

The scoring system evaluated was easily used, showed high interobserver reproducibility, was responsive over time, and measured those elements deemed to be associated with mucositis. The use of concomitant symptomatic measurements appeared to be unnecessary. Cancer 1999;85:2103–13. © 1999 American Cancer Society.

With the availability of cytokine therapy to reduce the duration and severity of hematologic toxicities associated with cancer chemotherapy, the importance of oral mucositis as a significant side effect has increased dramatically. Not only are lesions caused by mucositis of such severity as to require routine parenteral analgesic intervention, but they often result in cancer therapy dose limitations.1–4 In addition, the concomitant presence of breaks in the oral mucosa in an environment rich in microflora has resulted in the well reported relation between the mouth and bacteremias and sepsis among myelosuppressed patients.5 Taken as a whole, these factors also impact on hospital stay and outcome.6 Chemotherapy produces an estimated overall incidence of 40%. Mucositis is even more frequent among patients undergoing radiation therapy for tumors of the head and neck, affecting approximately 80% of patients. Currently, there is no effective intervention to prevent or treat mucositis.

We believe the dramatic increase in the number of clinical studies regarding mucositis has occurred in the past few years for three major reasons. First, bone marrow-stimulating cytokine therapy has had a major effect on reducing granulocytopenia as a long term consequence of cytoreductive therapy. Likewise, interleukin-11 for the treatment of chemotherapy-induced thrombocytopenia may result in the effective control of this side effect. The net result has been to increase the significance of mucositis as a toxicity. Second, because of an increased understanding of the biology of mucositis,7 investigators have identified several therapeutic opportunities to which biotechnology and pharmaceutical companies are responding with the application of innovative agents that are of potential therapeutic benefit.8 Finally, oncologists and radiotherapists aggressively are pursuing new agents, combination therapies, and dose escalations that are now limited by the development of mucositis.1–4

A major impediment to interventional and epidemiologic mucositis research has been the lack of an accepted, validated, and objective scoring system for mucositis.9 Instead, individuals and groups have developed many scoring systems, often with different objectives. Review of current scoring systems suggests three major reasons for their development. The widely used World Health Organization (WHO) and National Cancer Institute (NCI) systems were developed to describe toxicities associated with a particular chemotherapeutic agent or regimen.10 Both scales combine objective signs of mucositis (erythema and ulcer formation) with subjective and functional outcomes (pain and ability to eat). Although the developers of these scales intended the scorer to consider lesions painful if analgesia masked the pain, in practice many scorers ignore this differentiation, most likely resulting in underreporting and underscoring of mucositis.9

Oncology nurses have developed scoring systems for the assessment of mucositis and for patient management.11, 12 Many of these scales have a holistic quality; they include elements that are not defined traditionally as being related directly to mucositis. For example, in addition to evaluation of the integrity of the oral mucosa, these scales often include functional and subjective outcomes such as speech quality, avoidance of spicy foods, and swallowing as well as lip and mucosal dryness, infection, bleeding, and cleanliness.

Some research groups have attempted to develop mucositis scoring systems that are applicable as research tools.13, 14 These systems have attempted either to eliminate subjective findings completely or to evaluate them independent of objective findings and then integrate them into a comprehensive score. As has been the case with some management scales, several of the research scales have called for the evaluation of individual sites in the mouth to produce a single score.

Despite these attempts, mucositis researchers have not reached a consensus regarding an easily used, accurate, and reproducible scoring system designed specifically for investigative applications in this area. Our objective was to design, test, and validate a scoring system for mucositis that met this purpose.

MATERIALS AND METHODS

A panel of experts was convened to design an objective, simple, and reproducible assessment tool to evaluate oral mucositis with specific application to multicenter clinical trials. Attendees included a cross-section of health professionals (nurses, dental hygienists, physicians, and dentists) who had published peer-reviewed research articles in the fields of mucositis, oncology, and pain management, statisticians, and representatives of the pharmaceutical and biotechnology industries. The attendees agreed on a mucositis assessment scale that measured a consensus of indicators of mucositis severity. Primary indicators were the degrees of ulceration/pseudomembrane and mucosal erythema measured in specific sites in the mouth (Table 1). Secondary indicators included oral pain, swallowing, and the ability to eat as assessed by the patient.

Table 1. Sample Data Collection Form Indicating the Parameters and Sites for the Objective Scoring Used for Chemotherapy PatientsThumbnail image of
  • NCI: National Cancer Institute; WBC: leukocyte; ANC: absolute neutrophil count.

  • The same data were collected for patients receiving radiation therapy, although absolute neutrophil count was omitted.

  • To describe the psychometric properties of the scale and to evaluate face validity, content validity, reliability, responsiveness, and interpretability, the following criteria were established as measurement outcomes to determine the properties of the scale:

    • 1Face validity. Did the scale measure what experts perceive as mucositis?
    • 2Content validity. Did the indicators sample the entire domain of mucositis? Did the scale miss any important aspects of mucositis?
    • 3Reliability. Was there interobserver reproducibility?
    • 4Responsiveness. Did the scale detect clinically important changes in mucositis?
    • 5Interpretability. Did a particular score correlate with the clinical condition of the patient?

    In addition, user evaluation in terms of the ease with which observers used the scale was to be ascertained.

    Study Design

    Nine centers participated in this study. Because the extent, characteristics, and time course of mucositis differ in patients undergoing chemotherapy and radiation therapy, the study was designed to have adequate statistical power to validate the scale in the two groups of patients separately. Inclusion criteria were established for both groups of patients. To be eligible for inclusion, chemotherapy patients must have received a drug regimen in which at least 50% of patients demonstrating Grade 3 or Grade 4 mucositis as defined by the WHO scale of the NCI's Common Toxicity Index (CTI) and have been inpatients during the chemotherapy treatment period. Patients receiving radiation therapy were required to have been scheduled to receive a minimum of 50 grays. Patients of both genders were eligible for study provided they were age ≥ 12 years. Informed consent was obtained from all study participants.

    Patients were distributed among the study sites; at most, 25 evaluable patients for each arm could be enrolled at a single site. To assure a reasonable distribution of patients, we required that at least four sites enroll a minimum of eight evaluable patients for the chemotherapy arm and at least three sites enroll a minimum of seven patients for the radiation therapy arm before enrollment was stopped.

    At each site two trained investigators evaluated each patient for clinical manifestations of oral mucositis. Investigators were required to attend a study orientation meeting prior to participating in the project. Although all investigators had extensive experience in the oral care and assessment of patients, their educational backgrounds varied. They included dentists, oncology nurses, dental hygienists, and research assistants. At the time of each patient's enrollment, one individual was designated as Investigator 1 and the other as Investigator 2. This pair of individuals remained constant throughout the evaluation period of a specific patient; however, the roles could change from patient to patient. In addition to clinical scoring, Investigator 1 maintained responsibility for obtaining patient assessments of pain and function.

    Chemotherapy patients were evaluated beginning on the first day of treatment and continuing for 28 calendar days. Investigator 1 evaluated the patient daily (except on weekends and holidays) for the first 12 days, after which time the patient was evaluated each Monday, Wednesday, and Friday. Investigator 2 evaluated the patient on alternate days beginning on the first day of chemotherapy for the first week and then on Mondays, Wednesdays, and Fridays thereafter.

    Patients receiving radiation therapy were evaluated differently to accommodate their dosing schedules. Patients were examined by Investigator 1 on their first day of radiation therapy and then regularly until 2 weeks after the last dose of radiation. Both investigators evaluated patients on alternate days (except weekends and holidays) on days on which the patient received treatment.

    Investigators were instructed to examine each patient individually, but no longer than 4 hours apart. After obtaining the patient's subjective assessment, a clinical examination was performed using a halogen light source (Centauri Headlight Systems, Tempe, AZ). Investigator 1 obtained an NCI-CTI score on each day of evaluation. Investigators were instructed not to compare scores. Scoring sheets were placed in individual envelopes and sealed.

    Each day, before investigators assessed oral mucositis, patients received a data collection form containing 2 visual analogue scales (VAS) comprised of a 100-mm line with descriptors of severity at each end (Table 2). On the top and bottom lines, patients were asked to place a mark corresponding to the degree of oral pain and ability to swallow, respectively, at the time of assessment. After the completion of the VAS scales, patients were asked to complete a short questionnaire concerning their ability to eat (Table 2).

    Table 2. Sample of the Collection Form for Subjective Data Regarding Mouth Pain and SwallowingThumbnail image of
  • Patients completed this form on each evaluation day after standardized instructions from Investigator 1. Patients indicated their ability to function based on their eating ability in the diary at the bottom of the page.

  • Identical forms were completed by patients receiving chemotherapy or radiation therapy.

  • Daily analgesic use for oral pain was recorded for all patients as were leukocyte and absolute neutrophil counts for chemotherapy patients. Analgesic level was defined by one of four levels: topical, nonsteroidal antiinflammatory drugs or equivalent, oral narcotics, or parenteral narcotics.

    Data Management

    Data collection sheets were mailed to a central site for processing. Data was double-entered using Microsoft Access (Microsoft Corporation, Redmond, WA) and then imported into SAS software (SAS Institute, Cary, NC) to a second site for statistical analysis. An audit of the SAS database was performed independently. A series of SAS diagnostic tests were performed to describe the distribution of variables. Out-of-range or questionable values were confirmed against original data forms. All analyses were performed using PC SAS Version 6.12 or S-Plus for Windows, Version 4 (SAS Institute).

    If on a given day an observer recorded data for some, but not all, sites in the mouth, the analytic plan specified that the missing observation was zero, assuming that ulceration and erythema were absent. When an entire set of nine observations was missing, all observations were considered to be missing, not zero. During the course of the study, the investigators realized that many of the missing sites were unmeasured because the mouth was too painful to allow assessment. To represent the temporal course of mucositis accurately, the calculation of scores assumes that a missing value for a site is simply missing; thus, the scores implicitly assign the missing values the same value as the average of measured sites.

    Four scores were analyzed:

    • 1The mean mucositis score was equal to (Σui/nu ) + (Σei/ne), in which the summation was over the nonmissing values. nu and ne were the number of locations of nonmissing measurements of ulceration and erythema, respectively. This score ranged from 0–5.
    • 2The weighted mean mucositis score was equal to 2.5[Σuu/3nu) + (Σei/2ne)]. This score, which ranged from 0–5, reweighed the previous score to equalize the influence of ulceration and erythema.
    • 3The extent of mucositis score (the number of sites with either ulceration = 3 or erythema = 2) ranged from 0–9.
    • 4The worst site score was equal to the maximum erythema plus the maximum ulceration across all sites and ranged from 0–5.

    Several other scores were considered, but they had very little variability and hence would not be useful in clinical trials.

    At the conclusion of data collection, investigators were asked to assess scale utility and to provide specific comments with respect to certain aspects of scale use. Investigators also were given the opportunity to provide other, nondirected comments.

    RESULTS

    One hundred eight chemotherapy patients and 56 radiation therapy patients were evaluated. Chemotherapy patients were distributed over all but one of the study sites. Four sites contributed radiotherapy patients; 2 of these sites contributed 82% of the patients. The characteristics of patients in each cohort are described in Table 3. Patients receiving chemotherapy tended to be younger than patients being treated with radiation therapy (44.8 years vs. 58.8 years). Not surprisingly, the gender ratios for each form of therapy were opposite: the male:female ratio was 39:61 for chemotherapy compared with 66:34 for radiation therapy.

    Table 3. Baseline Characteristics of the Study Cohort
    CharacteristicChemotherapy (N = 108)Radiation therapy (N = 56)
    • Min: minimum; Max: maximum.

    • a

      Mucositis history was not collected for radiation therapy patients.

    Age (yrs)
     Mean44.858.8
     Standard deviation11.510.3
     Median4658
     Percentiles (25th, 75th)(36, 52)(52, 66)
     Min, Max(15, 77)(35, 82)
    Gender (%)
     Male3966
     Female6134
    Race (%)
     White9486
     Black45
     Asian24
     Other15
    Patient status at entry
     Inpatient9413
     Outpatient688
    Reported mucositis history (%)a
     Yes30
     No60
     Missing10

    The overwhelming majority of patients in the chemotherapy cohort were bone marrow transplant recipients (102 of 108); slightly more received allogeneic transplants (n = 48) than autologous transplants (n = 42). Among patients receiving radiation therapy, the mandible and lower face were the most targeted ports.

    More than two investigators participated at each clinical site. The professional training of the investigators varied: 39% were dentists, 18% were nurses, 28% were research dental hygienists, 10% were physicians, and 5% were research assistants. The average amount of investigator experience in assessing the oral mucosa of patients receiving antineoplastic therapy was 9.6 years.

    For the purposes of this study, clinically relevant mucositis was defined as oral mucositis having any one of the following properties: 1) NCI-CTI Grade of at least 2; 2) pain scale score, as measured by the patient diary 100-point VAS for pain, of at least 50; 3) swallowing difficulty scale, as measured by the patient diary 100-point VAS for swallowing, of at least 5; or 4) food intake limited by oral mucositis to liquids only.

    Of all patients studied, 82% demonstrated some evidence of mucositis; 78% of chemotherapy patients and 64% of radiation therapy patients had clinically relevant mucositis.

    Interobserver reproducibility was assessed at three different levels: first, the score for a particular site in the mouth on a particular day; second, the cumulative daily score for all sites; and third, cumulative scores over the course of mucositis. The two investigators generally reported similar measurements of individual oral sites (Tables 4 and 5 ). Although a high percentage of measurements were identical, some discordance was noted. The majority of discrepancies reflected a single unit scoring difference.

    Table 4. Individual Measurements for Patients Receiving Chemotherapy by Investigator, in Which the Number of Matched Scores for Each Severity Level Was Compared for Sites of Erythema and Ulceration
    Erythema sites
    Investigator 1Investigator 2
    NoneNot severeSevereMissingTotal
    None517659147515865
    Not severe678719195151607
    Severe6621460212894
    Missing251010449
    Total59451534854828415
    Ulceration sites
    Investigator 1Investigator 2
    No lesion≤ 1 cm21–3 cm2≥ 3 cm2MissingTotal
    No lesion69501894814447245
    ≤ 1 cm229018157176551
    > 1–< 3 cm2817991771329
    ≥ 3 cm2323786910246
    Missing26553544
    Total7379491287202568415
    Table 5. Individual Measurements for Patients Receiving Radiation Therapy by Investigator, in Which the Number of Matched Scores for Each Severity Level was Compared for Sites of Erythema and Ulceration
    Erythema sites
    Investigator 1Investigator 2
    NoneNot severeSevereMissingTotal
    None381828438264166
    Not severe34192420431472
    Severe77289128081654
    Missing9237488
    Total4245149915251117380
    Ulceration sites
    Investigator 1Investigator 2
    No lesion≤ 1 cm21–3 cm2≥ 3 cm2MissingTotal
    No lesion57741122017165939
    ≤ 1 cm217737369232644
    > 1–< 3 cm23259249444388
    ≥ 3 cm21916582082303
    Missing3721264106
    Total6039562397294887380

    The evaluation of cumulative daily scores showed a high correlation of scores between the two investigators. Although the correlation between investigators scoring chemotherapy-induced mucositis was high, it was lower than that for investigators scoring radiation-induced mucositis for each of the four outcomes considered.

    To compare the two investigators with respect to assessments of the duration and severity of mucositis, we calculated the area under the curve describing the time course of oral mucositis averaged over all chemotherapy and radiation therapy patients. Only measurements made by both investigators on the same day were included. All daily scores and summaries across time demonstrated high correlations. For the chemotherapy patients, the highest correlations (correlation coefficient [r] > 0.92) occurred for the scores that selected the three highest values over the course of mucositis assessment. The areas under the curve for the entire course of the study (r > 0.80) and the maximum scores (r > 0.84) of individual sites also showed high interobserver correlation (Tables 6 and 7).

    Table 6. Interobserver Reproducibility for Chemotherapy Patients for Three Highest Values, Area under the Curve, and Maximum Score
    Scorermb|b-1| + |r-1|p95% CI
    1. r: correlation coefficient between measurements for the two investigators; m: intercept of the estimated regression line between investigators; b: slope of he estimated regression line between investigators; p: proportion of [score Inv. 1 - score Inv. 2] ≤ 0.25; 95% CI: 95% exact confidence interval for the proportion of scores within 0.25.

    Three highest values (N = 108)
     Mean mucositis0.98−0.040.960.060.74[0.65, 0.82]
     Weighted mean mucositis score0.98−0.040.960.060.74[0.65, 0.82]
     Worst site0.930.180.890.170.48[0.38, 0.58]
     Extent of mucositis0.96−0.040.940.100.53[0.43, 0.62]
    Area under the curve
    Entire course (N = 108)
     Mean mucositis0.840.100.770.380.68[0.58, 0.76]
     Weighted mean mucositis score0.850.100.780.370.68[0.58, 0.76]
     Worst site0.850.200.780.370.57[0.48, 0.67]
     Extent of mucositis0.810.090.760.430.65[0.55, 0.74]
    Clinically relevant interval
     Entire course (N = 72)
     Mean mucositis0.800.230.780.420.33[0.38, 0.62]
     Weighted mean mucositis score0.800.230.790.410.32[0.37, 0.61]
     Worst site0.760.510.740.500.23[0.24, 0.47]
     Extent of mucositis0.720.250.740.530.23[0.24, 0.47]
    Maximum (N = 108)
     Mean mucositis0.850.150.890.260.44[0.35, 0.54]
     Weighted mean mucositis score0.860.160.890.250.44[0.34, 0.53]
    Table 7. Interobserver Reproducibility for Radiation Therapy Patients for Three Highest Values, Area under the Curve, and Maximum Score
    Scorermb|b-1| + |r-1|p95% CI
    1. r: correlation coefficient between measurements for the two investigators; m: intercept of the estimated regression line between investigators; b: slope of the estimated regression line between investigators; p: proportion of [score Inv. 1 - score Inv. 2] ≤ 0.25; 95% CI: 95% exact confidence interval for the proportion of scores within 0.25.

    Three highest values (N = 53)
     Mean mucositis0.99−0.031.000.010.94[0.84, 0.99]
     Weighted mean mucositis score1.00−0.021.000.010.96[0.87, 1.00]
     Worst site0.97−0.011.010.030.64[0.50, 0.77]
     Extent of mucositis0.98−0.091.000.030.51[0.37, 0.65]
    Area under the curve
    Entire course (N = 53)
     Mean mucositis0.960.000.930.100.85[0.72, 0.93]
     Weighted mean mucositis score0.960.020.920.120.87[0.75, 0.95]
     Worst site0.960.020.950.090.70[0.56, 0.82]
     Extent of mucositis0.930.050.840.230.77[0.64, 0.88]
    Clinically relevant interval
     Entire course (N = 36)
     Mean mucositis0.950.040.900.150.51[0.58, 0.88]
     Weighted mean mucositis score0.960.020.920.120.55[0.64, 0.92]
     Worst site0.960.240.910.130.51[0.58, 0.88]
     Extent of mucositis0.950.060.920.130.53[0.61, 0.90]
    Maximum (N = 53)
     Mean mucositis0.930.100.970.090.62[0.48, 0.75]
     Weighted mean mucositis score0.920.150.970.110.58[0.44, 0.72]

    Objective mucositis scores demonstrated strong correlations with symptoms associated with the condition. Thus mucositis scores were correlated highly with NCI scores, pain, swallowing, and ability to eat (Table 8). This relation was present for the mean mucositis score, the weighted mean mucositis score, the worst site score, and the extent of mucositis score. Of the subjective findings measured, swallowing appeared to be most affected by analgesic use.

    Table 8. P Values from Analysis of Variance for the Four Candidate Scores: Relation on a Day-to-Day Basis to Pain, Swallowing, and NCI Score
    SourceMean mucositisWeighted meanWorst siteExtent
    ChemoRadioChemoRadioChemoRadioChemoRadio
    1. NCI: National Cancer Institute; chemo: chemotherapy; radio: radiation therapy.

    NCI< 0.0010.002< 0.0010.002< 0.0010.008< 0.0010.014
    NCI× analgesic0.0870.0410.180.0570.0010.500.0420.002
    Pain< 0.0010.089< 0.0010.0191< 0.0010.44< 0.0010.30
    Pain× analgesic< 0.0010.0050.0050.0120.370.350.0060.38
    Swallowing< 0.0010.010< 0.0010.010< 0.0010.590.0010.041
    Swallowing× analgesic< 0.001< 0.001< 0.001< 0.010< 0.0010.073< 0.0010.003

    The data suggest that the scale was effective in tracking temporal changes in mucositis. Plots generally showed a close tracking of the mucositis scores with expected scores over time and with the observed pain, swallowing, and NCI scores. In general, the plots for the chemotherapy patients appeared to parallel each other more closely than did the plots for the radiation therapy patients.

    All examiners reported that the scale was easy to use. Completion of examination and grading took between 1–5 minutes.

    DISCUSSION

    The quest for an effective intervention for mucositis induced by chemotherapy or radiation therapy necessitates the use of an assessment tool that measures targeted outcomes accurately. Endpoints for existing scales, designed for toxicity assessment or patient management, are not specific enough to meet research needs. The complexity of available scales for research use has hampered their routine adoption.

    The scale tested in the current study was designed to provide a simple, quantitative, and accurate mechanism for clinical mucositis assessment for use primarily as a research tool. Ease of use by investigators and tolerability by patients are two requirements of an effective mucositis scale. Investigators reported that the scale was used easily; evaluations were completed in < 5 minutes. However, the repetitive examinations proved to be difficult for patients with severe mucositis and may have affected the second examiner's ability to inspect all the designated sites. This double examination procedure would not be routine in typical clinical trials but was required to ascertain interobserver variability. In the current study, restricted access to certain painful areas in a patient's mouth may have resulted in underscoring by the second investigator.

    The scale's interobserver reproducibility was noted at each of the three levels assessed. In general, the two investigators reported similar measurements for individual oral sites. A high percentage of these were identical, although some discordance was noted. The majority of discrepancies reflected a difference of a single unit. Discordance of more than one unit was likely due to simple measurement error or the lack of succinct site definition, which many have led to opposite scoring of contiguous areas. For example, if severe erythema was present on the labial mucosa and extended to the buccal mucosa, one investigator may have scored the labial mucosa as severe but underscored the buccal mucosa whereas the second investigator did the opposite. The establishment of specific site definitions may eliminate this problem, although it did not appear to be significant enough to effect interobserver reproducibility markedly.

    The correlation between investigators also was excellent for daily mucositis scores. This trend was true for all mucositis scores (mean, weighted mean, worst site, or extent of severe mucositis). Elimination of scores of 0 did not lower this correlation markedly. The discordance between observers was higher for chemotherapy patients than for those receiving radiation therapy. At least two possible explanations are consistent with this observation. First, the majority of radiation therapy patients were assessed at two study locations whereas chemotherapy patients were distributed more evenly among six study centers. Thus fewer investigators scored the radiation therapy patients. Another possible reason is that the mucosal response to radiation therapy tends to be more predictable and defined than the response to chemotherapy; hence, scoring of the former is easier.

    The accurate evaluation of temporal changes in clinical status often is important in assessing treatment outcome. Investigators showed high correlation in scores across time. The highest correlation (r > 0.92) occurred for those scores that selected the three highest values over the course of the measurement. Both the area under the curve (r > 0.80) and maximum score analyses (r > 0.84) also showed high interobserver correlation. Interobserver reproducibility demonstrated acceptable correlations for each of the three levels studied. All correlations were highly significant (P < 0.001).

    The current study data demonstrate that the variables measured correlated with the occurrence of mucositis as defined by the symptomatic parameters of pain and swallowing as assessed by VAS analysis. Although scoring of objective changes was expected to be correlated highly with the VAS scales, some patients had discordant subjective scale and clinical scale scores. The most likely explanation for discordance with the pain scale was the increased use of analgesics among patients with significant oral pain. Discordance of the scores derived from the swallowing and eating scales may have been due to the inability of the new scale to measure esophageal mucositis or lesions inferior to the oral pharynx.

    Each form of mucositis score evaluated (mean, weighted mean, worst site, and extent of mucositis) showed highly statistically significant associations (P < 0.001) with pain, swallowing, and NCI score in chemotherapy patients. As noted earlier, analgesic use affected the relation of these variables to the scale score. It is interesting to note that these associations were less strongly significant for the radiation therapy patients. These observations are of importance in supporting the hypothesis that a research tool measuring mucosal integrity and health that is dependent on or modified by symptomatic outcomes could result in misleading conclusions relative to the efficacy of a particular pharmacologic agent in patients who use analgesics to control oral symptoms.

    The data suggest that the new scoring system effectively measures changes in mucosal health over time. Of the measurements used, worst site and extent of severe mucositis appeared to be more responsive to change than mean mucositis score. The fact that for each day a number of intraoral sites were scored as zero and thus dampened the mean may be an explanation for this observation. This pattern appeared in both chemotherapy and radiation therapy patients. Temporal plots of clinical changes associated with mucositis and measures of symptoms were well correlated. In fact, in the chemotherapy patients the new scale score and the NCI scores virtually were superimposable; although there appeared to be some inconsistencies among the radiation therapy patients, there nevertheless were similar scoring trends.

    The new scale scores and the NCI score were closely related. This was in some ways surprising given the starting hypothesis that the accuracy of the NCI scoring system had the inherent weakness of being modified by analgesic symptom relief in the face of substantive mucosal disruption. Two elements in the current study may explain the extraordinary concordance of the NCI and new scale score. First, the study investigators were highly experienced and directed in the evaluation of the oral cavity. In addition, the intensity and thoroughness of the oral examination most likely was greater than is performed typically. Consequently, the accuracy of the reported NCI score may have been greater than is noted routinely. Second, the sequence of scoring also may have influenced the NCI score because NCI scoring was performed after the examination of the nine study sites dictated by the new scoring system.

    The mean and weighted mean had similar properties. The mean, which is simpler to interpret, measured the average level of mucositis. The maximum mucositis score was less reproducible than the others. The extent of severe mucositis focused on the most severe sites.

    The data as a whole indicate that both the mean mucositis score and the extent of severe mucositis score, calculated over time either as the area under the curve or as the average of the three highest values, produced scores that were reproducible and responsive to change. The score with the best statistical properties was the average of the three highest measurements of the extent of severe mucositis; however, it was not sensitive to low grades of oral mucositis. This finding would not appear to impact on the scale's utility as applied to clinical trials because, generally, efficacy of a study drug is measured by changes in clinically significant mucositis. The necessity to include all sites in an evaluation was substantiated by our inability to identify one or more specific sites that could serve as indicators for the remainder of the oral cavity.

    We believe that the mucositis scale presented can serve as an effective research tool for studies evaluating changes in the oral mucosa in response to chemotherapy or radiation therapy. The scale is easy to use. It showed high interobserver reproducibility, was responsive over time, and measured those elements deemed to be associated with mucositis. The use of concomitant symptomatic measurements by VAS and questionnaires is likely to be unnecessary because there was a strong correlation between objective findings and pain, swallowing, and eating.

    Acknowledgements

    The authors thank the following individuals who contributed to the design of the scale: W. Kohn, C. McGarigle, E. Peters, S. Porter, H. Stoetter, G. Vogelsang, and D. Weissman.

    Ancillary