• neurology;
  • patient-reported outcomes;
  • rating scales;
  • response category;
  • stability


Unambiguous use and interpretation of rating scale data assume that response categories are interpreted and work as intended. This study investigated the stability of interpretations of commonly used patient-reported rating scale response categories among people with neurological disorders.

Materials and methods

Forty-six people with neurological disorders (26 men; mean age, 57; Parkinson's disease, 50%; multiple sclerosis, 41%) indicated their interpretation of 21 response categories (representing frequencies, intensities and levels of agreement) on 100-mm visual analog scales (VAS) at two occasions, ≥2 weeks apart. Data were analyzed using intraclass correlation and weighted Kappa (ICC/Κw; should be >0.4), mean/median differences, percentages agreement (PA), and the standard error of measurement (SEM).


Most response categories had ICC/Κw values <0.4. The overall average ICC/Κw was 0.279/0.294 (frequencies, 0.224/0.255; intensities, 0.265/0.251; levels of agreement, 0.362/0.376). The mean/median difference between time points across all 21 categories was 0.43/0.5 mm (mean/median absolute difference, 3.36/9 mm). The overall average PA and SEM were 6.5% and 16.1 mm, respectively.


Stabilities in interpretations of patient-reported rating scale response categories among people with neurological disorders were generally low. Categories expressing levels of agreement showed best results, suggesting that these may be preferable when appropriate with respect to the scale and its items. Future studies should consider response category interpretations in relation to various contexts. These observations suggest caution when interpreting raw rating scale data and argue for the use of modern rating scale methodologies such as the Rasch measurement model.