• Hamilton Rating Scale for Depression;
  • interactive voice response program;
  • Japanese;
  • reliability;
  • validity


The aim of this study was to examine the reliability and validity of the Interactive Voice Response (IVR) program to rate the 17-item Hamilton Rating Scale for Depression (HAM-D) score in Japanese depressive patients.


Depression severity was assessed in 60 patients by a clinician and psychologists using HAM-D. Scoring by the IVR program was conducted on the same and the following days. Test–retest reliability, internal consistency, and concurrent validity for total HAM-D scores were examined by calculating intraclass correlation coefficient, Cronbach's alpha, and Pearson's correlation coefficient. Inter-rater consistency for each HAM-D item was examined by Cohen's kappa.


Test–retest reliability of the IVR program was high (intraclass correlation coefficient: 0.93). Internal consistency of each total score obtained by the clinician, psychologists, and IVR program was high (Cronbach's alpha: 0.77, 0.79, 0.78, and 0.83). Regarding concurrent validity, correlation coefficients between total scores obtained by the clinician versus IVR and that by the clinician versus psychologists were high (0.81 and 0.93). The HAM-D total score rated by the clinician was 3 points lower than that of IVR. Inter-rater consistency for each HAM-D item evaluated by the clinician versus IVR was estimated to be fair (Cohen's kappa coefficient: 0.02–0.50).


Our results suggest that the Japanese IVR HAM-D program is reliable and valid to assess 17-item HAM-D total score in Japanese depressive patients. However, the current program tends to overestimate depression severity, and the score of each item did not always show high agreement with clinician's rating, which warrants further improvement in the program.