In studies on atopic dermatitis (AD), different scoring systems are used to evaluate the severity of the disease. The objective of this study was to investigate agreement between observers in the assessment of the overall severity of AD, and interobserver variation in the assessment of severity of AD for each scoring item separately, using the Simple Scoring System (SSS), the Scoring Atopic Dermatitis (SCORAD) index, and the Basic Clinical Scoring System (BCSS), and, furthermore, to investigate agreement between these three scoring systems in the assessment of the overall severity of AD. Eighty-two patients (42 male) with AD, mean age 13.4 years (range 0.2−67.0), were included. Agreement between observers in assessing the overall AD severity scores, and interobserver variation in assessing AD severity of each scoring item separately were determined in 34 of these 82 patients by two physicians scoring the severity of AD by the three scoring systems. To determine agreement between the scoring systems, one physician scored the severity of AD in all patients with the three scoring systems. Agreement between observers and agreement between the three scoring systems was calculated by Cohen's kappa (κ) and by the measure of agreement according to Bland & Altaian. κ>0.4 represents fair agreement; κ>0.75 excellent agreement. In addition, interobserver variation for each scoring item separately was calculated by the Wilcoxon signed rank test. The mean differences (d) and the limits of agreement (d±2 SD of the differences) between observers by the SSS and the SCORAD were −0.82±5.58 and −0.28±7.49, respectively. κ between observers for the BCSS was 0.90 (95% CI 0.79−1.03). By the SSS, significant interobserver variation was found in assessing the severity of excoriations (P=0.02) and scales (P=0.02). By the SCORAD, significant interobserver variation was found in assessing the severity of edema/papulation (P=0.04), erythema (P=0.04), and excoriations (P=0.01). No significant interobserver variation was found in assessing the extent of AD. The mean difference and the limits of agreement between the SSS and the SCORAD were −4.17±9.52. k between the SSS and the BCSS was 0.21 (95% CI 0.09−0.33), and k between the SCORAD and the BCSS was 0.38 (95% CI 0.26−0.51). We found good agreement between observers assessing the overall severity of AD in the lower and higher scoring rates by the SSS and the SCORAD, and excellent agreement by the BCSS. Significant interobserver variation was found on the isolated intensity items scales, excoriations, edema/papulation, and erythema. We found poor agreement between the three scoring systems in assessing the overall severity of AD, indicating that the SSS, the SCORAD, and the BCSS cannot be used interchangeably to assess the overall severity of AD.