Measurement of the severity of autism at a single time point, and over time, is a widespread challenge for researchers. Recently, Gotham, Pickles, and Lord published a severity metric (calibrated severity scores; CSS) that takes into account age and language level and is based on raw total scores of the Autism Diagnostic Observation Schedule (ADOS), a standardized measure commonly used in autism diagnosis. The present study examined psychometric characteristics of the CSS compared to raw scores in an independent sample of 368 children aged 2 to 12 years with autism, pervasive developmental disorder-not otherwise specified (PDD-NOS), non-spectrum delay, or typical development. Reflecting the intended calibration, the CSS were more uniformly distributed within clinical diagnostic category and across ADOS modules than were raw scores. Cross-sectional analyses examining raw and severity scores and their relationships to participant characteristics revealed that verbal developmental level was a significant predictor of raw score but accounted for significantly less variance in the CSS. Longitudinal analyses indicated overall stability of the CSS over 12 to 24 months in children with autism. Findings from this study support the use of the CSS as a more valid indicator of autism severity than the ADOS raw total score, and extend the literature by examining the stability over 12 to 24 months of the CSS in children with ASD. Autism Res 2012, 5: 267–276. © 2012 International Society for Autism Research, Wiley Periodicals, Inc.