A framework for evaluation and use of automated scoring of constructed-response tasks is provided that entails both evaluation of automated scoring as well as guidelines for implementation and maintenance in the context of constantly evolving technologies. Consideration of validity issues and challenges associated with automated scoring are discussed within the framework. The fit between the scoring capability and the assessment purpose, the agreement between human and automated scores, the consideration of associations with independent measures, the generalizability of automated scores as implemented in operational practice across different tasks and test forms, and the impact and consequences for the population and subgroups are proffered as integral evidence supporting use of automated scoring. Specific evaluation guidelines are provided for using automated scoring to complement human scoring for tests used for high-stakes purposes. These guidelines are intended to be generalizable to new automated scoring systems and as existing systems change over time.