In the design of new enzymes and binding proteins, human intuition is often used to modify computationally designed amino acid sequences prior to experimental characterization. The manual sequence changes involve both reversions of amino acid mutations back to the identity present in the parent scaffold and the introduction of residues making additional interactions with the binding partner or backing up first shell interactions. Automation of this manual sequence refinement process would allow more systematic evaluation and considerably reduce the amount of human designer effort involved. Here we introduce a benchmark for evaluating the ability of automated methods to recapitulate the sequence changes made to computer-generated models by human designers, and use it to assess alternative computational methods. We find the best performance for a greedy one-position-at-a-time optimization protocol that utilizes metrics (such as shape complementarity) and local refinement methods too computationally expensive for global Monte Carlo (MC) sequence optimization. This protocol should be broadly useful for improving the stability and function of designed binding proteins. Proteins 2014; 82:858–866. © 2013 Wiley Periodicals, Inc.