Our programmatic article on Homo heuristicus (Gigerenzer & Brighton, 2009) included a methodological section specifying three minimum criteria for testing heuristics: competitive tests, individual-level tests, and tests of adaptive selection of heuristics. Using Richter and Späth’s (2006) study on the recognition heuristic, we illustrated how violations of these criteria can lead to unsupported conclusions. In their comment, Hilbig and Richter conduct a reanalysis, but again without competitive testing. They neither test nor specify the compensatory model of inference they argue for. Instead, they test whether participants use the recognition heuristic in an unrealistic 100% (or 96%) of cases, report that only some people exhibit this level of consistency, and conclude that most people would follow a compensatory strategy. We know of no model of judgment that predicts 96% correctly. The curious methodological practice of adopting an unrealistic measure of success to argue against a competing model, and to interpret such a finding as a triumph for a preferred but unspecified model, can only hinder progress. Marewski, Gaissmaier, Schooler, Goldstein, and Gigerenzer (2010), in contrast, specified five compensatory models, compared them with the recognition heuristic, and found that the recognition heuristic predicted inferences most accurately.