High-grading bias: subtle problems with assessing power of selected subsets of loci for population assignment



Robin S. Waples, Fax: +1 206 860 3335; E-mail: robin.waples@noaa.gov


Recognition of the importance of cross-validation (‘any technique or instance of assessing how the results of a statistical analysis will generalize to an independent dataset’; Wiktionary, en.wiktionary.org) is one reason that the U.S. Securities and Exchange Commission requires all investment products to carry some variation of the disclaimer, ‘Past performance is no guarantee of future results.’ Even a cursory examination of financial behaviour, however, demonstrates that this warning is regularly ignored, even by those who understand what an independent dataset is. In the natural sciences, an analogue to predicting future returns for an investment strategy is predicting power of a particular algorithm to perform with new data. Once again, the key to developing an unbiased assessment of future performance is through testing with independent data—that is, data that were in no way involved in developing the method in the first place. A ‘gold-standard’ approach to cross-validation is to divide the data into two parts, one used to develop the algorithm, the other used to test its performance. Because this approach substantially reduces the sample size that can be used in constructing the algorithm, researchers often try other variations of cross-validation to accomplish the same ends. As illustrated by Anderson in this issue of Molecular Ecology Resources, however, not all attempts at cross-validation produce the desired result. Anderson used simulated data to evaluate performance of several software programs designed to identify subsets of loci that can be effective for assigning individuals to population of origin based on multilocus genetic data. Such programs are likely to become increasingly popular as researchers seek ways to streamline routine analyses by focusing on small sets of loci that contain most of the desired signal. Anderson found that although some of the programs made an attempt at cross-validation, all failed to meet the ‘gold standard’ of using truly independent data and therefore produced overly optimistic assessments of power of the selected set of loci—a phenomenon known as ‘high grading bias.’