SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Kehoe ()
  5. 3. Ployhart ()
  6. 4. Sackett ()
  7. 5. Conclusion
  8. References

The most important fact emerging from the combination of my article and the three commentaries is the consensus judgment that content validity is appropriate scientifically and professionally for use with tests of specific cognitive skills used in job performance. This is important because the 1978 Uniform Guidelines on Employee Selection Procedures have typically been interpreted as not permitting such usage, and this is particularly the case in the interpretation given to the Guidelines by federal government enforcement agencies. Although the Society for Industrial and Organizational Psychology Principles and the Standards do not prohibit such usage, many industrial–organizational psychologists believe that it is not professionally or scientifically appropriate to employ content validity methods with cognitive measures. The hope is that this series will convince them otherwise. On this point, all four authors in the series are in agreement. The major disagreement among us concerns whether specific cognitive skills used in content valid tests must be considered constructs or not. My position, and apparently that of Kehoe, is that they need not be so considered. I argue that constructs must be invoked only in the context of a substantive theory. Sackett and Ployhart, on the other hand, argue that all measures taken on people must be viewed as constructs, regardless of whether any theoretical propositions and assumptions are involved. In this response, I present reasons why this need not be the case.

1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Kehoe ()
  5. 3. Ployhart ()
  6. 4. Sackett ()
  7. 5. Conclusion
  8. References

The comments by Kehoe (2012), Ployhart (2012), and Sackett (2012) are thoughtful and insightful, and I appreciate both their quality and the effort that went into producing them. The big picture in these comments is the fact that all three authors agree that from both a scientific and a professional point of view, cognitive measures can have content validity. This is important because all three commentators are highly respected in the field of industrial–organizational (I/O) psychology and have made important contribution to the field. This consensus will go a long way toward establishing a more uniform position on this question in I/O psychology and beyond. It will now be much harder for anyone to advance and defend the position that content validity is not appropriate for application to measures of cognitive skills.

This consensus is in stark contrast to some of the earlier reactions to my paper. About half the responses went something similar to this: ‘The position taken in this paper is too radical to be acceptable. It is contrary to both the Uniform Guidelines [Equal Employment Opportunity Commission et al., 1978 and to the professional standards of our field.’ The other half of the responses were the polar opposite and went something similar to this: ‘There is nothing new in this paper. I/O psychologists already know and accept the idea that cognitive measures are appropriate for content validity and this is reflected in common I/O practice.’ It seemed to me that the obvious conclusion was that the field was divided on this question. My hope is that the publication of the articles in this section will lead to a uniform stance in the field that content validity is appropriate for measures of cognitive skills.

I now move on to examine each of the commentaries.

2. Kehoe (2012)

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Kehoe ()
  5. 3. Ployhart ()
  6. 4. Sackett ()
  7. 5. Conclusion
  8. References

Kehoe used a schematic adapted from Binning and Barrett (1989) to clarify and extend the ideas I presented in my article. This was very effective and will, I think, help readers to better understand my article. I am almost tempted to say that he did a better job of presenting my ideas than I did. Certainly, it seems clearer in some respects. Kehoe's presentation does a particularly good job of clarifying the separation between evidence for content validity and the empirical evidence from validity generalization studies that cognitive measures, particularly general cognitive ability (GCA) measures, predict future job performance. He argues that the content validity process does not per se provide any specific evidence that the resulting test will predict future job performance. Instead, the evidence for such prediction comes from the completely separate theoretical and empirical base of meta-analytic validity generalization findings. He views the validity generalization evidence as ‘supplementing’ the credibility of the content validity evidence. I agree but believe that this supplementation need not always be invoked, presented, or stressed in applied applications. Doing so in practice may be useful in arguing for what Ployhart (2012) calls ‘overall validity,’ but it is optional. For many purposes, the content validity alone might be sufficient. This is the position taken by the 2003 Society for Industrial and Organizational Psychology (SIOP) Principles. Also, if both lines of validity evidence are presented, they can be viewed as separate and equal streams of evidence for validity. One need not be viewed as merely a supplement to the other.

According to Kehoe, the only point of intersection of the content process and validity generalization process is in the designation of the content of the skills included in the content valid test as being cognitive in nature. That is, the conclusion that these skills are cognitive in nature is based on research on human ability constructs. While he is correct, I would just add two considerations. First, even without input from this area of research, most people – even laymen – would probably recognize the skills in questions as being cognitive (i.e., as involving thinking, problem solving, etc.). Second, strictly speaking, the content validity process does not require that these skills be designated or labeled as ‘cognitive.’ Such labeling need not be part of the content validity process per se. However, if one wishes to invoke the meta-analytic validity generalization evidence, it is necessary to designate the skills measured as cognitive.

Kehoe's comments reflect a thorough and complete understanding of the fact that the content validity process and the empirical criterion-related validity process are entirely separate. They are parallel but separate. They are both important but they do not intermix. As discussed later, my impression is that both Ployhart and Sackett challenge this separation, in particular in their contention that the basic job-specific cognitive skills incorporated into content valid tests must be considered constructs. My contention is that while they are to be viewed as constructs in the empirical validity research program, the need not be viewed as constructs in the content validity context.

There are a couple of positions taken in Kehoe's comments that I believe can be misleading. First, he states that ‘While virtually all work behaviors have a cognitive component,’ there are some jobs for which ‘The theoretical framework of cognitive ability may have little practical, incremental value over content validity evidence.’ He cites typing as such a job because in this job, ‘job performance may be virtually equivalent to the skilled behavior operationalized in the (typing) test.’ The implication here is that in such a job, GCA would have no incremental validity over the test of typing skills. In my view, this would hold true only if the job involved nothing more than rote typing. I doubt that any such job exists. In almost all typing jobs, the incumbent must learn new features of word processors, must learn special symbols used in equations or numerical presentations, must remember and keep track of which individuals gave him or her which typing jobs, must learn new workflow procedures, etc. All these tasks draw on GCA, and hence, GCA measures would have incremental validity over a typing test.

Kehoe accepts the conclusion that several specific cognitive skills included in a content valid test constitute a de facto measure of GCA. But he postulates that racial or ethnic differences might be smaller on job-specific skill/aptitude tests than on GCA tests. Stated another way, he, in effect, predicts that racial differences will be smaller on content valid job-specific cognitive tests than on GCA measures, even though the former are also GCA measures. But if both are in fact GCA measures, it seems likely that they would show similar group differences, presuming comparable reliabilities. In support of his hypothesis, Kehoe notes that standardized mean racial differences found in the literature for work samples, job knowledge tests, training program performance, and job performance ratings tend to be smaller than reported differences on GCA tests. I see difficulties here. First, work samples and training performance measures are not tests and so are not directly comparable. Second, none of these measures, including job knowledge tests, are measures of the specific cognitive skills that would make up the content valid tests discussed in my article and so are not comparable. Also, the racial differences on the measures he cites are typically obtained on incumbents and are thus substantially reduced by differential range restriction as shown in the research by Phil Roth and his coauthors (Roth, Bobko, Switzer, & Dean, 2001; Roth, Huffcutt, & Bobko, 2003). On the other hand, differences on GCA tests are often calibrated on applicant or general population samples, which would cause them to be larger. Finally, differences in reliability can greatly affect the obtained standardized mean differences as demonstrated by Sackett, Schmitt, Ellingson, and Kabin (2001). For example, the lower reliability of ratings of job performance and work sample measures could be the cause of the smaller obtained mean standardized differences. In short, I would expect that once any reliability differences are taken into account, racial and ethnic differences on content valid measures of a combination of several job-specific cognitive skills will not differ from those on traditional measures of GCA. However, this is a question for future research.

These points of apparent disagreement are not major, but I believe they merit being addressed.

3. Ployhart (2012)

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Kehoe ()
  5. 3. Ployhart ()
  6. 4. Sackett ()
  7. 5. Conclusion
  8. References

Ployhart's commentary is a thoughtful and clear overview of the broad nature of the concept of validity, and I am basically in agreement with most of his arguments. The major point of disagreement seems to focus on whether the specific cognitive skills included in a content valid test must be viewed as assessing constructs. Ployhart believes they must be, while my position is that in the content validity context, they need not be viewed as constructs. Ployhart's position on this is also taken by Sackett in his commentary; to avoid repetition, I will address this question only in the section discussing Sackett's commentary. In the present section, I discuss other aspects of Ployhart's commentary.

Ployhart wishes I had made a clearer distinction between indicators of GCA and the latent nature of GCA. He feels that I did not always distinguish clearly between the latent GCA variable and its indicators. He emphasizes the valid point that a wide variety of indicators can be used to assess GCA, which itself is a latent factor that can only be measured via indicators that are caused by the latent GCA variable. While he may be right about a lack of clarity in my article, in a selection or other applied situation, it is not possible to employ the latent variable itself; one must use a combination of indicator variable scores (e.g., verbal, quantitative, technical scores). Hence, the focus of my discussion was on scores used in applied practice and the predictive validity of these. This is also the focus of validity generalization research: reported validities are not corrected for measurement error in the tests. Also, I believe that figure 1 and its associated discussion in my article does bring out the concept of indicator variables and their importance. But Ployhart's point is a valid one, and in another article (Schmidt, 2011), I provide an extended discussion of the distinction between the construct of GCA and its empirical indicators and the relevance of this distinction for both theory and practice.

The main thrust of Ployhart's commentary is a presentation of the broadest possible view of validity, based mostly on the ideas of Messick (1995). The aspects included in this broad view of validity include content validity evidence (two types), response processes involved in the test, the internal structure of the test, relations with other variables (which includes convergent, discriminant, and criterion-related validity), and the controversial concept of ‘consequential validity.’ Although I accept this broad characterization of potentially relevant validity evidence as appropriate, it goes beyond the focus of my paper. My contention is that while these are all aspects of validity broadly defined, not all such aspects are relevant in every applied situation. For example, there are situations in which content validity alone is sufficient to justify use of a test, both scientifically and professionally. This is also the position taken by the SIOP Principles. It is possible that Ployhart's focus on all the various types of evidence that can be relevant to the question of validity, broadly defined, is what leads him to adopt the position (discussed later) that the specific cognitive skills used in content valid selection tests must be considered constructs.

Ployhart (in his abstract) states that the content validity of tests of job-specific cognitive skills does not provide a useful or solid foundation for supporting the overall criterion-related validity of GCA. And he states this conclusion again in his list of final conclusions: ‘Content validity evidence does not provide strong evidence for the overall validity of GCA score.’ This statement, while true, appears to me to be sort of a red herring because there is no need for such support from content validity. The evidence for the predictive validity of GCA measures for all jobs is found in the extensive validity generalization research (e.g., see Hunter, Schmidt, & Le, 2006; Schmidt, 2002). It is possible that the point intended by Ployhart is the same as that made by Kehoe, namely, that content validity does not directly guarantee criterion-related validity but that criterion-related validity is well-established by the large body of validity generalization research. However, he states, ‘This leads to a bigger question – does content validity evidence provide a useful means for supporting the overall validity of GCA? This question is important because it is often not feasible to conduct an appropriate criterion-related study.’ But in light of the vast literature showing the generalizable validity of reliable GCA measures, there is no need for a situational criterion-related validity study. To assert otherwise would amount to endorsing the discredited situational specificity hypothesis. So I find it difficult to interpret the intended meaning of this aspect of the Ployhart commentary.

Ployhart takes issue with the definition of GCA as the ability to learn. His rationale for this is that GCA includes, in addition to the ability to learn, the ability to ‘perceive, interpret, manipulate, store, retrieve, and respond to data and information (see Jensen, 1998).’ GCA does include all these (and more), but these mental processes are all part of the process of learning; they are all used in the service of learning. In fact, there are so many of these GCA-related processes involved in learning (and in using that learning in performance) that from a complete theoretical point of view, it would be virtually impossible to list all of them. Hence, as a practical matter, it is appropriate to define GCA as the ability to learn.

Ployhart states that ‘the U.S. Federal Government’ endorses the use of cognitive skills in content validity because this appears to be an assumption underlying the Occupational Information Network (O*Net) system. While directing a selection research program for the US Office of Personnel Management, I spent 11 years in Washington, DC. One of the first things I learned is that there is no such thing as the federal government – there is only a large number of federal agencies and departments operating independently and often in contradiction and at cross-purposes. That is the case here. O*Net is part of the Department of Labor (the US Jobs Service) and so is the Office of Federal Contract Compliance Programs (OFCCP). The US Jobs Service endorses the use of content validity with cognitive measures, while OFCCP's policy is the opposite. OFCCP has repeatedly challenged the use of content validity with cognitive content. The positions of EEOC and the Civil Rights Division of the Department of Justice, and that of the Uniform Guidelines on Employment Selection Procedures (UGESP), are the same as that of OFCCP. So the endorsement of the use content validity methods by one component of one Federal agency cannot be interpreted as endorsement by ‘the U.S. Federal Government.’

4. Sackett (2012)

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Kehoe ()
  5. 3. Ployhart ()
  6. 4. Sackett ()
  7. 5. Conclusion
  8. References

Sackett's commentary, such as those of Kehoe and Ployhart, is thoughtful and insightful, and I am in agreement with most of the points he makes. The key point of disagreement concerns whether or not all measures in all contexts must be interpreted as reflecting constructs. He maintains that they must be while I maintain that this need not be the case in the context of demonstrations of content validity. But first I discuss three lesser points.

Sackett states that in contrast to section 14C of the UGESP, two of the Questions and Answers (Q&As) that accompany the UGESP appear to leave an opening for the use of content validity with cognitive measures. Q&A 73 appears to allow use of measures of knowledge, skill, and ability so long as these are ‘observable behaviors.’ Q&A 75 states that some selection procedures may actually be ‘observable behaviors’ despite having been given trait or construct labels. Sackett states that these two Q&As have been the basis for some content validity arguments for measures in the cognitive domain, and that is correct. But as pointed out in my article, the EEOC and OFCCP have interpreted the word ‘observable’ in a manner inconsistent with scientific research. By observable, they mean visible (i.e., able to be visually seen), whereas in science, observation means measurement, not visual seeing. For example, we cannot see an electrical current but we can measure it and thereby observe it in the scientific sense. Under the intended meaning of these two Q&As, it is difficult to see how cognitive skills could be made visually observable. Hence, I do not interpret these two Q&As as allowing the use of content validity with cognitive measures.

Sackett points out that my summary of the argument against the use of content validity with cognitive measures includes the statement that such use violates professional standards. He goes on to show that this is not true with regard to the SIOP Principles (Society for Industrial and Organizational Psychology, 2003) or the Standards (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999). His analysis is correct. However, the relevant statement in my article was intended to be a statement of beliefs that are common among I/O psychologists, and based on my experience, I believe it to be an accurate statement of such beliefs. I would conclude that such beliefs indicate that many I/O psychologists are either not adequately familiar with current versions of their professional standards or are familiar but reject the professional standards.

Sackett challenges my assertion that ‘construct validation is required if the theory has not been independently verified, i.e., construct validation is required if the measurement operations must be validated along with the theory itself.’ He maintains that construct validation may be required even if the theory in which it is embedded is already well-supported. He states there are two types of construct validity questions. First, does the construct ‘exist’ (‘e.g., can one define a construct labeled “integrity” and differentiate it from other constructs?’). Second, there are questions about the adequacy of a given measure of the construct. He concludes that even if the theory is well-supported, the second question remains relevant. I find this argument puzzling. Defining and differentiating the construct from other constructs requires two things: (a) construction of the theory of which the construct is a part (this includes creating a conceptual definition of the construct); and (b) demonstrating the discriminant validity of the construct measure – which requires a measure of the construct. Likewise, the development of support for the theory requires measures of the constructs contained in the theory; and Sackett's argument assumes the prior existence of such support. It is well-accepted that the process of developing support for a theory involves the simultaneous validation of both the theory and the measures of the constructs included in the theory. This is the process I described in the statement that Sackett referred to. The situation described by Sackett would occur only if a new way of measuring a construct included in a well-supported theory was proposed (e.g., a new way of measuring the construct of ‘field dependence’ using the embedded figures tests instead of the earlier more expensive measures such as the tilting room measure). In that case, if the theory-based predictions were not met for the new measure, the only possibility is that the measure is inadequate; the theory would not be questioned, given the substantial prior support for the theory itself. But in a case of this sort, it would have already been the case that both the theory and the (initial) measure of the construct would have been simultaneously validated, which is the process described in my statement.

Both Sackett and Ployhart take issue with my statement that ‘The abilities measured in cognitive employment tests are not constructs’. After rereading this section of my article, I find I must take responsibility for a lack of clarity in this respect. I hope I can be clearer this time. Whether or not a measure of a cognitive skill is or must be viewed as a measure of a construct depends on context. Within the context of the theoretical model (hierarchical model) of the organization of human abilities and the relation of GCA tests to job and other performances, measures of cognitive skills (i.e., those at the bottom of the hierarchy; see figure 1 in my article) must obviously be viewed as measures of constructs because the whole theoretical structure is built on ability constructs. And this will be the case within any theoretical context. But this is not the case in a content validity context because content validity does not need to incorporate any substantive theoretical model or structure. The concept of a construct does not need to be involved because no substantive theoretical propositions or implications need to be advanced or relied on. As noted in my article, constructs exist as components of theories and theoretical explanations; they need not be invoked in an atheoretical context such as content validity, in which they would have ‘surplus meaning.’ In the context of content validity, a job-specific cognitive skill is just a job-specific skill; it need not be viewed as a construct (as Groucho Marx said, sometimes a cigar is just a cigar). In fact, doing so could be counterproductive because it would introduce surplus meaning – meaning and attributes drawn from theories that are not part of and are irrelevant to content validity and the content validity context. For example, consider the specific cognitive skill of mental arithmetic in such a content valid test. It is a fact, established in other, theoretically based research (including the general finding of positive manifold for all ability measures and constructs) that people who are better at mental arithmetic also generally have better reading comprehension. But for content validity purposes, this is surplus meaning – it is not relevant to the establishment of content validity. However, I do agree with Sackett and Ployhart that content validity is possible with job-specific cognitive skills even when such skills are viewed as constructs. It is possible but it is much more cumbersome and complex than it needs to be for purposes of content validity. Finally, it also seems that a requirement to treat any and all measures as measures of constructs denigrates the essential role constructs play in theory; constructs exist within theories and are the building blocks of theories. They are not to be found anywhere and everywhere, just lying around in isolation.

I can now see that another sentence in my article is also unclear and confusing, to wit: ‘Since the cognitive abilities and skills measured in employment tests are not constructs, they are suitable for use in content validity methodology’. My intended meaning was that such skills and abilities are suitable for use in content validity methodology under section 14C of the UGESP. That is, even if we take this section of the Uniform Guidelines at face value, its prohibition of the use of content validity with constructs does not apply to cognitive skills used in content validity because such skills are not constructs in that context.

Ployhart states that in content validity, ‘the scores from the measures are obviously not constructs, they are indicators of the constructs. We want to evaluate the content validity of the scores, but to do so effectively, we need to understand (a) whether the scores are consistent with the nature of the underlying GCA construct, and (b) whether the scores overlap with the demands of the job.’ This statement confuses two different and separate domains: the theoretical domain composed of research on ability constructs and the domain of content validity methodology. As Kehoe (2012) has noted, these are completely separate; and in the context of a content validity study, the GCA hierarchical model and its accompanying literature on criterion-related validity are not relevant to content validity. So in content validity, we do not need to ‘understand whether the scores are consistent with the nature of the underlying GCA construct.’ This is simply not relevant because content validity methods do not aim to assess ‘the underlying GCA construct’ – even though the final test will, as shown by a different line of evidence, be a de facto measure of GCA. The fact of this outcome is not relevant to the content validity process per se. This outcome is not a goal; it is a by-product. But we do need to know the second item listed by Ployhart: ‘whether the scores overlap with the demands of the job.’ This is what is critical in content validity.

The position taken by Sackett, and I believe by Ployhart, is that any measure taken on a person is a measure of a construct and must be viewed as such. Sackett cites the definition of a construct given in Cronbach and Meehl (1955): ‘A construct is some postulated attribute of people, assumed to be reflected in test performance’ (p. 283). This definition does not imply that any measure taken on people must be viewed as a construct. For example, we can measure people's age. Must age be viewed as a construct? If so, then all of the requirements for construct validity apply? What about weight? What about height? What about years of education? Must all these be considered constructs? I think not.

Sackett correctly states that different statements in Cronbach and Meehl (1955) can be interpreted differently but maintains that almost all such statement support his position that all measures taken on people must be viewed as measures of constructs. I do not see it this way. One example of a construct that Cronbach and Meehl discuss in some detail is anxiety, one of the examples I presented in my article. Cronbach and Meehl appear to agree that the meaning of anxiety can only be clear within the context of a specific theory of anxiety, and because of this, it must be considered a construct – in contrast to the cognitive skill of simple arithmetic, which does not require embedment in a theory for its meaning. Most of the other examples they present are of this nature – for example, those from the Minnesota Multiphasic Personality Inventory scales. In addition, they include a lengthy section on the nomological net – the total pattern of relations of a construct with other variables and the resulting theoretical interpretation of this net of relationships. The nomological net and its interpretations constitute the theory associated with the construct. All of this material supports the position taken in my article to the effect that constructs are components of theories and that atheoretical measures need not be viewed as constructs. Consider the following statement by Cronbach and Meehl (1955): ‘Whether or not an interpretation of a test's properties or relations involves questions of construct validity is to be decided by examining the entire body of evidence offered, together with what is asserted about the test in the context of this evidence’ (emphasis added) (p. 283). This statement appears to be consistent with the position that not every measure is a measure of a construct.

It is also relevant that the Cronbach and Meehl (1955) article, the first published article on construct validity, is nearly 60 years old. The philosophy of science that this article drew on (logical positivism) has long since been displaced by newer theories of the scientific process. These newer formulations make it clear that constructs are components of theories; the concept of a construct is not meaningful in the absence of a theory of which it is a part (e.g., see Brown, 1977; Kerlinger, 1986, chapter 1). Also, at the time the Cronbach and Meehl article was written, most psychological theories were cruder and less developed than is the case today, so at that time, perhaps less of a theoretical scaffolding might have been viewed as being required to support an assertion of a construct. For example, the mere postulation of a trait, with virtually no supporting theory, might have been viewed as sufficient to justify the concept of a construct. In a period of nearly 60 years, ideas and practices can change. The very idea of content validity in I/O psychology is an example of this. The concept of content validity was borrowed by I/O psychology from educational psychology, where it had originated. In educational psychology, to be content-valid, a test had to be (and still has to be) a representative sample of the entire domain of a course or curriculum. I/O psychology adapted this concept so that in I/O practice, a test can be content valid if it samples only the critical tasks from only one task domain making up the job. It need not be a representative sample of all the tasks that make up the job. This is a major change in the concept of content validity. Sackett states that treatment of constructs in the 1999 Standards is based on the nearly 60-year-old Cronbach and Meehl (1955) article. To me, this is not reassuring. Surely over such a long period, psychologists should have progressed in their analysis and thinking related to the role of constructs and construct validity. As I reflect on it now, I see my article as perhaps an attempt to start such a process.

The emphasis on constructs can be taken too far, with harmful effects on the science and especially the practice of I/O psychology. An example of this occurred during the development of the 2003 SIOP Principles, when one group of participants proposed that validity generalization findings could not be used by practitioners if there was uncertainty about what constructs were measured in the primary studies included in the meta-analysis. One draft of the Principles included the following sentence: ‘Accordingly, when studies are cumulated solely on the basis of common methods (e.g., interviews) instead of constructs, it is inappropriate to interpret the results as indicating the generalizability of the validity of inferences from scores derived from the method to other settings or for different purposes’ (emphasis added). This statement precludes use by practitioners of validity generalization findings for employment interviews, assessment centers, biographical data scales, situational judgment tests, and other selection methods. This would be a crippling loss to the practice of I/O psychology, and there is no logical basis for this prohibition. Generalization does not require explanation (which certainty about the constructs assessed provides, at least to some degree). It is preferable but not necessary to have explanation. For example, aspirin was widely prescribed and used to good effect before there was any knowledge of its active ingredient or the mechanisms by which this ingredient worked. That is, the underlying theory and associated constructs were not known. Another example is the use of lithium in the treatment of bipolar disorder; the processes by which lithium works are still unknown today, but this does not prevent lithium from being useful and effective.

Consider employment interviews. It is true that it is generally not known exactly which constructs are assessed by any particular employment interview (Schmidt & Zimmerman, 2004). Regardless of that fact, it is possible to generalize the distribution of true validities based on a meta-analysis that contains many primary studies and a wide sampling of organizations, organization types, jobs, and interview variations – all of which are reflected in the obtained standard deviation (SD) of true validities. These SDs have been found to be quite small (e.g., see McDaniel, Whetzel, Schmidt, & Mauer, 1994), showing that the wide variation of interview types, job types, etc. has little effect on interview validity and that interview validity is almost always positive and substantial. Similar results have been found for assessment centers. These results are very similar to those from validity generalization studies in which the construct assessed is known (e.g., verbal ability). If a new organization is within the range of the variables sampled in the meta-analysis (which will almost always be the case), generalization of validity can made on this empirical basis. There is no need to know what constructs are assessed by which interviews – or even if any constructs at all are assessed. To insist under these circumstances that the meta-analytic results showing that validity generalizes cannot be used in practice because we do not know the specific constructs involved does not make sense.

It is relevant to note that this same process is employed in cases in which we do know what construct is measured – for example, verbal ability or the personality trait of conscientiousness. Here, as with employment interviews, the SD of true validities may not be zero, meaning that unknown moderators could be operating – and we do not know what these moderator constructs are. But the true validity distribution is based on a wide sampling of such potential moderators and they are all reflected in the true validity distribution (and specifically in the SD of that distribution), allowing us to compute the probability that any new organization would fall outside the range of useful validity. That probability is low so we can generalize validity on this empirical basis. This is identical to what is done in the case of validity generalization studies of employment interviews, assessment centers, biodata scales, and other measures for which there is uncertainty about the constructs they assess. So it should be clear that validity generalization is not dependent on knowledge of which constructs are being measured.

The empirical sampling procedure described here is sufficient to justify generalization of validity based on meta-analyses of selection methods such as the interview. But there is also a second way to refute the contention that because different interviews (or different assessment centers) may measure different constructs, meta-analysis findings cannot be generalized to new situations. To many people, this seems initially, and on its face, to be a plausible argument. But it ignores the facts created by positive manifold (Murphy, Dzieweczynski, & Zhang, 2009). It is well-known that measures of aptitudes, abilities, and knowledge show positive manifold (substantial positive correlations); however, the same is true for the Big Five personality traits (with neuroticism reverse scored as emotional stability; cf. Mount, Barrick, Scullen, & Rounds, 2005). Positive manifold is likely to be even stronger when these traits are assessed by judges as is the case with employment interviews and most dimensions assessed in assessment centers, because then halo enters to increase the intercorrelations (cf. Viswesvaran, Schmidt, & Ones, 2005). If the different constructs that are assessed by different interviews (or assessment centers or situational judgment tests, etc.) are all substantially correlated, then the final or total scores on the different interviews or assessment centers will be highly correlated. For example, Schmidt and Zimmerman (2004) examined a case in which each of two different interviews assess five constructs, with none of the constructs being in common across the two interviews and showed that the correlation between final (total) scores on these two interviews is 0.99. I predict that similar findings will also hold for the final scores of assessment centers, situational judgment tests, and other assessment methods. The fact that different applications of a method assess different combinations of constructs is of little import if the final (total) scores produced by the different applications of the method correlate very highly. This fact is very likely one of the reasons why the SD of true validity in meta-analyses of methods such as employment interviews and assessment centers are so small. It is the validities of the final or total scores produced by these methods that are entered into the meta-analysis, and because these total scores are so highly correlated, the differences in the component constructs assessed by the different interviews or assessment centers do not and cannot create variance in validities of the total scores. Hence, the result is the very small SDs of true validities.

The offending statement quoted earlier from an early draft of the 2003 SIOP Principles does not appear in the 2003 SIOP Principles. I, along with others, protested strongly and that specific wording was dropped from the final document. However, the final document still contains language that is contrary to the logic I present here. Consider the following statement (p. 30): ‘Because methods such as the interview can be designed to assess widely varying constructs (from job knowledge to integrity), generalizing from cumulative findings is only possible if the features of the method that result in positive method-criterion relationships are clearly understood, if the content of the procedure and meaning of the scores are relevant for the intended purpose, and if generalization is limited to other applications of the method that include these features.’ The following statement (p. 30) also reflects this position: ‘Generalizing from a meta-analysis of such data to a new similarly unspecified interview, to a different interview method, or to a different or new situation, is not warranted.’ These statements essentially disallow generalization of findings of meta-analyses in which there is uncertainty about what constructs have been measured. They, in effect, deny the possibility of generalization of validity based on the logical empirical process I described earlier. In addition to its conceptual deficiencies, this position denies useful tools to practitioners seeking solve important personnel selection problems. The position reflected in these statements is often conversationally referred to by some I/O psychologists, often practitioners, as ‘the tyranny of the constructs.’

5. Conclusion

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Kehoe ()
  5. 3. Ployhart ()
  6. 4. Sackett ()
  7. 5. Conclusion
  8. References

The most important fact that emerges from the combined article and commentaries is the consensus judgment that content validity is appropriate scientifically and professionally for use with tests of specific cognitive skills used in job performance. This is important because the 1978 UGESP is typically interpreted as not allowing such usage, and this is particularly the case in the interpretation given to the Guidelines by federal government enforcement agencies. Although the SIOP Principles and the Standards do not prohibit such usage, many I/O psychologists believe that it is not professionally or scientifically appropriate to employ content validity methods with cognitive measures.

On this point, all four authors in the series are in agreement. The major disagreement among us concerns whether specific cognitive skills used in content valid tests must be considered constructs or not. My position, and apparently that of Kehoe, is they need not be so considered. Sackett and Ployhart, on the other hand, argue that all measures taken on people must be viewed as constructs. In this response, I have presented reasons why this need not be the case. Application of content validity methodology to job-specific cognitive skills is still possible if the job-specific skills are considered to be constructs. However, such applications are more complicated and complex than need be the case because of the introduction of surplus meaning resulting from the construct designations.

References

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Kehoe ()
  5. 3. Ployhart ()
  6. 4. Sackett ()
  7. 5. Conclusion
  8. References
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
  • Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of inferential and evidentiary bases. Journal of Applied Psychology, 74, 478494.
  • Brown, H. I. (1977). Perception, theory, and commitment: The new philosophy of science. Chicago, IL: University of Chicago Press.
  • Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281300.
  • Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Department of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43, 3829438309.
  • Hunter, J. E., Schmidt, F. L., & Le, H. (2006). Implications of direct and indirect range restriction for meta-analysis methods and findings. Journal of Applied Psychology, 91, 594612.
  • Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.
  • Kehoe, J. F. (2012). What to make of content validity evidence for cognitive tests? Comments on Schmidt (2012). International Journal of Selection and Assessment, 20, 1418.
  • Kerlinger, F. N. (1986). Foundations of behavioral research. New York: Holt.
  • McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Mauer, S. (1994). The validity of employment interviews: A comprehensive review and meta-analysis. Journal of Applied Psychology, 79, 599616.
  • Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741749.
  • Mount, M. K., Barrick, M. R., Scullen, S. M., & Rounds, J. (2005). Higher order dimensions of the Big Five personality traits and the Big Six vocational interests types. Personnel Psychology, 58, 447478.
  • Murphy, K. R., Dzieweczynski, J. L., & Zhang, Y. (2009). Positive manifold limits the relevance of content-matching strategies for validating selection test batteries. Journal of Applied Psychology, 94, 10181031.
  • Ployhart, R. E. (2012). The content validity of cognitively-oriented tests. Commentary on Schmidt (2012). International Journal of Selection and Assessment, 20, 1923.
  • Roth, P. L., Bobko, P., Switzer, F. S., III, & Dean, M. A. (2001). Prior selection causes biased estimates of standardized ethnic group differences: Simulation and analysis. Personnel Psychology, 54, 591617.
  • Roth, P. L., Huffcutt, A. I., & Bobko, P. (2003). Ethnic group differences in measures of job performance: A new meta-analysis. Journal of Applied Psychology, 88, 694706.
  • Sackett, P. R. (2012). Cognitive tests, constructs, and content validity. A commentary on Schmidt (2012). International Journal of Selection and Assessment,, 20, 2427.
  • Sackett, P. R., Schmitt, N., Ellingson, J., & Kabin, M. B. (2001). High stakes testing in employment, credentialing, and higher education: Prospects in a post-affirmative-action world. American Psychologist, 56, 302318.
  • Schmidt, F. L. (2011). A theory of sex differences in technical aptitude and some supporting evidence. Perspectives on Psychological Science, 6, 560573.
  • Schmidt, F. L. (2002). The role of general cognitive ability in job performance: Why there cannot be a debate. Human Performance, 15, 187210.
  • Schmidt, F. L., & Zimmerman, R. D. (2004). A counterintuitive hypothesis about employment interview validity and some supporting evidence. Journal of Applied Psychology, 89, 553561.
  • Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH: SIOP.
  • Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2005). It there a general factor in ratings of job performance? A meta-analytic framework for disentangling substantive and error influences. Journal of Applied Psychology, 90, 108131.