SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. AN EXAMPLE
  5. 3. TWO ASSIGNMENT PROCEDURES
  6. 4. TWO CRITERIA FOR ASSESSING ASSIGNMENT PROCEDURES M AND R
  7. 5. THE PARTITIONING OF ERROR RATES
  8. 6. CONCLUSION
  9. Appendix
  10. REFERENCES

Consider an m-way cross-classification table (for m= 3, 4, …) of m dichotomous variables that describes (1) the 2mpossible response patterns to a set ofmquestions (where the response to each question is binary), and (2) the number of individuals whose responses to the m questions can be described by a particular response pattern, for each of the 2mpossible response patterns. Consider the situation where the data in the cross-classification table are analyzed using a particular latent class model having T latent classes (forT= 2, 3, …), and where this model fits the data well. With this latent class model, it is possible to estimate, for an individual who has a particular response pattern, what is the conditional probability that this individual is in a particular latent class, for each of the T latent classes. In this article, the following question is considered: For an individual who has a particular response pattern, can we use the corresponding estimated conditional probabilities to assign this individual to one of the T latent classes? Two different assignment procedures are considered here, and for each of these procedures, two different criteria are introduced to help assess when the assignment procedure is satisfactory and when it is not. In addition, we describe here the particular framework and context in which the two assignment procedures, and the two criteria, are considered. For illustrative purposes, the latent class analysis of a classic set of data, a four-way cross-classification of some survey data, obtained in a two-wave panel study, is discussed; and the two different criteria introduced herein are applied in this analysis to each of the two assignment procedures.


1. INTRODUCTION

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. AN EXAMPLE
  5. 3. TWO ASSIGNMENT PROCEDURES
  6. 4. TWO CRITERIA FOR ASSESSING ASSIGNMENT PROCEDURES M AND R
  7. 5. THE PARTITIONING OF ERROR RATES
  8. 6. CONCLUSION
  9. Appendix
  10. REFERENCES

For expository purposes, we consider first the two-class latent class model (with, say, latent classes 1 and 2) applied to the observed data in a four-way 2 × 2 × 2 × 2 cross-classification of four dichotomous variables (say, variables A, B, C, and D). There are 16 possible response patterns in the four-way cross-classification table; and we let fijkl denote the observed number (i.e., the observed frequency) of individuals whose response pattern is (i, j, k, l), with response i (for i= 1, 2) on variable A, response j (for j= 1, 2) on variable B, response k (for k= 1, 2) on variable C, and response l (for l= 1, 2) on variable D. When the latent class model is applied to the cross-classification table, the observed data in the table are used to estimate the corresponding expected frequency Fijkl under the latent class model; and, for an individual whose response pattern is (i, j, k, l), we can also estimate the conditional probabilities, inline image and inline image, of this individual being in latent class 1 and latent class 2, respectively. When the latent class model fits well the observed data in the cross-classification table (i.e., when the estimated Fijkl are sufficiently close to the corresponding observed fijkl, for i= 1, 2, j= 1, 2, k= 1, 2, and l= 1, 2), the following question can arise: For an individual whose response pattern is (i, j, k, l), how can we use the corresponding estimated conditional probabilities, inline image and inline image, to assign this individual to one of the two latent classes?

The above question was considered very briefly in a short two-page subsection of a long (81-page) article (Goodman 1974a) that was focused on many other matters pertaining to latent class analysis. More needs to be said on the assignment of individuals to latent classes.

In the short subsection of the long article cited above, an assignment procedure was described, but no criterion was introduced there to help assess when the assignment procedure is satisfactory and when it is not. In the present article, an additional assignment procedure is introduced, and two different criteria are also introduced that can be applied to each of the two assignment procedures in order to help assess each of them.

When the two-class latent class model is applied to the observed data in a four-way 2 × 2 × 2 × 2 cross-classification of four dichotomous variables, the 16 possible response patterns correspond to the 16 cells in the cross-classification table. We can think of these cells as the 16 possible observed classes into which each of the individual respondents can be classified in accordance with his or her response pattern. The two-class latent class analysis seeks to determine whether the observed classification of the individual respondents into the 16 observed classes can be described in terms of just two latent classes (say, latent classes 1 and 2), where the proportion of respondents in latent class 1 is πX1 and the proportion in latent class 2 is πX2 (with πX1X2= 1); and where these two proportions in the latent class model, and some other characteristics of the respondents in each of the two latent classes in the model, are estimated using the observed data. The estimates are then used to estimate the corresponding expected frequency Fijkl under the latent class model. When the estimated Fijkl are sufficiently close to the corresponding observed fijkl (for i= 1, 2; j= 1, 2; k= 1, 2; l= 1, 2), we can view the latent class model with its two latent classes as possibly an underlying description, and a basic explanation, of the subject under study—that is, the observed classification of the respondents in the 16 observed classes. In this case, the latent class model with its two latent classes can be viewed as possibly more fundamental than the observed classification of the respondents in the 16 observed classes. This article considers two possible procedures for assigning the respondents in the 16 observed classes to the two latent classes.

In these introductory comments we have, for expository purposes, considered the two-class latent class model applied to the four-way cross-classification table of four dichotomous variables. These introductory comments can be directly generalized to the case where the T-class latent class model (for T= 2, 3, …) is applied to the m-way cross-classification table of m dichotomous variables (for m= 3, 4, …).

The two assignment procedures that will be considered here for assigning individuals to latent classes, and the two criteria that will be introduced here for assessing the assignment procedures will be viewed as a contribution to the methodology of the subject of “classification.” In the present context, a classification study would determine whether a set of individuals, who have responded to a set of questions, can validly be described in terms of a small number of classes (or clusters) of similar individuals, whose responses to the questions are similar in some sense. (For views of the subject of classification applied in many other contexts, see, e.g., Gordon 1999.)

The classification issues considered in the present article are viewed here in the situation in which the T-class latent-class model (for T= 2, 3, …) is applied to the m-way cross-classification of m dichotomous variables (for m= 3, 4, …). The same kinds of classification issues could also be considered in the situation in which other finite mixture models are applied in the analysis of various kinds of data; but this would be beyond the scope of the present article. (For views of other mixture models, and some extensions of latent class models, see, e.g., MacLachlan and Peel 2000; Magidson and Vermunt 2001.)

2. AN EXAMPLE

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. AN EXAMPLE
  5. 3. TWO ASSIGNMENT PROCEDURES
  6. 4. TWO CRITERIA FOR ASSESSING ASSIGNMENT PROCEDURES M AND R
  7. 5. THE PARTITIONING OF ERROR RATES
  8. 6. CONCLUSION
  9. Appendix
  10. REFERENCES

We shall consider in this section a latent class model applied by Goodman (1974a, 1974b), in analyzing data presented earlier by Coleman (1964). Table 1 presents the observed cross-classification of individuals in interviews at two successive points in time, with respect to two dichotomous variables. Here we have data from a two-wave panel study of individuals interviewed with respect to two questions in the first wave and the same two questions in the second wave. The data in Table 1 are presented as a two-way 4 × 4 table, with the four row categories corresponding to the four possible responses in the first wave, and the four column categories corresponding to the four possible responses in the second wave. The data in the 16 cells of the 4 × 4 table can also be viewed as data in the corresponding 16 cells of a 2 × 2 × 2 × 2 table {A, B, C, D}, with A and B corresponding to the first two questions in the first wave, and C and D corresponding to the same two questions in the second wave.

Table 1.  Observed Cross-Classification of 3398 Schoolboys in Interviews at Two Successive Points in Time, with Respect to Two Dichotomous Variables*
 Membership AttitudeSecond Interview
1122
1212
  1. Note: With respect to self-perceived membership in the “leading crowd,” being in it is denoted 1 and being out of it is denoted 2; with respect to attitude concerning the “leading crowd,” a favorable attitude is denoted 1 and an unfavorable attitude is denoted 2. Membership and attitude will be denoted by the letters A and B, respectively, in the first interview; and by the letters C and D, respectively, in the second interview.

  2. *Variables = (1) self-perceived membership in the “leading crowd,” and (2) favorableness of attitude concerning the “leading crowd.”

First Interview 
MembershipAttitude 
11458140110 49
12171182 56 87
21184 75531281
22 85 97338554

When the two-class latent class model was applied to the data, the model did not fit the data. (The goodness-of-fit chi-square value was 251.17 on 6 degrees of freedom.) But when a special four-class latent-class model was applied, the model fit the data very well indeed. (The corresponding chi-square value was 1.28 on 4 degrees of freedom.) The estimated conditional probabilites, inline image (for t= 1, 2, 3, 4), of being in the corresponding latent classes (say, latent classes 1, 2, 3, 4) for those individuals whose response pattern is (i, j, k, l) in the four-way 2 × 2 × 2 × 2 table {A, B, C, D} are presented in Table 2. (This table differs from the corresponding table in Goodman [1974a] in that four decimal places are included here; the table in the earlier article included two decimal places. The four decimal places will be needed in the calculations that will be presented later herein.)

Table 2.  Estimated Conditional Probability inline image That an Individual Will Be in Latent Class t (for t= 1, 2, 3, 4), Given That the individuals Response Pattern Is (i, j, k, l) on Variables (A, B, C, D)
VariableObserved FrequencyEstimated Conditional Probability for Latent Class t
ABCDt = 1t = 2t = 3t = 4
  1. Note: The conditional probability is estimated under a special four-class latent class model applied to the four-way cross-classification (Table 1).

1111458.9355.0529.0097.0019
1112140.5937.3866.0062.0136
1121110.3864.0219.4970.0947
1122 49.1737.1131.2234.4899
1211171.5959.3844.0062.0135
1212182.1150.8539.0012.0299
1221 56.1747.1127.2247.4880
1222 87.0239.1773.0307.7681
2111184.7349.0416.1878.0358
2112 75.4053.2640.1036.2271
2121531.0259.0015.8170.1556
2122281.0098.0064.3082.6757
2211 85.4072.2627.1041.2260
2212 97.0664.4927.0170.4240
2221338.0098.0063.3101.6737
2222554.0012.0090.0381.9518

For readers who would like to know more about the special four-class latent class model that was applied to the data (Table 1), we include here the following brief description: The four latent classes (say, latent classes 1, 2, 3, 4) can be viewed as the latent classes of a latent variable X, and they can also be viewed as the four classes in a 2 × 2 table describing the joint distribution of two dichotomous latent variables (say, latent variable Y and Z). With respect to the observed data in Table 1, the observed variables A and C correspond to the first question (on membership) in the first and second interview, respectively; and the observed variables B and D correspond to the second question (on attitude) in the first and second interview, respectively. Under the special four-class latent class model considered here, the observed variables A and C are viewed as indicators of the dichotomous latent variable Y, and the observed variables B and D are viewed as indicators of the dichotomous latent variable Z. In other words, under this model, variable Y is the intrinsic latent variable on membership, and variable Z is the intrinsic latent variable on attitude; and the proportion of individuals estimated to be in each of the four latent classes in the 2 × 2 table describes the estimated joint distribution of the intrinsic latent variables on membership and attitude.

When the four-class latent class model considered here was applied to the data in Table 1, the proportion of individuals in each of the four latent classes was estimated under the latent class model, and the following results were obtained: πX1= .2720, πX2= .1284, πX3= .2315, and πX4= .3680, for latent classes 1, 2, 3, and 4, respectively. (These proportions differ from the corresponding proportions in Goodman (1974a, 1974b) in that four decimal places are included here; whereas the Goodman [1974a] article included two decimal places, and the Goodman [1974b] article included three decimal places. The four decimal places included here will be needed in the calculations that will be presented later herein.)

With respect to the above four proportions estimated under the latent class model, when these proportions are viewed as proportions in the 2 × 2 table describing the estimated joint distribution of the intrinsic latent variables Y and Z, we have πX1YZ11, πX2YZ12, πX3YZ21, πX4YZ22. In other words, the latent variable X here denotes the joint latent variable (Y, Z), and the four latent classes of latent variable X correspond to the four levels with respect to this joint latent variable—that is, (1, 1), (1, 2), (2, 1), and (2, 2), respectively. Using the corresponding four proportions in the preceding paragraph, the corresponding odds-ratio in the 2 × 2 table is (πX1πX4)/(πX2πX3) = 3.37. So we see that there is a strong positive relationship between the intrinsic latent variable on membership and the intrinsic latent variable on attitude.

With the latent class analysis of the data in Table 1 presented in the present section, we shall next consider the two different assignment procedures for assigning individuals to latent classes, and then the two criteria for assessing these assignment procedures. The procedures and criteria will make use of the information in Table 2, and the second criterion will also make use of the information presented earlier in this section on the proportion of individuals (i.e., πXt, for t= 1, 2, 3, 4) estimated to be in each of the four latent classes under the latent class model.

Before closing this section, we note that more about the special four-class latent class model considered here is included in Section A.1 of the Appendix. We also refer interested readers to Goodman (1974a, 1974b) for additional material on this special latent class model and on other latent class models. Reference is also made here to, e.g., Clogg and Goodman (1984, 1985), Goodman (1987, 2002, 2007), and Hagenaars and McCutcheon (2002).

3. TWO ASSIGNMENT PROCEDURES

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. AN EXAMPLE
  5. 3. TWO ASSIGNMENT PROCEDURES
  6. 4. TWO CRITERIA FOR ASSESSING ASSIGNMENT PROCEDURES M AND R
  7. 5. THE PARTITIONING OF ERROR RATES
  8. 6. CONCLUSION
  9. Appendix
  10. REFERENCES

In this section, we first consider the assignment procedure described briefly in Goodman (1974a). This procedure is closely related to the measure of association λ, which is based on optimal prediction, presented in Goodman and Kruskal (1954). We then introduce the second assignment procedure. This is closely related to the measure of association τ, which is based on proportional prediction, also presented in Goodman and Kruskal (1954).

For an individual whose response pattern is (i, j, k, l), let us now consider the first assignment procedure. This procedure assigns the individual to a corresponding modal latent class—that is, to a particular latent class for which the corresponding estimated conditional probability inline image (considering t= 1, 2, 3, 4) is the largest. If this assignment procedure is applied to all of the individuals whose response pattern is (i, j, k, l), using the estimated conditional probabilities in Table 2, we see, for example, that the 458 individuals whose response pattern is (1, 1, 1, 1) would be assigned to latent class 1; and the 554 individuals whose response pattern is (2, 2, 2, 2) would be assigned to latent class 4. This simple assignment procedure minimizes the number of incorrect assignments. Applying this procedure to the individuals whose response pattern is (i, j, k, l) for each of the 16 response patterns, we find that the number of individuals assigned to the latent classes 1, 2, 3, and 4 is 1113, 279, 641, and 1365, respectively, and the corresponding proportions assigned to the latent classes are as follows: pX1= .3275, pX2= .0821, pX3= .1886, and pX4= .4017. When these four proportions assigned to the latent classes are compared with the corresponding proportions presented in the preceding section—that is, the proportion of individuals in each of the four latent classes (πX1, πX2, πX3, πX4) estimated under the latent class model—we see that the proportions obtained with the first assignment procedure considered in this section (i.e., the assignment procedure that minimizes the number of incorrect assignments) can be very different from the estimated proportions obtained under the latent class model.

Under the four-class latent class model considered here, we noted in the preceding section that the particular ratio (πX1πX4)/πX2πX3) was of special interest. Using the estimated proportions πXi (for t= 1, 2, 3, 4) in the preceding section, the corresponding estimated ratio was 3.37; whereas, using the corresponding proportions obtained when individuals are assigned to latent classes by applying the assignment procedure considered above, the corresponding ratio, (pX1pX4)/(pX2pX3), is 8.49.

We shall now consider a different procedure for assigning individuals to latent classes. This assignment procedure is designed in a way so that the expected proportion of individuals assigned to each of the four latent classes can approximate the corresponding proportion, πXt, for t= 1, 2, 3, 4, estimated under the latent class model.

With this assignment procedure, each of the fijkl individuals whose response pattern is (i, j, k, l) would be assigned at random to one of the latent classes 1, 2, 3, 4, using the corresponding estimated probability distribution, inline image, for t= 1, 2, 3, 4, to make the random assignments. For example, with this assignment procedure, using the estimated conditional probabilities in Table 2, each of the 458 individuals whose response pattern is (1, 1, 1, 1) would be assigned at random to one of the four latent classes, using the probability distribution .9355, .0529, .0097, .0019 to make the random assignments to latent classes 1, 2, 3, 4, respectively; each of the 554 individuals whose response pattern is (2, 2, 2, 2) would be assigned at random to one of the four latent classes, using the probability distribution .0012, .0090, .0381, .9518 to make the random assignments to latent classes 1, 2, 3, 4, respectively. So we see here that each of the 458 individuals sharing a given response pattern also share a given assignment probability distribution; each of the 554 individuals sharing a very different given response pattern also share a very different given assignment probability distribution.

The assignment procedure described in the preceding paragraph is such that, for the fijkl individuals whose response pattern is (i, j, k, l), the expected proportion of individuals assigned to latent classes 1, 2, 3, 4 will be equal to the corresponding estimated probabilities inline image, for t= 1, 2, 3, 4. We explain in the next section why the expected proportion of individuals assigned to each of the four latent classes using this assignment procedure, applied to the individuals sharing a given response pattern for each of the 16 possible response patterns, can approximate the corresponding proportion πXt, for t= 1, 2, 3, 4, estimated under the latent class model.

The first assignment procedure considered in this section (and earlier in Goodman [1974a]), used a modal latent class based on the estimated probability distribution of the four latent classes corresponding to each of the 16 response patterns; we shall call this assignment procedure M. (For related material on this procedure, see Vermunt and Magidson [2005].) The second assignment procedure uses random assignments based on the estimated probability distribution of the latent classes corresponding to each of the 16 response patterns; we shall call this assignment procedure R.

4. TWO CRITERIA FOR ASSESSING ASSIGNMENT PROCEDURES M AND R

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. AN EXAMPLE
  5. 3. TWO ASSIGNMENT PROCEDURES
  6. 4. TWO CRITERIA FOR ASSESSING ASSIGNMENT PROCEDURES M AND R
  7. 5. THE PARTITIONING OF ERROR RATES
  8. 6. CONCLUSION
  9. Appendix
  10. REFERENCES

With respect to the first assignment procedure considered in the preceding section (i.e., assignment procedure M, the procedure that minimizes the number of incorrect assignments), here is a simple way to estimate the proportion of incorrect assignments obtained when this assignment procedure is used:

With fijkl denoting the observed number of individuals whose response pattern is (i, j, k, l), we let n denote the total number of observed individuals in the 2 × 2 × 2 × 2 table {A, B, C, D}. Now we let EM denote the proportion of incorrect assignments obtained with assignment procedure M. The following formula can be used to estimate EM:

  • image(1)

where

  • image

Using the corresponding information in Table 2, we find with formula (1) that the estimate of EM is .24. (More on formula [1] will be included in Section A.2 of the Appendix.)

Strictly speaking, a small improvement in formula (1) can be obtained by reducing slightly each product (fijkl·μABCDijklm) in the formula, by using the corresponding whole number without the decimal that follows the whole number (i.e., by replacing the product with the largest integer that is less than or equal to the product). This replacement of the product is equal to the number of (whole) individuals estimated to be in the modal latent class corresponding to response pattern (i, j, k, l). With this small improvement, again using the corresponding information in Table 2, we find that the estimate of EM becomes .25.

Now let us consider the second assignment procedure—assignment procedure R—described in the preceding section. With this procedure, the fijkl individuals whose response pattern is (i, j, k, l) are assigned at random to latent classes 1, 2, 3, 4 using the corresponding estimated probability distribution, inline image, for t= 1, 2, 3, 4, to make the random assignments. The estimate of the expected proportion ER of incorrect assignments obtained with assignment procedure R can be described simply as follows:

  • image(2)

where

  • image

Using the corresponding information in Table 2, we find with formula (2) that the estimate of ER is .34. (More on formula [2] will be included in Section A.3 of the Appendix.)

For those readers who may be interested in analyzing in more detail the proportion EM of incorrect assignments obtained with assignment procedure M, and the expected proportion ER of incorrect assignments obtained with assignment procedure R, a method of partitioning these error rates will be presented in the next section.

We have been considering in this section the estimate of the proportion (or the expected proportion) of incorrect assignments obtained with an assignment procedure as our first criterion for assessing the assignment procedure. We noted earlier that assignment procedure M minimizes the proportion of incorrect assignments. (More on this in Section A.2 of the Appendix.) On the other hand, we also found that the proportion of individuals assigned to each of the latent classes by assignment procedure M can be very different from the corresponding estimated proportions obtained under the latent class model.

Our second criterion for assessing an assignment procedure is the comparison of the proportion of individuals assigned to each of the latent classes by the assignment procedure with the corresponding estimated proportions obtained under the latent class model. We shall now see why the expected proportion of individuals assigned to each of the four latent classes using assignment procedure R can approximate the corresponding proportion, πXt for t= 1, 2, 3, 4, estimated under the latent class model.

To explain the above phenomenon, we make use of the following formula:

  • image(3)

for t= 1, 2, 3, 4, where Fijkl is the expected number of individuals whose response pattern is (i, j, k, l) under the latent class model, and where n is the total number of individuals in the cross-classification table. (More on this formula will also be included in Section A.3 in the Appendix.) When the latent class model fits well the observed data in the cross-classification table (i.e., when the estimated Fijkl are sufficiently close to the corresponding observed fijkl), we can approximate the quantity in parentheses on the left side in formula (3) by inline image, and formula (3) will then be approximated by

  • image(4)

where inline image denotes an approximation to πXt, for t= 1, 2, 3, 4. Since assignment procedure R assigns each of the fijkl individuals whose response pattern is (i, j, k, l) at random to latent classes 1, 2, 3, 4, using the estimated probability distribution inline image (for t= 1, 2, 3, 4), we see that the expected number of these individuals who are assigned to latent class t is simply inline image, for t= 1, 2, 3, 4. When assignment procedure R is applied to each of the fijkl individuals whose response pattern is (i, j, k, l), for each of the 16 possible response patterns, the expected proportion of these n individuals who are assigned to latent class t is given by formula (4), which is an approximation to πXt, for t= 1, 2, 3, 4.

Using the corresponding information in Table 2, we find with formula (4) that, despite the fact that the fijkl differ somewhat from the corresponding Fijkl, the inline image (for t= 1, 2, 3, 4) turned out to be equal to the corresponding πXt to four decimal places. (For greater accuracy, the information actually used here with formula (4) used eight decimal places for the inline image rather than the corresponding four decimal places presented in Table 2.)

Assignment procedure R replaces the Fijkl in formula (3) with the corresponding observed number fijkl, and it replaces the inline image with a random assignment procedure in which the expected proportion of the fijkl individuals who are assigned to latent class t is equal to inline image, for t= 1, 2, 3, 4. Since assignment procedure R is a random assignment procedure, the actual proportion of the fijkl individuals assigned to each latent class will differ in a random way from the corresponding expected proportion. Also, the number of individuals fijkl whose response pattern is (i, j, k, l) will differ from the corresponding Fijkl in a random way, given that the latent class model applied to the data in Table 1 is a correct model for describing these data. The actual proportion of individuals assigned to each latent class using the assignment procedure R should approximate the corresponding πXt, for t= 1, 2, 3, 4, estimated under the latent class model; but this approximation will depend on the magnitude of the differences between the fijkl and the corresponding Fijkl, and on the size of the fijkl, and on the conditional probabilities inline image, for t= 1, 2, 3, 4.

5. THE PARTITIONING OF ERROR RATES

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. AN EXAMPLE
  5. 3. TWO ASSIGNMENT PROCEDURES
  6. 4. TWO CRITERIA FOR ASSESSING ASSIGNMENT PROCEDURES M AND R
  7. 5. THE PARTITIONING OF ERROR RATES
  8. 6. CONCLUSION
  9. Appendix
  10. REFERENCES

We first consider the proportion EM of incorrect assignments using assignment procedure M. For the fijkl individuals whose response pattern is (i, j, k, l), using assignment procedure M, the proportion of these individuals who are assigned correctly is μABCDijklm, where m is a modal latent class, and where

  • image

The proportion of these individuals who are assigned incorrectly is inline image for all tm. And so, for the fijkl individuals whose response pattern is (i, j, k, l), we find that inline image of them are assigned correctly, and inline image are assigned incorrectly for each latent class tm.

Now let us partition the 16 response patterns into four separate sets; with Δm denoting the set of those response patterns that have a latent class m as their modal class, for m= 1, 2, 3, 4. For example, we see from the information in Table 2 that the set Δ1 consists of the response patterns (1, 1, 1, 1), (1, 1, 1, 2), (1, 2, 1, 1), (2, 1, 1, 1), (2, 1, 1, 2), (2, 2, 1, 1); and the set Δ4 consists of the response patterns (1, 1, 2, 2), (1, 2, 2, 1), (1, 2, 2, 2), (2, 1, 2, 2), (2, 2, 2, 1), (2, 2, 2, 2). Let fΔm denote the total number of individuals whose response pattern is one of the response patterns in the set Δm, for m= 1, 2, 3, 4. (From the information in Table 2, we see that fΔ1= 1113, fΔ2= 279, fΔ3= 641, fΔ4= 1365, as was noted earlier in Section 3.) Using assignment procedure M, the proportion of the n individuals in the cross-classification table who are assigned to latent class m is simply fΔm/n. And the proportion of the fΔm individuals who are assigned correctly is inline image, for m= 1, 2, 3, 4, where the summation is made over all response patterns in the set Δm. Similarly, the proportion of the fΔm individuals who are assigned incorrectly is inline image, for all tm, for m= 1, 2, 3, 4. Using the information in Table 2, we now present in Table 3 a partitioning of the proportion of correct assignments and the proportion of incorrect assignments using assignment procedure M. (For related material on this procedure, see Vermunt and Magidson [2005].)

Table 3.  Partition of the Proportion of Correct Assignments and the Proportion of Incorrect Assignments Using Assignment Procedure M
Latent ClassProportion Assigned to Latent Class*Proportion of Those Who Were in Column Latent Class Among Those Who Were Assigned to Row Latent Class*
1234
  1. * See corresponding comments in this section.

1.3275.7311.1742.0517.0430
2.0821.0981.7283.0067.1669
3.1886.0878.0050.7621.1452
4.4017.0199.0265.1749.7788

We noted in the preceding section, using the information in Table 2, that the proportion of incorrect assignments obtained with assignment procedure M was .24, and so the proportion of correct assignments was .76. With Table 3, we see (1) how the proportion of correct assignments varies depending on which assignment latent class is considered, and (2) how the proportion of incorrect assignments varies depending on which assignment latent class is considered and on which latent class the individuals are in.

Now let us consider assignment procedure R. For the fijkl individuals whose response pattern is (i, j, k, l), using assignment procedure R, we find that the expected number of these individuals who are assigned to latent class t is inline image, for t= 1, 2, 3, 4; and the expected number of these individuals who are assigned correctly to this latent class is inline image. The following formula is the expected number who are assigned incorrectly:

  • image

the expected number who are assigned to latent class t and are in latent class s, where st.

Now using assignment procedure R, the following formula is the expected proportion of the n individuals in the cross-classification table who are assigned to latent class t:

  • image(5)

Formula (6) is the proportion of the expected number of individuals correctly assigned to latent class t among the expected number assigned (correctly or incorrectly) to that latent class:

  • image(6)

Formula (7) is the proportion of the expected number of individuals in latent class s among the expected number assigned (correctly or incorrectly) to latent class t, where st:

  • image(7)

Using the information in Table 2, we now present in Table 4 a partitioning of the expected proportion of correct assignments and the expected proportion of incorrect assignments using assignment procedure R. The entries in the column of the expected proportions assigned to each latent class were obtained using formula (5), for t= 1, 2, 3, 4; and the entries on the main diagonal of the part of Table 4 that can be viewed as a 4 × 4 table were obtained using formula (6), for t= 1, 2, 3, 4. The entries that are not on the main diagonal of the 4 × 4 table (say in row t and column s, for st), were obtained using formula (7), for all st, in row t= 1, 2, 3, 4.

Table 4.  Partition of the Expected Proportion of Correct Assignments and the Expected Proportion of Incorrect Assignments Using Assignment Procedure R
Latent ClassExpected Proportion Assigned to Latent Class*Proportion of the Expected in Column Latent Class Among the Expected Assigned to Row Latent Class
1234
  1. * See formula (5) and corresponding comments in this section.

  2. See formulas (6) and (7), and corresponding comments in this section.

1.2720.7135.1527.0822.0515
2.1284.3235.5025.0324.1415
3.2315.0966.0180.5787.3066
4.3680.0381.0494.1929.7196

We noted in the preceding section, using the information in Table 2, that the expected proportion of incorrect assignments obtained with assignment procedure R was .34, and so the expected proportion of correct assignments was .66. With Table 4, we see how the expected proportion of correct assignments varies, and how the expected proportion of incorrect assignments varies. In addition, with the entries in Table 4 in the column of the expected proportion of individuals assigned to each latent class using assignment procedure R, we also see how very different these expected proportions are from the entries in Table 3 in the corresponding column of the proportion assigned to each latent class using assignment procedure M. In addition, we note here, as we did earlier in the preceding section, that the expected proportion assigned to each latent class using assignment procedure R turned out, in this example, to be the same, to four decimal places, as the proportion of individuals in each of the latent classes estimated under the latent class model.

6. CONCLUSION

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. AN EXAMPLE
  5. 3. TWO ASSIGNMENT PROCEDURES
  6. 4. TWO CRITERIA FOR ASSESSING ASSIGNMENT PROCEDURES M AND R
  7. 5. THE PARTITIONING OF ERROR RATES
  8. 6. CONCLUSION
  9. Appendix
  10. REFERENCES

We have considered here the following two procedures for assigning individuals to latent classes: (1) assignment procedure M minimizes the number of incorrect assignments, and (2) assignment procedure R assigns individuals to the different latent classes in a way so that, for those individuals whose response pattern is (i, j, k, l), the expected proportion of these individuals assigned to each latent class will be equal to the corresponding conditional probability inline image estimated to be in that latent class under the latent class model. Formulas (1) and (2) in this paper can be used to estimate the proportion of incorrect assignments using assignment procedure M and the expected proportion of incorrect assignments using assignment procedure R, respectively. In addition, with assignment procedure M, we can determine the proportion of individuals assigned to each of the latent classes; with assignment procedure R, we can determine the expected proportion of individuals assigned to each of the latent classes. And these proportions and expected proportions can be compared with the corresponding proportion estimated to be in that latent class under the latent class model. The two criteria have been introduced here to help assess when assignment procedures M and/or R are satisfactory and when they are not.

Appendix

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. AN EXAMPLE
  5. 3. TWO ASSIGNMENT PROCEDURES
  6. 4. TWO CRITERIA FOR ASSESSING ASSIGNMENT PROCEDURES M AND R
  7. 5. THE PARTITIONING OF ERROR RATES
  8. 6. CONCLUSION
  9. Appendix
  10. REFERENCES
A.1. More on the Special Four-Class Latent Class Model

We shall begin this section by first considering the conditional probabilities inline image (for t= 1, 2, 3, 4), which are parameters in the special four-class latent-class model applied in Section 2. The inline image denotes the conditional probability of a response 1 on variable A, given that the respondent is in latent class t (for t= 1, 2, 3, 4); and so inline image is the conditional probability of a response 2 on variable A, given that the respondent is in latent class t. The other three conditional probabilities, inline image, are defined similarly. Table 5 presents each of these four conditional probabilities (for t= 1, 2, 3, 4) estimated under the special four-class latent class model.

Table 5.  Conditional Probabilities, inline image, Estimated Under the Special Four-Class Latent Class Model
Latent Class tinline imageinline imageinline imageinline image
  1. Note: The inline image is the conditional probability of a response 1 on variable A, given that the respondent is in latent class t (for t= 1, 2, 3, 4). The inline image are defined similarly.

1.7543.8056.9098.8325
2.7543.2665.9098.3015
3.1112.8056.0755.8325
4.1112.2665.0755.3015

Under this latent class model, for each respondent in latent class t, the respondent's responses on variables A, B, C, and D are assumed to be mutually independent of each other; so the conditional probability of each of the 16 possible response patterns (i, j, k, l), for the respondents in latent class t, can thus be expressed directly in terms of the conditional probabilities described in the preceding paragraph; and the corresponding estimate of the conditional probability of each of the 16 possible response patterns (i, j, k, l), for the respondents in latent class t, can be calculated directly using the estimated conditional probabilities in Table 5.

Now let inline image denote the conditional probability of a response pattern (i, j, k, l), given that the respondent is in latent class t; and so inline image is the expected proportion of individuals in the four-way cross-classification table {A, B, C, D} who are in latent class t and whose response pattern is (i, j, k, l). The estimate of the proportion πXt of individuals in the four-way table {A, B, C, D} who are in latent class t was given earlier in Section 2 (for t= 1, 2, 3, 4), and the conditonal probability inline image can be estimated for each of the 16 response patterns (i, j, k, l) using the estimated conditional probabilities in Table 5, as we noted in the preceding paragraph. For each response pattern (i, j, k, l), the sum of the inline image, summed from t= 1 to t= 4, is the expected proportion of individuals in the four-way table {A, B, C, D} whose response pattern is (i, j, k, l). And the expected frequency of response pattern (i, j, k, l) under the latent class model is obtained by multiplying each of the expected proportions by the total number of individuals in the table. The corresponding estimated expected frequencies for the 16 possible response patterns are presented in Table 6.

Table 6.  Comparison of Observed Frequencies in Table 1 with the Expected Frequencies Estimated Under the Special Four-Class Latent Class Model
VariableObserved FrequencyEstimated Expected Frequency
ABCD
1111458454.7695
1112140144.2282
1121110109.1153
1122 49 48.8634
1211171172.3030
1212182179.6693
1221 56 58.2627
1222 87 85.7585
2111184188.5942
2112 75 68.8156
2121531530.5210
2122281283.0927
2211 85 82.1401
2212 97101.4501
2221338337.2942
2222554553.0921

The expected proportion inline image considered in the preceding paragraph can also be viewed as the probability that an individual in the four-way table {A, B, C, D} is in latent class t and the individual's response pattern is (i, j, k, l). Thus, for each individual whose response pattern is (i, j, k, l), the conditional probability that the individual is in latent class t can be calculated by dividing inline image by the sum of the inline image, summed from t= 1 to t= 4. The corresponding estimated conditional probability that an individual is in latent class t (for t= 1, 2, 3, 4), given that the individual's response pattern is (i, j, k, l), was presented earlier in Table 2.

A.2 .  On Assignment Procedure M

We first consider formula (1) in Section 4. For the fijkl individuals whose response pattern is (i, j, k, l) on variables A, B, C, D, we find that, when assignment procedure M is applied, the number of individuals assigned correctly is fijkl·μABCDijklm, where μABCDijklm is defined just below formula (1), and (fijkl·μABCDijklm)/n is the proportion of the total number of the n individuals in the cross-classification table whose response pattern is (i, j, k, l) and who have been assigned correctly by assignment procedure M. Thus, the proportion of the total number of individuals in the cross-classification who have been assigned correctly is the quantity in braces on the right side of formula (1). And the proportion assigned incorrectly is one minus the proportion assigned correctly.

We noted earlier herein that assignment procedure M minimizes the proportion of incorrect assignment. If we first consider only assignment procedures in which all of the fijkl individuals whose response pattern is (i, j, k, l) are assigned to the same latent class, then it is clear from formula (1) that assignment procedure M minimizes the proportion of incorrect assignments, since μABCDijklm is the largest of the inline image for t= 1, 2, 3, 4. (If there are two or more latent classes that are modal, then assignment of the fijkl individuals to any one of the modal latent classes will still provide an assignment procedure that minimizes the proportion of incorrect assignments.)

We now consider an assignment procedure in which the proportion of the fijkl individuals assigned to latent class t is inline image for t= 1, 2, 3, 4, where inline image. The proportion of the fijkl assigned correctly will then be inline image, and this quantity will be maximized when

  • image

where m is a modal latent class. (If there are two or more latent classes that are modal, then the assignment procedure will minimize the proportion of incorrect assignments when inline image for each latent class t that is not modal, and the fijkl individuals are distributed among the latent classes that are modal in any manner.)

A.3 .  On Assignment Procedure R

We now consider formula (2) in Section 4. For the fijkl individuals whose response pattern is (i, j, k, l) on variables A, B, C, D, we find that, when assignment procedure R is applied to these individuals, the expected number assigned correctly is inline image, and inline image is the expected proportion of the total number of the n individuals in the cross-classification table whose response pattern is (i, j, k, l) and who have been assigned correctly by assignment procedure R. Thus, the expected proportion of the total number of individuals in the cross-classification who have been assigned correctly is the quantity in braces on the right side of formula (2). And so the expected proportion assigned incorrectly is one minus the expected proportion assigned correctly.

We now consider formula (3) in Section 4. This formula was obtained in the following way: We first note that the formula for the conditional probability inline image of being in latent class t, given that the response pattern was (i, j, k, l) on variables A, B, C, D, is the following:

  • image((A-1))

where πABCDijkl is the probability that the response pattern will be (i, j, k, l), and πABCDXijklt is the joint probability that the response pattern will be (i, j, k, l) and the latent class will be t. Also, the formula for the expected frequency Fijkl of response pattern (i, j, k, l) on variables A, B, C, D, can be written as follows:

  • image((A-2))

Now using formulas (A-1) and (A-2), we see that

  • image((A-3))

and so, formula (3) in Section 4 can be rewritten as follows:

  • image

This formula states that the joint probability that the response pattern will be (i, j, k, l) on variables A, B, C, D, and the latent class will be t, when summed over variables A, B, C, and D, is simply equal to the probability that the latent class will be t. In other words, the probability distribution that the latent class will be t (for t= 1, 2, 3, 4) is simply equal to the marginal distribution obtained when the joint probability distribution that the response pattern will be (i, j, k, l) on variables A, B, C, D, and the latent class will be t (for t= 1, 2, 3, 4), is summed over variables A, B, C, and D.

REFERENCES

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. AN EXAMPLE
  5. 3. TWO ASSIGNMENT PROCEDURES
  6. 4. TWO CRITERIA FOR ASSESSING ASSIGNMENT PROCEDURES M AND R
  7. 5. THE PARTITIONING OF ERROR RATES
  8. 6. CONCLUSION
  9. Appendix
  10. REFERENCES
  • Clogg, C. C., and L. A. Goodman. 1984. “Latent Structure Analysis of a Set of Multidimensional Tables. Journal of the American Statistical Asoociation 79:762771.
  • Clogg, C. C., and L. A. Goodman. 1985. “Simultaneous Latent Structure Analysis in Several Groups. Sociological Methodology 16:81110.
  • Coleman, J. S. 1964. Introduction to Mathematical Sociology. New York : Free Press.
  • Goodman, L. A. 1974a. “The Analysis of Systems of Qualitative Variables When Some of the Variables Are Unobservable. Part I—A Modified Latent Structure Approach. American Journal of Sociology 79:1179259.
  • Goodman, L. A. 1974b. “Exploratory Latent Structure Analysis Using Both Identifiable and Unidentifiable Models. Biometrika 61:21531.
  • Goodman, L. A. 1987. “New Methods for Analyzing the Intrinsic Character of Qualitative Variables Using Cross-Classified Data. American Journal of Sociology 93:529583.
  • Goodman, L. A. 2002. “Latent Class Analysis: The Empirical Study of Latent Types, Latent Variables, and Latent Structures.” Pp.353 in Applied Latent Class Analysis, edited by J. A.Hagenaars and A. L.McCutcheon. Cambridge , England : Cambridge University Press.
  • Goodman, L. A. 2007. “Statistical Magic and/or Statistical Serendipity: An Age of Progress in the Analysis of Categorical Data. Annual Review of Sociology 33.
  • Goodman, L. A., and W. H. Kruskal. 1954. “Measures of Association for Cross Classifications. Journal of the American Statistical Association 49:73284.
  • Gordon, A. D. 1999. Classification, Second Edition. Boca Raton , FL : Chapman & Hall/CRC.
  • Hagenaars, J. A., and A. L. McCutcheon, eds. 2002. Applied Latent Class Analysis. Cambridge , England : Cambridge University Press.
  • MacLachlan, G. J., and D. Peel. 2000. Finite Mixture Models. New York : Wiley.
  • Magidson, J., and J. K. Vermunt. 2001. “Latent Class Factor and Cluster Models, Bi-Plots and Related Graphical Displays.” Pp. 22364 in Sociological Methodology, vol.31, edited by Michael E.Sobel and Mark P.Becker. Boston , MA : Blackwell Publishing.
  • Vermunt, J. K. and J. Magidson. 2005. Technical Guide for Latent GOLD 4.0: Basic and Advanced. Belmont , MA : Statistical Innovations.