*This paper describes a simple strategy for doing more reliable ethnography: after fieldwork has commenced, investigators can use thought experiments to recognize inconvenient phenomena. Two examples are discussed: “the ethnographic trial” and the “inconvenience sample.” The paper uses Clifford Geertz's classic “Notes on the Balinese Cockfight” as a case of how work could be made more reliable with such strategies. It highlights the value of systematically identifying aspects of the situation under study that have been excluded from the analysis.*

*Cross-cultural comparison of attitudes using rating scales may be seriously biased by response styles. This paper deals with statistical methods for detection of and correction for extreme response style (ERS), which is one of the well-documented response styles. After providing an overview of available statistical methods for dealing with ERS, we argue that the latent class factor analysis (LCFA) approach proposed by **Moors (2003)** has several advantages compared to other methods. Moors’ method involves defining a latent variable model which, in addition to the substantive factors of interest, contains an ERS factor. In LCFA the observed ratings can be treated as nominal responses, which is necessary for modeling ERS. We find strong evidence for the presence of ERS and, moreover, find that the groups differ not only in their attitudes but also in ERS. These findings underscore the importance of controlling for ERS when examining attitudes in cross-cultural research.*

*The theoretical consequences of measurement error in outcome variables that are continuous are widely known by practitioners, at least for the classical model: purely random errors will lead to a loss of efficiency but not to bias in regression coefficients. When the outcome variable is binary, however, regression coefficients, both linear and nonlinear, will contain bias, even if the measurement error (in this setting more commonly referred to as classification error) is purely random. This paper illustrates a method of correcting for misclassification bias that relies solely on the primary survey data. It is particularly suited to analyses of surveys where external validation of survey responses is unavailable but where there is strong reason to suspect contaminated data. This situation is common in observational studies of the health of populations. The technique is applied to a model of the antecedents of post-traumatic stress disorder (PTSD) using data from a large-scale cross-sectional survey of Vietnam-era veterans. Results show that when adjusted for errors in diagnoses, the sample PTSD prevalence estimate falls significantly; that failure to correct for misclassification in PTSD dramatically understates the effects of risk factors; and that this downward bias remains even when the model incorporates differential classification errors—that is, errors that are correlated with some of the explanatory variables in the model.*

*We examine several approaches for inferring logit models from empirical margins of predictor covariates and conditional margins containing the means of a binary response for each covariate margin. One method is to fit proxy data to the conditional response using the beta distribution, a process we call “margin analysis.” Proxy data can obtained using three approaches: (1) implementing the iterative proportional fitting (IPF) procedure on the margin totals, (2) sampling from a larger relevant data source such as the census, and (3) enumerating, or sampling from, the combinatoric space of all possible tables constrained by the margins. The first procedure is a well-studied approach for estimating contingency tables from margins, but it does not necessarily maintain the associations between the covariates unless seeded with an initial table containing those associations. In the second approach, which is appropriate for analyzing sociodemographic covariates, we can use a large census sample adjusting for sampling biases observed in the empirical margins. However, the appropriateness of using a census proxy depends substantially on how similar the sampling pools are. Our third approach entails exploring the combinatoric space of all contingency tables constrained by the margins while considering the associations among the covariates. We aggregate the logit models estimated from each table in that space into a single model. This approach is more robust than the first two as it considers multiple proxies. While the estimated logit models from each approach are generally similar to one another, for the low-dimensional tables we explore in this paper, the combinatoric approach incurs wider standard errors, which renders potentially significant coefficients insignificant. Finally, we suggest weighting the combinatoric models with evidence-relevant probabilities obtained using the multivariate Pólya distribution.*

*We deal with the problem of quantifying the degree to which parameter estimates in a structural equation model can be biased when structural relationships were not specified correctly by the researcher. We propose a framework to relate moment residuals to biases of parameter estimates and the overall noncentrality of the model. For each parameter in the model, an impact of either particular moment residual or the overall model noncentrality can be evaluated, although the latter tends to give error bounds that are rather conservative. We provide illustrative analytical and empirical examples to demonstrate the steps in application of the proposed procedures. The first example is a mildly misspecified model with causal indicators mistaken to be effect indicators. The resulting biases can be approximated very accurately by accounting for the effect of a single misfitted residual moment. The second example is a grossly misspecified model in which a mediating latent variable was erroneously omitted. In this case, the misspecification spreads to all entries of the covariance matrix, and measures based on overall noncentrality give a good indication of the magnitudes of plausible biases. The third example is an empirical study that uses Holzinger-Swineford factor analysis data that shows how the procedures can be used in practice.*

*Recent research has shown that two entropy-based segregation indices possess an appealing mixture of basic and subsidiary but useful properties. It would appear that the only fundamental difference between the mutual information or M index, and the entropy information or H index, is that the second is a normalized version of the first. This paper introduces another normalized index in that family, the H* index, which captures segregation as the tendency of racial groups to have different distributions across schools. More importantly, the paper shows that applied researchers may do better using the M index than using either H or H* in two circumstances: (1) if they are interested in the decomposability of the measurement of segregation, and (2) if they are interested in a margin-free measurement of segregation changes. The shortcomings of the H and H* indices are illustrated below by means of numerical examples, as well as with school segregation data by ethnic group in the U.S. public school system between 1989 and 2005.*

Optimal matching (OM) is a method that assesses sequence similarity. It was originally developed to study protein and DNA sequences and was later transferred to the social sciences where it was applied accordingly. However, there is an ongoing debate on the adequacy of its use in the social sciences, as a superficial transfer might not respond to the significant differences between typical sequences in biological and social settings. In this paper, I elaborate on these differences and introduce a distinction between two sequence types—namely, common ancestors and unfolding processes. While the first sequence type is typically found in biological settings (e.g., DNA sequences), the latter applies to most sequences studied in the social sciences (e.g., careers). Based on this distinction, I present a new way of coding sequences as an extension to conventional OM analyses and demonstrate its usefulness in simulated and empirical examples. The paper concludes with a discussion of this new approach and its integration into previous extensions of OM.

]]>*This paper introduces a new method for decomposing group differences in the mean of a variable into various within-group and between-group components with respect to group categories of intermediary variables. This is accomplished by considering counterfactual outcomes that would be realized by social interventions that change the relationship among variables. Because such a change does not by itself determine the counterfactual outcome, the paper introduces and juxtaposes two different mechanisms—the mechanism of realizing the counterfactual state that deviates least from the existing state, and the mechanism of holding relations among variables other than those that are modified by a given intervention unchanged—and demonstrates that despite the large difference in the mechanisms, they yield highly congruent outcomes. As an illustrative example, the paper analyzes gender inequality in hourly wages in Japan and thereby demonstrates the usefulness of the new method for deriving policy implications.*

*Many basic questions in the social network literature center on the distribution of aggregate structural properties within and across populations of networks. Such questions are of increasing relevance given the growing availability of network data suitable for meta-analytic studies, as well as the rise of study designs that involve the collection of data on multiple networks drawn from a larger population. Despite this, little work has been done on model-based inference for the properties of graph populations, or on methods for comparing such populations. Here, we attempt to rectify this gap by introducing a family of techniques that combines an existing approach to the identification of structural biases in network data (the use of conditional uniform graph quantiles) with strategies drawn from nonparametric Bayesian analysis. Conditional uniform graph quantiles are the quantiles of an observed structural property in the reference distribution produced by evaluating that property over all graphs with certain fixed characteristics (e.g., size or density). These quantiles have long been used to measure the extent to which a property of interest on a single network deviates from what would be expected given that network’s other characteristics. The methods introduced here employ such quantile information to allow for principled inference regarding the distribution of structural biases within (and comparison across) populations of networks, given data sampled at the network level. The data requirements of these methods are minimal, thus making them well-suited to meta-analytic applications for which complete network data (as opposed to summary statistics) are often unavailable. The structural biases inferred using these methods can be expressed in terms of posterior predictives for familiar and easily communicated quantities, such as p-values. In addition to the methods themselves, we present algorithms for posterior simulation from this model class, illustrating their use with applications to the analysis of social structure within urban communes and radio communications among emergency personnel. We also discuss how this approach may applied to quantiles arising from other reference distributions, such as those obtained using general exponential-family random graph models.*

*General random graphs (i.e., stochastic models for networks incorporating heterogeneity and/or dependence among edges) are increasingly in wide use in the study of social and other networks, but few techniques other than simulation have been available for studying their behavior. On the other hand, random graphs with independent edges (i.e., the Bernoulli graphs) are well-studied, and a large literature exists regarding their properties. In this paper, we demonstrate a method for leveraging this knowledge by constructing families of Bernoulli graphs that bound the behavior of an arbitrary random graph in a well-defined sense. By studying the behavior of these Bernoulli graph bounds, we can thus constrain the properties of a given random graph. We illustrate the utility of this approach via application to several problems from the social network literature, including identifying degeneracy in Markov graph models, studying the potential impact of tie formation mechanisms on epidemic potential in sexual contact networks, and robustness testing of inhomogeneous Bernoulli models based on geographical covariates. Practical heuristics for assessing bound tightness and guidance for use in theoretical and methodological applications are also discussed.*