## Introduction

Reconstructability analysis (RA) is a modeling methodology developed in the systems community (Klir, 1976, 1985; Conant, 1981; Krippendorff, 1981, 1986; Broekstra, 1979; Cavallo, 1979; and others) based on the work of Ashby (Klir, 1986). It uses set theory or information theory to assess models whose structures are defined by graph theory. These hypergraph structures specify which relationships between variables satisfactorily model the data, and allow one to posit relationships that are not merely dyadic (two-way), but of arbitrary ordinality (triadic, tetradic, etc.). In the set-theoretic version of RA (SRA), the input data are a set theoretic relation, that is, a subset of the Cartesian product of the sets of values of the variables. In the information-theoretic version of RA (IRA), the input data are a frequency or probability distribution. In both, a model is the relation or distribution—henceforth the word “relation” will be used for either—that maximizes entropy subject to the constraints of the model's structure. IRA partially overlaps log-linear (LL) methods and logistic regression (LR) (Bishop et al., 1978; Knoke & Burke, 1980), and Bayesian networks and other graphical models (Lauritzen, 1996). In the areas of overlap, RA and these other methods are typically equivalent. For example, in many contexts the maximum entropy solutions of RA give identical results to the maximum likelihood solutions of other methods.

IRA, given a frequency distribution, is an information-theoretic approach to statistical multivariate analysis, but it can alternatively be given a probability distribution, with no sample size, for which the analysis is nonstatistical. Set-theoretic mappings can also be treated probabilistically (and nonstatistically) with IRA. In its “k-systems” version (Jones, 1985), IRA can analyze continuous functions of nominal variables by rescaling the functions and treating them as probability distributions. IRA also has a Fourier version (Zwick, 2004b), which, in conjunction with the k-systems approach, resembles regression. The possibility of additional versions of RA is also implicit in generalized information theory (Klir, 2005), which includes fuzzy distributions.

From the input data one can generate projections; for example, a relation ABZ generates projections AB, AZ, BZ, A, B, and Z. Projection drops variables by doing a logical “or” (in SRA) or a summation (in IRA) over the values of the projected variables. If data are decomposable without information loss into a set of projections (lower-dimensional distributions or set-theoretic relations) represented by a hypergraph, then the hypergraph—more precisely, the projections it indicates—determine a calculated distribution or relation that satisfactorily models the observed data. The possible hypergraphs define a lattice of structures, and a major concern of RA is how best to search this lattice for good models. Here, a “lattice” is a partially ordered graph that has a single upper node (structure) where all variables are included in one relation and a single lower node (structure) where variables or sets of variables are distributed into separate relations, that is, are independent of one another; this is explained more fully later in section “Methods”—Lattice of Structures. The RA lattice can also be augmented by graphical models related to but not encompassed by the RA formalism. By contrast, other methods (e.g., LR) that are similar or possibly equivalent to RA in the estimation of individual models often do not explicitly articulate this lattice of possible models or provide heuristics for searching it. When applied to genomic data, the RA lattice offers a taxonomy of epistasis types, where a type is defined coarsely by the simplest structure that fits the data without loss or defined in more detail by the vector of information values for different decompositions.

This paper amplifies the previous use of RA (Shervais et al., 2010) to detect epistasis in genomic data. This amplification consists of (i) a more complete description of RA methodology than offered in that earlier study, (ii) the use of RA to define a taxonomy of epistasis types, and (iii) new extensions of RA methodology. Because the focus of this paper is on RA methodology itself, no attempt is made to provide definitive expositions of the similarities and differences between RA and other methods (this is the subject of ongoing work); nonetheless, the relationships between RA and other methods are briefly considered in the final “Discussion” section.