A note on investigating co‐occurrence patterns and dynamics for many species, with imperfect detection and a log‐linear modeling parameterization

Abstract Patterns in, and the underlying dynamics of, species co‐occurrence is of interest in many ecological applications. Unaccounted for, imperfect detection of the species can lead to misleading inferences about the nature and magnitude of any interaction. A range of different parameterizations have been published that could be used with the same fundamental modeling framework that accounts for imperfect detection, although each parameterization has different advantages and disadvantages. We propose a parameterization based on log‐linear modeling that does not require a species hierarchy to be defined (in terms of dominance) and enables a numerically robust approach for estimating covariate effects. Conceptually, the parameterization is equivalent to using the presence of species in the current, or a previous, time period as predictor variables for the current occurrence of other species. This leads to natural, “symmetric,” interpretations of parameter estimates. The parameterization can be applied to many species, in either a maximum likelihood or Bayesian estimation framework. We illustrate the method using camera‐trapping data collected on three mesocarnivore species in South Texas.

that is, species may occur at a surveyed location, yet be undetected by the field methods employed (but see Cam et al., 2000). This will lead to "false absences" that may result in misleading inferences about species co-occurrence patterns. MacKenzie et al. (2006) demonstrated that when the probability of species detection is unaffected by the presence of other species, the direction of any association between the two species (i.e., positive or negative effect on co-occurrence) may be correctly estimated using methods that do not account for imperfect detection, but the magnitude of the dependence will be underestimated. Whereas, when detection probability of one species is different depending on the presence of the second species (e.g., due to behavioral changes in the presence of a competing species), using methods that ignore imperfect detection may not even estimate the direction of any association correctly. MacKenzie et al. (2004) developed a modeling approach to investigate co-occurrence patterns between two species, while accounting for imperfect detection. An important basis of their method is recognizing that with two species of interest, a surveyed location may be in one of four possible states defined by the presence or absence of each species (i.e., species A and B present, only species A present, only species B present, or neither species present). MacKenzie et al. (2004) parameterized the co-occurrence component of their model in terms of the joint probability of both species occurring at a unit ( AB ) and the marginal, or overall, probabilities of each species occupying a unit (i.e., A and B ). They suggested the level of co-occurrence could be quantified in terms of: where a value of 1 would imply independence. They used a similar parameterization for the detection component, noting that which species could be detected in a survey of a unit would depend on the "true" state of the location. Potential covariate relationships with any of the parameters could be explored; however, it was found to be numerically unstable because of the constraints imposed upon possible parameter values (MacKenzie et al., 2006). Richmond et al. (2010) and Waddle et al. (2010) independently implemented an alternative parameterization (hereafter referred to as the RW parameterization) of the MacKenzie et al. (2004) model that was more numerically robust, particularly with covariates. The RW parameterization requires identifying a hierarchy between species where species A is defined as the "dominant" species and species B is the "subordinate" species, where the "subordinate" species is the focal species in an analysis (i.e., how is the occurrence of species B affected by the presence/absence of species A). The model is parameterized in terms of the marginal occurrence probability of species A and the occurrence probability for species B conditional on species A being either present or absent from the unit (denoted here as B | A and B | a , respectively; with the lowercase "a" indicating absence of species A). A similar conditional parameterization was also implemented for the detection component of the model. The RW parameterization could be regarded as "asymmetric" as a direction to the interaction between species is assumed, while the MacKenzie et al. (2004) parameterization is "symmetric" as no direction is assumed.
While both the MacKenzie et al. (2004) and RW models were initially presented in the context of co-occurrence between two species, they generalize to situations with a greater number of species, with the number of possible parameters to estimate increasing exponentially with the number of species (although constraints could be applied to reduce the number of parameters in the model). Rota et al. (2016) developed a species co-occurrence model using a "multivariate Bernoulli distribution," which has one Bernoulli random variable per species. However, this is essentially the same general approach used by earlier authors, where possible states are defined in terms of the combinations of which species are present or absent. Therefore, the Rota et al. (2016)  The underlying dynamic processes of species co-occurrence are also of interest to many ecologists, although methods to quantify them have received much less attention than those examining co-occurrence patterns, particularly while also accounting for the imperfect detection of the target species (although see Fidino et al., 2019;Haynes et al., 2014;MacKenzie et al., 2006;Miller et al., 2012;Yackulic et al., 2014). As in the static co-occurrence situation, there are numerous ways in which such a model could be parameterized to quantify the level of interaction between species in terms of co-occurrence dynamics (e.g., Fidino et al., 2019;MacKenzie et al., 2006MacKenzie et al., , 2018. In this paper, we first note the link between the "multivariate Bernoulli distribution" used by Rota et al. (2016) and the well-known statistical method of log-linear modeling used for analyzing contingency table or count data (e.g., Poisson regression). Understanding this connection improves our ability to formulate, and interpret, models for more than two species. We also detail how a dynamic multispecies model could be defined using the log-linear framework, with a simple example application. In the following, we focus on how the models can be parameterized in terms of log-linear models and do not supply the details of the underlying modeling procedure, as that has been suitably described elsewhere (e.g., Fidino et al., 2019;MacKenzie et al., 2004MacKenzie et al., , 2009MacKenzie et al., , 2018Richmond et al., 2010;Rota et al., 2016;Waddle et al., 2010).

| Log-linear models
Log-linear models are used to analyze count data, particularly to assess the independence of factors used to construct contingency tables, and possibly other predictor variables. Analyses can be con- where K is a normalizing constant such that the i 's sum to 1.0. The U parameter defines the effect of level 2 of factor U on the probability when v = 1, the V parameter defines the effect of level 2 of factor V on the probability when u = 1. The UV parameter defines the level of interaction, or dependence, between factors U and V on the probability structure. The two factors are independent when UV = 0, and in many applications, it is the nature of the interaction between the factors on the cell probabilities (or counts) that is of interest. The cell probabilities for a 2 × 2 table are given in more detail in Table 1, where An equivalent approach to using the corner-point constraint is from the other levels for that factor. In the 2 × 2 contingency table case, using the first level of factors U and V as the "reference" levels, then the indicator variables z U i and z V i can be defined, which equal 1 if the observed factor level was 2, and equal 0 otherwise (Table 1).
The log-linear model can then be expressed as: Hence, in a regression context, the indicator variables are predictor variables representing the combination of factor levels for an observation, and the α terms are regression coefficients quantifying the magnitude of the effect for each factor level. Coefficients associated with an interaction between two (or more) factors, for example, the parameter UV for the z U i z V i interaction, quantify how the effect of one factor is different depending on the value of the other factor(s).
When there are more than 2 levels for a factor, then the loglinear model generalizes in the obvious manner. For example, if factor U had 2 levels and factor V contained 3, the indicator variables z V2 i and z V3 i could be defined to equal 1 if the observed factor level was 2 or 3, respectively. The log-linear model would then be: Similarly, the approach easily generalizes to a greater number of factors. For example, with three factors (U, V, and W) with two levels each, then: In all cases, K would be defined differently to ensure that the cell probabilities sum to one.

| Species co-occurrence data-single season
Species co-occurrence data, assuming perfect detection, can be represented as a contingency table. Each factor is a species, and in the absence/presence case, there are two levels for each species (henceforth denoted with lowercase and uppercase characters, respectively). The structure of the possible observations for two species (species A and B), indicator variables, and associated cell probability structure is given in Table 2. The log-linear model, expressed in terms of the indicator variables, would therefore be: TA B L E 1 Example of cell probability ( i ) structure for 2 × 2 contingency table, using the corner-point constraint. U and V are the factors of interest, each with 2 levels. The binary indicator variables (z U i and z V i ) for the second level of each factor are also presented where z A and z B are the binary-valued variables indicating the presence of each species. While covariates have not been considered here, the general cell probability structure is the same as that used Rota et al. (2016) where the set of indicator variables represent their "multivariate Bernoulli distribution," with A , B , and AB being equivalent to the f 1 , f 2 , and f 12 parameters defined by Rota et al. (2016).
As shown by Rota et al. (2016), the model parameters are directly interpretable in terms of the probability of each species being present, conditional upon the presence or absence of the other species.
That is: Therefore, A and B determine the probability of occupancy (on the logit-scale) for each species given the absence of the other species, and AB is the effect that the presence of one species has on the other. Hence, AB parameter is a symmetric measure of cooccurrence between the two species, where AB = 0 indicates the species co-occur independently, while a negative value indicate some form of exclusion or avoidance and a positive value indicate the species tend to occur together. Inferences about the level of co-occurrence between species could be based on estimates of AB (e.g., by considering confidence intervals), or one could "test" for independence of the species by comparing the fit of a model where AB is estimated, to the fit of a model with the constraint AB = 0. Note that the level of association can also be expressed as an odds ratio: Therefore, this is similar to the RW parameterization, but the interaction between species is modeled as a symmetric relationship.
Heuristically, the presence or absence of one species is being used as a covariate on the probability of occurrence of the other species.
The extension to more than two species is therefore straightforward.
For example, with three species a third indicator variable can be defined (z C ) and the model for the contingency table cell probabilities becomes: The parameters AB , AC and BC quantify the two-way interactions between species and ABC the three-way interaction. As noted by Rota et al. (2016), and also MacKenzie et al. (2018, p. 555), it is not always necessary to estimate higher-order interaction terms between many species, and in fact, very large sample sizes may be required to obtain reliable parameter estimates. Furthermore, complex interactions between many species will be difficult to interpret biologically. Therefore some higher-order interaction terms may be set equal to zero. In the log-linear modeling literature, this is known as conditional independence. For example, the occurrence of species A and B may appear to be not independent, but that is because both species have a nonindependent co-occurrence relationship with species C. Given the presence or absence of species C, species A and B occur independently of each other (i.e., species A and B are conditionally, upon species C, independent). This hypothesis could be fit by constraining ABC = 0 and AB = 0.

| Covariates
The effect of potential covariates on the occurrence, or cooccurrence, for each species can be easily incorporated in the loglinear modeling framework, where the effect of such covariates may be the same, or different for each species. For example, if a covariate x 1 is thought to affect the occurrence of species A, the covariate x 2 affect the occurrence of species B, but the level of co-occurrence interaction is unaffected by either covariate, the following model could be fit to the data: If covariate x 1 is also thought to affect the level of interaction between species, then another model could be fit: Interpretation of the covariate effects would proceed exactly as normal.

| Extension to multiple seasons
To examine how species co-occurrences change over time, it is necessary to have data from multiple seasons, preferably at where X → Y denotes the probability of transitioning from occupancy state X in season t to state Y in season t + 1 (where the states are denoted as above). Importantly, the elements of each row must sum to 1, as a unit must be in one of the four states by the next season. When there are l species of interest, then the dimension of the TPM will be 2 l × 2 l .  (Table 3).

As noted by
Let z X i denotes the presence of species X in given state in season t, z X j denotes the presence of the species in season t + 1. The general structure for the cell probability in row i and column j could be defined as: where K is a normalizing constant defined to ensure the probabilities for each row of the TPM sum to 1. This is a very general formulation, allowing complex relationships about the dynamic co-occurrence processes to be evaluated, providing sufficient data. However, the model can be simplified by applying constraints to some parameters. being present in season t + 1. One could make a-priori predictions about the expected direction of such effects based on whether the species are considered to exclude one another, or not.
Generalizing to a greater number of species is achieved by defining the respective set of binary indicator variables for the presence of each species in seasons t and t + 1, with potentially a large number of parameters associated with the full model (including all interaction terms among species). Regardless of whether it is possible to estimate many of those parameters for a given dataset, interpretation of the effects may be challenging. Hence, it is recommended that practitioners limit the number of interaction terms they include in a model when analyzing data and carefully consider the biological interpretation of the estimates.

| Modeling the detection component
An important consideration for modeling the detection component is that the possible number of categories, or types of detection, will  Table 4 for the two-species case. The number of possible observations can be accounted for by defining the detection component to be both a function of the true (but unknown) presence/ absence of the species (z X i indicator variables) and binary indicator variables based on the observed outcomes of each survey, which will be defined as h X k . Detection probability can therefore be defined using a log-linear modeling framework as: where,

| Example-mesocarnivores in Texas
The motivation for developing this parameterization of the multiseason co-occurrence model was a 7-year camera trap dataset of bobcats (Lynx rufus), ocelot (Leopardus pardalis), and coyote (Canis latrans) collected in South Texas . This dataset is part of a long-term ocelot monitoring study on the East Foundation's El Sauz Ranch in Willacy and Kenedy counties, Texas. Although ocelot share a geographic overlap with bobcats and coyotes from South Texas to Central Mexico (Hody & Kays, 2018;Horne et al., 2009;Sánchez-Cordero et al., 2008), interactions among this community are poorly understood in this region. Coyote interactions with bobcats are well studied across their shared geographic range and often do not exhibit spatial or temporal segregation (Lesmeister et al., 2015;Thornton et al., 2004). Studies on bobcat-ocelot interactions have indicated the two species share a dietary overlap (Booth-Binczik et al., 2013), temporally segregate movement rates (Leonard et al., 2020), and may exhibit resource partitioning at the shrub level (Horne et al., 2009). Ocelots and coyote interactions are poorly known, with co-occurrence likely facilitated by high availability of food resources and abundant cover . . Camera stations were spaced 1 km apart, which was based on the mean minimum distance moved for ocelots in the region . At a station, cameras were placed facing each other and offset 1-2 m, with each camera attached to a tree or wooden stake about 30 cm above the ground. Camera stations were maintained all year, and cameras were replaced if they malfunctioned .
A sampling season was defined to be a 20-week period, either 8 May to 23 September (hot season) or 8 November to 24 March (cool season). A survey was defined to be a 4-week period, that is, a species was detected (h X k = 1) if it was photographed at least once at a station during the 4-week period, and undetected (h X k = 0) otherwise. Hence, each season was comprised of 5 surveys. Surveys were defined to be a 4-week period such that detections of bobcats and coyotes within a survey period could be assumed independent (Lombardi et al., 2020, i.e., aggregating the camera data at a temporal scale of 4 weeks effectively removes the effect of any short-term behavioral interactions between species).
The log-linear parameterization discussed above provides a great deal of flexibility for examining the patterns and dynamics of cooccurrence between multiple species, especially given the ability to incorporate spatial and temporal covariates. However, given the number of camera stations deployed (i.e., 28 surveyed units), only relatively simple models are fit to the data here to illustrate some key concepts. Lombardi et al. (2020) conduct a fuller analysis of the dataset examining the effect of covariates.
Due to the lack of hunting pressure (for coyotes and bobcats) in the area, we expected a natural dynamic between the three species and defined overarching hypotheses: (a) probability of ocelot and bobcat occurrence and detection will be negatively influenced by the presence/detection of coyotes, (b) ocelot and bobcat will exhibit positive co-occurrence values, and (c) the presence of species will be influenced by the presence of another the previous season. Five models were fit to

TA B L E 4 Possible observations admitting imperfect detection.
Lowercase characters for the true state or survey observation (Obs) indicate the absence or nondetection of that species, respectively, while uppercase characters indicate the presence or detection of that species. z X i is the binary indicator variable for the presence or absence of species X and h X k is the binary indicator variable for the detection or nondetection of species X in a survey  (Table 5). While model parameters could be season-specific, they have been assumed to be season invari-

ant. Additional information about the exact parameterization is supplied
in the Supplemental Material. The same detection component was assumed for all models, where a separate detection probability was estimated for each species, which was assumed to be independent of both the presence and detection of other species. Model 1 assumes species occur near camera trap stations independently of each other, and the probability of occurrence is the same each season and independent of the species being present near a station in the previous season. Model 3 also assumes species occur independently of each other, although the probability of occurrence after season 1 depends on the presence of the species in the previous season. This is equivalent to modeling the occurrence of each species as independent single-species multiseason models (MacKenzie et al., 2003), where changes in occurrence are assumed to be a first-order Markov process.
The species co-occurrence models were fit using maximum likelihood techniques (e.g., MacKenzie et al., 2004MacKenzie et al., , 2009MacKenzie et al., , 2018Richmond et al., 2010;Waddle et al., 2010) using custom-written R code, although Bayesian methods could also be used (e.g., Fidino et al., 2019;Rota et al., 2016). Models were compared on the basis of Akaike's information criterion (AIC). Table 6 presents a summary of the five models fit to the mesocar- (standard error in parentheses). For each of the three species, the probability of occurrence in the current season is estimated to be higher if they were present in the previous season, particularly for ocelots, although the effect is small for bobcats (Table 7;

| D ISCUSS I ON
The log-linear parameterization outlined here for the multiseason, multispecies co-occurrence model is not unique, and other parameterizations are possible (e.g., Fidino et al., 2019;MacKenzie et al., 2006MacKenzie et al., , 2018. The log-linear parameterization provides the ability to directly estimate, and interpret, how the presence of species is affected by the presence of other species in either the current, or the previous, season. With this structure, the presence of each species is essentially being used as a predictor variable for the presence of other species, although the general framework that accounts for imperfect detection allows for the fact that the presence of any species may not be known with certainty. Furthermore, the parameterization can also be applied to the detection process, to allow for nonindependent detections of each species. Complexity breeds complexity. As practitioners attempt to address more complex questions of ecological data, more complex methods of analysis are generally required to provide quantitative inspections of those data. Such is the case with multiseason, multispecies co-occurrence models. Irrespective of the preferred parameterization to be used, proper analysis should involve careful consideration of hypotheses of interest, which species interactions should be included and whether such interactions change over time, effect of potential covariates for co-occurrence-and detection-related parameters. Proper analysis will require time, and some degree of skill in fitting and interpreting model results. While tools can be developed to simplify certain aspects of the process, TA B L E 5 Summary of effects included in each model fit to the Texas camera-trapping data. "2-way interaction" is interaction effects between pairs of species; "Depends on z X i " and "Depends on z Y i ε indicate whether occurrence in the current season depends on the presence of the focal (X), or other (Y) species in the previous season practitioners should have a realistic expectation that such analyses require a substantial investment of time and effort.
Practitioners are strongly encouraged to gain a realistic expectation of the type, and quantity, of data required to achieve their objectives, before embarking on any data collection. Complex models, with a large number of biologically relevant parameters to estimate, will require relatively large datasets to produce accurate estimates with suitable levels of precision. Simulation studies are an incredibly useful approach to evaluating the expected quality of the results from a proposed study design. The outcome will often be enlightening, and sometimes, sobering. While the exact outcome will depend on the specifics of the situation, in general we suggest that typically the number of sampling units required to be surveyed will be in the 100's rather than the 10's of units. This is based on our experience with similar models, and on the simple premise that there is not a lot of information in binary observations, and therefore, a large number of them tend to be required to obtain adequate precision of parameter estimates.
Log-linear modeling can be used in situations where a factor of interest has m levels (with m ≥ 2), by defining m − 1 indicator variables. In this paper, we have focused on situations where m = 2 (i.e., species presence or absence), although as alluded to above, this parameterization extends naturally to situations where the occurrence of species may be defined using a greater number of categories (e.g., absent, present without breeding, present with breeding). The loglinear modeling parameterization therefore provides a framework for assessing relevant questions about co-occurrence patterns and dynamics for these more complex situations, in combination with multistate occupancy models (e.g., MacKenzie et al., 2009;Nichols et al., 2007;Royle & Link, 2005).
This parameterization of a many-species co-occurrence model is currently being incorporated into Program PRESENCE and the RPresence R package. The data and R code used for the mesocarnivore example are available from the Dryad repository https://doi. org/10.5061/dryad.59zw3 r26t.

ACK N OWLED G M ENTS
We thank the East Foundation and the Tim and Karen Hixon Foundation for financial support for this research. We thank the Caesar

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data and associated R code used for the example in this manuscript are accessible in the repository Dryad. Please see https://doi. org/10.5061/dryad.59zw3 r26t.