Bias‐reduced estimators of conditional odds ratios in matched case‐control studies with unmatched confounding

We study bias‐reduced estimators of exponentially transformed parameters in general linear models (GLMs) and show how they can be used to obtain bias‐reduced conditional (or unconditional) odds ratios in matched case‐control studies. Two options are considered and compared: the explicit approach and the implicit approach. The implicit approach is based on the modified score function where bias‐reduced estimates are obtained by using iterative procedures to solve the modified score equations. The explicit approach is shown to be a one‐step approximation of this iterative procedure. To apply these approaches for the conditional analysis of matched case‐control studies, with potentially unmatched confounding and with several exposures, we utilize the relation between the conditional likelihood and the likelihood of the unconditional logit binomial GLM for matched pairs and Cox partial likelihood for matched sets with appropriately setup data. The properties of the estimators are evaluated by using a large Monte Carlo simulation study and an illustration of a real dataset is shown. Researchers reporting the results on the exponentiated scale should use bias‐reduced estimators since otherwise the effects can be under or overestimated, where the magnitude of the bias is especially large in studies with smaller sample sizes.

In GLMs, it is often the case that the interest is not in itself but in some, usually nonlinear, transformation of . A most notable example of this is the logit link binomial GLM, whereˆ= (̂1, … ,̂) = exp(ˆ) = (exp(̂1), … , exp(̂)) is an estimate of the odds ratio (OR), . For example, when estimating the effect of an exposure on the outcome, OR with corresponding interval estimates are reported regularly in epidemiologic studies (Bishop et al., 2007;Kleinbaum et al., 1982;Schlesselman, 1982). Therefore, it seems natural to investigate the properties of the estimator̂. What are the bias, variance, and mean squared error (MSE) of this estimator? How does removing the first-order bias of̂, for example, when using FPE, affect the properties of the transformed estimator,̂? While there is a common presumption that the focus should be on the original scale, that is, scale, there are examples, where the focus should indeed be on the transformed scale, that is, scale (Greenland, 2000). For example, in a well-designed study, the ORs, and not their logarithms, are proportional to disease rates (Rothman & Greenland, 1998) which in turn are proportional to the overall costs of disease (Morgenstern & Greenland, 1990). Since a primary target of interest for public health and policy debates is the magnitude of these costs, the relevant estimation errors are proportional to arithmetic (exponentially transformed) and not logarithmic (the original scale) errors in relative-risk estimates (Greenland, 2000). Furthermore, when the outcome of interest is binary, it is becoming common to estimate the relative risk directly (instead of estimating the OR), by using the log Poisson GLM (Zou, 2004). Here, the quantity of interest, that is, the estimate of the relative risk, is and its bias is regularly evaluated (Yelland et al., 2011;Zou, 2004).
The bias of the MLE persists when considering the exponentially transformed estimator and using the bias-reduced estimators of does not alleviate the problem (Lyles et al., 2012). We provide a rather simple explanation of this phenomenon based on the Taylor series expansion of . Several approximate bias corrections for the exponentially transformed parameter have been proposed for the logit binomial GLM; however, their application is often limited to a setting with a single dichotomous exposure (Bishop et al., 1975;Ejigou, 1990;Good, 1980;Greenland, 2000;Jewell, 1984). An important exception is the estimator proposed by Lyles et al. (2012). Lyles et al. derived, using the (approximate) lognormality of̂, a bias-reduced estimator which can be applied to any regression model requiring only the existence of̂or its bias-reduced alternative and their respective variances. We show that other modifications in the spirit of Lyles et al., motivated by the Taylor series expansion of , are possible. We then explain that these explicit estimators are one-step approximations of the more general implicit iterative approach, wherein the spirit of Firth (Firth, 1993;Kosmidis & Firth, 2009) bias-reduced estimator of is obtained by introducing bias into the score function. The proposed approach could be applied to any GLM; however, we will focus in more detail only on the logit binomial GLM and (stratified) Cox proportional hazards model (Cox, 1972) since these two models can be used to obtain conditional OR estimates in the matched case-control studies.
The paper unfolds as follows. In Sections 2, 3, and 4, we introduce the notation, explain the bias of the estimators of , and describe the bias-reduced estimators, respectively. We then describe how these estimators can be used to obtain bias-reduced conditional OR estimators in matched case-control studies in Section 5. The presentation of the results of our Monte Carlo simulation study for matched sets with unmatched confounding and application to a real dataset follows in Sections 6 and 7, respectively. The paper concludes with a summary of the most significant findings and key conclusions in Section 8.

MODEL AND NOTATION
Assume observations on independent random variables 1 , … , each with density/mass function of the form where (⋅), (⋅), (⋅), and (⋅) are some specific functions, is the parameter of interest, and is a dispersion parameter, which is assumed to be known. Thence, where = 2 ∕ ( ) and = d ∕d and where is the ( , )th component of a model matrix , (⋅) is a known link function and = ( 1 , … , ) is a -vector of unknown regression parameters.
For some matrix , we use tr( ) to denote the trace of the matrix; , th element of is denoted by [ ] .

BIAS OF THE EXPONENTIALLY TRANSFORMED PARAMETER ESTIMATES
The interest here is about estimating = ( 1 , … , ) = (exp( 1 ), … , exp( )) = exp( ), where exp(⋅) when applied to a vector (or a matrix) should be understood elementwise. It is well known that the MLE of is given bŷ= exp(̂), = 1, … , ; that is, if̂has been computed, we do not need to maximize the log-likelihood in terms of to get the MLE of . However, if̃is a bias-reduced estimator, then (exp(̃1), … , exp(̃)) is not a bias-reduced estimator; in fact, it may even have worse bias properties than the MLE (see, e.g., , Section 4 for discussion on reparameterizations of bias-reduced estimators). For reduction of bias with methods like that of Firth (1993), we need to derive the expression for the first-order bias of̂. It turns out that the first-order bias of̂(see Kosmidis & Firth, 2010, Section 4.3., Remark 3) can be written in terms of the first-order bias of̂, the first two derivatives of the function exp and the inverse of the expected information matrix. The first-order bias of general transformations of MLE is given in Di Caterina and Kosmidis (2019) and is used to derive a simple location adjustment for common Wald statistics that considerably improves the performance of Wald-type inference. In particular when using the function exp, by Di Caterina and Kosmidis (2019, Expression 6) or Kosmidis and Firth (2010, Section 4.3., Remark 3), The same expression is obtained by the Taylor series expansion of the function exp around , which after taking the expectations on both sides yields where we have used that {̂( )} 2 is ( −2 ) and that var(̂) is equal to the th diagonal element of the inverse of ( ).
The same argument via the Taylor series expansion can then be used to derive the expression for E(̃) based on the FPE, which yields where we have used that̃− is ( −1∕2 ) (see Firth, 1993), (̃− ) = ( −2 ) and E(̃− ) 3 = ( −2 ), which can be established by using the results presented in Pace and Salvan (1997, Section 9.2). Thence,̃obtained by using the bias-reduced estimator of will on average be greater than , since 1 + 1 2 var(̃) > 1: for ≥ 0 will be overestimated and for < 0 will be underestimated. Note that when talking about over (under-) estimation of we make a distinction between ≥ 1 and < 1. For example, if the true value is equal to 2 then we would say that the true value is overestimated for any value larger than 2 and underestimated for any value smaller than 2, while when the true value is equal to 1∕2 then we would say that the true value is overestimated for values smaller than 1∕2 and underestimated otherwise.

THE PROPOSED BIAS-REDUCED ESTIMATORS
Two approaches for obtaining bias-reduced estimators of are presented: the explicit (direct) approach and the implicit approach. We start with the explicit approach which is then shown to be a special case of the implicit approach. Lyles et al. (2012) derived, using the (approximate) lognormality of̂, the following explicit bias-reduced estimator

Explicit estimators
Lyles et al. proposed to usẽ=̂−̂( ), wherê( ), evaluated at̂, is given by Equation (1). When̂is undefined, FPE (i.e., a solution to Equation 2) and its respective variance can be used instead. We refer to this estimator as explicit Lyles-type bias-reduced estimator (EL). Equation (4), however, suggests the following multiplicative bias-reducing adjustment We refer to this estimator as explicit multiplicative bias-reduced estimator (EM). Observe the inequalitỹ<̃<̃, which implies that the two bias-reduced estimators are shrunken towards zero on the scale and that the shrinkage is larger for EL than for EM. For ≥ 1 and < 1, this shrinkage should reduce the over-and underestimation of when using̃, respectively.
The Wald-type (1 − )100% confidence interval (CI) for based oñE L and̃E M can be obtained from where we have suppressed the and notation. Note the use of the first-order approximation in (6) by using the variance of̃instead of the variance of the explicit bias-reduced estimators. Also, the bias correction term requires the existence of the expression for var(̃), = 1, … , , where its respective estimate is then used in the actual implementation. When this is unavailable, the bootstrap can be applied. In situations where the first-order approximation is expected to perform poorly, bootstrap could be used to obtain var(log(̃)). Alternatively, the CI could be constructed directly using

± ∕2
√ var(̃), using bootstrap to obtain var(̃). However, since the convergence towards normality is faster at the than the parameterization, the first option is more appealing for the sample sizes usually encountered in practice; hence, this second option will not be considered herein.

Implicit estimators
The performance of the explicit bias-reduced estimators relies heavily on the performance of̃, = 1, … , . To avoid this, the following approach can be used to obtain implicit bias-reduced estimators. It will then be shown that the explicit bias-reduced estimators presented in the previous section are one-step approximations of these implicit bias-reduced estimators. Firth (1993) showed that an estimator of some parameter with ( −2 ) bias may be obtained through the solution of an adjusted score equation in the general form * ( ) = ( ) + ( ) = 0, where ( ), suitably chosen, is (1) as → ∞ and where for a vector parameter the equation should be read in the matrix notation (see Firth, 1993, or Kosmidis & Firth, 2010, for more details). Kosmidis and Firth (2009) gave a general family of bias-reducing adjustments to the score vector of the form where ( ) is either the observed or the expected information, ( ) is some matrix with expectation of order ( 1∕2 ), and the vector ( ) is the ( −1 ) asymptotic bias. Note that ( ) can be the zero matrix in which case the bias-reducing adjustment simplifies to ( ) = − ( ) ( ), which is the bias-reducing adjustment used in Firth (1993). In order to use Equation (8), with ( ) = 0, we can use the results presented in Section 3 (see Equation 3) which suggest that the first-order bias vector of̂iŝ( is defined elementwise and diag is a function ℝ × → ℝ that transforms the main diagonal of a given matrix to a (column-)vector. Then, using (8) based on the expected information in the parameterization, say ( ), the adjusted score equation in the parameterization, say * ( ), is given by where we used that the expected information for the parameterization, ( ), and score for the parameterization, ( ), are by the chain rule ( ) = ( ) ( ( )) ( ) and ( ) = ( ) ( ( )), respectively. The estimates could be obtained by using the quasi-Newton-Raphson iteration by Kosmidis and Firth (2010) or modified Fisher scoring iteration by Kosmidis and Firth (2009); however, the iteration may become unstable because the components of are necessarily nonnegative. To avoid this issue, we will proceed on the scale where the equation to be solved is where th diagonal element of (exp( )) is now exp(− ). Observe that the penalty function, ( ) + * * ( ), is a sum of the penalty function used to obtain FPE, ( ), and additional term, * * ( ). For the exponential family model with canonical parameter th element of ( ), saj is equal to where | ( )| denotes the determinant of ( ) and the term in brackets, {⋅}, is the derivative of the information matrix with respect to parameter (see Firth, 1993, for more details). The solution of (9) can be obtained by using the quasi-Newton-Raphson iteration by Kosmidis and Firth (2010), via ( +1) = ( ) + ( ( ) ) −1 * * ( ( ) ) = ( ) + ( ( ) ) −1 ( ( ) ) −̂( ( ) ) − 1 2 diag( ( ( ) ) −1 ).
, which after exponentiating is the EL estimator. We will denote the fully iterated solution, after exponentiating, as̃= (̃1 , … ,̃) , for implicit Lyles-type estimator (IL). What about the EM estimator, does it have a similar implicit counterpart as EL? Consider the following modified score equation: The iteration then has the form where ( ) is the candidate value of the bias-reduced estimate of at the th iteration. Starting from (0) =̃, the first step yields̃− log( + 1 2 diag( (̃) −1 ), which after exponentiating is the EM estimator. We denote the solution to * ( ) = , after exponentiating, as̃= (̃1 , … ,̃) , for implicit multiplicative bias-reduced estimator (IM). For both implicit estimators, the estimates could be, for some GLMs, obtained also through pseudo-responses as described in Firth (1992aFirth ( , 1992b and Kosmidis and Firth (2009) or by data augmentation; we show how this can be done in the case of the logit link binomial GLM in Section 5.1. The interval estimates can be obtained either based on the Waldtype or score-type tests (Heinze & Schemper, 2002;Kosmidis & Firth, 2009) first obtaining the CI at the level of and then exponentiating it to the desired scale. Jewell (1984) studied the point estimates of the conditional OR in a matched pairs case-control study with a single exposure variable, . For matched pairs, let denote the number of pairs where the cases were exposed to treatment and the controls were not, and let denote the number of pairs where the controls were exposed to treatment and the cases were not. The unconditional OR estimator is given bŷ= ( ∕ ) 2 , which is known to be biased (see Breslow & Day, 1980). The Kraus' estimate of the OR from matched pairs study (Kraus, 1960) (equivalent to maximizing the conditional likelihood or using the Mantel-Haenszel estimator (MH) (Mantel & Haenszel, 1959), treating each pair as a substratum (Jewell, 1984)) is given bŷ= ∕ . FPE is in this example equal tô= {( + 1∕2)∕( + 1∕2)} (observe that this estimator always exists and that the natural logarithm of his estimator is the Haldane estimator). Jewell (1984), showed that the Kraus' estimator is biased and proposed the following bias-reduced estimators:

BIAS-REDUCED ESTIMATORS OF THE CONDITIONAL ODDS RATIOS IN MATCHED CASE CONTROL STUDIES
for the approach based on jackknife, bias adjustment, and bias correction, respectively (see Jewell, 1984, for more details). Observe that̂, is not defined when = 0 or = 1, whilẽ, is not defined when = 0 and for = 1 it is equal to zero irrespective of . These estimators can, however, only be used when there is a single exposure variable and no unaccounted confounding. In contrast, the estimators presented in the previous sections can easily be used to obtain bias-reduced estimators of the conditional ORs in matched case-control studies where it is straightforward to include several exposure variables and to account for the unmatched confounding. Let be the number of controls in the matched set , = 1, … , . Let 0 = ( 01 , … , 0 ) be the -vector of independent variables for cases, and let = ( 1 , … , ) be the -vector of independent variables for the th control, = 1, … , . The conditional likelihood is then given by (see Breslow and Day, 1980, and the references therein) Observe that (11) is identical to the partial likelihood of the stratified Cox proportional hazards model, where the time variable, , is set to 1, the event indicator is set to one for cases and zero for controls and the variable used for stratification is set to the indicator variable uniquely defining each matched set, for example, = (1, … , 1, 2, … , 2, … , , … , ) . In our context where separate estimates of the baseline hazard are not required, the parameter estimates of the stratified Cox proportional hazards model can be obtained from the (unstratified) Cox proportional hazards model by appropriately specifying the start and end time variables, 0 and 1 , respectively, by defining 0 = − 1 and 1 = 0 + 1. When = 1, = 1, … , , (11) simplifies to where we recognize the likelihood of the zero intercept logit binomial GLM with the response variable set to one for all pairs and the data vector * = 0 − 1 , = 1, … , . Matching in the design to control for the matching factors can, however, introduce confounding by the matching factors even when it does not exist in the source population (Rothman et al., 2008). Thus, a matched design will require controlling for the matching factors in the analysis (Pearce, 2016). When using conditional methods, this means that adjusting for the matching factors could still be necessary (see Mansournia et al., 2018, for more details). With some care in modeling (see, e.g., Mansournia et al., 2018), this can also be done by using the unconditional methods (Breslow & Day, 1980;Kleinbaum et al., 1982;Mansournia et al., 2018;see Pearce, 2016, for a discussion about using unconditional methods for analyzing matched case-control studies with their advantages and (potential) disadvantages over the conditional methods). The estimators presented in the previous section can easily be used also to conduct the appropriate conditional and unconditional analysis adjusting, if necessary, for the matching factors or some other potential confounders which were not considered in the study design.
In the following two sections, we explain in more detail how our implicit bias-reduced estimators can be implemented for the binary logit link binomial GLM and the Cox proportional hazards model. (1 − − ) , and the score for the data with, if ≥ 0, the outcome equal to = 0 weighted by = (1 − ) or, if < 0, the outcome equal to = 1 weighted by = − , ∑ =1 (0 − )(1 − ) . Note that the second and the third terms in the above expression correspond to the penalty ( ), while the last term corresponds to the additional penalty term, * ( ) and * * ( ) for IM and IL estimators, respectively.

Implementation for the binary logit link binomial GLM
Hence, the proposed implicit estimators are formally equivalent to ML on the augmented data using appropriate weights for the original and pseudo-data. In practice this entails creating an augmented model matrix, is a column vector of ones and ⊗ denotes the Kronecker product, the augmented outcome vector, = ( 1 , … , , 1 , … , , 1 − 1 , … , 1 − , * , … , * ) , where * = 0 if ≥ 0 and * = 1 otherwise, and weights vector = (1, … , 1,  … , 1, 1 , … , ) . This is the approach that we will adopt in our simulation study for matched sets with unmatched confounding. Note that the weights depend on ; hence, an iterative procedure updating the weights at each iteration needs to be implemented.

Implementation for the Cox proportional hazards model
Extending the proposed approach to the Cox proportional hazards model is straightforward since the model can be interpreted as an exponential family with canonical parameter (McCullagh & Nelder, 1989). Using the results presented in Section 4.2, the Newton-Raphson iteration for the implicit multiplicative estimator is then where ( ) is given in Equation (10); see Heinze and Schemper (2001) for the expressions of the derivatives of the information matrix with respect to parameter required to calculate ( ). Implicit Lyles-type estimator would be obtained by replacing the final term in Equation (12), log( + 1 2 diag( ( ( ) ) −1 )), with 1 2 diag( ( ( ) ) −1 ). The estimators do not have closed-form expressions, not even for the matched sets with a single exposure, and no unmatched confounding.

Setup
We describe the setup of our simulation study using the framework by Morris et al. (2019). Aims: We investigated the performance with respect to effect and CI estimation. We compare the implicit and explicit bias-reduced estimators for the analysis of matched case-control studies with unmatched confounding.
Methods: We use the approach presented in Section 5 to utilize standard regression techniques for estimating the parameters of the conditional logistic regression. The models included the exposure variable ( ) and the unmatched confounder ( 2 ). The unconditional analysis was also performed for matched pairs using unconditional logistic regression where the models included , 1 , and 2 . The explicit estimators were based on FPE, obtained using the R package brglm2 for = 1 and coxphf R package for > 1. The median unbiased estimator (see , for more details) for = 1 was obtained using the R package brglm2 (for > 1 the median unbiased estimator was not considered). For = 1, the implicit estimators were implemented via the data augmentation approach presented in Section 5.1, using FPE in each iteration. For > 1, the implicit estimators were obtained by modifying the coxphf R package FORTRAN code. All CIs are Wald type. The analysis was performed on a cluster of CentOS-based containers.
Estimands: The estimands in this study were the ORs with a special focus on the OR corresponding to the exposure variable with a true OR of 1∕4, 1∕2, 1, 2, 4. The properties on the original scale are also investigated.
Performance measures: For the point estimate of 3 and exp ( 3 ), we evaluated bias and root mean squared error (RMSE). For CIs, we evaluated the coverage rates at the 3 scale (nominal level 0.95) and power (probability to exclude 0 on the 3 scale) as well as left-tailed and right-tailed coverage of one-sided 97.5% CIs. For some estimator̂of , bias is defined . Relative bias (rBias) is defined as b( )∕ × 100 and relative RMSE (rRMSE) as √ m( )∕ .

Results for matched pairs
Results for different values of 1 were similar (data not shown); hence, we only show the results for 1 = log(2). The performance of ML, median unbiased estimator (MD), and FPE for exp( 3 ) was not good with large MSE and bias, overestimating the true effect for exp( 3 ) > 1 and underestimating it for exp( 3 ) < 1 (Figure 1). The proposed bias-reduced estimators performed much better, attaining a much smaller MSE and having in general very small bias. The exceptions where the bias was slightly larger were limited to the two smallest considered sample sizes ( = 25 and 50) and only some estimators: the two implicit estimators, IL and IM with exp( 3 ) < 1 and EL with exp( 3 ) ≥ 1 where the effect size was underestimated. The MSE for different estimators for exp( 3 ) ≥ 1 was very similar although there were some settings with = 25 where EM had a slightly smaller MSE than the other estimators. MSE of the two implicit estimators with exp( 3 ) < 1 and smaller sample sizes was slightly larger than the MSE of the two explicit estimators. The two explicit esti- mators had similar MSE for exp( 3 ) = 0.5, but for exp( 3 ) = 0.25 and with a smaller sample size EL had slightly smaller MSE. Results on the original scale showed that FPE was unbiased in all settings, while the other estimators were not (Supporting Information). MD and especially ML overestimated 3 with bias decreasing quickly as the sample size increased (observe the symmetry for positive and negative values of 3 ). Our bias-reduced estimators overestimated 3 for 3 < 0 and underestimated it otherwise, but as expected this bias decreased with increasing sample size. For 3 ≤ 0, FPE had the smallest MSE; however for 3 > 0 our proposed bias-reduced estimators had smaller MSE (more so for larger effects). The coverage was in general larger than the nominal level for all estimators, but it approached the nominal level with increasing sample size (Supporting Information). Coverage of the left-tailed CIs for the bias-reduced estimators was larger than the nominal level; however, with exp( 3 ) > 1, the coverage of the right-tailed CIs was slightly too small. This was a consequence of shrinking the effect size towards 1 (on the OR scale). This was also the reason for a slightly smaller power observed for the bias-reduced estimators with exp( 3 ) > 1 and a smaller sample size (the power for the bias-reduced estimators with exp( 3 ) < 1 was, however, larger than observed with the other estimators). Similar trends were observed for the unconditional analysis, with all the methods attaining slightly smaller bias and MSE and slightly larger power than in the conditional analysis, but the differences between the conditional and unconditional analysis were not substantial (Supporting Information).

Results for 1-R matched sets
For brevity of the presentation, we only show the results for = 2; the results when increasing were as expected with smaller bias and RMSE (data not shown). The results for different values of 1 were similar (data not shown); hence, we only show the results for 1 = log(2). The results in terms of estimating exp( 3 ) were very similar as observed for the matched pairs, with as expected smaller bias and MSE obtained with all the estimators when compared with the simulation study with matched pairs (Figure 2). FPE and ML did not perform well having large bias and MSE, while our proposed bias-reduced estimators performed much better. Different bias-reduced estimators performed very similarly with ≥ 50. With = 25 and exp( 3 ) ≥ 1, EM and IL had the smallest bias with EL and IM under-and overestimating the effect size, respectively, but the bias was not substantial. Our proposed bias-reduced estimators had smaller MSE than ML and TA B L E 1 Lung cancer study: odds ratio estimates and Wald-type 95% two-sided confidence intervals for the effects of radiotherapy, smoking, and their interaction. FPE; the differences between the bias-reduced estimators were not substantial. The results for estimating 3 were also very similar as observed for the matched pairs: FPE was unbiased, while the bias-reduced estimators, EM, EL, IM, and IL, overand underestimated 3 for 3 < 0 and 3 > 0, respectively (Supporting Information). With 3 ≤ 0, FPE had the smallest MSE; however, with 3 > 0 the MSE obtained with the bias-reduced estimators was either comparable (exp( 3 ) = 2) or smaller (exp( 3 ) = 4) than obtained with FPE. The coverage of the CIs when using the proposed bias-reduced estimators was satisfactory, with our bias-reduced estimators either attaining the nominal level, or slightly exceeding it, but less so than when using ML or FPE. The results for the power left-and right-tailed coverage were also very similar as observed for the matched pairs (Supporting Information).

APPLICATION
Here we apply the proposed bias-reduced estimators to analyze a matched case-control study assessing the risk of developing lung cancer following radiation treatment for carcinoma of the breast (Hirji et al., 1988) which was analyzed also by Heinze and Puhr (2010). Fifty-two female controls were matched to 18 female cases by age and date of diagnosis of breast cancer. Risk factors were smoking and radiation therapy, with a particular interest in their two-way interaction.
The results are reported in Table 1. The interaction OR in Table 1 represents the ratio of two ORs related to smoking, that is, for those patients that were exposed to radiotherapy and those who were not. All CIs for the interaction OR include 1, which is an indicator of the small power of the study since there are only 18 matched sets. Since there are no matched sets where the control was exposed to both radiotherapy and smoking while the case was not, the OR estimate obtained by conditional ML is infinite and the CI is uninformative (we report in Table 1 the estimates obtained at the final iteration although the iterative procedure did not converge). Using FPE, we obtain a finite, albeit still very large point estimate with a very wide CI. The estimates obtained with the other estimators are much smaller, although they still imply the harmful effect of smoking is amplified in the presence of radiotherapy. Observe that the effect obtained by using IM is very similar to the one estimated by Heinze and Puhr (2010) by using LogXact (median unbiased) estimator (Hirji et al., 1988), but with a much narrower CI than obtained by LogXact (OR=2.5 [95% CI: 0.6-∞]). In the final column in Table 1, we report the OR for the comparison of patients exposed to both radiotherapy and smoking. This OR was obtained by reparameterization into a single-factor model with four levels. Observe that 0.8 ⋅ 3.25 ⋅ 10.51 = 27.37, which holds since penalization by the determinant of the information matrix is not influenced by this reparameterization. This equality, however, does not hold for the other estimators. Also in this parameterization, the point estimate obtained by using IM was very similar to the one obtained by Heinze and Puhr (2010) by using LogXact with the other estimates being much smaller.

DISCUSSION AND CONCLUSIONS
We showed that when exponentiating the bias-reduced estimator of , for example, FPE, this leads to overestimated effects for ≥ 1 and underestimated effects for < 1. Lyles et al. (2012) proposed to solve this issue by multiplying the estimate of by a factor proportional to the variance of the estimator. They proposed to use either MLE of and its variance (option 1) or the bias-reduced estimator of , obtained as the MLE minus the estimate of its bias as given in Equation (1) in Section 2 (option 2), or when this estimator does not exist, for example, when data are separated (Albert & Anderson, 1984;Bryson & Johnson, 1981;Jacobsen, 1989;Kolassa, 1997;Lesaffre & Albert, 1989), to use FPE, which for canonical and some noncanonical link functions (Kosmidis & Firth, 2009) is well defined also when there is separation (Heinze & Schemper, 2002;Kosmidis & Firth, 2009) (option 3). We showed in our exact numerical evaluation for the matched pairs with a single exposure and no unmatched confounding that using option 3 yields a bias-reduced estimator with better properties than when using the other two options, even if we define sensible estimates for the situations where they would otherwise not be defined (Supporting Information).
We showed that other explicit bias-reduced estimators, motivated by a different argument than the one used by Lyles et al., are possible. We then showed that all these explicit estimators are only one-step approximations of a more general implicit iterative procedure motivated by Firth's idea of introducing bias into the score equations. We showed how these implicit estimators can be applied to any GLM, and we gave more details for the logit binomial GLM and the Cox proportional hazards model. In particular, we showed that the implicit estimators for the logit binomial GLM can be obtained by MLE or FPE using the augmented dataset by stacking original and pseudo observations with appropriate weights for the original and pseudo observations, similarly as it can be done to obtain FPE by using the MLE (Puhr et al., 2017). Therefore, no special software is in this case needed to obtain implicit bias-reduced estimators since all that is required is a program for MLE or FPE that allows for weighted observations. Some care must be taken when calculating the variance of this estimator since it is not equal to the variance given at the final iteration, but it must be calculated using the final estimates and considering only the original observations. We then showed that these bias-reduced estimators can be applied directly to estimate the conditional ORs for matched sets with several exposures and unmatched confounders by using well-known relations between the conditional likelihood for matched sets and unconditional likelihood. In our simulation study, we showed that all proposed bias-reduced estimators have smaller bias and MSE than observed by exponentiating FPE, with only small differences between different bias-reduced estimators. The coverage of Wald-type two-sided CIs, where we used the variance of FPE instead of the variance of the bias-reduced estimator, was never smaller than the nominal level, and the power was comparable to the power obtained by using FPE. However, the right-tailed coverage was, for ≥ 1, in general slightly smaller than the nominal level (obviously, the left-tailed coverage was larger than the nominal level); for < 1 the coverage of left-and right-tailed coverage was satisfactory. We do not think that this is particularly problematic in applied research since one-sided CIs are rarely reported. If this would be of concern to someone then the coverage could be improved by using bootstrap to obtain the variance of the bias-reduced estimator. Alternatively, the CIs could be obtained by profiling the penalized likelihood (for the models where it exists) as in Heinze and Puhr (2010), but we leave this for potential future research.
Matched case-control studies can also be analyzed using the unconditional methods, appropriately adjusting for the matching factors in the analysis (Breslow & Day, 1980;Kleinbaum et al., 1982;Mansournia et al., 2018). The proposed bias-reduced estimators can also be used in such cases. In our simulation study with matched pairs, the results in terms of the comparison of different (unconditional) estimators were similar to those observed for the conditional analysis: the proposed (unconditional) bias-reduced estimators attained much smaller bias and MSE than (unconditional) MLE and FPE. The unconditional analysis yielded slightly better statistical precision than the conditional analysis for all estimators, which was expected given the simulation design, the differences were, however, not substantial. A thorough comparison of conditional and unconditional methods to analyze matched case-control study (or matched vs. unmatched study design) was, however, not the scope of the paper (see, e.g., Pearce, 2016, for a short introduction about the advantages of choosing the unconditional over the conditional analysis and matched vs. unmatched design). Adjusting for the matching factors might be required also when performing the conditional analysis (see Mansournia et al., 2018), which can easily be done with the proposed methods. When performing conditional analysis, the precision of the estimators can be improved by combining matched sets with identical matching values into a single matched set (Pearce, 2016;Mansournia et al., 2018). If this is not done, as in our simulation study and real data example, this will not affect the bias of the estimates, but it will make the estimates less precise since combining matched sets avoids conditioning on irrelevant distinctions. Assessing the potential gains in precision when combining matched sets exceeds the scope of the paper, but it would be a worthwhile exercise for future research.
For all proposed bias-reduced estimators, it is possible to apply the bias correction on the exponentiated scale only for some coefficients and apply the bias correction on the original scale for others. For the explicit approach, this is straightforward to achieve, since separate calculations are performed for each coefficient. For the implicit approach, this requires only a slight modification of the score functions. Our small-scale simulation study for the logit binomial GLM showed that this does not affect the properties of the estimators: for the coefficients where bias correction is applied on the exponentiated scale, similar bias and MSE are observed as in the example where the bias correction is applied to all coefficients, while the results for those coefficients where bias correction is not applied are similar as when using FPE (data not shown).
The methodology presented herein could be extended to any function, ℎ( ), which is at least three times differentiable, but this was not considered here due to practical and notational reasons.
There is a lack of invariance when considering the properties of the estimators on the scale. For example, in the matched pairs with a single exposure and no unmatched confounders, all bias-reduced estimators considered herein (i.e., those proposed by us, Lyles et al., 2012, andJewell, 1984) are not invariant under the change in the definition of exposure to unexposure and vice versa. While this is to be expected probabilistically, it might trouble applied researchers. Thence, the coding of the data is an important issue that needs to be carefully thought of in practical use and in particular set in advance prior to the analysis (as is the case for specifying whether one would like to compute left-tailed or right-tailsvalues when one wants to perform one-sided significance test). Appropriate specification of the reference category should, however, be a matter of content rather than a statistical argument.
To conclude, we propose that the researchers reporting results at the scale, use the bias-reduced estimators since otherwise, the reported estimates are overestimated for ≥ 1 and underestimated otherwise. The estimators proposed herein shrink the estimates of towards zero, thus successfully reducing this bias. Both approaches, the explicit and the implicit approach, were shown to perform similarly in most considered situations, so it is not really important which of the two is used. Regarding the multiplicative approach, motivated by the Taylor series expansion, and the Lyles-type approach, motivated by the lognormality of̂, we would prefer the former since it has a smaller bias and MSE in most considered scenarios. Researchers with some knowledge of computer programming can use the data augmentation approach to obtain the implicit estimators (limited however only to some GLMs, for example, logit binomial GLM and log Poisson GLM); those familiar with R can use the functions which we have used to obtain the results for this paper (available as Supporting Information). Those not familiar with R or computer programming can use the explicit estimators where all that is required are the FPE of and its variance, which are now widely available in statistical software.

A C K N O W L E D G M E N T S
The author thanks the associate editor and a referee for insightful comments that substantially improved the presentation. The author acknowledges the support of the Slovenian Research Agency (Predicting rare events more accurately, N1-0035; Methodology for data analysis in medical sciences, P3-0154).

C O N F L I C T O F I N T E R E S T S TAT E M E N T
The authors declare no conflict of interest.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available.

O P E N R E S E A R C H B A D G E S
This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. The data is available in the Supporting Information section. This article has earned an open data badge "Reproducible Research" for making publicly available the code necessary to reproduce the reported results. The results reported in this article could fully be reproduced.