## 1 Introduction

Case-control designs are often used when investigating the effect of a treatment (exposure) on an outcome, e.g., a disease. The design is an efficient alternative in the event of rare diseases, since then the size of a random sample may need to be very large in order to include sufficiently many cases for the subsequent analysis. In the case-control setting, the controls can be sampled in different ways, e.g., independently sampled or matched on one or several variables believed to be confounders.

In case-control designs, odds ratios are commonly used to measure the effect of a binary treatment, *t*, on a binary outcome, *Y*, since the odds ratio expressed in terms of the outcome conditional on the treatment is equivalent to the odds ratio defined in terms of the treatment conditional on the outcome. A causal odds ratio is defined by the distribution of the potential outcomes under each treatment; see Holland [1] for the definition of a causal effect in the potential outcome framework. The causal effect of the treatment on the disease is identified if all confounding pretreatment variables, referred to as covariates henceforth, are observed [2]. Depending on whether we are conditioning on, or marginalizing over the confounding covariates, an odds ratio is conditional or marginal. A conditional odds ratio can help a clinician decide whether a treatment is beneficial for a patient with particular characteristics, while a marginal odds ratio can be used to assess the effect of the treatment in the population as a whole. A conditional estimate will usually differ from its marginal counterparts due to noncollapsibility of the odds ratio; see Greenland, Robins, and Pearl [3] for a discussion of confounding and collapsibility in causal inference. Even without the presence of confounding, noncollapsibility can occur. For instance, when a covariate is associated with the outcome but unassociated with the treatment and the marginal odds ratio is not equal to 1, the conditional counterpart will be farther from 1 than the former [4]. Statistical development has to a large extent focused on the estimation of conditional odds ratios; see the review by Breslow [5] and the references therein.

Estimators of the marginal causal odds ratio have been investigated. Zhang [6] proposes an estimator based on a logistic regression model for the outcome conditional on the treatment and the covariates. In addition, an estimator stratifying on the propensity score, the probability of treatment given the covariates, has also been introduced [7]. In contrast, the inverse of the propensity score can also be used as a weight in an inverse probability of treatment weighted estimator of the marginal causal odds ratio, described by Robins [8]. In Vansteelandt, Bowden, Babanezhad, and Goetghebeur [9], estimators of both conditional and marginal causal odds ratios are proposed using an instrumental variable, i.e., a variable associated with the exposure but not the response.

In this paper, we study and compare the performance of estimators for the marginal causal odds ratio within matched and unmatched case-control designs when information on the outcome probability (the prevalence), P (*Y* = 1), is known in the population under study. The prevalence can be used to adjust the conditional outcome model in an unmatched case-control design either with intercept adjustment [10, 11] or weighted maximum likelihood (ML) [12]. For a matched design, we implement the theory described in van der Laan [13] and apply intercept adjustment and weighting. Here, knowledge of the prevalence conditional on the matching variables is required. After adjusting the conditional models, we marginalize over a weighted distribution of the cases and controls to obtain estimates of the marginal causal odds ratio. An approximation of the variances of the estimators is derived using the delta method. The estimators described are compared to case-control weighted targeted maximum likelihood estimators (TMLE) [13, 14]. For a TMLE, the conditional model is updated using a fitted model for the propensity score. The finite sample properties of all the estimators are studied in simulations and compared to the unadjusted sample odds ratio and the ML estimator of the conditional causal odds ratio.

An example of a case-control design where the prevalence is known is a study based on individuals from a disease register where all incident cases of the disease are registered and a sample of disease-free controls are collected from the population giving rise to the cases. In this paper, we use data from the Swedish Childhood Diabetes Register (SCDR), a population-based incidence register providing information on the prevalence of type 1 diabetes mellitus (T1DM) in children 0-15 years old. The estimators described above are implemented and we estimate the causal effect of low birth weight on the risk of T1DM. Here, we show that neglecting the confounding or considering the conditional estimate results in a different conclusion than when considering the marginal estimate when adjusting for confounding.

We proceed as follows: In Section 2, the theoretical framework and notation are presented. Section 3 describes estimators of the marginal casual odds ratio in matched and unmatched case-control designs. In a simulation study described in Section 4, we investigate the finite sample properties of the estimators. In Section 5, we use data from the SCDR to estimate the effect of low birth weight on childhood onset insulin-dependent diabetes. Section 6 concludes with a discussion.