Combining Mendelian randomization with the sibling comparison design

Mendelian randomization (MR) is a popular epidemiologic study design that uses genetic variants as instrumental variables (IVs) to estimate causal effects, while accounting for unmeasured confounding. The validity of the MR design hinges on certain IV assumptions, which may sometimes be violated due to dynastic effects, population stratification, or assortative mating. Since these mechanisms act through parental factors it was recently suggested that the bias resulting from violations of the IV assumptions can be reduced by combing the MR design with the sibling comparison design, which implicitly controls for all factors that are constant within families. In this article, we provide a formal discussion of this combined MR‐sibling design. We derive conditions under which the MR‐sibling design is unbiased, and we relate these to the corresponding conditions for the standard MR and sibling comparison designs. We proceed by considering scenarios where all three designs are biased to some extent, and discuss under which conditions the MR‐sibling design can be expected to have less bias than the other two designs. We finally illustrate the theoretical results and conclusions with an application to real data, in a study of low‐density lipoprotein and diastolic blood pressure using data from the Swedish Twin Registry.


INTRODUCTION
Most epidemiological studies aim to estimate the causal effect of a particular exposure on a particular outcome.When the exposure is not randomized there are almost always concerns about the estimated exposure effect being biased due to uncontrolled confounding.Mendelian randomization (MR) is a study design that uses genetic variants, often risk scores consisting of several single nucleotide polymorphisms (SNPs), as instrumental variables to eliminate or reduce confounding bias.The MR design has gained considerable popularity in the last decades and has been used to estimate, for instance, the effect of obesity on cardiovascular outcomes, 1 the effect of shortened telomere length on Alzheimer's disease, 2 and the effect of cannabis use on risk of schizophrenia. 3 Like all instrumental variable designs Mendelian randomization hinges on three conditions, often referred to as the instrumental variable (IV) assumptions: 4 (1) the genetic instrument and the exposure are statistically associated; (2) there is no uncontrolled confounding for the genetic instrument and the outcome; and (3) there are no pleiotropic effects, that is, the genetic instrument has no effect on the outcome, apart from an effect mediated through the exposure.Whereas the first assumption can easily be verified with data, the latter two are to a large extent untestable and must be defended with subject matter knowledge on a case-by-case basis. 5n the context of MR, Brumpton et al 6 discussed three potential threats to the second IV assumption: dynastic effects, population stratification and assortative mating.Dynastic effects means that the parental genetic variants that contribute to the offspring genetic instrument also have an influence on the outcome in the offspring.Population stratification is a similar mechanism, where demographic factors (eg, ancestry or region of birth) are simultaneously associated with the genetic instrument and the outcome in the offspring.Assortative mating means that partners tend to mate on the basis of the exposure or the outcome.Brumpton et al 6 showed that parental mating may act as a collider on a non-causal path between the genetic instrument and the outcome in the offspring, so that the implicit conditioning on parental mating may therefore induce a statistical association between these. 7,8ince these mechanisms act through factors on the parental level, Brumpton et al 6 proposed that the resulting biases can be eliminated or reduced by combining the MR design with the sibling comparison design.By conditioning on the family, the sibling comparison design implicitly controls for all factors that are constant within the family.When there are two siblings per family, this combined MR-sibling design is essentially a standard IV analysis applied to within-family differences in the instrument, exposure and outcome.Although the sibling comparison design has been used extensively in epidemiologic research (see Sjölander et al 9 and the references therein), the suggestion to combine it with the MR design appears novel; apart from the illustration with real data by Brumpton et al 6 we could only find one other study 10 that has used a combined MR-sibling design.
Brumpton et al 6 provided heuristic arguments for the MR-sibling design and showed results from a limited simulation study.In this simulation the standard MR design was biased when the second IV assumption was violated, as expected, whereas the MR-sibling design remained (asymptotically) unbiased.Although these arguments and results indicate that the MR-sibling design may indeed be useful, they also raise several questions and concerns.First, although heuristic arguments are helpful to gain intuition it is important to also have a more formal motivation for the MR-sibling design.Second, it is not obvious how strong general conclusions can be drawn from the simulation by Brumpton et al. 6 In particular, it has previously been shown that the sibling comparison design may amplify bias due to uncontrolled confounding if the exposure correlates strongly within families. 11With this in mind one may wonder whether we should always expect the MR-sibling design to be superior to the standard MR design, or whether there are situations where a standard MR design should be expected to have less bias.Finally, when sibling data are available there is also an option to discard the genetic instrument altogether and instead carry out a standard sibling comparison analysis with observational data.In their illustrations with real data Brumpton et al 6 provided standard sibling comparison estimates but they did not comment on when (if ever) these estimates should be expected to have less bias than those from a standard MR design and an MR-sibling design.
In this article, we address these issues.We use causal diagrams to formalize and contrast the sibling comparison design, the MR design and the MR-sibling design.We derive analytic expressions for the biases that result from these designs under standard parametric models and we use these bias expressions to discuss when and why one should expect each design to be less biased than the other two.Finally, we illustrate the theoretical results and conclusions with an application to real data, in a study of low-density lipoprotein and diastolic blood pressure using data from the Swedish Twin Registry. 12To facilitate analytic derivations we only consider families with two siblings, and like Brumpton et al 6 we only consider linear models.We further restrict attention to asymptotic (large sample) bias and ignore issues of statistical power and efficiency.Throughout, we use the term "bias" as shorthand for "asymptotic bias."

NOTATION, DEFINITIONS, AND ASSUMPTIONS
Let G ij , X ij , and Y ij be the genetic instrument, the exposure and the outcome for sibling j in family i, respectively, with j = 1, 2 and i = 1, … , n.Let U ij be a scalar construct representing the whole set of genetic and environmental factors that may confound X ij and Y ij , and/or G ij and Y ij .The causal diagram 13,14 in Figure 1 shows the assumed relations between these variables, with parameters measuring the strength of these relations as defined below.Directed arrows represent causal effects, whereas double-headed dashed arrows between pairs of variables indicate associations that are not necessarily due to a causal effect of one variable on the other.
In Figure 1, we have for simplicity assumed that G ij has a causal effect on X ij , but we note that the first IV assumption only requires a statistical association between these variables.The double-headed arrow between G i1 and G i2 indicates that these are associated due to common heritage from the parental genotypes.The double-headed arrow between U i1 and U i2 indicates that these may partly consist of factors that are shared (constant) within families, and may also be influenced by such familial factors.The double-headed arrow between G ij and U ij may represent dynastic effects, population stratification or assortative mating, if one views the familial factors that give rise to these mechanisms as a common cause of G ij and U ij .Alternatively, one may consider these familial factors as being absorbed into U ij , in which case U ij could have a causal effect on G ij .The double-headed arrow between G ij and U ij may also represent pleiotropic effects, if G ij has a causal effect on Y ij mediated through some components of U ij ; this would be a violation of the third IV assumption.By having double-headed arrows in the causal diagram, which allow for different causal mechanisms, we are able to derive analytic bias expressions that are valid under a wide range of different violations of the MR assumptions and the sibling-comparison assumptions.We are thus able to reason generally and qualitatively about these biases in a way that would be hard if one considered each of the violations separately.Notably, we allow for a "cross-individual" association between G ij and U ij ′ , where j ≠ j ′ .This association is most likely present in practice, but likely no larger than the "within-individual" association between G ij and U ij .In the special case when U ij only consists of parental genetic factors, U i1 and U i2 are identical so that the cross-individual association between G ij and U ij ′ is identical to the within-individual association between G ij and U ij .
We assume the causal linear models where the error terms ( X i1 ,  X i2 ) for the exposure are independent of the error terms ( Y i1 ,  Y i2 ) for the outcome, and all error terms are independent of , respectively.We emphasize that these correlations are marginal over sib-pairs.Thus, none of these correlations condition on, for instance, parental genetic factors or common familial environment.Since full siblings share on average half of their genome identical by descent in the last generation, we would expect that the correlation  G between G i1 and G i2 is approximately equal to 0.5.This degree of correlation has been validated empirically for polygenic scores in siblings. 15e make several clarifications regarding the models in (1) and (2).First, the models do not include any family-specific intercepts.This is because such intercepts are intended to represent confounders that are shared between siblings, and shared confounders are absorbed by the construct U. If most confounders are perfectly shared, then  U will be close to 1. Second, the models do not make any distributional assumptions, and do not assume a linear relationship between U and G. Thus, all our results and conclusions are valid even if the variables are, for instance, non-normal.Third, the models are symmetric in the sense that the model parameters are assumed to have the same values for both siblings in the pair.This assumption may be violated if, for instance, the siblings have different age or sex, and some of parameters are age and/or sex dependent.The assumption may also be violated if there is "sibling-sibling interaction" (aka cross-over effects), where the exposure and/or the outcome of one sibling may influence the exposure and/or the outcome of the other sibling.Symmetry of siblings is a standard assumption in the literature on sibling comparisons, and we refer to Sjölander et al 9 and the references therein for a thorough discussion on this and other related assumptions in sibling comparison designs.
As we shall see, it will be useful to define the ratio and the within-family difference for any variable V.
The target parameter is the causal effect of X ij on Y ij , denoted with  XY in Figure 1 and model (2).In standard potential outcome notation,  XY is the average 1-unit causal effect of X ij on Y ij , given U ij : In this expression, Y x ij and Y x+1 ij are the potential outcomes when X ij is set to x and x + 1, respectively.A standard sibling comparison analysis of these data ignores the genetic instrument and is expected to provide an unbiased estimate of  XY if all exposure-outcome confounders are perfectly shared within families, that is, if  U = 1. 11In contrast, a standard MR analysis ignores the familial membership of the siblings, and is expected to provide an unbiased estimate of  XY if the three IV assumptions hold.In terms of the parameters defined above, the first IV assumption holds if  2 G ≠ 0 and  GX ≠ 0, and the second and third IV assumptions hold if  UG = 0. We end this section with the technical but important remark that the parameters defined above are not free to vary independently of each other.Since the covariance matrix for (U i1 , U i2 , G i1 , G i2 ) must be positive semidefinite, it can be shown (see Appendix A) that  G ,  U ,  UG , and  * UG are restricted by the relations and In the special case where  U = 1 (ie, under optimal conditions for a sibling comparison design), it follows from (4) that ( UG −  * UG ) 2 ≤ 0. Since the left-hand side of this inequality cannot be negative, it follows that  UG −  * UG = 0, or R UG = 1.Thus, we conclude that  U = 1 implies that R UG = 1.

ESTIMATORS AND BIAS FOR THE DIFFERENT DESIGNS
A standard sibling comparison analysis conditions on the family by using a conditional (fixed effects) linear regression model on the form In this model, the family-specific intercept  0i is intended to absorb, and thus control for, the familial confounders.When there are two siblings per family, the estimated exposure effect is asymptotically equal to Since the familial confounders are assumed to have the same influence on both siblings, this influence is "canceled out" in the within-family differences ΔX i and ΔY i .
A standard MR analysis consists of two stages.In the first stage, one fits an ordinary linear regression model for the exposure against the genetic instrument and in the second stage one fits an ordinary linear regression model for the outcome against the genetic instrument Finally, the exposure effect on the outcome is estimated as the ratio  GY ∕ GX of the two regression slopes.Asymptotically, this ratio is equal to In the MR-sibling analysis proposed by Brumpton et al, 6 the constant intercepts  0 and  0 are replaced with family-specific intercepts  0i and  0i , so that both the first and second stage regression models control for familial confounders.When there are two siblings per family, the estimated exposure effect from the MR-sibling analysis is asymptotically equal to This expression reveals that, when there are two siblings per family, the MR-sibling analysis proposed by Brumpton et al 6 is essentially a standard IV analysis applied to within-family differences in the instrument, exposure and outcome.It further reveals that only sib-pairs with variation in the genetic instrument contribute to the estimated exposure effect, since sib-pairs with ΔG ij = 0 drop out from both numerator and denominator in Equation (5).As a consequence, an MR-sibling analysis can only use a genetic instrument that has within-family variation for at least some families.It can be shown (see Appendix B) that, under the parametric model in Section 2, the biases of  XY sib ,  XY MR , and  XY MRsib are given by and respectively.

CRITERIA FOR UNBIASEDNESS
Although the bias expressions from the previous section are relatively complex, in particular for B sib , they convey several simple and important qualitative insights.First, all bias terms are equal to 0 if  U = 0 or  UY = 0, since these appear in the numerator of all terms.This is intuitively reasonable; if  U = 0 then U ij is constant, and if  UY = 0 then U ij has no effect on Y ij .In either case, there is no confounding of X ij and Y ij , and we would expect all estimators to be unbiased.Apart from these common cases, there are essentially only two cases where the standard sibling comparison design is unbiased (B sib = 0).The first case is when  U = 1.To see this, note that setting  U = 1 makes the denominator of the first term in (6) "infinitely large," so that the first term becomes equal to 0. The same argument applies to the second term in (6), since, as noted in Section 2,  U = 1 implies that R UG = 1 as well.This case makes intuitive sense, since the sibling comparison design is expected to eliminate all confounding between X ij and Y ij if U i1 and U i2 perfectly correlated.
The second case is when both  UX = 0 and  UG = 0, since  UX appears in the numerator of the first term in (6) and  UG appears in the numerator of the second term in (6).This result can be understood from the causal diagram in Figure 1; if  UX = 0, then the confounding path In principle, the second term in ( 6) also becomes equal to 0 if either  GX = 0 or  G = 0; however, these cases would invalidate the first IV assumption, and would quickly be identified by the analyst in most cases and thus avoided in practice.In principle, one could also have that B sib = 0 if some of the parameters in ( 6) are positive and some are negative, in which case the two terms in (6) could cancel out.However, a perfect cancellation is unlikely to occur in practice.
Apart from the common cases  U = 0 or  UY = 0, there is essentially only one case where the standard MR design is unbiased (B MR = 0).This is when  UG = 0, since  UG appears in the numerator of the expression in (7).That B MR = 0 in this case is intuitively reasonable, since, as noted in Section 2, the second and third IV assumptions hold when  UG = 0.
Finally, apart from the common cases  U = 0 or  UY = 0, there are essentially two cases where the MR-sibling design is unbiased (B MRsib = 0); when  UG = 0, since  UG appears in the numerator of the expression in (8), and when R UG = 1, since this makes the denominator of the expression in ( 8) "infinitely large." It follows from the arguments above that, if B sib = 0 or B MR = 0, then we must essentially have that either  U = 0,  UY = 0,  UG = 0 or R UG = 1, or a combination of these.In any of these cases we have that B MRsib = 0.This leads to an important conclusion: the MR-sibling design is unbiased if either the standard MR design or the standard sibling design is unbiased, not necessarily both.We may thus say that the MR-sibling design is "doubly robust," in the sense that it gives the analyst two chances instead of only one to have an unbiased estimate of the causal exposure effect.

Provisional assumptions
Although the double robustness of the MR-sibling design has theoretical appeal, it may have less practical relevance.Indeed, all of the estimators in the previous sections are likely to have some bias, even asymptotically, since no genetic instrument is perfectly valid and there are always unmeasured confounders for the exposure and the outcome that are not perfectly shared within families.So under what conditions can the MR-sibling design be expected to have less bias than the standard MR and sibling comparison designs?It is difficult to answer this question in full generality, since some of the parameters in the bias terms from the previous section may have different signs, and thus cancel each other out.We hence proceed in this section by considering a somewhat restricted scenario where the following assumption holds: This assumption is not as restrictive as it may seem at first glance.That  G ,  U , and   X are non-negative is expected from the fact that most factors correlate positively within families.The signs of  GX ,  UX , and  UY can be swapped independently of each other, by swapping the signs of G ij , X ij , and Y ij , respectively.Thus,  GX ,  UX , and  UY can be defined as positive, by defining G ij , X ij , and Y ij accordingly.In practice, assumption (9) thus amounts to assuming that  UG and  * UG have the same signs as the other parameters in the model.In addition we assume that which is reasonable, since we would not expect the confounder and the genetic instrument to be more strongly correlated across individuals than within individuals.

The standard MR design vs the MR-sibling design
Consider the biases B MR and B MRsib of the standard MR estimator and the MR-sibling estimator in ( 7) and ( 8), respectively.These bias terms are identical, except for the factor in the denominator of B MRsib .If this term is larger than 1, then B MRsib has a larger denominator, and is thus smaller, than B MR .Under assumptions (9) and (10) it is easy to see that this happens if R UG ≥  G .We thus arrive at the following simple and important result: Since we would expect  G to be approximately 0.5, we may thus expect the MR-sibling design to have less bias than the standard MR design if the cross-individual correlation  * UG is no less than 50% of the within-individual correlation  UG .Importantly, the result in (11) also implies that the MR-sibling design will have more bias than the standard MR design if R UG <  G .To understand this "bias amplification," it is instructive to consider the special case when both G ij and U ij are binary.Due to the correlation  G , the variables G i1 and G i2 tend to be equal in a random sib-pair taken from the population.However, from the analytic expression of the estimator in (5) it is clear that only sib-pairs where G i1 and G i2 are unequal (ie, ΔG i ≠ 0) contribute to the estimated exposure effect in the MR-sibling design.Now, consider such a sib-pair with G i1 = 0 and G i2 = 1.The variables U i1 and U i2 are related to G i1 and G i2 in this sib-pair through two mechanisms.A positive correlation  UG suggests that U i1 = G i1 = 0 and U i2 = G i2 = 1.However, a positive correlation  * UG suggests the opposite; that UG is weak relative to  UG , that is, if R UG is small, then the first mechanism tends to dominate the second, so that we would expect a net positive correlation between U ij and G ij .Even in the whole population of all sib-pairs there is a positive correlation between U ij and G ij (equal to  UG ), but among the sib-pairs where G i1 and G i2 are unequal this correlation has to be stronger, in order to "override" the correlation between G i1 and G i2 .Since a stronger confounder-instrument correlation makes the genetic instrument less valid, we would thus expect this to result in a larger bias as well.
A similar bias amplification may occur if G ij and U ij are non-binary, since we would then expect a gradient where sib-pairs with less equal values of G i1 and G i2 contribute more information to the estimated exposure effect. 16Further, by the same reasoning as above we would expect a stronger confounder-instrument correlation, and thus a less valid instrument, in sib-pairs with less equal values of G i1 and G i2 than in the whole population of all sib-pairs.We illustrate this feature in Appendix C with an example.
The relation in (11) shows that it is important to reason about the value of R UG , in order to assess whether the MR-sibling design may perform better than the standard MR design in any given scenario.In Appendix D, we argue, based on theoretical considerations, that R UG is only likely to be smaller than  G if there is a direct effect of G ij on U ij , which acts in the opposite direction to other causes of the correlation  UG between G ij and U ij .For instance, if U ij has genetic components that are positively correlated with G ij through common heritage, then G ij must also have a negative direct effect on other (eg, environmental) components of U ij .Furthermore, in order for R UG to be smaller than  G this direct effect must be strong enough to "override" the other causes of the correlation, so that the net correlation  UG has the same sign as the direct effect of G ij on U ij .Thus, if we don't consider such a strong "inverse" direct effect as likely, then we may be fairly confident that the MR-sibling design has less bias than the standard MR design.In the illustration with real data below, we provide some empirical justification for why values of R UG smaller than  G may be considered unlikely.

The sibling comparison design vs the MR designs
The relations between B sib and the other two bias terms are more complex than the relation between B MRsib and B MR , due to the more complex analytic expression for B sib .However, there are simple qualitative relations between the biases, illustrated by Figure 2. In all plots of this figure we have used parameter values The parameter  U is set to 0.25, 0.5, and 0.75 in the top, middle, and bottom row, respectively, and the parameter   X is set to 0.25, 0.5, and 0.75 in the left, middle, and right column, respectively.In all plots, the horizontal, and vertical axes represent values of  UG and R UG , respectively, in the range (0, 1).The light gray area in each In line with the relation in (11) we observe that B MRsib is smaller than B MR when R UG is larger than  G (=0.5).We observe that both B MRsib and B MR tend to be smaller than B sib if  U or  UG is small.This is intuitive; if  U is small, then the confounds are only shared to a small extent within families, in which case the sibling comparison design will not provide a large bias reduction, and if  UG is small, then the genetic instrument is close to valid, in which case the MR design and the MR-sibling design are expected to provide a large bias reduction.Further, we observe that both B MRsib and B MR tend to be smaller than B sib if   X is large.This can be understood from the fact that the sibling comparison design tends to amplify bias due to uncontrolled confounding if the exposure correlates strongly within families. 11his feature of the sibling comparison design is analogous to the bias amplification of the MR-sibling design discussed above.We observe that B sib tends to be larger than B MR if R UG is small.That the sibling comparison design amplifies bias when R UG is small can be argued as for the MR-sibling design above, since only sib-pairs with unequal values of X i1 and X i2 contribute to the estimated exposure effect in the sibling comparison design, and the aforementioned mechanisms through  UG and  * UG will tend to act on X i1 and X i2 as well through their associations with G i1 and G i2 .Thus, the selection of sib-pairs with unequal values of X i1 and X i2 will tend to strengthen the correlation between U ij and X ij , thus increasing the confounding bias.Finally, we observe that B sib also tends to be larger than B MRsib if R UG is large, which indicates that the MR-sibling design reduces bias "more quickly" than the sibling comparison design as R UG approaches 1.
In Appendix E, we prove the qualitative relations observed in Figure 2, and we derive the parameter-specific thresholds for  U ,  UG ,   X , and R UG such that B MRsib ≤ B sib and B MR ≤ B sib .In Appendix F, we carry out a small simulation study, to verify the theoretical results and conclusions above.

ILLUSTRATION
To illustrate the theoretical results and conclusions above, we carried out a study of low-density lipoprotein (LDL) and diastolic blood pressure (DBP), using data from the Swedish Twin Registry. 12Several large cohort studies have indicated that high LDL levels are associated with high blood pressure, for example, see Otsuka et al 17 and the references therein.However, the strong potential for confounding by unmeasured (eg, genetic and lifestyle) factors raises concerns about bias in these observed associations.

Data
We used a sample of 2672 dizygotic Swedish twin pairs, born between 1911 and 1958.Both serum-LDL and DBP were measured at enrollment, which happened between 2004 and 2008.The LDL level ranged from 1.0 to 8.9 mmol/L, with a median equal to 3.7 mmol/L, and DBP ranged from 50 to 129 mmHg, with a median equal to 81 mmHg.Genetic data were available with Illumina OmniExpress bead chip in all twins.Basic quality control was done with removal of SNPs or samples due to missingness, minor allele frequency <1%, Hardy Weinberg disequilibrium, unmatched sex or low genotyping success.To construct a genetic instrument for LDL we used 91 available SNPs from the 99 SNPs selected by Valdes-Marquez et al 18 in their Mendelian randomization study of LDL and ischemic stroke and CHD.These SNPs were selected from the 2243 genome-wide significant associations in the Global Lipids Genetics Consortium using the clumping method implemented in PLINK1.9 and the 1000 Genomes Project Phase 3 (EUR).We regressed LDL on all SNPs jointly, and constructed our genetic instrument as the subject-specific linear predictor obtained from this regression. 19n our sample there were several measured potential confounders for LDL and DBP, including waist-to-hip ratio (WHR), body mass index (BMI; kg/m 2 ), amount of physical exercise per week (graded on a scale from 0 to 6), smoking and alcohol intake (ever vs never), education (total number of years in school), socioeconomic status (SES; graded on a scale from 1 to 7) and cognitive ability (graded on a scale from 0 to 19).Cognitive ability was assessed with a telephone-based cognitive screening instrument, and all other covariates were obtained from questionnaires.In addition, we had data on the first 10 genetic principal components, which were calculated from the genotype imputation with the Haplotype Reference Consortium (HRC) as the reference panel.
The participants have provided informed consent and the study has ethical approval from the Ethical Review Board in Stockholm.

Analysis
We first attempted to assess whether the models (1) and ( 2) fit reasonably well to our data; in particular, whether the absence of interactions between U ij and G ij , and between U ij and X ij in these models, respectively, are reasonable model assumptions.Using ordinary linear regression, we regressed LDL on the genetic instrument and all measured confounders except alcohol and cognition, including main effects and an interaction term between the instrument and each confounder.Similarly, we regressed DBP on LDL and all measured confounders except alcohol and cognition, including main effects and an interaction term between LDL and each confounder.We computed a P-value for each of the interaction terms in these models.The reason for excluding alcohol and cognition from these interaction analyses is that only 147 subjects had data on both alcohol and cognition, which is arguable too small sample to detect any interaction of realistic magnitude.
We next estimated the within-family correlation  G for the genetic instrument.We estimated the causal effect of LDL on DBP with four different methods: (1) ordinary linear regression; (2) sibling comparison, where each twin pair was entered as a separate stratum in a fixed effects linear regression model, thereby implicitly controlling for all confounders shared in the family; (3) MR with two-stage estimation, where we in the first stage regressed LDL on the genetic instrument with ordinary linear regression, in the second stage regressed DBP on the genetic instrument with ordinary linear regression, and finally estimated the causal LDL effect as the ratio of the two regression slopes; (4) MR-sibling with two stage estimation, where the ordinary linear regression models in the first and second stage were replaced with fixed effects linear regressions.In a real study, one would most likely want to control for all measured confounders, in order to reduce bias to the extent possible.However, since the purpose of our study was to illustrate and contrast the different methods in the presence of uncontrolled confounding, we did not control for any confounders when estimating the causal LDL effect with these four methods thus "pretending" that the measured confounders were unmeasured.
We observed in Section 5 that the relations between biases depend heavily on the within-family correlation  U in the confounders, on the within-individual correlation  UG between the confounders and the genetic instrument, and on the ratio R UG between  UG and the corresponding cross-individual correlation  * UG .In our definition of these parameters, the variable U ij is a scalar summary measure intended to represent the whole set of uncontrolled confounders.Although useful for conceptual insight, it is not clear how to construct such a scalar summary measure in practice from any given multidimensional set of confounders.Thus, to assess the bias relations in our study we considered the confounders one at a time, and estimated the parameters  U ,  UG , and R UG for each confounder separately.In this analysis, we only included those subjects for which confounder data was not missing, for each respective confounder (ie, complete-case analyses).
Finally, we repeated analyses (1)-( 4) above, controlling for the measured confounders.In these analyses we excluded alcohol and cognition for which there was substantial missingness, and we also excluded those subjects for which the remaining confounders were not missing (complete-case analyses), leaving 2247 subjects.
The complete-case analyses may give bias if the missingness is not completely at random, and the exclusion of alcohol and cognition from the controlled analyses may give bias if these variables have a substantial confounding influence.Such biases may be reduced to some extent by using, for instance, multiple imputation.
We computed 95% confidence intervals for all estimates with non-parametric bootstrap based on 1000 re-samples.To accommodate the dependence between twins from the same pair we re-sampled twin pairs rather than individuals in the bootstrap.By viewing the twin pair rather than the individual as the "unit of observation," and assuming that the twin pairs are independent, the validity of this bootstrap follows immediately from general results on bootstrap of independent observations.All analysis were carried out with R, version 4.0.5, 20 with packages drgee 21,22 and forestplot. 23

Results
In the regression model for LDL against the genetic instrument and the measured confounders, only 2 out of 16 interaction terms turned out statistically significant (P < 0.05 for BMI and smoking).In the regression model for DBP against LDL and the measured confounders, only 1 out of 16 interaction terms turned out statistically significant (P < 0.05 for education).Thus, models (1) and ( 2) that assume no interactions may be reasonable for our data.

F I G U R E 3
Confounder-specific estimates of the within-family correlation  U in the confounders, with 95% bootstrap confidence intervals.
The estimated within-family correlation  G for the genetic instrument was 0.53, with 95% confidence interval (0.49, 0.57), thus agreeing well with the expected theoretical value of 0.5.
Table 1 shows the estimated effect of LDL on DBP for the four different analyses, not controlling for any of the measured confounders.The ordinary and fixed effects linear regressions indicate strong positive association (slopes 1.17 and 1.15).The two MR analyses indicate a weaker positive association (slopes 0.40 and 0.55).If the genetic instrument is valid, then these analyses eliminate all confounding bias.However, the confidence intervals are much wider for the MR analyses than for the ordinary and fixed effects regressions, which indicates that this potential elimination of bias comes to the cost of severely reduced statistical efficiency/power.
The forest plots in Figures 3-5 show the confounder-specific estimates of the within-family correlation  U in the confounders, the within-individual correlation  UG between the confounders and the genetic instrument, and the ratio R UG between  UG and the corresponding cross-individual correlation  * UG , respectively, with 95% confidence intervals.In Figure 5, arrow heads indicate that the corresponding confidence limits are outside the range of the plot.The confounder-specific estimates of  U in Figure 3 indicate that all confounders have a positive within-family correlation as expected, possibly with an exception for alcohol intake.The principal components correlate strongly within families, whereas the correlations in the other confounders are more modest.This indicates that the sibling comparison analysis may have reduced confounding bias to some extent in our study, but that significant bias may still remain.The confounder-specific estimates of  UG in Figure 4 are all close to 0, and all confidence intervals include 0 except for the first  TA B L E 2 Estimated effect (mmol/L/mmHg) of LDL on DBP, controlling for all measured confounders except alcohol and cognition.and eighth principal components.This indicates that the genetic instrument may be at least approximately valid in our study, and one would thus expect both the standard MR analysis and the MR-sibling analyses to be less biased than the standard sibling comparison analysis.However, we caution that the genetic instrument may still be invalid by correlations with unmeasured confounders.The confidence intervals for the confounder-specific estimates of R UG in Figure 5 are very wide, so these estimates must be interpreted with a great deal of caution.However, we observe that the estimates appear to be distributed around the range (0.5, 1) indicated by the vertical dashed lines in Figure 5, where 6 estimates are above, 3 estimates are below, and 9 estimates are within the range.Figure 6 shows this distribution as a histogram, with a fitted normal distribution.This indicates that, when considering the confounders a joint set, the true value of R UG may plausibly be larger than  G and smaller than 1, so that the MR-sibling analysis has less bias than the standard MR analysis in our study.However, we again caution that this tentative conjecture may be contradicted by correlations between the genetic instrument and unmeasured confounders.

Analysis
Table 2 shows the estimated effect of LDL on DBP for the four different analyses, controlling for all measured confounders except alcohol and cognition.The results are fairly similar to those from the uncontrolled analysis in Table 1.An exception is the point estimate from the MR-sibling analysis, which is much larger when the confounders are controlled for.However, the confidence interval for this estimate is extremely wide, which indicates that the difference in point estimates could be largely explained by pure chance.

DISCUSSION
The MR design has become an important and popular component in the epidemiologic toolbox, due to its ability to reduce confounding bias.However, this attractive ability is endangered by several mechanisms, including dynastic effect, population stratification and assortative mating.In this article, we have discussed the MR-sibling design, which has previously been suggested to be less susceptible to such parental level mechanisms.We have shown that the MR-sibling design is indeed "doubly robust," in the sense that it is unbiased if either the standard MR design or the standard sibling is unbiased, not necessarily both.We have shown that the MR-sibling design can be expected to have less bias than the standard MR design if the cross-individual correlation between the confounders and the genetic instrument is no less than 50% of the within-individual correlation, and we have discussed when this condition is likely to hold in practice.Specifically, we have argued that this condition is only likely to be violated if there is a direct effect of the genetic instrument on the confounders, which acts in the opposite direction to other causes of the correlation between these variables.We have also derived the more complex relations between the bias of the sibling comparison design and the two MR designs.Finally, we have illustrated the theoretical results and conclusions with a real data example.In the real data example we used a sample of dizygotic twins.Arguably, twins tend to share more environmental factors than ordinary siblings, since they are born at the same time.Thus, for twins we expect  U to be closer to 1 than for ordinary siblings, thus potentially giving less bias in the sibling comparison design.Monozygotic twins are a special case, since they have identical genomes up to random mutations.We note though that, although this feature is an advantage in a sibling comparison design, it makes it impossible to use monozygotic twins in the combined MR-sibling design, since the genetic difference ΔG i is equal to 0 for all twin pairs.Our results assume that the genetic instrument is a scalar.This was indeed the case in our real data example as well, since we used the 91 available SNPs to construct a scalar instrument as the subject-specific linear predictor obtained from regressing LDL on all SNPs jointly.If several instruments are available, and one wishes to analyze these separately instead of combining them into a scalar score, then our results apply to each instrument separately.
All bias results are derived for linear models.Although it can be conjectured that similar results may hold for non-linear (eg, logistic or Cox proportional hazards) models, it would be useful to verify this formally, and we view this as an important topic for future research.We have throughout restricted attention to asymptotic bias.In practice, the choice between different study designs depends on other factors as well, in particular statistical power and efficiency.While having a potential for bias reduction, Brumpton et al 6 showed that the MR-sibling design also has lower power than the standard MR design.Thus, if the sample size is moderate and the bias due to parental-level factors (including dynastic effects, population stratification and assortative mating) is expected to be relatively small, then power considerations may outweigh concerns of bias, so that the standard MR design may be deemed preferable to the MR-sibling design.

How to cite this article
After some algebra, the eigenvalues of Σ UG are given by where Σ UG is positive semidefinite if and only if all eigenvalues are non-negative.The restrictions  1 ≥ 0 and  2 ≥ 0 are equivalent with (3), and the restrictions  3 ≥ 0 and  4 ≥ 0 are equivalent with (4).

APPENDIX B. ASYMPTOTIC BIAS
Combining ( 1) and ( 2) gives that We thus have that which gives (7).From ( 1) and (B1) we have that and We further have that We thus have that which gives (6), and which gives (8).

APPENDIX C. AN EXAMPLE OF AMPLIFIED CONFOUNDER-INSTRUMENT CORRELATION
Figure C1 shows the conditional correlation between normally distributed U ij and G ij , given the minimal absolute difference min |ΔG i | in the genetic instrument between the two siblings, for  U =  Z = 0.5,  UG = 0.3 and R UG = (0, 0.25, 0.5).By increasing the value of min |ΔG i | we create an increasingly smaller subpopulation of sib-pairs with increasingly larger difference in the genetic instrument between the two siblings in each pair.When considering the whole population (min |ΔG i | = 0), the conditional correlation is equal to the marginal correlation  UG = 0.3.However, as we restrict attention to sib-pairs with increasingly larger values of min |ΔG i |, the conditional correlation becomes increasingly larger.As expected, the conditional correlation increases faster when R UG (and thus also  * UG ) is small, since the conditional correlation is then dominated by the mechanism acting through  UG , as argued for binary variables above.

APPENDIX D. THEORETICAL COMPARISON OF 𝝆 G AND R UG
To study the relation between R UG and  G it is useful to partition the correlation structure between (U i1 , U i2 ) and (G i1 , G i2 ) into components due to common causes of these variables, and components due to a causal effect of G ij on U ij , as in the causal diagram in Figure D1.In this model, the familial common causes W i contains all parental genetic factors, whereas U ij contains all environmental and individual genetic factors that affect both the exposure and the outcome, except the alleles contributing to the genetic instrument G ij .The arrow from G ij to U ij represents an effect of the genetic instrument on the outcome, mediated through the environmental components of U ij , that is, a pleiotropic effect.
We augment the linear model in ( 1) and (2) with ) where the error terms ( G i1 ,  G i2 ) for the genetic instrument are independent of the error terms ( U i1 ,  U i2 ) for the confounders, and both these are independent of the error terms for the exposure and outcome in ( 1) and (2), respectively.Under this model we have the following relations between R UG and  G (see proof below): The relation in (D3) is obvious from Figure D1; without the arrow from G ij to U ij the correlation  UG is determined by the product of the coefficients  GU and  WG on the path U ij ← W → G ij , and the correlation  * UG is determined by the very same product of coefficients along the path U ij ← W → G ij ′ .The relation in (D4) also makes intuitive sense; without the arrows from W i to (U i1 , U i2 ), or the arrows from W i to (G i1 , G i2 ),  UG is determined by the coefficient  GU on the path G ij → U ij , whereas  * UG is determined by the product of the coefficients  GU and  G on the path Taking the ratio of these products gives  G .
The relation in (D5) is probably less intuitive, but highly important.In Figure D1, the influence of W i on G ij and U ij is determined by the product of the coefficients  WU and  WG on the path U ij ← W i → G ij .If this product is positive (ie, if  WU and  WG have the same sign), then W i pulls the correlation between U ij and G ij to positive values, whereas if this product is negative (ie, if  WU and  WG have different signs), then W i pulls the correlation to negative values.The net correlation  UG between U ij and G ij depends on both this influence of W i and the direct effect  GU of G ij on U ij .According to the relation in (D5), R UG is only smaller than  G if the influence of W i is in the opposite direction as the net correlation.Clearly, this can only happen if the direct effect of G ij on U ij is strong enough to "override" the influence of W i .
To prove the relations in (D3)-(D5), let  2 W and  2  G denote the variance of W i and  G ij , respectively.From (D1) we have that Combining (D1) and (D2) gives that

APPENDIX E. RELATIONS BETWEEN BIASES
We implicitly rely on assumptions ( 9) and (10) in this section.We show that and for certain parameter-specific thresholds K 1 , … , K 8 , which are derived below.Define We have that and We now have that We thus have that Combining with the fact that B MRsib is not a function of  U gives that Solving for   X gives that Similarly, combining with the fact that B MR is not a function of  U gives that B MR ≤ B sib if  U ≤ K 5 , where K 5 is the value of  U such that B MR = B sib .Straight-forward calculations gives that Solving for   X and R UG give that and respectively.The inequality  U ≤ K 1 can be rewritten as The left-hand side of this inequality is a quadratic function of  UG (1 − R UG ) with a positive second derivative.Thus, the inequality holds when x 1 ≤  UG (1 − R UG ) ≤ x 2 , where x 1 and x 2 are the solutions to the second order equation obtained by replacing the inequality sign in (E3) with an equality sign.Straight-forward calculations gives that of sib-pairs.Some of the cells in the table are empty; these correspond to parameter combinations that violate either of the restrictions in (3) or (4), and are thus logically impossible.We observe that βXYsib is unbiased when  U = 1 and that βXYMR is unbiased when  UG = 0, as expected.We further observe that βXYMRsib is indeed "doubly robust," in the sense that it is unbiased when either βXYsib or βXYMR is unbiased.We finally observe that the bias in βXYsib decreases with  U and R UG , and increases with   X , as discussed in Section 5.The 95% confidence intervals have close to the nominal 95% coverage probability in the scenarios where the corresponding estimate is unbiased, but virtually 0 coverage probability in the other scenarios.

5 .
Light gray area: B MRsib < min(B sib , B MR ); middle gray area: B MR < min(B sib , B MRsib ); dark gray area: B sib < min(B MR , B MRsib ); white area: logically impossible values of ( UG , R UG ).plot represents the region where the MR-sibling design has smallest bias (B MRsib < min{B sib , B MR }), the middle gray area represents the region where the standard MR design has smallest bias (B MR < min{B sib , B MRsib }) and the dark gray area represents the region where the standard sibling comparison design has smallest bias (B sib < min{B MR , B MRsib }).The white area represents values of  UG and R UG that violate the restriction in (3) or (4), and are thus logically impossible.

F I G U R E 4
Confounder-specific estimates of the within-individual correlation  UG between the confounders and the genetic instrument, with 95% bootstrap confidence intervals.FI G U R E 5Confounder-specific estimates of the ratio R UG between  UG and the corresponding cross-individual correlation  * UG , with 95% bootstrap confidence intervals.Arrow heads indicate that the corresponding confidence limits are outside the range of the plot.The estimates for alcohol and PC5 are 3.97 and −3.73, respectively.

F I G U R E 6
Distribution of confounder-specific estimates of the ratio R UG between  UG and the corresponding cross-individual correlation  * UG , with a fitted normal distribution.
TA B L E 1

TA B L E F1 Simulation results. 𝜷 XY sib 𝜷 XY MR 𝜷 XY MRsib
The table shows the mean estimates (est) of  XY and the empirical coverage probability (cp) of 95% bootstrap confidence intervals over 1000 samples consisting of n = 1000 sib-pairs each.The true causal effect  XY is equal to 0. Note: