Sensitivity Analysis of G-estimators to Invalid Instrumental Variables

Instrumental variables regression is a tool that is commonly used in the analysis of observational data. The instrumental variables are used to make causal inference about the effect of a certain exposure in the presence of unmeasured confounders. A valid instrumental variable is a variable that is associated with the exposure, affects the outcome only through the exposure (exclusion criterion), and is not confounded with the outcome (exogeneity). These assumptions are generally untestable and rely on subject-matter knowledge. Therefore, a sensitivity analysis is desirable to assess the impact of assumptions violation on the estimated parameters. In this paper, we propose and demonstrate a new method of sensitivity analysis for G-estimators in causal linear and non-linear models. We introduce two novel aspects of sensitivity analysis in instrumental variables studies. The first is a single sensitivity parameter that captures violations of exclusion and exogeneity assumptions. The second is an application of the method to non-linear models. The introduced framework is theoretically justified and is illustrated via a simulation study. Finally, we illustrate the method by application to real-world data and provide practitioners with guidelines on conducting sensitivity analysis.

exposure (exclusion), and is not confounded with the outcome (exogeneity). Notably, these three properties define a valid instrumental variable (IV). Such variables allow for the identification of causal effects, particularly the local average treatment effect (LATE), which is the average treatment effect (ATE) for individuals whose treatment status can be influenced by changing the IV value. 2 These causal effects are the target effects in many observational studies and the main focus of this article.
Contrary to the RCTs, finding a valid IV in observational studies is complicated since the exclusion and the exogeneity properties are testable only under strict constraints. 3 For example, Palmer et al 4 developed a statistical test based on inequalities induced on the joint distribution of the IV, the exposure, and the outcome. However, this test is only applicable to categorical data, which is an important limitation. Another approach utilizes the overidentification. Overidentification refers to a situation where the number of IVs exceeds the number of causal parameters of interest. An overidentification test, such as the Sargan test 5 and Hansen's J test, 6 inspect the degree of agreement between different estimators of the causal effect. Although there are several versions of these tests, Newey 7 showed that tests based on a finite set of moment conditions are asymptotically equivalent. The main drawback of these tests is that we do not know which one, if any, of the IVs is valid. Therefore, we cannot determine which of the estimators converge to the true causal effect. As a result, properties that allow for a valid IV are relegated to assumptions that typically rely on subject-matter knowledge. For example, in the well-known study in political economy, Acemoglu et al 8 estimated the causal effect of income on the level of democracy using past savings as an IV. Acemoglu et al 8 acknowledged that the exclusion assumption might be compromised since the saving rates might be correlated with anticipated regime change. Furthermore, an exogenous factor may influence both the saving rates and the democracy level, violating the exogeneity assumption. Another example is epidemiological studies that use the Mendelian randomization approach to estimate causal effects. This approach uses a genetic marker that is associated with a particular exposure and affects an outcome of interest only through the exposure. For instance, VanderWeele et al 9 present a study of the causal effect of smoking on lung cancer by using genetic variants on chromosome 15 as an IV that affects the number of smoked cigarettes per day. However, this genetic variant also affects directly the probability of lung cancer and therefore violates the exclusion assumption. 10 If any of the IV assumptions are violated, the causal effect cannot be fully identified. Therefore, assessing the robustness of the obtained results to violations of valid IV assumptions is desirable. This assessment can be done, for example, with a sensitivity analysis. The need for sensitivity analysis to violations of the IV assumptions has been addressed in several previous publications. 6,9,11,12 For example, Angrist et al 11 provide rigorous analysis of the IV estimand when the exclusion assumption is violated. Wang et al 13 introduced a method for sensitivity analysis that is based on the Anderson-Rubin 14 (AR) test. The AR test is a uniformly most powerful test used to test the null hypothesis of no causal effect of the exposure on the outcome. Wang et al 13 considered two sensitivity parameters; one for the exclusion and another for the exogeneity assumption. By assuming linearity of the functional form of both violations, one can add up the two parameters and construct a single parameter that captures both the exclusion and the exogeneity assumptions violations. Canley et al 15 also considered one sensitivity parameter that is assumed to account for the two types of violations. The sensitivity analysis of the estimators was performed for a linear model by assuming a different set of values or a prior distribution of the sensitivity parameter. Imbens 16 performed a sensitivity analysis of the exogeneity assumption violation using two distinct sensitivity parameters in the framework of a linear causal model. Additional research on sensitivity analysis was conducted by Imbens & Rosenbaum, 17 Small & Rosenbaum 18 and others. 13,19 However, all these studies dealt only with least-squares-based estimators in linear causal models. 6,[11][12][13]16 This article starts with formulating valid IV assumptions using Rubin's potential outcomes framework. 11,20 Then, we formulate violations of these assumptions using a single sensitivity parameter that captures violations of both the exclusion and the exogeneity assumptions. Using a single parameter requires specifications of fewer parameters in the model which is considered a valuable advantage. Next, we conduct a simulation study to illustrate the sensitivity analysis in linear and logistic causal models. Finally, we provide a real-world data example that illustrates the new method.

The structural mean model
Let Y x be the potential outcome 20 of Y when the exposure X is set to x. In addition, let L be a set of measured confounders for X and Y , Z is the instrumental variable, and is a vector of the causal parameters. See Figure 1 for a graphical illustration of a causal model with a valid IV. A structural mean model 21,22 that parameterizes the average causal effect F I G U R E 1 A causal structure of a valid IV. Y is the outcome variable, X is exposure, and Z is the instrument. U represents all unmeasured confounders of X and Y , whereas L represents all measured confounders. The instrument Z affects Y only through the exposure X, and is not confounded with the outcome by the unmeasured variables.
F I G U R E 2 A twin causal network with an invalid IV. The solid arrows represent the underlying causal structure without the violations.
Y is the outcome variable, X is exposure, and Z is the instrument. U represents all unmeasured confounders of X and Y , whereas L represents all measured confounders. The dashed arrows from Z to Y and from Z 0 to Y 0 represent the exclusion assumption violation in the factual and hypothetical networks, respectively. The dotted arrows from U to Z and Z 0 represent the exogeneity assumption violation in the factual and the hypothetical networks, respectively. Notably, the factual instrument equals the potential instrument, that is, Z = Z 0 , since it is determined in both networks prior to the treatment X by the same unmeasured confounder U and the noise term Z (given the measured confounders L). The noise terms Z and Y represent all other factors that determine the value of the instrument Z, the outcome Y , and the potential outcome Y 0 . for individuals exposed to level x of X, is defined as where is a link function of a generalized linear model, and dim(m(L)) = dim( ).
, therefore this part of the model is identifiable from the observed data, whereas identification of the counterfactual mean E[Y 0 |L, Z, X = x] requires a valid IV. The composition of the vector-valued function m(L) defines the exact form of the causal model and may allow for interactions between L and the exposure X. For example, in a linear model, is the identity link function. Furthermore, assuming m(L) = 1 and a binary exposure X, simplifies the structural mean model In such a case, is the average causal effect of the exposure for the exposed. In this scenario, can be consistently estimated using the two-stage least squares (TSLS) method. 23 We use the twin network to demonstrate the counterfactual implications of violations of the IV assumptions. A twin network is a graphical method that presents two networks together-one for the hypothetical world where the exposure is set to a fixed level x for everyone, and the other for the factual world where the exposure varies randomly. A directed acyclic graph (DAG) of a twin network in Figure 2 provides a graphical illustration of a causal model with unmeasured confounders U and an instrumental variable Z. The solid arrows represent the underlying causal structure without the violations. In other words, if we keep only the solid arrows in the DAG, we remain within a network where the IV Z is valid. The dashed arrows from Z to Y and from Z 0 to Y 0 represent the exclusion assumption violation in the factual and the hypothetical networks, respectively. Namely, the dashed arrows represent the direct effect of the instrument on the outcome in the factual and hypothetical networks, respectively. Notably, the factual instrument equals the potential instrument, that is, Z = Z 0 , since it is determined in both networks prior to the treatment X by the same unmeasured confounder U and the noise term Z (given the measured confounders L). The dotted arrows from U to Z and Z 0 represent the exogeneity assumption violation in the factual and the hypothetical networks, respectively. In other words, if the instrument Z (Z 0 ) is associated with the unmeasured confounders U that also confound the exposure X and the outcome Y , then the instrument becomes an endogenous variable. It is worth mentioning that the noise terms Z and Y represent all other factors that determine the value of the instrument Z, the outcome Y , and the potential outcome Y 0 . Given Z , L and U, the value of the instrument Z is set deterministically. However, since we cannot measure every possible parent variable, we regard Z as a random variable. Namely, even if the values of U are known, Z remains a random variable where Z accounts for the unexplained variation of Z. The same explanation holds for Y (Y 0 ), U and Y . Another important feature of the twin network method is that it provides a graphical way for testing independence among counterfactual quantities. According to Figure 2, a counterfactual implication of exogeneity and exclusion assumptions is that a valid IV Z satisfies Y 0 ⊥ Z|L. 24 In other words, if Z is a valid IV, there is no open path between the potential outcome Y 0 and the instrument Z, conditionally on L. 25 An important advantage of this formulation is that it avoids explicit reference to unmeasured confounder U, allowing for greater generality.

G-estimation
Loosely speaking, the G-estimator 26 is a value of the causal parameter under which the assumption Y 0 ⊥ Z|L holds in the observed data. This value is obtained as a solution to a system of estimating equations. 27 In heterogeneous populations, the value of Y 0 is determined by the causal effects specified in the model and the noise term Y that incorporates the set of all other exogenous unknown factors that determine the outcome. This factor is regarded as a disturbance and thus is suppressed in the definition of the causal model. The potential outcome Y 0 is counterfactual, namely, an unobserved quantity for the exposed individuals. Thus, we can only estimate its mean value. For example, in a linear model, if the causal effect for the exposed is defined as m T (L)x , then subtracting it from the observed outcome Y , results, on average, in the potential outcome Y 0 . We denote this quantity by h( ), where the exact form of h( ) depends on the link function , such that , if is the logit link function. ( G-estimators are obtained in the same manner in generalized linear models, following the transformation that is defined as the inverse of the link function. Formally, the G-estimator solves the estimating equation where D(L, Z) is an arbitrary function with the same dimension as , such that E[D(L, Z)|L] = 0. Two common choices for D are 22 and The function in Equation (5) can only be used for one-dimensional instruments, while the function in Equation (6) can be used for multidimensional IVs as well. However, this advantage comes with a price-it requires specification of an additional model. Particularly, the function in Equation (5)  . Without loss of generality, from now on, we will assume a one-dimensional IV and therefore consider the more simple function as in Equation (5). Consistency of the G-estimator relies on the validity of the IV, namely, on the conditional independence of h( ) and Z, given L. A possible representation of a less strict condition of mean independence between h( ) and Z is via the following equation 28,29 If the conditional mean independence does not hold, then Equation (7) does not hold either. Although a valid IV satisfies complete independence of Y 0 and Z, violations that affect, for instance, only the variance of the variables will not affect the estimators' consistency of the structural mean model parameters. To avoid confusion, by 'violation' we refer to settings that violate the mean independence of Y 0 and Z, given L. The conditional dependence of Y 0 and Z can arise from the violation of the exclusion assumption, the violation of the exogeneity assumption, or both. To illustrate this claim, we use the twin network in Figure 2. For example, if the exclusion assumption is violated, it implies a direct effect of the IV Z on the outcome Y . Since the potential IV Z 0 possess the same parent variable z (in addition to the measured confounders L), this violation implies an association between the potential outcome Y 0 and the observed instrument Z. If the exogeneity assumption is violated, it means that both Z and Y 0 have a common parent U. If U is unmeasured and thus not controlled, it creates an association between Z and Y 0 . A possible consequence of these violations is inconsistency of the G-estimator of .

Sensitivity analysis with a single parameter
We model compositions of mean independence violations using a parametric function b(L, Z; ), where for a generalized linear model, one can use the following form Since any source of violation will result in an association between Y 0 and Z (given L), we can use a scalar function b(L, Z; ), where is a parameter that incorporates this association. Therefore, we require that b(L, Z; 0) = 0. Namely, the violation is eliminated if there is no association between Y 0 and Z. Consequently, b(L, Z; ) is non-zero for ≠ 0. Defining a sensitivity parameter in terms of the deviation from a counterfactual independence assumption is standard in the causal inference literature, and used, for instance, in sensitivity analysis for truncation by death, 30 marginal structural models 31 and transportability in randomized trials. 32 Since |b(L, Z; )| expresses the magnitude of the assumptions' violation, we can expect that for given values of L and Z, |b(L, Z; )| will be a monotonically non-decreasing function of | |. For example, in a linear violation structure such that b(L, Z; ) = Z , incorporates the association's strength and direction. For readers who prefer to define interpret the sensitivity parameter in terms of causal mechanisms (eg, arrows on a DAG), we show in the Appendix how our parameter can be decomposed into a direct effect of the instrument on the outcome, and an association between the instrument and the outcome due to confounding. For linear models, Wang et al 13 considered such a parametrization, and used this for sensitivity analysis. Our results in the Appendix thus extend those by Wang et al. and allow the reader to specify directly or indirectly in terms of parameters similar to those proposed by Wang et al. The interpretation of the direct effect is conceptually straightforward. However, the confounded association is more subtle since assessing the strength (absolute value) and direction (sign) of induced by omitted confounders can be tricky. In order to give meaningful values or even crude bounds on , subject-matter knowledge is required. For the aforementioned linear structure, a subject-matter expert can answer two main questions: (1) Is a direct causal effect of the instrument on the outcome plausible? (If the answer is affirmative-how strong is the effect, and what is its direction?) (2) Are there any important confounders that can be measured and therefore mitigate the possible bias caused by their omission? For example, in a special case of an RCT with imperfect compliance, one can rule out the exclusion assumption violation by the very design of the experiment. In addition, as a well-conducted experiment strives to account for every known confounder, one can assume that even if a violation occurs, it can only be due to the omission of non-essential confounders (exogeneity assumption violation) and, therefore, should not be severe. In such a case in the sensitivity analysis, one may consider only small values of around 0.
Using Equation (3) in the estimating equations for ≠ 0 will result in an inconsistent estimator since the estimating equations in such a setting will be biased w.r.t. 0. To resolve this pitfall, we reformulate h( ) from Equation (3) as a function of the parameter such that E[h( ; )|Z, L] = E[h( ; )|L] = a(L), for the true value of that is denoted by = * (see proof of Lemma 1 in Appendix). In other words, for * , the assumption Y 0 ⊥ Z|L is still violated; however, the estimating functions remain unbiased, and thus the G-estimator is consistent w.r.t. the true causal parameter . Since the consistency depends on the correct specification of in h( ; ) and b(L, Z; ), we consider as a sensitivity parameter. The exact value of * would typically not be known to the analyst. Therefore, a sensitivity analysis can be carried out by varying over a range of values considered plausible by subject-matter experts, and estimating the causal exposure effect separately for each value of .

Asymptotic variance and distribution of the G-estimator
Notably The asymptotic variance of̂is given by the sandwich formula 27 , and 0 is the true value of the unknown parameters. It can be shown that the upper-left block of the "meat" matrix B( 0 , ) is the expected value of the outer product of the unbiased estimating functions For example, for one-dimensional Z and , and linear ) .
The vector of 0 estimators is denoted bŷ( ). This asymptotic covariance results in a robust estimator of the coefficients' variance w.r.t. model misspecification. The asymptotic distribution of the estimatorŝ( ) is multivariate normal, 27 namely where the subscript p denotes the dimension of the parametric space ∈ Θ, and the superscript D denotes convergence in distribution. For the sample version of "bread" and "meat" matrices, we replace the expectation operator with the corresponding sample means, the true T 0 = ( T Y , Z ) with the corresponding unbiased estimators, and with its G-estimator that is obtained for a given value of .

Linear causal model
In order to illustrate the implication of nonzero value * , we start with a linear model with explicit unmeasured confounders U, and possible violations of the exclusion and the exogeneity assumptions. Namely, we assume an underlying causal structure as illustrated in the DAG in Figure 2. However, for clarity of exposition, we assume no measured confounders L, that is, L = ∅. It can be shown (see, eg, section 2 Vansteelandt & Didelez 24 ) that the structural mean model in Equation (2) can be obtained by averaging out the following structural mean model w.r.t. the conditional distribution of U, given Z and X. We specify a violation of valid IV assumptions with b(Z, L; * ) = * Z. This specification may represent a direct effect of Z on Y , a confounding effect of unmeasured variables U on Z and Y , or both.
Using simple algebra as illustrated in Lemma 2 in the Appendix, the true causal effect can be formulated as a function of the sensitivity parameter * in the following way where XZ = cov(X, Z)∕var(Z), and YZ = cov(Y , Z)∕var(Z). For a special case where U = ∅, the sensitivity parameter * is simply the true regression coefficient YZ.X = cov(Y , Z|X)∕var(Z|X). Hence, a G-estimator that can be corrected using Equation (16) to remain consistent, coincides with the least squares estimator of (for further details, please refer to Lemma 4 in the Appendix). Following the same logic, in the presence of additive unmeasured confounders, with an underlying causal structure as in Figure 2, the sensitivity parameter * is YZ.XU = cov(Y , Z|X, U)∕var(Z|X, U). Namely, * depends on the correlation of the outcome with the IV conditional on unmeasured confounders U and exposure X, and therefore is not point-identifiable. Notably, from Equation (16), we can obtain bounds of the corresponding causal effect by setting bounds on the true value of the sensitivity parameter * . Theorem 1. Assume the causal structure as in Figure 1, and a structural mean model as in Equation (2). The true sensitivity parameter * , as defined in (8), is non-identifiable.
For a detailed proof of Theorem 1 please refer to Appendix. An immediate implication of Theorem 1 is that, in the presence of unmeasured confounders, the causal effect is non-estimable using the G-estimation method. Therefore, it is desirable to perform a sensitivity analysis where is varied over a set of values considered plausible by a subject-matter expert. Otherwise, if such values cannot be determined, we suggest starting with a range symmetric values around 0. Each value of from this set is mapped to a corresponding G-estimator, and this procedure thus provides bounds on the true causal effect.
As illustrated in Equation (13), the asymptotic distribution of the G-estimator is normal, where its asymptotic variance can be computed using the sandwich formula as in Equation (11). For linear outcome models, the "bread" matrix A( 0 , ) is block-diagonal matrix since the partial derivatives of D(L, Z; Z ) and h( ; ) do not depend on Y . This result simplifies the calculations of the G-estimator variance since the upper-right block derived for the estimators of Y does not contribute to the variance of the G-estimator. For example, for one-dimensional Z and , and linear E[Z|L; Z ] such that S(L.Z; Z ) = D(L, Z; Z ), the form of the right-lower block of the "bread" matrix A( 0 , ) is ) .

Logistic causal model
Assume a binary outcome Y , a binary exposure X, a binary instrument Z, and an unmeasured confounder U. In addition, assume that there are no measured confounders, that is, L = ∅, and m(L) = 1. We define the structural mean model on the logit scale Therefore, we define the violation on the logit scale as well In addition, let D(L, Z; Z ) = S(L, Z; Z ) = Z − Z . Therefore, the last function D(L, Z; Z )h( ; ) in the system of estimating equations (10) is Since h( ; ) in the logistic causal model depends on the outcome model E[Y |Z, X; Y ], the variance of the G-estimator is now affected by the variance of Y estimators. In particular, the "bread" matrix A( 0 , ) is no longer a block diagonal as in the linear model, since the partial derivatives of D(L, Z; Z )h( ; ) w.r.t. Y are non-zero valued, that is, . Therefore, the computations of the covariance matrix are more involved since they require the inversion of high-order matrices. The simulation study in Section 3 illustrates a sensitivity analysis of the G-estimators of causal parameters using an invalid instrument in linear and logistic causal models.

SIMULATION STUDY
In many practical situations, we cannot rule out the presence of unmeasured confounders that affect both X and Y , and an absence of violation of the IV assumptions, as illustrated in Figure 2. In situations with invalid IVs, the G-estimators of the causal parameter will be asymptotically biased, 12 while their correction is not possible since the true * is not point-identifiable. Thus, sensitivity analysis is desirable. To perform the sensitivity analysis, we vary over a range of plausible values and map them into a range of corresponding G-estimators of . Since any violation of valid IV assumptions results in an association between the instrument Z and the potential outcome Y 0 , we can simulate the appropriate data without explicitly using unmeasured confounders. In other words, instead of using the explicit unmeasured confounders U, we start with the structural mean model and the consequence of the violations. Therefore, in the data generating process (DGP), the unmeasured confounders are implicit, whereas the consequence of their omission is explicitly specified. For the linear causal models, we consider a sample size of n = 1000, two values of ∈ {0, 1.5}, two values of * ∈ {0, 0.5}, and all their combinations. For the logistic causal models, we consider a sample size of n = 1000, two values of ∈ {0, 0.5}, two values of * ∈ {0, 0.5}, and all their combinations. For every model, we solve the estimating equations, as defined in Equation (10), for every in { * − c(1 − k∕10)} 20 k=0 , for * ∈ {0, 0.5}, and c = 0.2. Namely, we map every to a G-estimator̂G( ), that is obtained by numerical solution of Equation (15). Furthermore, for every G-estimator̂G( ), we compute a 95%-level asymptotically-correct CI using the sandwich formula as in Equation (11). We repeat each simulation m = 1000 times, and compute the empirical coverage rates and the mean length of the 95%-level CIs for the true causal parameter for every in the aforementioned sequence. The coverage rates are visually illustrated in Figures 3 and 6 for every combination of and . The distribution of the 95%-level CIs are illustrated in Figures 5  and 8, and Tables 1 and 2. Additionally, to illustrate the bias of̂G( ) as a function of , we present in Figures 4 and 7 boxplots of the empirical distribution of̂G( ) for every . Further model-related specifications are presented in the following subsections. A summary of the simulation study and discussion appear at the end of this section. The simulations source code is available on the authors' GitHub repository. †

Linear causal model
For the linear causal model simulation, we assume a normally distributed outcome Y , a binary exposure X, and a binary instrument Z. Assume that the structural mean model is as specified in Equation (14), and the violation structure is as defined in Equation (8). The DGP is set as follows  We specify the marginal distribution of X and Z. Thus, five out of the seven parameters of the DGP can be set freely.
In particular, we set P(Z = 1) = p z = 0.5, and P(X = 1) = p x = 0.6. In order to relate the violation structure to the DGP parameters of the observed data, we use the fact that a valid IV satisfies Y 0 ⊥ Z|L; particularly, it implies a mean independence E[Y 0 |Z = 1, L] = E[Y 0 |Z = 0, L]. Therefore, for L = ∅ and an invalid IV with the violation structure b(L, Z; * ) = * Z, we obtain We use the DGP presented in (20) to express E[Y 0 |Z] as a function of the observed data parameters and the causal parameter Notably, we obtain a function w.r.t. the unknown parameters that characterize the DGP of Y . By plugging in this result in Equation (21) for Z = 0 and Z = 1, we obtain the functional relationship between the sensitivity parameter * and the DGP parameters of the observed data. Please refer to Appendix for a detailed derivation. Several of these parameters we set freely, while the others are calculated to satisfy the specified values of the marginal probabilities of Z = 1 and X = 1. These specifications leave three degrees of freedom for the T Y = ( 0 , x , z , xz ) vector, and one degree of freedom for the T = ( 0 , z ) vector. Without loss of generality, we set 0 = x = z = 1, and solve for xz . Furthermore, we set 0 = −1, and solve for z . Figure 3 illustrates the coverage rates of the 95%-level CI for the true causal parameter as a function  (20), and p x = P(X = 1) = 0.6. of . Figure 4 illustrates the bias of the G-estimators as a function of . Figure 5 illustrates the distribution of the lengths of the 95%-level CIs. Table 1 presents the mean values of 95%-level CIs lengths as a function of .

Logistic causal model
For the logistic causal model simulation, we consider a binary outcome Y , a binary exposure X, and a binary instrument Z. The structural mean model is as specified in Equation (17), and the violation structure is as specified in Equation (18). The DGP is given below We specify the marginal distribution of Y , X, and Z. In order to relate the violation structure to the DGP parameters of the observed data we use the fact that a valid IV satisfies Y 0 ⊥ Z|L, particularly, P(Y 0 |Z = 1) = P(Y 0 |Z = 0). Therefore, for an invalid IV with the violation structure as specified in Equation (18), we obtain By using the causal parameter and the DGP presented in (22), we obtain a function w.r.t. the unknown parameters of the observed data P(Y 0 = 1|Z = z) = expit( 0 + z z)(1 − expit( 0 + z z)) + expit( 0 + x x + z z + xz xz − )(1 − expit( 0 + z z)).
By plugging in this result in Equation (23), we obtain the functional relationship between the sensitivity parameter * and the DGP parameters of the observed data. Please refer to Appendix for a detailed derivation. Several of these parameters we set freely, while the others are calculated to satisfy the specified values of the marginal probabilities of Z = 1, X = 1,

Simulation summary
In the simulation study, parameters of the DGP are determined after the specification of the true and * . Therefore, we can use * in the estimating equations to construct consistent G-estimators of . In real-world applications, * is rarely known. Thus, finding a plausible interval for can be challenging since such an interval depends on subject-matter knowledge.
1. The proposed sensitivity analysis method of G-estimators works well for valid and invalid IVs, and for linear and logistic models. Namely, if * = 0, then the G-estimator of is consistent for = 0, and its 95%-level CI attains approximately its nominal coverage rate. 2. The proposed sensitivity analysis method of G-estimators works well when the true causal effect is zero, both for linear and logistic models. Namely, even if the IV is invalid * = 0.5, and there is no causal effect, that is, = 0, the G-estimator of is consistent for = 0.5, and its 95%-level CI attains approximately its nominal coverage rate. 3. In the linear causal model, the G-estimators are quite stable with narrow CIs. As a consequence, the coverage probability of the CI depends heavily on the assumed value of . In the examined scenarios, a deviation of 0.1 from the true value of * ∈ {0, 0.5} resulted in empirical coverage rates of less than 70% for the nominal 95%-level CIs. On the other hand, for correctly specified , the G-estimator is stable with reliable CIs that achieve the nominal coverage rate. 4. In the logistic causal model with logistic outcome model, the variance of the G-estimator depends on the dimension and the stability of Y estimators. Therefore, compared to the linear causal models, the CIs are significantly wider. 5. In the logistic causal model, the simulation is sensitive to the sample size and the specified values of the marginal probability of the outcome Y . For example, values of p y in the vicinity of 0 might require a very large sample size to obtain a solution for the estimating equations and the covariance matrix. A singular covariance matrix might result from relatively low empirical frequencies of every possible combination of Z, X, and Y , which are crucial for covariance matrices computations. Linear causal models, due to the relative simplicity of their covariance matrices, do not share this property. Therefore, in linear causal models, G-estimators and their corresponding CIs are less sensitive to the sample size and extreme values.

REAL WORLD DATA EXAMPLE
In order to illustrate our novel method of sensitivity analysis in a real-world scenario, we use the vitamin D data that are available in the ivtools R-package. These publicly available data are modified version of the original data from a cohort 33 study on vitamin D status causal effect on mortality rates that were previously used by Sjölander & Martinussen. 22 Vitamin D deficiency has been linked with several lethal conditions such as diabetes, cancer, and cardiovascular diseases. However, vitamin D status is also associated with several behavioral and environmental factors, such as season and smoking habits, that may result in biased estimators when using standard statistical analyses to estimate causal effects. Mendelian randomization 34,35 is a method whose principles were introduced originally by Katan 36 in a strictly medical context. Subsequently, Youngman et al 37 introduced this method in the context of epidemiological studies and also coined the aforementioned term. Mendelian randomization is a method that uses genotypes as IVs to estimate the causal effect of phenotype on disease-related outcomes. The population distribution of genetic variants is assumed to be independent of behavioral and environmental factors that usually confound the effect of exposure on the outcome. The process governing the distribution of genetic variants in the population resembles the randomization mechanism in RCTs. Namely, assuming that this process satisfies the assumptions presented in the Introduction section, the genotype distribution constitutes a form of natural experiment that allows to identify the causal effect of exposure on the outcome. However, these assumptions can be violated, for example, by possible developmental changes that compensate for genetic variations and linkage disequilibrium. Linkage disequilibrium is a term that describes the departure of genetic variants distribution from the independence of confounding factors. Such departure corresponds to a violation of the exogeneity assumption. Developmental changes (also known as Canalization) refer to a situation where the effects of the genotype on the outcome bypass the exposure (phenotype), thus violating the exclusion assumption that the IV affects the outcome only through the exposure. For a detailed discussion of additional possible violations of the IV assumptions in the context of Mendelian randomization, please refer to Lawlor et al. 38 Any combination of these and other possible violations motivates a sensitivity analysis of the causal effect estimator.
In our example, the phenotype is vitamin D status, the diseases related outcome is death during follow-up, and the genotype is mutations in the filaggrin gene. These mutations have been shown to be associated with a higher serum vitamin D concentration. The prevalence of this mutation is estimated to be 8%-10% in the northern European population. Possible developmental changes that compensate for genetic variations, linkage disequilibrium between filaggrin genotype, and possible epigenetic effects 35,39 may violate the IV assumptions of the filaggrin genotype. We used the modified version of data obtained in the Monica10 population-based study. This is a 10-years follow-up study started in 1982-1984 that initially included examination of 3,785 individuals of Danish origin. The participants were recruited from the Danish Central Personal Register as a random sample of the population. In the follow-up study of 1993-1994, the participation rate was about 70% and resulted in a data that contained a total of 2,656 participants, where 2,571 were also available in the modified data after the removal of cases that had missing information on filaggrin and or on vitamin D status. 29,33 This data consisted of 5 variables: age (at baseline), filaggrin (a binary indicator of whether filaggrin mutations are present), vitd (vitamin D level at baseline as was assessed by serum 25-hydroxyvitamin D 25-OH-D(nmol/L) concentration on serum lipids), time (follow-up time), and death (an indicator of whether the subject died during follow-up). This analysis is mainly for illustration purposes; therefore, we consider only the necessary variables for the required models. We use the binary indicator of death during follow-up as the point outcome Y , the binary indicator of the presence of the filaggrin gene mutation as the IV Z, and following Martinussen et al 29 the scaled version of the vitamin D status at baseline (vitd) as a continuous exposure variable X. The structural equations are as given in Equation (22), and the structural mean model is as given in Equation (17) where the target parameter is the natural logarithm of the causal OR. The marginal probability p z is estimated with an intercept-only logistic regression model. The relevance assumption, that is, whether the instrument is associated with the exposure, was assessed with a linear model that yielded an F-statistic of 7.349 with 1 and 2569 degrees of freedom. The point estimator of the regression coefficient of filaggrin gene mutation was 0.273 with a corresponding 95%-level CI of [0.076, 0.471]. This result strengthened the legitimacy of the instrument.
To the best of the authors' knowledge, the main concern in our setting is the violation of the exogeneity assumption. 40,41 Namely, possibly unmeasured confounders associated with the IV filaggrin mutation and death during follow-up. In such a case, the sensitivity parameter represents a possible bias of the estimated causal effect (w.r.t. the true causal effect) induced by the unmeasured confounders of the outcome and the IV. According to the vast clinical literature on the association between vitamin D deficiency and mortality rates, the true causal effect is expected to be negative. 33,42 The exact mechanism by which the vitamin D affects the mortality rates is yet unknown; 41 thus, we cannot rule out the possibility of no causal effect. However, since there is no known clinical evidence of the positive effects of vitamin D deficiency on mortality rates, in the sensitivity analysis, we can discard values of that correspond to positively valued estimators of . We can also discard values of that correspond to unprecedentedly high causal effects with no previous records or plausible clinical explanation. 43 Therefore, we empirically derive the set of plausible values of to be considered in the sensitivity analysis. The G-estimator̂G( ) of the vitamin D causal effect on death during follow-up as a function of the sensitivity parameter is illustrated in Figure 9, and in a corresponding Table 3.
If the filaggrin gene mutation is a valid IV, that is, assuming * = 0, the G-estimator of equals 1.558 with a corresponding 95%-level CI spanning the 0 value. Additionally, there is no solution to the estimating equations for < −0.17. Such results support the possibility of a bias; an underestimation of the magnitude of the true causal effect of vitamin D deficiency on mortality rate during follow-up. For all considered values of , the sign of the point G-estimator does not change (see Table 3 for further details). Therefore, we may conclude that if the structural mean models and the violation structure are correctly specified (given that the data constitutes a representative sample of the target population and the sample variability can be neglected), there is no evidence of a positive causal effect of vitamin D deficiency on the TA B L E 3 G-estimator of and OR, with the corresponding 95%-level CI limits, of the vitamin D baseline status effect on death rate during follow-up as a function of .  44 If the IV is invalid, the true effect of vitamin D deficiency on the mortality rate is likely to be of a larger magnitude than the estimated value for = 0. Such findings are consistent with the clinical literature on the negative effects of vitamin D deficiency.

SUMMARY & CONCLUSIONS
In this study, we propose and demonstrate a new sensitivity analysis method for G-estimators in structural mean models. This study introduces two novel aspects of sensitivity analysis. The first is a single sensitivity parameter that captures violations of the exclusion and exogeneity assumptions. Using a single parameter is a valuable advantage of the new method as fewer parameters are needed to be specified. This feature may increase model stability and decrease the risk of misspecification. The second is the application of the method to non-linear models. The proposed method is theoretically justified and is illustrated via a simulation study and a real-world example. This study highlights the importance of sensitivity analysis for G-estimators of causal parameters and provides general guidelines for conducting such analysis. However, this study is not without limitations. The usage of one parameter for two distinct violations complicates its interpretation. Therefore, specifying a plausible interval for the sensitivity parameter may be challenging since it relies heavily on subject-matter knowledge and a solid understanding of causal inference. Another limitation is of a computational nature. This study examines quite limited scenarios in the simulation study. For example, we considered only binary exposure with a binary instrument in the logistic causal model. Although considering continuous instruments in the logistic causal model introduces computational complications, it should not induce any qualitative difference to the proposed sensitivity analysis. An additional computational limitation is that the estimating equations were solved only for a single instrumental variable. This study's main contribution is providing a theoretical framework for conducting sensitivity analysis for G-estimators. The presented framework can be readily extended to apply to various structural mean models and computationally involved scenarios, which may serve as directions for future research.

DATA AVAILABILITY STATEMENT
The source code of the simulations that support the findings of this study (Section 3) is openly available in GitHub repository https://github.com/vancak/G_sensitivity. The data that support the findings of Section 4 are openly available as part of the ivtools R-package in CRAN repository at https://CRAN.R-project.org/package=ivtools. ENDNOTES * For non-linear target parameters, for example, OR and hazard ratio (HR), more careful analysis is required even in RCTs. This is due to the non-collapsibility of these measures. For example, in Cox models, the implicit conditioning of the risks set on their survival up to a certain time point challenges the causal interpretation of the HR estimators. However, since in this study, we are not targeting marginal OR or HR in survival models, these issues are outside the scope of this paper. For further details, we refer the reader to Aalen et al. (2015). 1 † The simulations, figures and Vitamin D data analysis source code: https://github.com/vancak/G_sensitivity ORCID Valentin Vancak https://orcid.org/0000-0001-8732-7353 Arvid Sjölander https://orcid.org/0000-0001-5226-6685 Using our notations we denote ≡ * . Wang et al. only considered a model with an identity link function . However, a straight-forward generalization of their definitions to the logit link function allows to express both violations on the logit scale. In such case, the direct causal effect of the IV Z on the outcome Y is defined as and the induced association between the IV Z and the outcome Y by unmeasured confounders is defined as Using the same derivation as in Equation (A2) for the logit link function and our definition of the assumptions' violation, we can express both violations with a single parameter where, in our notations, ≡ * .

Proof of Lemma 1
Proof. When Y is a point outcome and is the identity link function, we have that where the third equality follows from Equations (14) and (8). ▪

Proof of Lemma 2
For a linear causal model as in Equation (14) with L = ∅, and m(L) = 1, the unbiased estimating equation for is

Proof of Lemma 3
The asymptotic bias of the G-estimator for invalid instrument with b(L, Z; * ) = * Z.
Proof. Assume a linear structural mean model as described in (14), and that there are no unmeasured confounders, that is, U = ∅. The G-estimator equals the TSLS estimator. 22 Namely, the G-estimator converges in probability to Consequently, the asymptotic bias of the estimator is YZ.L ∕ XZ.L . ▪

Proof of Lemma 4
In a linear causal model without unmeasured confounders, the corrected G-estimator equals the least squares estimator of .
Proof. In a linear structural mean model as in (14), the ordinary least squares estimator of converges asymptotically to cov(Y , X|L) such that G is the asymptotic value of the G-estimator as in Lemma 3. ▪

Proof of Theorem 1
Proof. In order to estimate * and a one-dimensional parameter , a system of two estimating equations is required. We construct another function D 2 = D 2 (L, Z) that is linearly independent of D 1 (L, Z), and satisfies where for a valid IV Z the coefficients matrix is singular. Therefore, there is a degree of freedom which is the sensitivity parameter . ▪

Simulation study
Linear causal model-Simulation setup The structural mean model is and the violation is defined as Assume that both X and Z are binary. We use the DGP presented in (20) to express E[Y 0 |Z] as a function of the observed data and the causal parameter . Then we use the DGP presented in (20)  Assume that both X and Z are binary. We use the DGP presented in (22) to express P(Y 0 = 1|Z) as a function of the observed data and the causal parameter . Then we use the DGP presented in (22)  = expit( 0 + z Z)(1 − expit( 0 + z Z)) + expit( 0 + x + z Z + xz Z − )expit( 0 + z Z).
We have nine unknown parameters T 0 = ( 0 , x , z , xz , 0 , z , , , p z ) and the sensitivity parameter . If we specify the causal effect , and the marginal probabilities of Z = 1, X = 1, and Y = 1, we left with four degrees of freedom. Note: The true * = 0, 0.5, respectively. The marginal distribution f the outcome is P(Y = 1) = p y = 0.8, and P(X = 1) = p x = 0.6.