Assessable and interpretable sensitivity analysis in the pattern graph framework for nonignorable missingness mechanisms

The pattern graph framework solves a wide range of missing data problems with nonignorable mechanisms. However, it faces two challenges of assessability and interpretability, particularly important in safety‐critical problems such as clinical diagnosis: (i) How can one assess the validity of the framework's a priori assumption and make necessary adjustments to accommodate known information about the problem? (ii) How can one interpret the process of exponential tilting used for sensitivity analysis in the pattern graph framework and choose the tilt perturbations based on meaningful real‐world quantities? In this paper, we introduce Informed Sensitivity Analysis, an extension of the pattern graph framework that enables us to incorporate substantive knowledge about the missingness mechanism into the pattern graph framework. Our extension allows us to examine the validity of assumptions underlying pattern graphs and interpret sensitivity analysis results in terms of realistic problem characteristics. We apply our method to a prevalent nonignorable missing data scenario in clinical research. We validate and compare our method's results of our method with a number of widely‐used missing data methods, including Unweighted CCA, KNN Imputer, MICE, and MissForest. The validation is done using both boot‐strapped simulated experiments as well as real‐world clinical observations in the MIMIC‐III public dataset.

tilting, where an exponential factor captures departures from the aforementioned assumption.Subsequently, sensitivity analysis is done by varying the exponential factor in a chosen range.
The pattern graph framework assumption covers missingness problems with ignorable mechanisms as well as a wide range of ones with nonignorable mechanisms.However, like many other counterparts, it faces two challenges of assessability and interpretability: (i) How can one assess the validity of its key assumption in a given problem, and how should the framework's result be adjusted if some aspects of the problem are known to violate the assumption?(ii) How can one interpret the process of exponential tilting in sensitivity analysis, that is, how does one unit of tilt correspond to meaningful real-world quantities?Does the tilting perturbation comply with the realistic characteristics of the problem?These challenges become particularly important in safety-critical areas such as clinical diagnosis, where an unnoticed departure from assumptions can lead to catastrophic results, and sensitivity analyses need to be calibrated accordingly.
In this paper, we introduce an extension for the pattern graph framework which enables us to incorporate substantive available knowledge about the missingness mechanism, encoded in m-graph causal models for missing data.Our extension, which we refer to as Informed Sensitivity Analysis (ISA), allows us to examine the validity of the pattern graph's prior assumption, and if necessary, correct for potential biases.It makes a connection between the sensitivity analysis parameters and parameters of an assumed missingness model.As these parameters often have a clear interpretation in terms of prior knowledge, we achieve interpretability of the results in addition to their assessability.As a prerequisite, we modify the pattern graph assumption to a new form, which we name edge-wise identifiability assumption.
Our study is inspired by the clinical sequential observations problem, a prevalent and potentially nonignorable missing data scenario in clinical research.Examples can be found in almost all diagnostic routines, where physicians collect additional evidence gradually, to conclude a final diagnosis for patients.For its unique nature, this problem admits pattern graph modeling, but at the same time, bears more information that can be leveraged to modify and adjust the framework's solution; this is the objective of ISA.In Sections 1.1 and 4, we further introduce and study the problem in detail.
The main contributions of this paper are as follows: 1. We develop a new and more effective identifiability assumption for pattern graphs.Accordingly, We propose a modification to the pattern graph framework (our ISA method) and demonstrate how both assessability and interpretability are achieved within the ISA extension.2. By accounting for structured missingness mechanisms, we demonstrate that ISA enables pattern graphs to cover more nonignorable scenarios, ones that otherwise would violate the framework's assumption.3. We demonstrate how a prevalent clinical scenario that cannot be solved correctly by conventional methods may be formulated and solved by ISA method.4. We compare and evaluate our method against widely-used missing data methods, using both simulated data as well as real-world clinical data. 5.As a more theoretical contribution to missing data analysis's body of work, ISA is the result of the first attempt (to the best of our knowledge) to study the confluence of two missing data models, namely pattern graphs and m-graphs.
The rest of the paper is organized as follows.The motivating medical problem for this paper is introduced in Section 1.1.Section 2 provides a brief overview of related works.We introduce our approach of Informed Sensitivity Analysis in Section 3. The results of simulation and empirical studies are presented in Section 4. Concluding remarks are made in Section 5.

The clinical sequential observations process
Bickley and Szilagyi 2 describe the process of clinical semiology as a sequential step-by-step process of observing signs and symptoms.During this process, the physician first performs primary tests such as blood pressure or heart rate measurements.Based on the findings, more specific secondary tests such as MRI scanning or genetic tests are performed.These tests are secondary due to their potential harm, availability, or cost.The process, which we call clinical sequential observations*, is highly selective with regard to a pool of possible medical tests.It often continues for several steps and results in sets of observations that differ in number and type of features.As we show later in the paper, the resulting missingness problem may be nonignorable.A formulation of this scenario, as well as the underlying assumptions, are presented in Section 4.
Clinical sequential observations is a unique process from a missing data analysis point of view; it often bears information about both the mechanism and pattern of missingness that can be leveraged for inference.5][6] Within these standards, the final missingness pattern gives us enough information to recreate (probabilistically) the underlying observation process.Likewise, the process bears information regarding the reasons of acquisition of a particular variable.The primary reason for acquisition is physicians' clinical judgments, supported again by diagnostics flowcharts: Based on the normal and abnormal values of primary tests, physicians decide to observe (or skip) later variables.In addition, unplanned reasons such as waiting queues, patients' reluctance to perform specific tests, insurance coverage policies, or missing hospital visits can induce more missingness in the data, all of which are often recorded in electronic health records.All together, these are considered elements in the mechanism model of missingness.
As an example of a clinical sequential observation process, Figure 1 presents a subpart of the flowchart for thalassemia carrier identification for differential diagnoses (DDx) of a set of related diagnoses such as -thalassemia (and other thalassemia variants), iron deficiency, and Hbs (associated with Sickle cell disease), based on the following lab tests: Mean Corpuscular Volume (MCV), Mean Corpuscular Hemoglobin (MCH), Hb pattern, iron level (IL) and a number of genetic tests. 4The first three tests are included in the Hematology category of the blood test.Based on the readings, clinicians then become suspicious of iron deficiency as a root cause and, therefore, decide to order Zinc protoporphyrin (ZnPP) or/and IL tests which are items of the Chemistry category of the blood test.Finally, low values of IL (or high values of ZnPP) convince clinicians to diagnose iron deficiency.In general, hematology category tests are ordered more frequently, while ZnPP and IL tests are more specific and decisive, and considered as the main diagnostic test for iron deficiency.As a conclusion, one can consider the IL test as the secondary test, and the set of MCV/MCH/Hemoglobin A2 (HbA2) items as the primary.
This paper discusses why the pattern graph framework suits the clinical sequential observation process, why it is possible and necessary to adjust the framework, and how our proposed ISA method makes such adjustments.In Section 4, this flowchart will serve as a case study for ISA.

F I G U R E 1
The iron deficiency branch in the thalassemia carrier DDx flowchart. 4

Missingness problem formulation
Consider a n × d dataset comprised of n i.i.d.realizations of a d-dimensional random vector L ∈ R d whose i-th component we denote by L i .Define the missingness indicator R ∈ {0, 1} d as a binary random vector such that L i is missing if R i = 0. 7 A realization of the missingness indicator, denoted by r, is called a missingness pattern, and the conditional probabilities of the patterns p(r|l) ≡ p(R = r|L = l) describe the missingness mechanism.Given two patterns r and s, we write s > r if the observed components under r are also observed under s, while there exists at least one component which is observed under s but missing under r.We denote the difference between two patterns by s − r.Finally, we use the notation |r| = ∑ j r j to denote the number of observed components in pattern r.

Missingness mechanism assumptions
Missing data problems with nonignorable missingness mechanisms cannot be solved using methods that effectively ignore the mechanism.Therefore, further modeling assumptions are needed to proceed. 7Examples of 'nonignorable models' can be found in studies by Robins et al, 8 Robins, 9 Zhou et al 10 and Mohan et al. 11 In these works, assumptions are formulated as conditional independence statements about the variables and missingness indicators.Mohan et al, 11 Mohan and Pearl 12 and Shpitser et al 13 suggest that one can consider these assumptions as a factorization of a target distribution with respect to a causal directed acyclic graph (DAG).This way, identifiability in missing data problems is tackled as a causal identification problem.The associated causal DAG for the missing data problem is called m-graph.Nabi et al 14 further study the completeness of the identification methods for m-graphs.
In general, m-graphs model the mechanism of missing data and view the missingness patterns only as realizations of the model; in other words, under m-graphs, a missingness pattern (eg, R = 1101) is viewed as a realized set of missingness indicators, and is studied by factorization of (R 1 , … , R i ) distribution with respect to the m-graph.Under this view, the order of the acquisitions is ignored.In Section 3, we study in details, the m-graph assumptions and properties which are related to the current paper.

Pattern graphs
Chen 1 introduces the pattern graph framework for missing data problems.A pattern graph is a DAG whose nodes are the missingness patterns seen in the data, and whose directed edges represent "possible hidden scenarios that generate a response pattern". 1Unlike m-graphs, pattern graphs directly model the emerging missingness patterns via the transition probabilities from one pattern to another.These probabilities are assigned to edges.Rather than adopting the m-graph's set-view on patterns, pattern graphs leverage the prior knowledge about the chronology of observations of the data components, for example, if a component has been missed/observed systematically after missing/observing another component.
Pattern graphs allow for an imputation algorithm, applied row-wise on incomplete data to impute the missing values, through the following steps (The related algorithms are presented in Appendix H).
(Step 1): A pattern graph is generated for a given dataset using the algorithm from Chen. 1 (Step 2): The following identifiability assumption is made: The conditional extrapolation density given the observed data in a missingness pattern is identical to that in the parents of the pattern, that is, where pa PG (r) denotes the set of parents of the pattern r in the pattern graph PG, while l r and l r denote the unobserved and observed components of the realization l of the complete random vector L under the pattern r, respectively. ( Step 3): Let r be the pattern of the to-be-imputed row.First, we select a parent s ∈ pa PG (r) based on the occurrence frequency of the parents in the data.Then we impute the components that are different between s and r, that is, we impute the values of L s−r , by sampling from the distribution p(l s−r |l r , R = s) (Algorithm 1).We repeat this step in the same row for the new pattern s, and recursively thereafter across the parent patterns until no missing value is left (Algorithm 2).

Exponential tilting sensitivity analysis
Kim and Yu 15 consider unidentifiable missingness mechanisms as departures from an identified mechanism.They suggest to model this departure based on the concept of exponential tilting introduced by Scharfstein et al. 16 Specifically, they investigate the unidentifiable self-censoring non-response missingness, where in a data model (L 1 , L 2 ), variable L 1 is fully observed and the response variable L 2 is subjected to missingness.Based on their formulation, the observed conditional distribution p(l 2 |l 1 , R 2 = 1) deviates from the unobserved version p(l 2 |l 1 , R 2 = 0) by a factor of In case of logistic modeling of the missingness indicator, (2) reduces to where  parameterizes the effect of self-censoring, and C(L 1 ) is independent of L 2 .The expression in ( 3) is interpreted as an exponential tilt of the extrapolation density p(l 2 |l 1 , R 2 = 0) from the observed distribution p(l 2 |l 1 , R 2 = 1).Franks et al 17 demonstrate that, such logistic modeling results in an interpretable connection between the substantive knowledge about the missingness mechanism and estimation of a parameter of interest.Carpenter et al 18 introduce a re-weighted multiple-imputation (MI) method to solve a bivariate self-censoring missingness problem.In their work, they use the exponential tilt as the importance sampling's ratio in sampling step of MI.The authors suggest that one can verify the value of tilting parameters by domain experts.Chen 1 implements the concept of exponential tilting as described in (3) and develops a sensitivity analysis scheme for pattern graphs.In this scheme, an exponential factor models departures from the assumption made in Equation (1)  in the following form: where  is the tilting parameter vector of the size |r| for each pattern r and controls the amount of deviation.It is used to correct the imputation sampling procedure via rejection sampling with the probability of where l † r is a candidate sampled imputation,  is an upper bound on L r , and the exponential coefficient  parameterizes the tilt.The rejection step is applied at the end of Algorithm 2) to correct the sampling distribution.A range of variation of [−1, +1] |r| for all  is suggested by Chen. 1 The modified algorithm for sensitivity analysis which replaces Algorithm 2 is presented by Algorithm 3. In this paper, we call Algorithm 3, the default sensitivity analysis approach.

METHODS
The validity of the pattern graph's results depends on the validity of its assumption.Also, the interpretability and practicality of the framework's sensitivity analysis rely on the possibility of providing a real-world interpretation of the tilting parameters.We claim that pattern graphs address both if we incorporate available prior (causal) knowledge about the missing data problem into the framework.Hence, we introduce the Informed Sensitivity Analysis (ISA) method to achieve this goal.In ISA, we utilize the causal information of m-graphs and missingness models to identify possible structural The relation between pattern graph framework, m-graphs, missingness mechanism models, and ISA method: Applying the causal information encoded in the m-graphs and the edge-wise assumption ratios, results in local sensitivity models.The tilting parameters of these models will be connected to explainable missingness model parameters to allow for an informed sensitivity analysis.ew, edge-wise; id, identifiability; indep, independence; params, parameters.
departures from the pattern graph assumption.The ISA establishes a connection between the tilting parameters and the parameters of an assumed missingness model (e.g., logistic model).Finally, by making informed guesses about the parameters of the model, we perform sensitivity analysis where perturbations are translated into meaningful real-world quantities.ISA utilizes a new modified identifiability assumption, which we call edge-wise identifiability assumption.In this paper, we refer to the assumption introduced by Chen 1 as the original assumption.The relations among pattern graphs, m-graphs and ISA are depicted in Figure 2.
In this section, we start by introducing the edge-wise assumption and show how it is an effective assumption for pattern graphs.We then introduce ISA's modified imputation and sensitivity analysis algorithms and discuss how assessability and interpretability are achieved.

Notations
Let G = (R, E) represent the pattern graph for a missing data problem.For a given missingness pattern (r ∈ R, r ≠ 1 d ), the pattern graph imputation Algorithm 1 selects an imputation path  ∈ Π G by sampling iteratively on different depths from the ancestors of r.We denote the nodes lying on the imputation path by r (k) , where the first node r (0) is the current pattern, r (1) is the selected parent of the pattern, and last node r (K) is always the fully-observed pattern, that is, r (K) = 1 d .We denote the edge from r (k) to r (k−1) by e (k) ∈ E. We index the components that are imputed during moving across the edge e (k) , namely the components in L r (k) −r (k−1) , by the subscript Δ k .Subsequently, all components in R Δ k are 0 for the child pattern, and 1 for the parent pattern, that is, R (k−1) For ease of notation, we write 1 to refer to these two values.Finally, subscript −Δ k selects the remaining components other than Δ k .For every edge e (k) , the missingness vector R −Δ k has a constant value of ).

Edge-wise identifiability assumption and sensitivity analysis
M-graph models describe the causal relationships among individual components.In addition, the imputation Algorithm 2 recursively imputes the components in L r along an imputation path.Therefore, in order to leverage structure of the m-graph model in the imputation algorithm, we propose breaking the identifiability assumption into edge-wise statements about components that are imputed on each edge of the pattern graph.This way, it becomes more amenable to simplification on the basis of conditional independence relations holding in the m-graph model.Specifically, we consider the edge-wise identifiability assumption, which requires that for all edges e (k) lying on a selected imputation path  ∈ Π G , and for every imputation path, We shall emphasize that Equation ( 7) is stated for an edge, conditioning on the imputation path being selected.This means that for an individual edge in a pattern graph, the associated edge-wise assumption differs when the edge belongs to different imputation paths.An example of such an edge is 1111 → 1110 in Figure 3 (right), which belongs to two paths 1111 → 1110 → 1000 and 1111 → 1110 → 0000.Given that the subsequent discussions in this paper primarily focus on conditioning on an imputation path being selected, we will omit the use of selected path index () in the equations for the sake of brevity.In case of analyzing multiple imputation paths simultaneously, we will employ appropriate clarifying path indices for precision.While the original pattern graph assumption is made about the entire unobserved components of a sample, Equation (7) makes assumptions separately about components that are imputed across a single edge.In other words, the former assumes that the imputed data in one complete round of Algorithm 2 (L r ) is sampled from the unbiased extrapolation density, while the latter breaks this assumption down to each turn of partial-imputation (L Δ k ).Nevertheless, the edge-wise assumption is as strong for the pattern graph as the original assumption, as each ratio in Equation ( 7) appears in the original assumption, although corresponding to a different pattern.In other words, edge-wise assumption is a reordering and regrouping version of the original assumption (with this implicit consideration that it must structurally hold, as the equality must be true for two different ordering and grouping of a set of terms).However, as we will show later in Subsection 3.3, edge-wise assumption proves beneficial in constructing the ISA method.In Subsection 3.4 We will discuss the relation between two assumptions as well as the necessity of the edge-wise assumption in detail.The validity of edge-wise assumption is discussed in Theorem 1.

Theorem 1 (Identifiability under edge-wise assumption). Let G = (R, E) be a pattern graph for a missing data problem modeled by (L, R), and the imputation Algorithm 1 imputes the partially observed data. Under the edge-wise assumption, sampling extrapolation densities in Algorithm
The ratio terms in Equation ( 7) are assumptions about the extrapolation densities.We can replace each ratio with an equivalent assumption about the missingness mechanism.

Corollary 1. Let G = (R, E) be a pattern graph for a missing data problem modeled by (L, R). For every path and edge selection (e ∈ 𝜋, 𝜋 ∈ Π G ), the edge-wise assumption holds iff
for any H, J s.t.H ∪ J = {1, … , k} and H ∩ J = ∅, where C is an arbitrary constant independent of L Δ k .
For simplicity, we call the ratio terms in first and second product terms in Equation ( 8), the L-sorted and R-sorted ratios respectively.Corollary 1 states that we can replace any chosen subset of the ratios by its R-sorted counterpart.Note that the validity of Equation ( 8) still depends on the imputed components across a single edge, that is, r .Since Equation ( 7) is a special case of Equation ( 8) with J = ∅, for the rest of the paper we refer to Equation ( 8) as the edge-wise assumption.
Associated to the edge-wise assumption, we now define the edge-wise sensitivity analysis by perturbing Equation ( 8) using the exponential tilting concept, as where  k is the tilting parameter with the size of |Δ k | for the edge e (k) on a selected imputation path.Three illustrating examples for Theorem 1, Corollary 1 and Equation ( 9) are presented in Appendix B.
As Equation ( 8) makes edge-wise assumptions, we perform the rejection sampling for each partial-imputation step.Algorithm 4 presents the modified edge-wise sensitivity analysis Algorithm.The associated rejection probability for Algorithm 4 is obtained as Next, we discuss how the edge-wise perspective of the new sensitivity analysis helps us formalize a way to incorporate substantive knowledge in the framework.

Informing the edge-wise sensitivity analysis: ISA
As shown above, the edge-wise assumption in Equation ( 8) comprises (R-sorted) terms which relate to the missingness mechanism of study.In the absence of further information, Equation ( 8) must be assumed for inference.However, if mechanism information is available for a problem, some R-sorted terms can be evaluated.Consequently, the evaluated terms will give us an evidence for an informed choice of tilting parameter in Equation (9).As the extreme case, the terms will be canceled out, and hence do not appear in the sensitivity analysis step.Such a practice is carried out in our proposed method of informed sensitivity analysis (ISA).Upon such incorporation of missingness mechanism information, the global sensitivity model of the pattern graph (what we named the original sensitivity analysis) converts to a set of local sensitivity models, each of which is utilized for imputation of the single component.
The ISA method is built on the following setup: (i) Assume that besides the knowledge about missingness patterns, there exists prior knowledge about causes and the corresponding causal effects of the missingness indicators.This knowledge is encoded via m-graphs and the parameters of the missingness model (e.g., logistic model).(ii) In ISA, we simplify the left-hand side of Equation ( 9) using the conditional independence statements of m-graph, and then (iii) bring the remaining terms to an exponential form which consists of the parameters of the missingness model.This way, we translate prior knowledge directly to the range of variation for  in the sensitivity analysis step.In summary, we instruct the ISA method in the following steps: The structure of ISA is illustrated in Figure 2. We continue through the rest of the section by discussing details of these steps.

3.3.1
Step 1: Incorporating m-graph information As introduced in Subsection 2.2, m-graphs are causal DAGs for modeling the missing data problem.For a data distribution (L, R), an m-graph is defined as The edges received by R nodes (missingness indicator) determine the missingness mechanism in a problem.As an example, an m-graph with Rs without parents (within V) represents the ignorable Missing-Completely-At-Random (MCAR) mechanism.In addition to the general causal DAG assumptions, m-graphs often follow the no-direct-effect (NDE) assumption, which states that testing/measurement has no direct and unmediated effect on the study variables. 19This implies "no R → L edge" in the m-graphs.Through the rest of the paper, we make the NDE assumption for all inferences.In m-graphs, conditional independence relations can be discovered via the rules of d-separation.Obviously, different m-graphs encompass different conditional independence properties.In ISA, we seek such properties related to R nodes to reduce their corresponding R-sorted ratios in Equation ( 9) up to a constant (independent of L Δ k ) or into conditional odds of individual R variables, as these could be expressed by meaningful and interpretable parameters.According to Corollary 1, we freely split the k index set for e (k) into the H and J subsets, that is, rewrite necessary ratio terms in their R-sorted form, such that applying the independence statements becomes possible.
As m-graph and pattern graph frameworks model two distinct types of information, scenarios with distinct m-graphs may share a single pattern graph.This means that the extracted conditional independence statements, and hence the simplification of ratio terms in the pattern graph assumptions, may differ case by case.Below, we discuss a sufficient condition for m-graphs which leads to simplification of R-sorted ratios.In the absence of this condition, we invoke a second sufficient condition which leads to a useful factorization for the ratios, such that incorporation of available knowledge becomes possible.

Condition 1.
The following conditional independences hold: In other words, Condition 1 requires that {L r (k−1) , R −Δ j } is a d-separation set for R Δ j and L Δ k in the associated m-graph.Given Condition 1, the R-sorted ratio of the Δ j index, reduces to a constant factor in the edge-wise sensitivity analysis (9) (proof in Appendix A. 3).
An m-graph which follows three restrictions, satisfies Condition 1 † : (i) no path L Δ k → R Δ j which is not mediated by the d-separation set (e.g., direct edge) should exist; (ii) No confounder path of the form which is not mediated by the d-separation set should exist.Recall, that the subscript 1 d − r (k) selects all variables that remain missing after the partial imputation along the edge e (k)   Red edges highlight the violating paths: For partial imputation across e (1) , in (b) the direct edge L 3 → R 3 violates restriction (i), and in (c) the confounding {L 1 } belongs to L 1 d −r (1) , hence violates restriction (ii).for partial imputation across e (2) and j = 2, in (d) the node R 2 is a collider for L 1 and R 1 , hence violates restriction (iii), whereas for j = 1, the set {L 2 , R 2 } d-separates L 1 and R 3 .In all three examples, Condition 1 holds if the red 'spurious' paths are interrupted, i.e, one removes: In the absence of Condition 1, we can alternatively invoke the following Condition 2. R Δ j consists of only one component (|Δ j | = 1), or we have the conditional independences where  m enumerates the individual components of R Δ j in an arbitrary order.
To explain Condition 2 further, we define the conditional odds as follows.
Definition 1 (Conditional odds).Given a set of variables Q and values q, the conditional odds for R given Q = q is defined as .
We will show in the next step of ISA that the conditional odds for individual R components (i.e., |R| = 1 in Definition 1) may be conveniently connected to explainable and meaningful parameters in the sensitivity analysis.Thus, it is desirable if the R-sorted ratios which have escaped Condition 1, can be factorized into conditional odds for individual R components.Condition 2 describes the sufficient conditions for such a scenario: First part of Condition 2 (|Δ j = 1|) trivially gives the desired results.To explain the second part (when |Δ j | > 1), we factorize the R-sorted ratio following the chain rule as with a slight abuse of notation for the case m = 1, as it doesn't have the R  1 ∶ m−1 term in the conditioning set.Without further assumptions, the crude factorization in (11c) cannot be simplified further to individual conditional odds terms.An effective assumption for simplification is stated by the second part of Condition 2 and the conditional independences (11b), namely, if the missingness indicators in the R Δ j block are independent given {l r (k) , r −Δ j }.This lets us drop R  1 ∶ m−1 from the conditioning set of the right-hand side of (11c) and obtain For partial imputation across e (1) , we have R Δ 1 = {R 2 , R 3 } (dashed bounding box).In (b) the direct edge R 2 → R 3 violates the restriction (i); in (c) the confounder {L 1 } belongs to L 1 d −r (1) , hence violates restriction (ii); In (d) the node R 1 is a collider for R 2 and R 3 , hence violates restriction (iii).In all three examples, Condition 2 holds if the red spurious paths are interrupted, that is, one removes: for which the discussed case of |Δ j | = 1 is a special case, where only one conditional odds term exists.
Step 2 of ISA will demonstrate how conditional odds terms in (11d) are employed further in the ISA procedure.An m-graph which follows three restrictions, satisfies the second part of Condition 2, namely the conditional independences in (11b should exist (the proofs follow an identical path to those of Condition 1, presented in Appendix A.3). Figure 5 depicts an imputation path and three example m-graphs structures, demonstrating when these restrictions are met or violated.It is worthy of note that these simplifying assumptions concern the edges of an imputation path.It is possible, and in fact highly likely to have a mixture of simplifying cases applied to different edges and imputation paths.
In conclusion of step 1 of ISA, we establish an important realization: in the absence of Conditions 1 and 2, ISA does not provide a viable alternative to the default sensitivity analysis within the scope of this paper.In Section 5, we express our anticipation of the discovery of additional conditions in future research.

3.3.2
Step 2: Decomposing global sensitivity model into local sensitivity models In this step, we consider a working model according to our prior knowledge of the missing data problem, to bring the remaining ratios as in (11d) to an exponential form.This way, the parameters of the working model have a direct connection to the tilting parameters  k in the right-hand side of (9).In other words, we use the substantive prior knowledge embedded in the working model, in the sensitivity analysis step.
Suppose the following logistic model for a missingness indicator R i : where  jk is the logistic coefficient for Δ k (to-be-imputed) components in the R i model, and f (.) is an arbitrary model over the rest of remaining factors.In (12), the exponential terms are grouped into two groups of dependent and independent factors of L Δ k .We treat the second group of terms as a constant, and rewrite Equation (12) as where C jk () are the obtained constants for the j-th odds in J for e (k) , conditioned on the path  being selected.The resulting exponential form in (13) matches the assumed exponential tilt in Equation ( 9).This means that the coefficients  in the assumed logistic model per each R-sorted ratio, is connected to the tilting parameters .
After the first step of ISA, for all the irreducible R-sorted ratios which are expressed in the form of (11d), and given the exponential model in (13), we rewrite the terms as where C = Π i C ik (), while i enumerates over all the conditional odds terms.By accepting a slight abuse of notation, we drop the k subscript and  argument from C to highlight the connection of Equations ( 14) to ( 9); comparing right-hand sides of Equations ( 14) and (9) gives Equation ( 15) connects the parameters of an assumed working model for missingness indicators to the tilting parameters in the pattern graphs' sensitivity models.By associating the logistic coefficients to substantive and interpretable prior knowledge, we have the possibility of leveraging prior knowledge in the sensitivity analysis step.This will be discussed in the next ISA step.

3.3.3
Step 3: Informed choice of local tilting parameters Let R u be the missingness indicator of a single component L u , and R u ∈ R Δ j .Consider the component L v ∈ L Δ k is one of the components to impute across the edge e (k) .Assuming a logistic model for R u , we have a logistic coefficient  uv that appears in the exponential tilting sensitivity analysis for imputing L Δ k , that is, in (14).Let l v and l ′ v be two arbitrary values for L v .We evaluate the conditional odds of R u at these two values (13): where A is a term including the covariates excluding L j .Solving for  uv , we have where is the odds ratio of R u between two levels of L v .We can translate a piece of prior knowledge to an informed guess for  uv if it resembles this odds ratio.Two possible cases from clinical studies are presented below: • If L v is a binary variable representing the patients' health status (1=healthy, 0=sick), and we know that "the odds of a missing observation for healthy patients is approximately  times greater than the respective odds for sick patients", a choice of  ≈ log  would reflect the prior knowledge/assumption.
• Concerning the test protocols and diagnostic flowcharts, likewise, we might know from the clinical protocols that the odds of ordering test u after the occurrence of a symptom v is approximately  times greater than if symptom v is absent, which yields  ≈ − log .
The  values associated to the odds ratio in Equation ( 16) give initial guesses for  and subsequently  parameters.We denote this initial guess for both  and  by a superscript asterisk, that is,  * ,  * .Since  is often a guess and cannot be inferred nor can be validated via the observations, we select a range by adding and subtracting an offset value to and from the initial guess.We define the offset value  such that [ * − ,  * + ] is considered as the range for  in the process of sensitivity analysis.
In case of the default sensitivity analysis, the range of [−1, +1] (one unit of tilt toward both directions) is chosen in the global sensitivity model of the pattern graph, thus the  parameters of all edges are set to a common value. 1 However, using the ISA method, we obtain different ranges for local sensitivity model.Therefore, we need a procedure to perform sensitivity analysis on different parameter ranges.For that purpose, let  1 , … ,  i be the coefficients used in ISA, with the corresponding ranges One option is then to perform calculations on a grid defined over the space  1 × • • • ×  e , and report the corresponding varying range of the parameter of interest.Another option is to vary the tilting parameters simultaneously from the minimum to the maximum values of their range and in equal relative steps.This is possible using a reference tilting parameter  ∈ [−1, +1] which simultaneously controls the variation on all  i parameters, via Using the definition in (17), we have a way to compare the estimates from the pattern graph approach under default sensitivity analysis with our ISA method, as we set the tilting parameter of default method  to be equal to the reference parameter .This idea will be used in the comparisons in Section 4. We established step 3 of ISA on the assumption that the missingness indicators follow a logistic model.However, the validity of ISA method is not limited to a specific working model.For any assumed missingness mechanism model that comprises interpretable parameters, and fits the form of rejection sampling step of the imputation algorithm, step 3 of ISA can be re-defined and repeated.In this paper, we continue with only the logistic models as it is also considered an effective working model for most missing data scenarios. 18Also, note that the parameters of the working model may still be uninterpretable, not because of a lack of prior knowledge, but because of the structure of the assumed m-graph.As an example, the R-sorted ratios for 4c,d and Figure 5b,c,d respectively, cannot be canceled out and thus appear in the local sensitivity models, while the components are not causally connected (unlike in Figure 4b).Forming an informative choice of tilting parameter for such cases is not straightforward.With this regard, further extensions of ISA methodology are needed.This can extend the list of sufficient conditions to apply ISA to a missing data problem.

Relation between the original and edge-wise assumptions
In the opening of the current section, we claimed that developing the edge-wise assumption is necessary for the introduction of ISA method.We break the discussion about this claim in three questions: (i) whether edge-wise assumption makes a stronger or weaker assumption about pattern graphs than the original assumption; (ii) how it facilitates assessing the identifiability assumption, and (iii) how it facilitates the interpretability of the sensitivity analysis results.In this subsection, we discuss these three questions.
Regarding the first question, Theorem 2 states that edge-wise assumption is as strong as the original identifiability assumption for pattern graphs.
Theorem 2 (Equivalency of original and edge-wise identifiability assumptions).Let G = (R, E) be a pattern graph for a missing data problem modeled by (L, R).The identifiability assumption in (1) holds iff for every path, and for every edge on the path the edge-wise assumption in (7) holds.
In order to discuss the second question, we recall the assessability process in ISA.In step 1 of ISA, we arrive at L-sorted or R-sorted ratios which we simplify and cancel out by incorporating the m-graph conditional independence statements.This implies the need for having such conditional odds terms in order to be able to assess the assumptions.In the original pattern graph assumption, we have the R ∈ pa PG (r) term in the denominator of the ratio.This means that at least one missingness indicator in the denominator is marginalized over and canceled out of the condition set, while it is still present in the conditioning set of the numerator.As an example, take a case where r = (100) and pa PG (r) = {(110), (101)}.Then, in the left-hand side of Equation (1) we have This ratio cannot be factorized into similar ratio terms in the edge-wise case since R 2 = 0, R 3 = 0 do not have the counterparts in the denominator to form a conditional odds term, thus inapplicable for ISA method.This leads to a similar issue to the absence of Conditions 1 and 2. Subsequently, this issue propagates to step 2 of ISA, where we establish a connection between  and  parameters via the conditional odds terms for individual R components.This brings us to the third question.If we do not make any connection between these two sets of parameters, explainability is lost as well.Furthermore, we bring the readers' attention to the fact that the tilting parameter in the default sensitivity analysis is a vector of the size |l r |, acting upon the entire imputed components in L r , and with equal components ranging from −1 to +1.This is due to the fact that in the F I G U R E 6 A schematic of the domain of applicability of ISA in missing data problems.If only pattern emergence information is available (Left circle), then we employ the pattern graph framework with its original assumption and default sensitivity analysis.If missingness mechanism information is available as well (intersection region), we can employ ISA.One of the scenarios for ISA is the introduced clinical sequential observations process.iron def, iron deficiency; Miss, missingness; Seq.obs, sequential observation.default sensitivity analysis, we assume no further knowledge or assumption helps us with choosing the tilting parameters.In ISA, we assume that such knowledge exists, and it is encoded in the assumed m-graph, and the missingness model.This allows us to make informed guesses about the nature of the missing data problem, and choose accordingly each tilting component individually.
As a concluding remark, Theorem 2 states that in the absence of any extra information other than what we utilized to derive the pattern graph, ISA has no superiority over the default sensitivity analysis.ISA adopts the same pattern graph framework while incorporating applicable substantive knowledge from m-graphs.

VALIDATION AND EXPERIMENTS
In Section 1.1, we introduced the clinical sequential observations missing data problem, in which both the pattern-and the mechanism-related information are often available; hence, this problem can benefit from the ISA method.This may not be the only missing data problem that ISA is able to address.In general, ISA can be used to approach all problems modeled via both m-graph and pattern graph, for which a working model is assumed.Figure 6 illustrates the domain of applicability of ISA method in relation to the pattern graph and m-graph frameworks, as well as the experiments for this section.
In this section, first, we present the ISA results for the bivariate sequential observations problem.Next, we set up a simulation study and generate a synthesized dataset according to the scenario to compare our method with the following four commonly used missing data methods: Unweighted CCA, KNN imputer, MICE, MissForest (See Appendix D for implementation details).We further compare the sensitivity analysis results with the default sensitivity analysis.Finally, we use real-world clinical data from the MIMIC-III public dataset 20 to evaluate our method.There, we estimate the cohort's mean iron level in the presence of missing data.We compare it with the estimated mean from the aforementioned four missing data methods.Furthermore, we demonstrate that in the presence of extra information about the problem, how one is able to validate the method and interpret the sensitivity analysis results.Here, the physician becomes suspicious about a particular disease and orders the secondary tests, that is, L 1 .
Missingness scenario 3: Occurs when the variable L 2 is observed, but not recorded in the database, for example, when L 2 represents a symptom.Symptoms often are recorded in a non-standard textual format and extracted via text mining and natural language processing algorithms.An error in these algorithms leads to invalid and hence missing values in the collected dataset.On the other hand, the secondary test results are gathered by labs in a structured fashion, hence less prone to missingness and mistakes.Figure 7 presents the derived m-graph and pattern graph for this case study (See Appendices E.1 and E.2 for more details about the derivation processes).In summary, the resulting pattern graph includes the following nodes, edges and imputation paths (denoted by ): to impute r 2 By assuming the logistic models R 1 and R 2 as we obtain the informed tilting parameters via the ISA method as The details of ISA derivation are presented in Appendix E.3.The obtained tilting parameters in (19) imply a need for informed guesses for the three logistic coefficients, for example: • The odds of missing primary and secondary test results for healthy against sick patients are approximately  21  Next, we use this case study in the following two validation experiments.

Validation: Simulation experiment
Our goal in this simulation step was to demonstrate that the ISA method yields unbiased results if prior knowledge is available, and missing data assumptions are correct.We further attempt to demonstrate the successful incorporation of prior knowledge in the ISA's sensitivity analysis by comparing the results with the default sensitivity analysis.Throughout the experiments, we thus utilized true assumptions, models, and parameters (ones which are used to synthesize data).
For a detailed explanation of the experiments' setup as well as the results, see Appendix F. Figure 8 presents the estimation bias of expected values for the designed bivariate clinical sequential observations, as introduced in Section 4.1.The ISA method obtained better results compared to unweighted CCA, the single imputation KNN method, and multiple imputation methods: MICE and Missforest.
For default versus edge-wise sensitivity analysis comparison, we made informed choices of tilting parameters according to ISA steps.Figure 9 presents the results of two sensitivity analysis approaches for the simulated dataset.As depicted, the varying range under ISA complies better with the ground truth.Improvement in estimating L 1 is more significant, as  it is subjected to self-censoring, implying a higher departure from the pattern graph assumption.Figure 9 demonstrates how ISA improves the estimations if prior knowledge is valid.

Demonstration: Iron deficiency experiment
Section 1.1 introduced the diagnostics flowchart for thalassemia carrier identification.We further discussed how it can be considered as a clinical sequential observations scenario.This section aims to estimate the true average iron level (IL) in a hospital cohort in the presence of missing data.As a brief recall, we know, according to the flowchart (Figure 1), that physicians diagnose iron deficiency based on the values of the following tests (if available): MCV, MCH, Hb pattern, iron level and a number of genetic tests. 4We took the set of MCV, MCH, and Hb pattern tests as primary since all are part of the chemistry category blood test, and IL test as secondary tests.Our dataset was the MIMIC-III public dataset (see Appendix G for details about the dataset).We employ the ISA's solution for sequential observations, obtained in Section 4.1.The remaining step to take is step 3 of ISA i.e. making informed choices of the tilting parameters.According to (19) The first statement points to the epidemiological statistics of iron deficiency in the population of interest, which can be extracted from related literature.In order to proceed, we assumed that the missing values of the study variables positively correlate with the patients being extremely healthy (as the sick population visits the hospital to take the tests more often).Therefore, we associated the missing follow-ups with high values of IL, MCV and MCH.To formulate our prior knowledge, we assumed that: the odds of missing follow-ups (both variables) for extremely healthy patients is  11 =  21 = 2.5 times higher compared to extremely sick patients.Using Equation (16) this assumption translates to  * 11 =  * 21 ≈ −0.9.We utilized these guessed values, knowing that the domain experts are able to validate and adjust the numbers if required.Regarding the large size of missingness for HbA2, the thalassemia flowchart 4 supports this assumption that the Hematology lab tests are performed regardless of their direct observed values, since any range and value of three items (MCV, MCH, HbA2) are used to order further tests.This means that we can consider the missingness in HbA2 as not self-censoring.Based on Figure 10 we modeled the missingness mechanism of HbA2 as a logistic regression model given the values of MCV and MCH.In other words, we imputed the HbA2 variable (only for the rows with observed MCV and MCH) as a function of MCV and MCH variables.
Second statement points to physicians' opinions and the diagnostic flowchart associated to the iron deficiency, formulated as Similar to the previous item, one can make informed guesses for this item by analyzing physicians' style of practice in a target hospital.In the absence of such information, we estimated E[R 1 |L 2 ] by assuming in order to make a reasonable initial guess for  * 12 .We admitted that this is potentially a biased estimation (as we must estimate E[ 12 |L 2 ]), but we accepted it, to take into account the potential bias later by selecting a range of variation and doing sensitivity analysis.This gave  12 ≈ 1.35, and thus  * 12 = −log( 12 ) ≈ −0.3 is the preliminary approximated value obtained from (20).Apart from the MAR assumption, we made another simplifying assumption: even though L 2 is continuous, we guessed a constant value for  * 12 based on the odds ratio of lowest and highest values of L 2 .In order to account for the approximations and simplifying assumptions, in the sensitivity analysis step, we vary the parameter  12 from its initial guess in order to account for the potential effect of these simplifying assumptions.
As the final step, we select an offset value for  11 ,  21 , and one for Subsequently, we calculated the informed tilting parameters according to Equation (19) and set the varying range for the tilting parameter of the default sensitivity analysis method to  default =  for future comparisons.

F I G U R E 11
The estimated empirical distribution of IL for the entire cohort using different missing data approaches.

TA B L E 1
The estimated mean and standard deviation of the IL in the entire cohort using different missing data approaches.As explained throughout the paper, the choices in (21) are not solid inferred values but rather only guesses based on assumed available prior knowledge.What we achieved in the ISA method, however, was assessability (as clinical experts are able to verify and adjust above statements) and also interpretability of the results (as the sensitivity analysis results has a clear mapping to the selected values).

Method
Figure 11 and Table 1 presents the estimated distribution of IL for the entire cohort using different methods.We imputed the partially-observed IL variable using unweighted CCA, KNN (K=33) imputation, MICE, MissForest, and ISA methods.We performed ISA using the initial guesses for tilting parameters, presented in (21).
Sensitivity analysis results for ISA and default approach are Ê ISA [iron] ∈ [42.51, 65.20] and Ê default [iron] ∈ [29.22,55.90] for the default approach.While the difference in estimated ranges are significant, we cannot claim that the measured difference implies any superiority of a particular approach.An advantage of our method, however, is that one has the possibility of discussing and validating the edge-wise assumptions, encoded scenario, and the tilting parameters, by the medical experts.Furthermore, ISA results are interpretable, as the obtained range corresponds to meaningful quantities.

DISCUSSION AND FUTURE WORK
In this paper, we introduced a new identifiability assumption for pattern graphs and, subsequently the ISA method for incorporating available prior knowledge about the missingness mechanism.ISA method allowed us to validate the framework's assumption in a given problem and correct the bias accordingly (assessability).It also allowed us to make an interpretable connection between domain-expert knowledge and the sensitivity analysis results (interpretability).We concluded that ISA method enables the use of pattern graphs in a broader range of nonignorable missingness problems which otherwise would violate the framework's assumption.We demonstrated how the causal independence statements from m-graphs are used to simplify the pattern graph assumptions.We also introduced examples of prior knowledge statements and how they relate to the parameters of the missingness model.
We implemented the ISA method to solve the bivariate sequential observations clinical case study, which involved nonignorable missingness, violating the pattern graph assumption.We assessed the edge-wise assumptions and calculated the corresponding tilting parameters.We identified two pieces of prior knowledge to make informed guesses for the logistic coefficients: information about rate of hospital visit for health and sick patients, and protocols of ordering a test based on primary findings.We compared the performance of our method with widely-used missing data methods (including original pattern graph) in a simulation study.We studied a routine real-world clinical data and compared our results with other methods.
One may adopt the methods presented in Section 3 for other missing data scenarios.The adoptation includes the calculation of tilting parameters and identifying interpretable prior knowledge for those scenarios.We considered using the logistic model for missingness indicators.There are other working models suggested in the literature, such as the Gaussian model. 21One may investigate the use of such models in the ISA framework.Furthermore, we introduced sufficient conditions for incorporation of m-graph information in the pattern graph framework.This list can be extended considering more real-world scenarios.In particular, one may investigate the sensitivity models with irreducible conditional odds with no immediate interpretable missingness model coefficients despite available causal prior knowledge.

R E F E R E N C E S
As this ratio is going to be utilized either in the rejection probability of the rejection sampling method, or the importance ratio of the importance sampling method, any term C independent of the sampled imputation will be cancelled out (by being cancelled out from the numerator and the denominator of the rejection probability or via normalization of the weights of each sample in the weighted-averaging step, respectively).Thus it is valid to have the R-sorted ratios up to a normalizing constant.

A.3 Proof of conditional independence (11a) and derivation of the three m-graph restrictions
Starting with the independence statement (11a), we recall an R-sorted ratio is written as ) .
By definition, we have L r (k) = {L Δ k , L r (k−1) }.Hence, we rewrite the conditioning set as , that is, the upstream components on the imputation path of the pattern graph.As this component does not appear in the conditioning set, any unmediated confounder path cannot be blocked by {L r (k−1) , R −Δ j }. 3. Restriction iii represents the 'no collider' rule.Since we assumed NDE (see Section 3.3), no L variable can be a collider for Rs, hence only the R −Δ j component needs to be mentioned in the restriction.

A.4 Proof of Theorem 2
After the imputation Algorithm 1 selects a parent r (k) to impute the pattern r (k−1) (not necessarily at the beginning of the imputation; r (k−1) can be any intermediate node for imputation of a lower pattern), the right-hand side of the original pattern graph assumption is stated as ) .
We factorize the components in L r (k−1) with respect to the ancestors of the current child, as we expect it to be imputed: Each ratio term written for L Δ k+i corresponds to a ratio term in the edge-wise assumption in (7) where Δ k+i ∶= Δ k and r (k−1) ∶=; in other words, if we factorize the original assumption for all patterns, and write the edge-wise assumption for all edges, every term in one, is found in the other.This means that the set of all terms, the product of which must be equal to one is shared between two assumptions, but with a different order.In theory we consider a case where each term in an assumption statement is not one, but the product of all terms are.However, this is an arbitrary case that might or might not occur in a problem.In order for the assumptions to structurally hold, we should have all the terms being equal to one.We write two pattern graph assumptions for two incomplete patterns: We write the edge-wise assumption for three edges (one for imputation of (110), two for imputation of (100): It has been shown that three L-sorted ratios are shared between the two assumptions.

APPENDIX B. EXAMPLES OF EDGE-WISE ASSUMPTION
We provide three examples related to Figure 3 which better describe Theorem 1, Corollary 1 and the sensitivity model in Equation (9).
Example 1 (Edge-wise assumption).The pattern (1000) in Figure 3 is imputed according to the selected path by first imputing (L 2 , L 3 ), and then L 4 across two edges in (6b).Edge-wise assumptions are as follows: Example 2 (Edge-wise assumption with L-and R-sorted ratios).We rewrite the edge-wise assumption for e (2) in Example 1 using L-sorted and R-sorted ratios.In Equation (8), let H = {1} and J = {2}, that is, second ratio must be written in its R-sorted form.Then the edge-wise assumption for e (2) is expressed as Example 3 (Edge-wise local sensitivity model).Continuing Example 2, we write the local sensitivity model for e (2) based on Equation ( 9) as where  2 is the tilting parameter for e (2) on the selected imputation path.

APPENDIX C. MCAR AND MAR MISSINGNESS IN ISA
MCAR missingness mechanism is defined by the independence assumption R ⟂ ⟂ L. In the corresponding m-graph, Rs accept no incoming edges from the L nodes.With a slight abuse of terminology, we assign the name "MCAR" to an individual R node of the same condition in an m-graph, that is, R i ⟂ ⟂ L, R −i (receiving no edge from other R nodes as well).
In this case, all three restrictions of Condition 1 holds, and hence the corresponding R-sorted ratio is cancelled out.In an extreme case, if all R nodes are of MCAR type, all partial imputations on all imputation paths are free of the need for sensitivity analysis.
Likewise, MAR missingness mechanism is defined by the independence assumption R ⟂ ⟂ L miss |L obs where the index set obs selects the fully-observed variables, and miss, the remaining ones.In the corresponding m-graph, Rs are restricted to have incoming edges only from the fully-observed nodes.Similar to the MCAR case, we assign the name "MAR" to an individual R node of the same condition in an m-graph.Condition 1 holds for MAR R nodes as well, as the component L r (k−1) (including observed variables) is a sufficient d-separation set.Condition 1 holds also for the weaker "non-monotone MAR" mechanism, where the subscript obs selects the observed entries for a sample.Obviously, this is a weaker assumption since L obs may include the partially-observed variables which are observed for that particular sample.Nevertheless, the observed entries are still included in L r (k−1) , hence a similar discussion as for MAR is valid.

APPENDIX E. DETAILS OF ISA METHOD FOR SEQUENTIAL OBSERVATIONS PROBLEM E.1 Pattern graph
Chen 1 claims that generation of the correct pattern graph for a scenario is intrinsically an open problem.Nevertheless, a pattern graph structure selection procedure has been introduced by the author which is summarized as follows: The general idea of the procedure is to have in mind a scenario of step-wise data collection, be it the order of questions in a questionnaire or the chronological order of lab tests through time which matches the reality of the scenario.We start from when it is not yet determined whether the study variables are missing, for example, at the beginning of the questionnaire or first admission day.From that point, we specify, step-by-step, the possibilities of observing/missing a variable in the process.For that purpose, in addition to 0 and 1 values for r, we use also a placeholder value "−".The placeholder implies that the observation of a variable has not yet been decided upon.
As an example, imagine a case of 3 lab tests, the first two of which are taken simultaneously while the third one is performed the next day.We start with the pattern s 0 = (−, −, −).Patients may either miss or take the first two tests on the first day, therefore we arrive at two new patterns s 1 = (1, 1, −) and s 2 = (0, 0, −).For s 1 the third test might or might not be performed, therefore two new patterns emerge from s 1 which are s 3 = (1, 1, 1) and s 4 = (1, 1, 0).Moreover, suppose we know by the clinic protocols, that the third test must be performed if the first two are missing.Therefore only one pattern s 5 = (0, 0, 1) is resulted from s 2 .
To finish the process, we replace all the placeholders with the value 1.Finally, the emerging patterns and the in-between edges are used to form the pattern graph, with this consideration that a self-looping edge are dropped, and duplicated edges are presented once.For the example at hand, we arrive at the following graph: For our case study, considering the procedure explained above, we construct the pattern graph as follows: • We start from the initial step s 0 = (−, −).A patient may be absent from the site, therefore no observations will be made for (z 1 , z 2 ), which represents the pattern s 1 = (0, 0).For a present patient, the symptom z 2 might or might not be recorded which represents s 2 = (−, 1) and s 3 = (−, 0).
• Based on the value of z 2 , the test z 1 might or might not be taken disregarding whether or not z 2 is recorded.These represent s 4 = (1, 1), s 5 = (0, 1) for the next step of s 2 , and s 6 = (1, 0) and s 7 = (0, 0) for the next step of s 3 .
• • Missingness scenario 1 This scenario implies L 1 → R 1 and L 1 → R 2 edges in the m-graph.
• Missingness scenario 3 This scenario points to an external cause of R 2 missingness, that is, other than L 1 and L 2 and therefore does not imply any explicit edge in the m-graph.
In the constructed m-graph (Figure 7), the variable L 1 is subjected to self-censoring (L 1 → R 1 ).By subtracting the self-censoring edge, the m-graph becomes a subgraph of the block-parallel model. 11Also, note that the missingness in both variables depends on the unobserved data.

E.3 Deriving informed tilting parameters via ISA
We describe the first two steps of ISA to derive the informed tilting parameters for the sequential observations problem.
This is an identifiable case, as presented by (11a), since L 1 ∈ pa m-graph (L 2 ); therefore, no sensitivity analysis is needed: 2.  2 : e 2 edge for imputing r 3 : the R-sorted exponential tilt for this edge, according to Equation (11b), is Note that the condition term R 2 = 1 is removed from the terms since R 1 ⟂ ⟂ R 2 |L. 3.  3 : e 3 edge for imputing r 4 : In this case the number of components for imputation is more than 1 (L 5.  4 : e 1 edge for imputing r 4 : This is the second edge of  4 to complete the partially-imputed r 4 by sampling L 2 .Similar to  1 , the variable L 2 is in the condition set of the L-sorted ratio for this edge, and therefore this case is identifiable: Step 2: Obtaining the tilting parameters Assuming the logistic model, we obtain the tilting parameters via Equation ( 16): • Equations (E1a) and (E1e) directly give

E.4 MI Sampling
To estimate the probability densities p(L s−r |L r , R = s) in Algorithm 4, we categorize the estimation task in two classes: 1. when L r ≠ ∅ that is, the target is a conditional density.We can utilize probabilistic models such as Gaussian process or Bayesian neural network.For larger datasets, more scalable models such as neural linear model (NLM) are used. 23ssuming linearity, one may also implement Bayesian linear regression for binary/categorical or the generalized linear model for continuous variables.2. when L r = ∅.For example, in a dataset with three variables L = (L 1 , L 2 , L 3 ), to impute a row L i = (?, ?, ?) with the missingness indicator R i = r 0 = (0, 0, 0) where s → r 0 and s = (1, 1, 0), the target is p(l 1 , l 2 |R 1 = 1, R 2 = 1, R 3 = 0).This is the problem of density estimation, an unsupervised machine learning task where the goal is to establish an approach to sample from an unknown distribution from which the observations (in case of the above example (L 1 , L 2 )) have been drawn.In case of a univariate distribution with a large enough sample size, the most straightforward solution is based on the empirical cumulative distribution function (eCDF) and the universality of the uniform theorem. 24The eCDF approach is equivalent to sampling with replacement from a realization vector.However, this solution is not feasible for high-dimensional data or for sparse and continuous variables.Dinh et al 25 summarize the following more advanced approaches: • Maximum likelihood models such as Restricted Boltzmann Machines or Deep Boltzmann Machines described by probabilistic undirected graphs, and Variational Autoencoders described by directed graphical models.
• Adverserial models such as GAN, and the novel real-valued non-volume preserving transformations (real NVP) method proposed by Dinh et al. 25 Finally, If the conditional distribution models are available for a set of variables, Gibbs sampling 26 can also be used for both aforementioned cases of conditional and marginal distributions.
In this study we chose GLM and eCDF approaches.

F I G U R E F2
Box plot visualization for the results of different simulation runs using Table F3.

TA B L E F2
The mean estimation bias, and STD of bias in the first simulation for the study variables L 1 , L 2 .for n = 1000, number of iterations = 50, and simulation parameters according to Table F1.percentages of missingness was 10.01% jointly for MCV and MCH, 89.83% for HbA2, and 76.74% for IL.Finally, we used the binary interpretation of HbA2, as suggested by the flowchart in Figure 1, based on the threshold of 3.6%.

APPENDIX H. IMPUTATION ALGORITHMS
The original pattern graph imputation is presented in Algorithm 2, with the partial imputation Algorithm 1. Modified imputation algorithm in the default sensitivity analysis is presented in Algorithm 3. The partial imputation algorithm is modified under edge-wise sensitivity analysis (in ISA) as in Algorithm 4. This modified partial imputation algorithm shall be utilized with the original imputation Algorithm 2. Set L now = L i,R i and R now = R i .

4:
Execute Algorithm 1 with inputs L now and R now .

5:
Update L now , R now to be the return of the previous step.

1 . 2 . 3 . 4 .
(prerequisite) Construct the pattern graph and m-graph for a given problem; Incorporating m-graph information: Extract the sufficient conditional independence statements from the m-graph to simplify the edge-wise exponential tilting terms; Decomposing global sensitivity model into local sensitivity models: Obtain the local tilting parameters as a function of coefficients of the assumed missingness indicators' model for each imputation component; Choice of local tilting parameters: Select an informed range of variation for local tilting parameters  based on the prior knowledge about the aforementioned coefficients.

Figure 4
depicts an imputation path and three example m-graphs structures, demonstrating when the restrictions are met or violated.Details of derivation of the restrictions are presented in Appendix A.3.Among possible missingness mechanisms which conform to this condition are the ignorable MCAR and MAR missingness mechanisms.For a detailed discussion about these mechanisms and their corresponding m-graphs, see Appendix C.

F I G U R E 4
Three possible m-graphs (b, c, d) for a pattern graph and an imputation path in (a) to study the restrictions of Condition 1:

F I G U R E 5
Three possible m-graphs (b, c, d) which violate Condition 2 for a pattern graph and an imputation path in (a).The blue edges highlight the edges which violate Condition 1 (thus leads us to check Condition 2), while the red edges highlight the violating paths:

F I G U R E 7
The assumed m-graph (a) and pattern graph (b) for the sequential observations problem.

4 . 1 1 : 2 :
Case study: Clinical sequential observationsConsider a problem with two study variables L 1 and L 2 and the corresponding missingness indicators R 1 and R 2 .Let the variable L 1 represent the severity of a disease (0=healthy, 1=sick) and L 2 represent an associated symptom for the disease (0=normal, 1=abnormal).To derive the m-graph and pattern graph, we explore the reasonable causal relations as well as the most frequent missingness scenarios in the clinical settings: Causal structure of L: We define L 2 as the manifested symptom of the underlying disease L 1 .Missingness scenario Occurs when a patient does not show up in the hospital (missing-visit); as an example, extremely healthy patients are less likely to visit doctors and therefore their associated variables are missing.Missingness scenario Occurs for patients with abnormal (eg, high) primary test values i.e.L 2 .

8
Estimation results for the bivariate clinical sequential observations case study for a simulation round.

F
I G U R E 9 Sensitivity Analysis for estimation of E[L 1 ], E[L 2 ] with agnostic and informed sensitivity parameters.

F
I G U R E 10 MCV-MCH plot for observed and unobserved HbA2 cases.

3 Figure
Figure E1 depicts the process of choosing the pattern graph and the final result.

𝜋 1 1 4 1 4 = 3 3
= 0 and   = 0. • In both Equations (E1b) and (E1d), we have the conditional odds of R 1 in the process of imputing L 1 .It gives   11 .• In Equation (E1b), we have two conditional odds of R 1 and R 2 in the process of imputing L 1 and L 2 .The odds of R 1 gives  11 for L 1 and  12 for L 2 , while odds of R 2 gives  21 for L 2 only.The result is the tilting vector   = ( 11 +  21 ,  12 ).

Algorithm 1 .Algorithm 2 . 1 :
Partial-imputation algorithm for pattern graphs1Inputs: variables l r ; the pattern r is determined by the input l Sample a random pattern S from the pattern set pa PG (r) with probabilityp(S = s) = p(l r |R = s)n s ∑ ∈pa PG (r) p(l r |R = )n  ,where n s = ∑ n i=1 I(R i = s).Impute the components in s − r by sampling from the conditional density:L † s−r ∼ p(l|L r = l r , R = s).Return L s = (L s−r , L r ) Pattern graph algorithm for imputing the entire data 1 Requires estimators p(l r |R = r); a regular pattern graph  2: for i = 1, … , n do 3: and  11 times larger.This implies  * 11 ≈ − log  11 and  * 21 ≈ − log  21 .• The odds of ordering the secondary test for patients with abnormal primary test results is approximately  12 times larger for patients with normal primary test results.This implies  * 12 ≈ − log  12 .
, we make informed guesses for  11 ,  12 , and  21 .1.The odds of missing primary and secondary test results for healthy patients are approximately  21 and  11 times larger compared to sick patients.This implies  * 11 ≈ − log  11 and  * 21 ≈ − log  21 .2. The odds of ordering the secondary test for patients with abnormal primary test results is approximately  12 times larger for patients with normal primary test results.This implies  * 12 ≈ − log  12 .
Consequently, if the second and third term of the conditioning set, that is, {L r (k−1) , R −Δ j } d-separates R Δ j and L Δ k , the latter is removed from the conditioning set, thus the R-sorted ratio reduces to a constant (hence, (11a)).By the rules of d-separation, a set C is a d-separation set for A, B if A and B do not have a direct causal edge, C blocks all the confounder paths, and no variable in C opens a collider path: 1.Restriction i represents the 'no direct edge' rule.2. Restriction ii represents the 'no confounder' rule.The only component which is left out of the R-sorted conditioning set is L 1 d −r (k) Table D1 introduces the missing data methods for comparison in our experiments.Missing data methods for comparison with informed sensitivity analysis pattern graph.
1 , L 2 ), and according to the m-graph, missingness indicators of the components have no direct causal edges.Therefore Condition 2 is met.We use (11d) to write the R-sorted exponential tilting terms O(R 1 |L)O(R 2 |L) = exp (   4 : e 4 edge for imputing r 4 : This is the first edge of the  4 imputation path for R 4 , where L 1 is being imputed.similar to  2 , we have The simulation parameters of the first round of simulation.( 11 ,  21 ,  12 ,  10 ,  20 ) (−0.6, −0.6, −0.3, 0.6, 0.3)

E[L 1 ] (bias ± std) E[L 2 ] (bias ± std)
The simulation parameters for the first round of simulation.