Revisiting the Analysis of Matched-Pair and Stratified Experiments in the Presence of Attrition

In this paper we revisit some common recommendations regarding the analysis of matched-pair and stratified experimental designs in the presence of attrition. Our main objective is to clarify a number of well-known claims about the practice of dropping pairs with an attrited unit when analyzing matched-pair designs. Contradictory advice appears in the literature about whether or not dropping pairs is beneficial or harmful, and stratifying into larger groups has been recommended as a resolution to the issue. To address these claims, we derive the estimands obtained from the difference-in-means estimator in a matched-pair design both when the observations from pairs with an attrited unit are retained and when they are dropped. We find limited evidence to support the claims that dropping pairs helps recover the average treatment effect, but we find that it may potentially help in recovering a convex weighted average of conditional average treatment effects. We report similar findings for stratified designs when studying the estimands obtained from a regression of outcomes on treatment with and without strata fixed effects.


Introduction
In this paper we revisit some common recommendations regarding the analysis of matched-pair and stratified experimental designs in the presence of attrition.Here, we define attrition to mean that we do not observe outcomes for some subset of the experimental units.This situation may arise, for instance, if subjects refuse to participate in the experiment's endline survey or if researchers lose track of subjects prior to observing their experimental outcomes.
Our main objective is to clarify a number of well-known claims about the practice of dropping pairs with an attrited unit in matched-pair designs.Specifically, when one unit in a pair is lost, several contradictory suggestions have been made in the literature about whether or not experimenters should drop the remaining unit in their analyses.1For instance, King et al. (2007) and Bruhn and McKenzie (2009) assert that a key advantage of matched-pair designs is that dropping pairs with an attrited unit may protect against attrition bias when attrition is a function of the matching variables.In contrast, Glennerster and Takavarasha (2013) claim that dropping pairs may increase attrition bias, and point out that the widespread practice of including pair fixed effects in a regression of outcomes on treatment is equivalent to computing the difference-in-means estimator after dropping pairs.Accordingly, they go on to suggest that experimenters should instead stratify the units into larger groups if there is risk of attrition.Donner and Klar (2000) assert that dropping pairs with an attrited unit is a requirement in analyses of matched-pair designs with attrition, and characterize this as a weakness of matched-pair designs.As a result, they also recommend stratifying units into larger groups.
To address these claims, we first derive the estimands obtained from the difference-in-means estimator in a matched-pair design both when the observations from pairs with an attrited unit are retained and when they are dropped.We find that the estimand produced when retaining the units is simply the difference in the mean outcomes conditional on not attriting.In contrast, the estimand produced when dropping the units is a complicated function of the mean outcomes and attrition probabilities conditional on the matching variables.Using this result, we show that dropping pairs does not recover the average treatment effect when attrition is a function of the matching variables, and instead recovers a convex weighted average2 of conditional average treatment effects.Moreover, we argue that natural conditions under which this convex weighted average further collapses to the average treatment effect are in fact stronger than the condition that attrition is independent of experimental outcomes.From these results we conclude that, although dropping pairs may potentially help in recovering a convex weighted average of conditional average treatment effects, we find limited evidence to support the claims that dropping pairs in a matched-pair design helps protect against attrition bias more generally.
Next, to address the claims that the issues surrounding whether or not to drop pairs with an attrited unit can be resolved by instead stratifying the experiment into larger groups, we repeat the above exercise in the context of a stratified randomized experiment where the strata are made up of a large number of observations.To mirror the analysis carried out for matched pairs, we study the estimands obtained from a regression of outcomes on treatment with and without strata fixed effects.
We find analogous results: the estimand produced when omitting strata fixed effects is once again the difference in mean outcomes conditional on not attriting, and the estimand produced when including strata fixed effects is a function of the mean outcomes and attrition probabilities conditional on the strata labels with very similar properties to what was obtained for matched pairs.From these results we conclude that we do not find compelling evidence to support the idea that stratifying into larger groups resolves the issues surrounding attrition that we explore in this paper.
Including pair fixed effects when conducting inference via linear regression is a widely adopted practice (see for instance the recommendations in Bruhn and McKenzie, 2009), and is numerically equivalent to dropping pairs with an attrited unit.As a consequence, inference considerations sometimes drive the discussion of whether or not to drop pairs (see for example Chapter 4, footnote 32 in Glennerster and Takavarasha, 2013).However, in our view this should not play a primary role when deciding whether or not to drop pairs for three reasons.First, as we show in this paper, including vs. excluding pair fixed effects produces estimands with distinct interpretations in the presence of attrition.Second, as argued in Bai et al. (2021) and Bugni et al. (2018) (in settings without attrition), including pair/strata fixed effects is not a requirement for conducting valid inference on the ATE in matched-pair/stratified experiments, and there is no clear benefit obtained from doing so in general.
Third, there are no formal results which justify the use of conventional robust standard errors in the presence of attrition (with or without fixed effects), and we conjecture that alternative inference procedures should be developed in this case (see Remark 3.1 for a preliminary discussion).For these reasons, in this paper our primary focus is on studying the interpretation of the resulting estimands.
Finally, we explore the empirical relevance of our results using experimental data collected in Groh and McKenzie (2016) as well as data collected from a systematic survey of all papers published in the American Economic Review (AER) and American Economic Journal: Applied Economics (AEJ: Applied) from 2020-2022 which conduct matched-pair or stratified experiments in the presence of attrition.Using these datasets we find that there can be noticeable differences between the point estimates obtained from dropping or retaining pairs with an attrited unit (or including/omitting stratum fixed effects), even when attrition is comparatively low.For instance, using the data in Groh and McKenzie (2016) we find an average absolute percentage difference of 13.82% in point estimates across a collection of outcomes even with an average attrition rate of only 1.4%.
Our paper is related to a large literature on the analysis of randomized experiments with attrition.Most of this literature focuses on developing methods to recover the average treatment effect, often by either modeling the missing data process (Heckman, 1979;Rubin, 2004), inverse probability weighting (Wooldridge, 2002;Little and Rubin, 2019), bounding (Horowitz and Manski, 2000;Lee, 2009;Behaghel et al., 2015), or testing for the presence of attrition bias (Ghanem et al., 2021).Instead, the focus of our paper is on studying the behavior of commonly used estimators in the analysis of matched-pair and stratified experiments.To our knowledge, the paper most similar to ours is Fukumoto (2022), who conducts finite population and super-population analyses of the bias and variance of the difference-in-means estimator in matched-pair designs with and without dropping pairs.However, his super-population analysis maintains a sampling framework where the observations are drawn together as pairs, whereas we consider a sampling framework where observations are drawn as individuals and then subsequently paired according to their covariates.As a consequence, his results and ours are not directly comparable (we note that every empirical application we consider in Section 4 describes a specific procedure by which they stratified their sample using available covariates, and thus does not feature a sample constructed from pre-formed strata as modelled in Fukumoto, 2022).Moreover, Fukumoto (2022) exclusively focuses on the setting of matched-pair designs and thus does not derive results for stratified randomized experiments.
The rest of the paper is structured as follows.In Section 2 we describe our setup and introduce the main assumptions we consider on the attrition process.Section 3 presents the main results.In Section 4 we present an empirical illustration.Finally, we conclude in Section 5 with some recommendations for empirical practice.

Setup and Notation
Let Y * i denote the realized outcome of interest for the ith unit in the absence of attrition, D i ∈ {0, 1} denote treatment status for the ith unit and X i denote the observed, baseline covariates for the ith unit.Further denote by Y i (1) the potential outcome of the ith unit if treated and by Y i (0) the potential outcome if not treated.As usual, the realized outcome is related to the potential outcomes and treatment status by the relationship We consider a framework which allows for the possibility that units collected in the baseline survey may drop out (attrit) after treatment is assigned.In particular, let R i ∈ {0, 1} be an indicator where R i = 1 indicates the ith unit is present in the endline survey (i.e. has not attrited) and R i = 0 indicates otherwise.Let R i (1) denote the potential attrition decision of the ith unit if treated, and R i (0) denote the potential attrition decision of the ith unit if not treated.As was the case for the realized outcome, the realized attrition decision is related to the potential attrition decisions and treatment status by the relationship With these definitions in hand, we define the observed outcome to be We note that the observed outcome is undefined if individual i is not observed in the endline survey, and so we set it arbitrarily to zero in equation (3).
We assume that we observe a sample As a result, the distribution of the observed data is determined by ( 1), ( 2), (3), {W i : 1 ≤ i ≤ n}, and the mechanism for determining treatment assignment (which we specify in Sections 3.1 and 3.2).We maintain the following assumption on {W i : 1 ≤ i ≤ n} throughout the entirety of the paper: Assumption 2.1(a) imposes mild restrictions on the moments of the potential outcomes.Assumption 2.1(b) rules out situations where the probability of attrition is one for either treatment status.
Our parameter of interest is the average treatment effect, denoted as Without further assumptions on the nature of attrition, θ is not point-identified from the observed data.As a consequence, in this paper we first study the estimands produced by commonly used estimators in the analysis of matched-pair and stratified randomized experiments, and then document if and when these estimands collapse to θ under well-known, albeit strong, assumptions on the attrition process; see Remark 3.2 for further discussion.The first assumption we consider is that attrition is independent of the potential outcomes: Assumption 2.2.
Under Assumption 2.2, the average treatment effect θ is point-identified in a classical randomized experiment by simply comparing the mean outcomes under treatment and control for the non-attritors (see for instance Gerber and Green, 2012).The next assumption we consider is that attrition is independent of potential outcomes conditional on some set of observable characteristics: Assumption 2.3.For some set of observable characteristics C i , Although Assumption 2.2 does not necessarily imply Assumption 2.3 or vice versa, it is often argued that Assumption 2.3 may be easier to defend in practice (Moffit et al., 1999;Hirano et al., 2001;Gerber and Green, 2012;Little and Rubin, 2019).Under Assumption 2.3, θ is point-identified in a classical randomized experiment by first identifying the average treatment effect conditional on each value C = c and then averaging these conditional treatment effects across C. Note that Assumption 2.3 generalizes the assumption discussed in the introduction that attrition is a function of observable characteristics.The final assumption we consider is that attrition is independent of observable characteristics: Assumption 2.4.For some set of observable characteristics C i , A useful observation for the discussion which follows is that, although Assumptions 2.2 and 2.3 are not nested, Assumptions 2.3 and 2.4 do in fact imply Assumption 2.2.To see this, consider the following derivation: where the first equality follows from the law of iterated expectations, the second equality from Assumption 2.3, the third from Assumption 2.4, and the fourth from the law of iterated expectations once again.
3 Main Results

Matched-Pair Designs with Attrition
In this section we study the estimands produced by the difference-in-means estimator in a matched-pair design when the observations from pairs with an attrited unit are retained and when they are dropped.
Before defining the estimators we provide a formal description of the treatment assignment mechanism.
To simplify the exposition, we assume that n is even for the remainder of Section 3.1.For any random variable indexed by i, for example D i , we denote by ) be a permutation of {1, . . ., n}, potentially dependent on X (n) .The n/2 matched pairs are then represented by the sets In other words, pairs are formed by arranging observations in the order {π(1), π(2), . . ., π(n)} according to the permutation π, and then forming pairs from the adjacent units as {π(1), π(2)}, {π(3), π(4)}, etc. Next, given such a π, we assume treatment status is assigned as follows: Assumption 3.1.Treatment status is assigned so that and, conditional on . and each uniformly distributed over {(0, 1), (1, 0)}.
To summarize, the assignment mechanism first forms pairs of units (according to π) and then assigns both treatments exactly once in each pair at random.The first estimator we consider is the standard difference-in-means estimator computed on non-attritors: Note that θn may be obtained as the estimator of the coefficient on D i in an ordinary least squares regression of Y i on a constant and D i , computed on the non-attritors.The second estimator we consider is the difference-in-means estimator computed by first dropping any observations belonging to a pair with an attritor: Note that θdrop n corresponds to the estimator recommended in Bruhn and McKenzie (2009) and King et al. (2007).We emphasize that, in the absence of attrition, θn and θdrop n are numerically equivalent.
As a consequence of the Frisch-Waugh-Lovell theorem, θdrop n can equivalently be obtained as the ordinary least squares estimator of the coefficient on D i in the linear regression of Y i on D i and pair fixed effects computed on the non-attritors (i.e.individuals with R i = 1): Similar regression specifications are extremely common in the analysis of matched-pair experiments.
We impose the following assumption in addition to Assumption 2.1: Assumptions 3.2(a)-(b) are smoothness requirements that ensure that units that are "close" in terms of their baseline covariates are also "close" in terms of their potential attrition indicators and potential outcomes on average.Similar smoothness requirements are also imposed in Bai et al. (2021) andBai (2022).
Finally, we require that the matched-pair design is such that the units in each pair are "close" in terms of their baseline covariates in the following sense: Assumption 3.3.The pairs used in determining treatment status satisfy See Bai et al. (2021) for sufficient conditions for Assumption 3.3.In particular, if dim < ∞ and we construct pairs by simply ordering the units from smallest to largest according to X i and then pairing adjacent units.For the case dim(X i ) > 1, Bai et al. (2021) provide sufficient conditions under which Assumption 3.3 is satisfied when using the popular R package nbpMatching.Using appropriate laws of large numbers developed in Bai et al.
(2021), we now establish the following result: Theorem 3.1.Suppose the data satisfy Assumptions 2.1 and 3.2 and the treatment assignment mechanism satisfies Assumptions 3.1 and 3.3.Then, as n → ∞, θn P → θ obs , where and θdrop n P → θ drop , where with Theorem 3.1 shows that the estimand produced by the difference-in-means estimator, θ obs , is simply the difference in the mean outcomes conditional on not attriting (under the additional assumption that R i (1) = R i (0) this could be interpreted as the average treatment effect for units who do not attrit: see Remark 3.3 for details).It follows immediately that, under Assumption 2.2, θ obs = θ and thus under this assumption we recover the average treatment effect.
On the other hand, the estimand produced by first dropping units belonging to a pair with an attritor, θ drop , is a complicated function of the mean outcomes and attrition probabilities conditional on the matching variables.First, note that unlike θ obs , θ drop does not collapse to θ under Assumption 2.2.Moreover, θ drop does not collapse to θ under Assumption 2.3 with i.e. θ drop may be written as a convex weighted average of the conditional average treatment effects τ (x).In some special cases this convex-weighted average has a simple and transparent interpretation: consider for example a setting where X i is a binary variable, and suppose that attrition is such that units with X i = 1 always appear in the endline survey, so that , and ρ(0) = 0. We thus have that in this case which is the average treatment effect for those units with X i = 1.In contrast, θ obs does not lend itself to a straightforward causal interpretation in this example (however in Remark 3.3 we provide a favorable interpretation of θ obs under Assumption 2.3 and the additional assumption that R i (1) = In general, straightforward algebra shows that ρ(x) = 1 if and only if In words, ρ(x) = 1 if and only if the conditional probability of attrition under treatment is inversely proportional to the conditional probability of attrition under control.A natural assumption which guarantees (7) for all x is Assumption 2.4 with C i = X i , so that attrition is independent of the matching variables X i .Finally, we note that under Assumption 2.4 with C i = X i , it follows that θ drop = θ obs .As a result, θ drop = θ under Assumptions 2.2 and 2.4.We summarize the above discussion in the following corollary: We conclude this section by noting that, as explained in the derivation following the statement of Assumption 2.4, Assumptions 2.3 and 2.4 imply Assumption 2.2.In other words, we see that the sufficient conditions provided in Corollary 3.1 under which θ drop = θ are in fact stronger than the conditions required for θ obs = θ.We thus find limited evidence to support the claims that dropping pairs in a matched-pair design helps in reducing attrition bias.However, we emphasize that dropping pairs may potentially help in recovering a convex weighted average of conditional average treatment effects.
Remark 3.1.In Appendix A.3 we develop the requisite distributional results to use θn for inference about θ obs .In contrast, the large sample distribution of θdrop Remark 3.2.We note that, in the absence of additional assumptions like Assumptions 2.2-2.4,we are not able to conclude that either θ obs or θ drop is less biased for θ relative to the other, and in fact it is possible to construct data generating processes where either estimand is closer to the true average treatment effect.We present a concrete construction of such a set of DGPs in Appendix A.4.
Remark 3.3.From Theorem 3.1 we also observe that, in the absence of additional assumptions like Assumptions 2.2-2.4,neither θ obs nor θ drop can be interpreted as a "treatment effect parameter" (in the sense that neither parameter can be interpreted as an average treatment effect for some subset of individuals or more generally as a weighted average of treatment effects).This is because the subgroup of units who attrit under treatment (R i (1) = 0) may not correspond to the subgroup of units who attrit under control (R i (0) = 0).Under the additional assumption that R i (1) = R i (0), so that these subgroups coincide, we obtain and then θ obs could be understood as the average treatment effect for units who do not attrit.Imposing the same assumption for θ drop we obtain that and then θ drop could be understood as a "probability of attrition"-weighted average of individual-level treatment effects for the non-attritors.If we additionally impose Assumption 2.3, we alternatively obtain In this case, both parameters can be interpreted as convex weighted averages of conditional average treatment effects, with the main difference being that θ obs is weighted using the the conditional attrition rate E[R i |X i ], whereas θ drop "doubles down" by weighting using the squared conditional Remark 3.4.Regardless of whether or not a practitioner finds the interpretation of θ drop more or less attractive than the interpretation of θ obs , it is crucial to note that, even in the absence of attrition, inferences produced using robust standard errors obtained from a regression with pair fixed effects are generally conservative, but in some cases may in fact be invalid, in the sense that the limiting rejection probability could be strictly larger than the nominal level.See Bai et al. (2022) and de Chaisemartin and Ramirez-Cuellar (2020) for details.

Stratified Designs with Attrition
In this section we repeat the exercise presented in Section 3.1 but in the context of stratified designs.
Before describing the estimators, we provide a description of the class of treatment assignment mechanisms we consider.In words, our results accommodate any treatment assignment mechanism which first partitions the covariate space into a finite number of "large" strata, and then performs treatment assignment independently across strata so as to achieve "balance" within each stratum.Formally, let S : supp(X i ) → S be a function which maps the support of the covariates into a finite set S of strata labels.For 1 ≤ i ≤ n, let S i = S(X i ) denote the strata label of individual i.For s ∈ S, let where ν ∈ (0, 1) denotes the "target" proportion of units to assign to treatment in each stratum.
Intuitively, D n (s) measures the amount of imbalance in stratum s relative to the target proportion ν.
Our requirements on the treatment assignment mechanism can then be summarized as follows: Assumption 3.4.The treatment assignment mechanism is such that As before, the first estimator we consider is the standard difference-in-means estimator computed on non-attritors θn .The second estimator we consider, denoted θsfe n , is the estimator obtained as the estimator of the coefficient on D i in an ordinary least squares regression of Y i on D i and strata fixed effects computed on the non-attritors: Similar regression specifications are extremely common in the analysis of stratified randomized experiments.See, for example, Bruhn and McKenzie (2009), Duflo et al. (2015), Glennerster and Takavarasha (2013), de Mel et al. (2019), and Callen et al. (2020).Using appropriate laws of large numbers developed in Bugni et al. (2018), we now establish the following result: Theorem 3.2.Suppose the data satisfy Assumptions 2.1 and the treatment assignment mechanism satisfies Assumption 3.4, Then as n → ∞, θn P → θ obs , where and θsfe n P → θ sfe , where The conclusions we draw from Theorem 3.2 closely mirror those of Theorem 3.1.In this case, under Assumption 2.3 with C i = S i , where so that θ sfe is also a convex weighted average of the strata-level treatment effects τ (s), although the weights λ(s) are arguably more complicated to interpret than the weights ρ(x) defined in Section 3.1.
Straightforward algebra shows that λ(s) = 1 if and only if where . Conditions under which this holds seem difficult to articulate in words, but once again a natural assumption which guarantees (8) for every s ∈ S is that Assumption 2.4 is satisfied with C i = S i .We summarize these observations in the following corollary: We conclude this section by stating that, given how closely the results presented in Section 3.2 mirror those in Section 3.1, we do not find compelling evidence to support the idea that stratifying into larger groups resolves the issues surrounding attrition that we explore in this paper.

Empirical Illustrations
4.1 Re-analysis of Groh and McKenzie (2016) In this section we illustrate the potential empirical relevance of deciding whether or not to drop pairs with an attrited unit using the experimental data collected in Groh and McKenzie (2016), which implemented a matched-pair design in the presence of attrition.The regression specifications in the paper contain pair fixed effects, which, as explained in Section 3.1, is mechanically equivalent to dropping pairs with an attrited unit when regressing outcomes on a constant and treatment.Groh and McKenzie (2016) study the effect of insuring microenterprises (clients) against macroeconomic instability and political uncertainty in post-revolution Egypt.A baseline survey was completed for 2961 clients, who were then randomly assigned to treatment (1481 individuals) and control (1480 individuals) using a matched-pair design 4 .In Table 1 we reproduce the intention-to-treat estimates from Table 7 of their paper, which presents estimated treatment effects on profits, revenues, employees and household consumption."Original" corresponds to the estimates obtained from running the regression specifications in the original paper which include pair fixed effects, and θn corresponds to estimates obtained from running an identical regression specification without pair fixed effects (we note that we were able successfully reproduce all of the reported estimates from the paper).We find an average absolute percentage difference of 13.82% 5 for the point estimates of these effects, with the largest differences appearing for profits and revenue.One caveat to the findings in Table 1 is that the setting does not map exactly into our theoretical results: first, both regressions control for baseline covariates and second, the final assignment contained one stratum with 16 individuals, each belonging to a different branch office.Given this, in Table 2 we report the intention-to-treat estimates without baseline covariates and without this additional stratum.
In this case we find an average absolute percentage difference of 15.61% for the point estimates of the effects.We emphasize that we consider these difference particularly salient given that attrition is quite low (on average 1.4% across the outcomes), and that in the absence of attrition these estimates would be numerically identical, as illustrated from the estimates of the effect of treatment for monthly consumption.
4 Per the authors, they "created matched pairs [...] to minimize the Mahalanobis distance between the values of 13 variables that [they] hypothesized may determine loan take-up and investment decisions".The final assignment contained one stratum with 16 individuals, each belonging to a different branch office.We follow the authors' methodology in keeping this stratum when we conduct our analysis in Table 1.We drop these when we perform additional analyses in Table 2. 5 Here the absolute percentage difference is computed as

Re-analysis of Recent Publications in AER & AEJ: Applied
Next, we perform a similar exercise using the data from a systematic survey of all papers published in the American Economic Review (AER) and the American Econonomic Journal: Applied Economics  2020).For each paper, we collected a set of "relevant" regression specifications,6 and reproduced these regressions with and without pair/stratum fixed effects (we note that we were able to successfully reproduce all of the reported estimates from each paper).In Figure 1 we report the average absolute percentage change (computed as |Alternative−Original| |Original| × 100, where "Original" corresponds to the point estimate computed in the paper, and "Alternative" corresponds to the estimate computed from the alternative specification with or without fixed effects) across all specifications for each paper.
Similar to our findings for Groh and McKenzie (2016), we find that there can be noticeable differences in the point estimates with and without fixed effects (although we emphasize that we do not claim that these differences are necessarily statistically significant).

Recommendations for Empirical Practice
We conclude with some recommendations for empirical practice based on our theoretical results.Our main takeaway is that choosing whether or not to include pair/strata fixed effects when attrition is a concern can make a substantive difference to empirical findings and to the interpretation of the resulting estimand.In our view, unless practitioners are interested in recovering the convex-weighted averages produced by θ drop and θ sfe under a conditional independence assumption (Assumption 2.3), primary analyses should be based on regressions without pair/strata fixed effects: the resulting estimand θ obs has a simple interpretation in the absence of any assumptions, and collapses to the average treatment effect under arguably weaker assumptions than θ drop and θ sfe .A secondary benefit of θ obs is that, under the additional assumption that R i (1) = R i (0), θ obs also enjoys an interpretation as a convex-weighted average under Assumption 2.3, with weights which may be more desirable than those appearing in θ drop or θ sfe in that they do not "double-up" on attrition: see Remark 3.3 for details.

A Appendix
A.1 Proof of Theorem 3.1 First we argue that θn .
By Lemma S.1.5 in Bai et al. (2021), For B n , it follows Assumptions 3.1, 3.2(a), 3.3, the fact that R i (d) ∈ {0, 1} for d ∈ {0, 1} and therefore has finite second moments, and similar arguments to those in the proof of Lemma S.1.6 of Bai et al. (2021) that as n → ∞, Next, we turn to C n .Note

It follows from Assumption 3.1 and
Next, it follows from Assumptions 2.1(a), 3.2(b), 3.1, 3.3, and similar arguments to those in the proof of Lemma S.1.6 of Bai et al. (2021) that as n → ∞, Moreover, it can be shown using similar arguments to those in the proof of Lemma S.1.6 of Bai et al.
A.2 Proof of Theorem 3.2 First we argue that θn .
By Lemma B.3 in Bugni et al. (2018), where we note that an inspection of their proof shows that Assumption 3.4(b) is sufficient to establish their result.Similarly, Hence the result follows by the continuous mapping theorem.Next we argue that θsfe n P → θ sfe .To that end, write θsfe where Di is the projection of D i on the strata indicators, i.e., Di = D i − n 1 (S i )/n(S i ), and for By Lemma B.3 in Bugni et al. (2018) and the continuous mapping theorem, we have Similarly, where in the last equality we used the fact that , where the second equality follows from 1≤i≤n R i Di n1(Si) n(Si) = 0, which is derived as follows: 1≤i≤n The conclusion then follows from the continuous mapping theorem.

A.3 The Limiting Distribution of θn
Theorem A.1.Suppose Q satisfies Assumption 2.1 (as well as E[Y 2 i (d)] < ∞) and Assumption 3.2 (as well as ), and the treatment assignment mechanism satisfies Assumptions 3.1, 3.3 as well as where for d ∈ {0, 1}.
Remark A.1.Following arguments similar to those in Bai et al. (2023), we can construct a consistent estimator of ς 2 mp .To that end, consider the observed adjusted outcome defined as: We then propose the following variance estimator: where It follows from similar arguments to those used in Bai et al. (2023) that under appropriate assumptions Proof of Theorem A.1.To begin, note θn = .
Next, note by Assumption 3.1 that and similarly for the other three terms.The desired conclusion then follows from Lemma A.1 together with an application of the delta method.In particular, for g(x, y, z, w) = x y − z w , observe that and the Jacobian is Note by the laws of total variance and total covariance that V in Lemma A.1 is symmetric with entries The conclusion of the theorem then follows from direct calculation.
Lemma A.1.Suppose Q satisfies Assumption 2.1 (as well as ), and the treatment assignment mechanism and similarly for the rest.Next, note (L YA1 1,n , L A1 1,n , L YA0 1,n , L A0 1,n ), n ≥ 1 is a triangular array of normalized sums of random vectors.We will apply the Lindeberg central limit theorem for random vectors, i.e., Proposition 2.27 of van der Vaart (1998), to this triangular array.Conditional on X For the upper left component, we have It follows from the weak law of large numbers, the application of which is permitted by On the other hand, it follows from Assumption 3.2 and 3.3 that Meanwhile, )] < ∞, so it follows from the weak law of large numbers as above that where the first inequality follows by inspection, the second follows from Assumption 3.2 and the Cauchy-Schwarz inequality, the third follows from (a + b) 2 ≤ 2a 2 + 2b 2 , the last follows by inspection again, and the convergence in probability follows from (15).Therefore, it follows from ( 16) that follows from Markov's inequality conditional on X (n) and D (n) , and the fact that probabilities are bounded and hence uniformly integrable, that (L Otherwise, it follows from similar arguments to those in the proof of Lemma S.1.5 of Bai et al. (2021) that where L denotes the distribution and ρ is any metric that metrizes weak convergence.
Next, we study (L YA1 2,n , L A1 2,n , L YA0 2,n , L A0 2,n ).It follows from Q n = Q 2n and Assumption 3.1 that . Therefore, it follows from Markov's inequality conditional on X (n) and D (n) , and the fact that probabilities are bounded and hence uniformly integrable, that Similarly, + o P (1) .
A.5 Additional Details for Empirical Survey in Section 4.2 A.6 Details for Equation (6) Let θdrop n denote the OLS estimator of θ drop in (6) using only observations with R i = 1.By construction, the jth entry of the OLS estimator of the projection coefficient of D i on the pair fixed effects is

n
seems non-trivial to characterize and may in fact feature an asymptotic bias in general.For this reason we leave an in-depth study of the limiting distribution of θdrop n to future work.
every s ∈ S.Assumption 3.4(a) simply requires that treatment assignment be exogenous conditional on the strata labels.Assumption 3.4(b) formalizes the requirement that the assignment mechanism performs treatment assignment so as to achieve "balance" within strata.Assumption 3.4(b) is a relatively mild assumption which is satisfied by most stratified randomization procedures employed in field experiments: seeBugni et al. (2018) for examples.

Figure 1 :
Figure 1: Average absolute percentage difference for "Original" vs "Alternative" point estimates.Average attrition rate, defined as [number of individuals with missing outcome / total number of individuals] is reported in parentheses below each author label.

Table 1 :
Groh and McKenzie (2016)ained from Empirical Application:Groh and McKenzie (2016) Groh and McKenzie (2016)isted in Table7ofGroh and McKenzie (2016), we report (a) the original estimates obtained in paper ("Original"), (b) the estimate on treatment status without pair fixed effects ( θn), and (c) the attrition rate in % by outcome, defined as [number of individuals with missing outcome / total number of individuals].The regression specifications here include baseline covariates; see Table2for analogous results without baseline covariates included.

Table 2 :
Groh and McKenzie (2016)timates Obtained from Empirical Application:Groh and McKenzie (2016)Note: For each outcome regression specification listed in Table7ofGroh and McKenzie (2016), we report (a) the original estimates obtained in paper ("Original"), (b) the estimate on treatment status without pair fixed effects ( θn), and (c) the attrition rate in % by outcome, defined as [number of individuals with missing outcome / total number of individuals].The regression specifications here exclude baseline covariates from the authors' original work.

Table 3 :
Additional notes about each paper used in Figure1: For each paper considered in Section 4.2, we list the corresponding table/figure and specification(s) replicated in the second column.We include relevant notes for each application in the third column. Notes