A comment on Dincecco et al. (2022): Pre-colonial warfare and long-run development in India

. We test the reproducibility and replicability of M. Dincecco, J. Fenske, A. Menon and S. Mukherjee (2022), which reports a positive relationship between pre-colonial interstate warfare and long-run development patterns across India. Overall, we conﬁrm that all of the study’s estimates are computationally reproducible using the provided replication package in Stata, but note that the ease of replication could be improved by the provision of code and intermediate data sets for the conﬂict exposure measure. We test for and ﬁnd no evidence of data manipulation in the ﬁnal data sets. Concerning direct replicability, we consider diﬀerent ways of measuring distance to conﬂicts and also alternative proxies for both the dependent variable and variables that capture channels by which the main eﬀects operate. We ﬁnd that some estimates are sensitive to the type of conﬂict considered. Other estimates are sensitive to the time period considered, most likely due to time heterogeneity in the number of conﬂicts recorded. Nevertheless, most estimates are substantially in line with the original study. Résumé. Commentaire sur Dincecco, Fenske, Menon et Mukherjee (2022) : guerres précoloniales et développement à long terme en Inde. Nous testons la reproductibilité et la répétabilité


Introduction
D incecco et al. (2022) investigate the relationship between pre-colonial interstate warfare and long-run development patterns across India.They construct a new geocoded database of historical conflicts on the Indian subcontinent and find a robust positive relationship between pre-colonial land battle conflict exposure and economic development.In their preferred specification, the authors find that pre-colonial conflict exposure to land battles within 250 km of the district centroid between the years 1000 and 1757 is associated with increased contemporary economic development as measured by district-level luminosity averaged between 1992 and 2010.They argue that districts more exposed to pre-colonial conflict experienced greater early state-making that increased the powers of local government institutions.The greater power a local government institution held, the more the promotion of local long-term economic development through the provision of domestic security and investments in physical and human capital.In the long run, the authors argue, this led to higher levels of development and less political violence.
The goal of our study is to replicate all of the results of Dincecco et al. (2022) to add further extensions as well as robustness checks to the study.We define a positive replication as an estimate of the same sign (positive/negative) and significance (significant/not significantly different from zero) as that reported the original paper.This definition of course precludes difference in the magnitudes of estimates, which we discuss in the text.Our study first successfully replicates 100% of the main findings of the authors directly using code in Stata and data provided by the original study.Given that raw data is not provided by the authors, we calculate the distributions of first digits in the prepared data provided by the study and compare them with the distributions we would expect from non-manipulated data.Using this technique, we find no evidence of manipulation of the data in 100% of tests.
To assess direct replicability, we use Dincecco's et al. (2022) methodology on alternative data.Firstly, we consider alternative proxies for conflict exposure using data that are provided by the authors, but not used in the original paper.We find that some estimated coefficients on conflict exposure are sensitive to the type of conflict (60.42% of results replicated) and time period over which the conflict exposure is considered (66.67% of results replicated).We also explore alternative ways of measuring conflict exposure using data from the Historical Conflict Event Dataset (Miller and Bakar 2023).Here, we are able to replicate the sign and significance of estimates in 100% of alternative measures of conflict exposure.We also find that the number of conflicts varies greatly with time, with most recorded conflicts being registered in period 1500-1757, the period of the Mughal empire.We examine the extent to which time heterogeneity in recorded conflicts translates to time heterogeneity in Dincecco's et al. (2022) main results and channels by which their main effects operate.
Finally, we re-examine one of the channels by which Dincecco's et al. (2022) main effects operate, the relationship between pre-colonial conflict exposure and contemporaneous political violence levels.To do so, we use an alternative proxy for political violence using new data from the Uppsala Conflict Data Program (Sundberg and Melander 2013).We find that the results for this channel are sensitive to the choice of proxy and time period considered in 50% of tests.
15405982, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/caje.12693 by Nes, Edinburgh Central Office, Wiley Online Library on [07/11/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License The remainder of the paper is organized as follows.Section 2 discusses the results of tests of computational reproducibility.Section 3 covers tests of direct replicability and section 4 concludes.

Computational reproducibility
This section considers the computational reproducibility, or the ability to duplicate the results of a study using the same data and procedures as were used by the original investigators.We show reproducibility of the main results using the Stata code provided by Dincecco et al. (2022) in section 2.1.Secondly, in section 2.2, we test for data manipulation in the data sets provided in Dincecco et al. (2022) because only final data sets were provided by the authors and we find no evidence of data manipulation.

Stata reproducibility
Both the code and full data sets are provided by the authors and published on the Economic Journal website, available at https://doi.org/10.5281/zenodo.5583263.We reproduce the paper's main result, an ordinary least squares regression of luminosity on pre-colonial conflict exposure, and main robustness check, which uses two stage least squares and instruments for pre-colonial conflict exposure.
For this analysis, we rely on the same specifications as Dincecco et al. (2022), the OLS specification is Y i,j = βConflictExposure i,j + λPopDensity i,j + μ j + X i,j φ + i,j , ( where i indexes districts in equation ( 1) and j indexes states in modern-day India.Y i,j measures local economic development in terms of luminosity, ln(0.01 + Luminosity i,j ).ConflictExposure i,j measures pre-colonial conflict exposure, the variable of interest.PopDensity i,j controls for log population density, μ j are state fixed effects and X i,j is a vector of controls for geographic features including latitude, longitude, altitude, ruggedness, precipitation, land quality, dry rice suitability, wet rice suitability, wheat suitability and malaria risk.
Table 1 shows the paper's main results, reproduced by using the authors' Stata code (original study).Table 2 reproduces the paper's main robustness check, which uses a district's proximity to the Khyber Pass as instrument for pre-colonial conflict exposure because the Khyber Pass was the main route for invaders coming from Central Asia to India.We found no discrepancies between the original paper and the Stata reproductions.Dincecco et al. (2022) provide the final data sets, but not the base data sets nor the code to create them.As such, it was not possible to check data definitions or to recode key variables.Instead, we compare the relative frequency distribution of leading digits in each data set against Benford's law, which states that the proportion p n of observations beginning with first significant digits n = 1, 2, . . ., 9 is approximately log 10 (n + 1).This law is will hold where data are scale-invariant, i.e., independent of the units chosen.1 Comparing data with this theoretical distribution is a technique used to look for fraud and manipulation in financial records and other data sets in forensic accounting.This test is most likely to capture    data fraud under the condition that observations have been added, edited, or removed in a way that does not conform to the Benford distribution; see Durtschi et al. (2004) for a discussion.We calculate distributions for scale-invariant variables in all 19 data sets provided by Dincecco et al. (2022) and compare these distributions to the expected distribution.

Distribution of first digits
Figure 1 shows examples of calculated distributions compared with the expected distributions for the variables used in the main specifications of Dincecco et al. (2022).Following da Silva Azevedo et al. (2021) we define a calculated distribution as not conforming to the expected distribution when the mean squared error (MSE) is in excess of 0.015. 2 Panels A to C, which show measures of Luminosity, Conflict Exposure and Population Density, all exhibit MSEs less than the critical value and with distributions that clearly align to the expected frequencies.Panel D shows an example of data in which the calculated frequencies of first digits do not align to the Benford distribution.Because this is latitude data, we would not expect it to do so, given that the Indian subcontinent can lie only within given latitudes, i.e., this data is not scale-invariant.Repeating this process for all scale-invariant variables in the 19 data sets provided by Dincecco et al. (2022), we find no evidence of data manipulation.

Direct replicability
In this section, we test the ability to duplicate the results of Dincecco et al. (2022)   appendix).They also explore a variable end-date cut-off which allows them to include exposure to conflicts after 1757 but before British conquest of a district (table A.16, online appendix).In both cases, they find that the coefficient estimates are very similar in magnitude and significance to the main estimates across both checks.This section explores further robustness checks by using alternative measures of exposure to conflict. 3We begin by replicating the measure used in the paper, and then consider several alternative measures.Dincecco et al. (2022) define the exposure to conflict as the sum of the inverse distance between each district centroid and pre-colonial conflicts that occurred between 1000 and 1757 within a radius of 250 km: where distance i,c is measured from the centroid of district i to the location of conflict c.We use data from the Historical Conflict Event Dataset (HCED) of Miller and Bakar (2023) and construct the measure of exposure to conflict in equation ( 2), keeping only land battles that occurred in the Indian subcontinent4 between 1000 and 1757 and within 250 km.We calculate the geographical distance for each district and conflict pair.We do so by measuring the length of the shortest path between the two points along the surface the earth, as approximated by the method of Vincenty (1975) to calculate distances on a reference ellipsoid.Figure 2 compares distributions of the original and reconstructed measures using histograms.They are not identical but close in distribution, with a notably fatter right tail in the replication.Table 6 compares the number of HCED conflicts with the number of conflicts calculated using data provided by Dincecco et al. (2022).5There are fewer conflicts in total in the HCED data, concentrated between the periods 1000-1100 and post-1500.
In columns (1) and (2) of table 4, we compare how the results of the OLS and IV specifications differ when we use the reconstructed measure (panel B) in place of that provided by (Dincecco et al. 2022, panel A).The main parameter of interest, i.e., the effect of exposure to conflict on present-day development, is still positive and statistically significant but at a marginally smaller magnitude.As discussed in the preceding paragraph, measures using HCED data do not exactly match those of Dincecco et al. (2022).Dincecco et al. (2022) also do not clarify the exact algorithm that they rely on to measure distance.To understand whether the different data or different distance measurements are driving the change in magnitudes, we repeat the analysis using data from Dincecco et al. (2022)   that significance, signs, and magnitudes are almost exactly equal.We conclude that it is therefore most likely that the differences in the data that are driving the slight differences in estimated magnitudes.Using HCED data, we now summarize the alternative measures of distance to conflict that we consider.In equation ( 2), Dincecco et al. (2022) add one to distance i,j before taking the inverse because not including this would mean that a district in which a conflict took place very near to the centroid would receive a large conflict exposure value, regardless of its proximity to any other conflicts. 6Nevertheless, this measure still gives a lot of weight to nearby conflicts and is not unit-invariant in relative terms.In order to explore the sensitivity of estimates to variation induced by smaller distances, we first explore how the results may change if we scale distances by factor u > 0, i.e., c∈C As u increases, the weight put on a given conflict decays less rapidly as its distance from the centroid increases i.e., conflicts that are further away contribute more to conflict exposure 7 .In the first and second alternative measures, we use the baseline measure as in equation ( 2) but explore using different distance units: 10 km and 100 km.If originally a distance from conflict a to district centroid b measured x kilometres, we now measure x/10 kilometres and x/100 kilometres, respectively.Here, we still follow their benchmark of including conflicts that occurred between 1000 and 1757, and within 250 km of the district.Comparing the the OLS and IV specifications using these measures, panels C and D compared with panel B of table 4 show that estimates are decreasing in magnitude in both the OLS and IV specifications, respectively, as u increases, however, the standardized coefficients are broadly unchanged and the sign and significance of the estimates are unaffected.
Secondly, we explore a Gaussian transformation of distance i,c (measured in 100 kilometres): which captures distance decay, a measure used in geography to describe the decline of influence on cultural or spatial interactions between places as distance increases (Pun-Cheng 2016).This measure of conflict exposure is positively correlated with the original exposure measure of Dincecco et al. (2022).Panel E of table 4 shows that the sign and significance of the estimated coefficients are unaffected, and the magnitude of the standardized coefficients are marginally higher.Finally, we define a simple count-based measure of exposure to conflict, which is calculated as the number of conflicts within 250 km of the district centroid.Note that this corresponds to the the limiting case in equation (3) as u → ∞.Panel G of table 4 shows that the sign and significance of the estimated coefficients are again unaffected, and magnitudes are marginally higher.
In conclusion, 100% of tests using alternative measures of conflict exposure replicate the sign, significance and are close in magnitude to the original paper's results.
6 The authors also show that the results are also robust to simply using the inverse of distance.
7 Re-expressing equation (3) as c∈C (k + distancei,c/u) − , we note that the same effect could be achieved by increasing the k term or by decreasing the absolute magnitude of the elasticity parameter, .

Different time periods for pre-colonial conflict exposure
The replication of section 3.2 highlighted the heterogeneity in the number of conflicts recorded per century in Dincecco's et al. (2022) data.In this section, we therefore test the sensitivity of Dincecco's et al. (2022) results to the period of time over which pre-colonial conflict exposure is measured.In table 5, we replicate the main OLS results (table 1 of Dincecco et al. 2022) and IV specification (table 2 of Dincecco et al. 2022), respectively, but break down the pre-colonial conflict exposure variable into 100 year periods.Concretely, row 1 is exposure during years 1000-1757 (ie, replicating the original study).Row 2 reports the estimates for the time period 1000-1100, row 3 for exposure during years 1101-1200 and so on until row 8 for years 1601-1700.Note that for the time period 1400-1500, estimates are omitted because there were no recorded conflicts in the Dincecco et al. (2022) data.We replicate the OLS results in terms of sign and binary significance in 100% of tests, and 83% of the IV specification tests.However, we see great heterogeneity in estimates over 100 year time blocks.In the OLS results of table 5, in later time blocks (1501-1600 and 1601-1700), we find estimates of a much smaller magnitude.Earlier time periods, 1000-1100 to 1301-1400, have estimated coefficients that are larger.We see a similar pattern in the IV results, and in this specification, time block 1601-1700 now shows an insignificant effect.We note that Dincecco et al. (2022) restrict the time period for the conflict data to the sub-period of 1500 to 1757 and report their main findings (i.e., replicating tables 1 and tables 2) in their online appendix (their tables A.13 and A.14) and report robust results.
Table 6 in our paper reports the number of pre-colonial conflicts, while table 7 reports the means of conflict exposure broken down by 100-year time periods.Both the number of conflicts and average exposure is much higher after 1500.This could mean that the observed pattern of coefficients could be due to a size effect owing to the lower number of conflicts before 1500, perhaps due to selection in the recording of conflicts.However, it could also be that conflicts before 1500 had a larger effect on current economic conditions due to decreasing returns to conflict exposure; either mechanism would rationalize smaller estimates.In order to analyze this further, we are able to examine the time heterogeneity in channels by which the main effect operates according to Dincecco's et al. (2022) theoretical framework.We therefore use this sub-period measure to estimate the remaining results of the paper, i.e., their tables 3 to 9.
Firstly, table 8 replicates table 3 in the original study, which estimates the relationship between pre-colonial state-making and pre-colonial conflict exposure.An important prediction of the theoretical framework of Dincecco et al. (2022) is a positive relationship between the two as measured by the number of important Mughal sites, and districts incorporated into the Mughal empire by rulers Babur and Akbar.We replicate the original study in the upper panel of table 8, and in the lower panel we use the later time period for pre-colonial conflict (years 1500-1757).We find a significant and positive relationship for important Mughal sites, and districts incorporated into the Mughal empire by rulers Babur and Aurangzeb.We also note that the estimated magnitude of the effect is greater during the later years, 1500-1757, than for earlier years.
Table 9 replicates the regression of colonial fiscal development on pre-colonial conflict exposure.Dincecco et al. (2022) report a positive and significant relationship between 1000-1757, which the authors argue is suggestive evidence that pre-colonial conflict exposure played a role in colonial-era state-making.Using the later time period for conflict exposure measure provided in the original study, 1500-1700, in row 2, we show results that are broadly consistent with the measures examined in the original study.Such measures include different scaling of the available tax revenue in 1881, by area and persons across states with direct rule (British India) or indirect rule (Princely states) and tax revenue in 1931 scaled by area (3) (4) (5)
The dependent variable is ln(0.01+ Luminosity) averaged between 1992 and 2010.Variable of interest is pre-colonial exposure to land battles between 1000 and 1757, with rows indicating the time period used in each specification in 100 year blocks.Conflict exposure during the period 1400-1500 is omitted as no land battles occurred during this time in the data.Geographic controls include latitude, longitude, altitude, ruggedness, precipitation, land quality, dry rice suitability, wet rice suitability, wheat suitability and malaria risk.Population density is ln(Population Density) in 1990.Robust SEs in parenthesis are calculated using the robust command in Stata.Significant at the ***1%, **5%, *10% levels.and by person.Again, we find that estimated magnitudes are marginally greater during the later years, 1500-1757.Table 10 replicates table 5 of Dincecco et al. (2022), which examines the relationship between pre-colonial conflict, colonial and post-colonial conflict.The authors find a positive and significant relationship with colonial conflict exposure between 1758-1839, a negative relationship with post-colonial conflict exposure and no relationship with colonial conflict exposure during 1840-1946.We find these results are robust to the use of the later time period for pre-colonial conflict exposure and that estimated magnitudes are greater during the later years, 1500-1757.
Row 1 of table 11 reports the replication of table 6 of Dincecco et al. (2022), which estimates a negative relationship between pre-colonial conflict and post-colonial political violence.The authors argue that reductions in political violence should result from greater colonial state-making.Row 2 shows the replication with the subset of conflicts, finding similar results.While the relationship with linguistic fractionalization (column (2)) is no longer significant, the relationship with political violence remains robust.Note that for column (2), which measures local Maoist control in 2003, the number of observations in the replication is lower than the original study (293 versus 395) because this data set does not include the later time period data for pre-colonial conflict, and we are able to match  observations for only a subset of the original data.8Nevertheless, we estimate a relationship that is similar to the original study.Table 12 reports the results from replicating Dincecco's et al. (2022) table 7, estimating the relationship between pre-colonial conflict and irrigation infrastructure.They find a large positive relationship which they argue is consistent with their theoretical framework which predicts greater state-making for areas with more conflict exposure, resulting in more investment in physical capital.The relationship with the share of non-agricultural workers in 2011 (% Non-agriculture, column (4)) remains robust to the later time period for pre-colonial conflict exposure and is higher in magnitude for the later time period.However, column (1) shows that the positive relationship with the proportion of area sown with canal irrigation in 1931 (% Irrigated) is no longer significantly different from zero when using the later time period for conflict exposure.In columns (2) and (3), we are not able to directly replicate Dincecco's et al. (2022) results as the data containing irrigation rates and crop yields does not contain the conflict exposure data for 1500-1757.We are able to match data for only 208 of the original 271 observations.With this subset of data, the relationship with irrigation rates averaged between 1956 and 1987 (column (2)), and the relationship with crop yield  NOTES: Estimation method is OLS using data from Dincecco et al. (2022).Unit of analysis is district.Dependent variable is colonial conflict exposure to land battles between 1758 and 1839 in column ( 1) and to all conflict types in column (2).Similarly, it is colonial conflict exposure between 1840 and 1946 in columns ( 3) and ( 4), and post-colonial conflict exposure between 1947 and 2010 in columns ( 5) and ( 6).Variable of interest is pre-colonial conflict exposure to land battles between 1500 and 1757.Geographic controls include latitude, longitude, altitude, ruggedness, precipitation, land quality, dry rice suitability, wet rice suitability, wheat suitability and malaria risk.Population density is ln(PopulationDensity) in 1750 in columns ( 1) and (2), in 1850 in columns ( 3) and ( 4) and in 1950 in columns ( 5) and ( 6).Robust SEs in parenthesis are calculated using the robust command in Stata.Significant at the ***1%, **5%, *10% levels.

NOTES:
Estimation method is OLS using data from Dincecco et al. (2022).Unit of analysis is district.Dependent variables are as follows: % Irrigated measures the proportion of area sown with canal irrigation in 1931 (column (1)) and the proportion of gross cropped area that is irrigated averaged between 1956 and 1987 (column (2)), ln(Yield) measures the total yield across 15 major crops averaged between 1956 and 1987 (column(3)) and % Non-agriculture measures the share of non-agricultural workers in 2011 (column (4)).Variable of interest is pre-colonial conflict exposure to land battles between 1000 and 1757.Geographic controls include latitude, longitude, altitude, ruggedness, precipitation, land quality, dry rice suitability, wet rice suitability, wheat suitability and malaria risk.Population density is ln(PopulationDensity) in 1900 in column (1), in 1950 in columns ( 2) and ( 3) and 1990 in column (4).Note that the number of observations in columns (2) and (3) differ between the original study and the replication due to data unavailability, see section 3.3 for a discussion.Robust SEs in parenthesis are calculated using the robust command in Stata.Significant at the ***1%, **5%, *10% levels.
(column (3)) is no longer significantly different from zero.However, it impossible to ascertain whether this is due to the later time period considered for conflict exposure, or due to missing observations.Table 13 replicates table 8 in Dincecco et al. (2022), which estimates the relationship between pre-colonial conflict exposure and district-level literacy rates under British colonial rule.Dincecco et al. (2022) estimate no relationship for literacy rates in 1881 and 1921 but strong positive relationships in 1961-1991 and 2011 (upper panel of table 13).We are able to directly replicate the results in columns 1 and 2 using the conflict exposure data from 1500 to 1757 provided by the authors, and also find no significant relationships.For literacy rates in 1961-1991 and 2011, the authors do not provide sufficient data in the replication package to create these estimates, as the data containing literacy rates does not include conflict exposure data for the limited time period.We create this variable but are unable to recover as many observations as the original study for columns (3) (we recover 264 observations versus 271 in the original study) and (4) (541 recovered versus 626 in the original study).Using this subset of data, we no longer find a significant relationship between literacy rates and pre-colonial conflict exposure, however we are unable to disentangle whether this is due to the effect of dropped observations or the conflict exposure period used.
Finally, table 14 re-examines the relationship between pre-colonial conflict exposure and education provision by district.In column (1), the variable of interest is the proportion of villages with primary schools in 1981, column (2) measures the proportion with high   3) is the infant mortality rate in 1991.Dincecco et al. (2022) find a positive relationship with primary education provision, and negative relationships with high school provision and infant mortality rates.They interpret this finding as conflict exposure promoting state-making which favoured investment in basic human capital at the expense of advanced human capital investments.We estimate a relationship of the same significance for the Mughal empire period only for primary provision; the coefficients on high school and infant mortality are not significantly different from zero.We also not note that the magnitude of the coefficient on primary education is much higher in the later time period.However, as again we are unable to match all observations, we are cannot comment whether the differences are due to reduced observations or the effects of time period heterogeneity.
To further understand the importance of time heterogeneity in the number of conflicts recorded by century discussed in 3.2, this section replicated the results of Dincecco et al. (2022) over 100 year time blocks, finding heterogeneous time effects.In particular, restricting the sample to the periods 1501-1600 and 1601-1700, the time of the Mughal empire, resulted in estimates of a much smaller magnitudes.Re-examining the mechanisms via which the authors effects operate for the period 1500-1757, we find estimates of the same sign and significance during this time period in 66.67% of replications.Furthermore, estimated magnitudes for estimates of channels are in general higher for this time period, providing suggestive evidence that the heterogeneous time effects estimated in table 5 are more likely due to size effects than diminishing returns to conflict exposure.

Alternative political violence data source
One of the three predictions of Dincecco's et al. (2022) conceptual framework is that districts that were more exposed to pre-colonial conflict would experience lower political violence in the long-term as a result of greater state-making.In this section, we use alternate conflict data provided by the Uppsala Conflict Data Program (UCDP) (Sundberg and Melander 2013) to re-estimate the relationship between organized violence per district between 2015 and 2018, and pre-colonial conflict exposure.The UCDP data covers individual events of organized violence, which they define as the phenomena of lethal violence occurring at a given time and place.
We use the same time period in the UCDP as Dincecco et al. (2022) use with their ACLED data source, 2015 to 2018.During this time period, there were incidents in 36 states and 659 districts, according to ACLED data used by Dincecco et al. (2022).The UCDP data includes fewer incidents in 28 states and 288 districts during the same time period.We present the results using this different conflict data source in columns (2) and (3) of table 15.Using the UCDP data, we no longer find a negative and significant relationship between pre-colonial conflict exposure and contemporary political violence that happened using the same time period (between 2015 and 2018) and geographical level (district) as the Dincecco et al. (2022) paper.If we include conflicts between 2001 and 2021 (column (3), table 15), we do confirm the negative and significant relationship but note that it is sensitive to changes in time period considered.

Conclusion
Table 16 consolidates the results of all tests discussed in this paper.We confirm direct reproducibility of 100% of the main results using both the provided replication package in Stata (table 16, computational reproduction).By testing the distribution of first digits and comparing it to an expected distribution, we find no evidence of data manipulation in any data sets provided by the authors (first digits).For direct replicability, we consider  alternative measures of conflict exposure using the Historical Conflict Event Dataset (Miller and Bakar 2023).Because Dincecco et al. (2022) do not include code and intermediate data for their conflict exposure measure, we are unable to replicate an exact match for the measure of conflict exposure.However, we replicate the sign and significance of the original findings in 100% of tests.We also examine alternative proxies for conflict exposure provided by the authors.Dincecco et al. (2022) argue that such alternative proxies were more likely to capture battles that affected the capital stock, diminishing the proposed mechanisms.Nevertheless, in 60.42% of tests (total, alternative conflict proxies) we are able to replicate the sign and significance reported by Dincecco et al. (2022), but note that some magnitudes of estimates also differ from the original paper.When considering different time periods between years 1000 and 1757 for the conflict exposure proxy, we are able to replicate results in 91.67% of tests (total, different time periods (main results), but report heterogeneity over in the magnitude of estimated coefficients along 100 year time blocks with larger estimates concentrated in the pre-1500 time period.Analysis of the number of conflicts and mean of conflict exposure shows that there is some evidence for a size effect given that many more conflicts are recorded in the post-1500 period; however, another explanation could be diminishing returns to conflict exposure in later periods.To examine this latter explanation, we replicate all results focusing on the channels by which their main results operate for the 1500-1757 period.In this analysis, we replicate their results in 66.67% of tests (total, different time periods [mechanisms results]).For pre-colonial era state-making, colonial fiscal development, post-colonial conflict and post-colonial violence we find relationships that are broadly aligned with those estimated by Dincecco et al. (2022).
For irrigation infrastructure, literacy, presence of high schools and infant mortality, we do not find significant results using a later time period for conflict exposure.However, we note that a number of these results may be replicable if the authors provided a crosswalk for state and district between all data sets in their replication package.In general, we find higher magnitudes for estimates in the post-1500 period, suggestive evidence that size effects may play more of a role in time heterogeneity in the effects of conflict exposure on

NOTES:
We define a positive replication as an estimate of the same sign (positive/negative) and significance (significant/not significantly different from zero) as that reported the original paper.This definition of course precludes difference in the magnitudes of estimates, which we discuss in the main text.
economic development than diminishing returns to exposure.Using an alternative proxy for political violence from the UCDP (Sundberg and Melander 2013), we are not able to replicate estimates using the time period for the dependent variable used in Dincecco et al. (2022) but are able to replicate the findings using a larger window including more recent data.Taken together, we confirm direct replicability in 69.52% of direct tests (total direct).Taking all results together (grand total), we find that 75.38% of tests have a positive replication result.Contributors to economic journal are required to provide all the components necessary for others to duplicate the results of a study using the same materials and procedures as were used by the original investigator.Having this comprehensive criteria of the journal in mind, and based on the replication package provided by the authors, we argue that ease of replicability could be increased by the inclusion of a crosswalk linking observations in main and auxiliary data sets used in the channels analysis, and the provision of code and intermediate data for the construction of the conflict exposure measure.However, we conclude that the results of Dincecco et al. (2022) are replicable and the replicated estimates are substantially in line with the original study.

FIGURE 1
FIGURE 1 Empirical Benford distributions NOTES:Panel A: Proxy for economic development as measured by luminosity the dependent variable, exponentiated to recover the distribution of digits.These data fit the expected distribution with a mean squared error (MSE) of 0.0003.Panel B: Conflict exposure, the main variable of interest, as measured by land battles between 1000 and 1757 within a radius of 250 km.These data fit the expected distribution with a mean squared error (MSE) of 0.0003.Panel C: Population density, control variable, exponentiated to recover the distribution of digits.These data fit the expected distribution with a mean squared error (MSE) of 0.0002.Panel D: Latitude as measured using district centroids.These data fit the expected distribution with a mean squared error (MSE) of 0.03, which fails the Benford test, which we would expect given the assigned nature of the data.

TABLE 1
Dincecco et al. (2022)and economic development: Main results Estimation method is OLS using data fromDincecco et al. (2022).Unit of analysis is district.The dependent variable is ln(0.01+ Luminosity) averaged between 1992 and 2010.Variable of interest is pre-colonial exposure to land battles between 1000 and 1757.Geographic controls are latitude, longitude, altitude, ruggedness, precipitation, land quality, dry rice suitability, wet rice suitability, wheat suitability and malaria risk.Population density is ln(Population Density) in 1990.Robust SEs in parenthesis are calculated using the robust command in Stata.Significant at the ***1%, **5%, *10% levels.

TABLE 2
Dincecco et al. (2022)d is 2SLS using data fromDincecco et al. (2022).Unit of analysis is district.In panel A (first stage), the dependent variable is pre-colonial conflict exposure to land battles between 1000 and 1757 and the variable of interest is proximity to the Khyber Pass.In panel B (second stage), the dependent variable is ln(0.01+ Luminosity) averaged between 1992 and 2010.Variable of interest is pre-colonial exposure to land battles between 1000 and 1757.Geographic controls are latitude, longitude, altitude, ruggedness, precipitation, land quality, dry rice suitability, wet rice suitability, wheat suitability and malaria risk.Population density is ln(Population Density) in 1990.Robust SEs in parenthesis are calculated using the robust command in Stata.Significant at the ***1%, **5%, *10% levels.

. Alternative measures of conflict exposure Dincecco
et al. (2022)define the exposure to conflict as the sum of the inverse distance between each district centroid and pre-colonial conflicts.As robustness checks, the authors further use alternative radii cutoffs to define conflict exposure(table A.15in the online 15405982, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/caje.12693 by Nes, Edinburgh Central Office, Wiley Online Library on [07/11/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Distribution of the conflict exposure measure by district provided by Dincecco et al. (2022) (solid grey) versus replication using data from the Historical Conflict Event Dataset from Miller and Bakar (2023) (clear with black outline.) and report the estimates in panel B of table 4. Because Dincecco et al. (2022) provide a crosswalk for only the measure All conflicts, the relevant comparison is row 2 of table 3.Here we see

TABLE 6
Dincecco et al. (2022) exposure to land battles from 1001 to 1757 within 250 km of district centroid calculated over districts by century using data fromDincecco et al. (2022).Calculations by the present authors.
Dincecco et al. (2022)LS using data fromDincecco et al. (2022).Unit of analysis is district.Dependent variables are as follows.ln(Tax/Area), 1881 and ln(Tax/Person), 1881 measures land revenue in 1,000 rupees per square kilometre or per capita, in 1881 for districts under direct British rule and/or indirect rule (i.e., major Princely states).ln(Tax/Acre), 1931 and ln(Tax/Person), 1931 measures average land revenue in rupees per acre or per capita, in 1931 for districts in British India.Variable of interest is pre-colonial conflict exposure to land battles between 1000 and 1757.Geographic controls include latitude, longitude, altitude, ruggedness, precipitation, land quality, dry rice suitability, wet rice suitability, wheat suitability and malaria risk.Population density is Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/caje.12693 by Nes, Edinburgh Central Office, Wiley Online Library on [07/11/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License , 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/caje.12693 by Nes, Edinburgh Central Office, Wiley Online Library on [07/11/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License the Herfindahl index of language population shares in 2001.Dependent variable in column (4) is Religious fractionalization, defined as 1 minus the Herfindahl index of religion population shares in 2001.Variable of interest is pre-colonial conflict exposure to land battles between 1500 and 1757.Geographic controls include latitude, longitude, altitude, ruggedness, precipitation, land quality, dry rice suitability, wet rice suitability, wheat suitability and malaria risk.Population density is ln(PopulationDensity) in 1990.Robust SEs in parenthesis are calculated using the robust command in Stata.Significant at the ***1%, **5%, *10% levels.15405982 Dincecco et al. (2022)LS using data fromDincecco et al. (2022).Unit of analysis is district.Dependent variables are as follows: % Literacy 1881 is the proportion of "literate" persons in 1881, % Literacy 1921 is the proportion of persons that can read and write in 1921, % Literacy 1961-1991 is the literacy rate averaged between 1961 and 1991 and % Literacy 2011 measures the adult literacy rate across both rural and urban populations for ages 7 and above.Variable of interest is pre-colonial conflict exposure to land battles between 1000 and 1757 (top panel, direct replication of original study) and exposure to land battles between 1500 and 1757.Geographic controls include latitude, longitude, altitude, ruggedness, precipitation, land quality, dry rice suitability, wet rice suitability, wheat suitability and malaria risk.Population density is ln(PopulationDensity) in 1850 in column (1), in 1900 in column (2), in 1950 in column (3) and in 2011 in column (4).Observations differ between the direct replication (top) panel and the lower panel due to missing data in the replication package, see section 3.3 for a discussion.Robust SEs in parenthesis are calculated using the robust command in Stata.Significant at the ***1%, **5%, *10% levels.
Dincecco et al. (2022)LS using data fromDincecco et al. (2022).Unit of analysis is district.Dependent variables are as follows: % Primary measures the proportion of villages having a primary school in 1981, % High measure the proportion of villages having a high school in 1981 and % Infant mortality is the infant mortality rate in 1991.Variable of interest is pre-colonial conflict exposure to land battles between 1000 and 1757.Geographic controls include latitude, longitude, altitude, ruggedness, precipitation, land quality, dry rice suitability, wet rice suitability, wheat suitability, and malaria risk.Population density is ln(PopulationDensity) in 1950 in columns (1) and (2), and in 1990 in column (3).Observations differ between the direct replication (top) panel and the lower panel due to missing data in the replication package, see section 3.3 for a discussion.Robust SEs in parenthesis are calculated using the robust command in Stata.Significant at the ***1%, **5%, *10% levels.15405982, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/caje.12693 by Nes, Edinburgh Central Office, Wiley Online Library on [07/11/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License schools and column ( 15405982, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/caje.12693 by Nes, Edinburgh Central Office, Wiley Online Library on [07/11/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License

TABLE 15
Dincecco et al. (2022)and post colonial political violence Estimation method is OLS using data fromDincecco et al. (2022).Unit of analysis is district.Column (1) replicates the results.The dependent variable is Political violence, defined as fatalities per district between 2015 and 2018 (in hundreds) using ACLED data.The dependent variable in columns (2) and (3) is Organized violence, defined as fatalities per state (in hundreds) based on the UCDP data.Column (2) focuses on the time from 2001 to 2021 and column (3) on the time from 2015 until 2018.The variable of interest is Pre-colonial conflict exposure to land battles between 1000 and 1757.Geographic controls include latitude, longitude, altitude, ruggedness, precipitation, land quality, dry rice suitability, wet rice suitability, wheat suitability and malaria risk.Population density is ln(PopulationDensity) in 1990.Robust SEs in parenthesis are calculated using the robust command in Stata.Significant at the ***1%, **5%, *10% levels.