Revisiting the size – productivity relationship with imperfect measures of production and plot size

Monitoring smallholder agricultural productivity growth, one of the targets of the Sustainable Development Goals, rests on accurate measures of crop production and land area. Existing methods and protocols for measuring smallholder production and plot size are prone to various sources and forms of mismeasurement. Inaccuracies in production and land area measurement are likely to distort descriptive and predictive inferences. We examine the sensitivity of empirical assessments of the relationship between agricultural productivity and land area to alternative measurement protocols. We implement six production and six land area measurement protocols, and show that most of these protocols differ systematically in their accuracy. We find that an apparent inverse size – productivity relationship in our data is fully explained by measurement error in both production and plot size. Moreover, we show that some of the previously used “ gold standard ” measures are themselves prone to non-classical measurement error, and hence can generate spurious inverse size – productivity findings. Our results also show that slight improvements in the precision of objective measures significantly reduce the inferential bias associated with the size – productivity relationship.


| INTRODUCTION
Smallholder agriculture continues to account for a large proportion of income and employment in rural areas in developing countries (e.g., Davis et al., 2010Davis et al., , 2017. 1 Growth in agricultural productivity is a fundamentally important channel for structural transformation and poverty reduction (Christiaensen et al., 2011;Diao et al., 2010;Timmer & Akkus, 2008). Quantifying progress toward productivity targets is complicated by difficulties in sourcing data that accurately measure the key agricultural metrics necessary to fully understand the factors determining the productivity of smallholder farmers, especially in the context of Africa. The implication of this is arguably best illustrated with reference to one of the most widely debated empirical regularities in development and agricultural economics: the inverse land size-productivity relationship. An emerging literature shows that the empirically observed plot-level inverse relationship between size and productivity can be partly, or even wholly, explained by systematic measurement error in either self-reported plot area or production estimates (Abay et al., 2019;Desiere & Jolliffe, 2018;Gourlay et al., 2019). 2 The debate on the empirical foundations and relevance of the size-productivity relationship remains far from settled for several reasons. First, the benchmark production and area measurement protocols used in previous studies are themselves not free from potential mismeasurement. For example, most studies use subplot crop cuts for approximating total production, whereas recent studies show that some types of subplot crop cuts can suffer from nonclassical measurement error Sida et al., 2021). 3 Similarly, although land area measurement using global positioning systems (GPS) brings significant improvements over self-reported data (Carletto et al., 2015Dillon et al., 2019;Kilic et al., 2017), the consumer grade GPS devices used in many of these studies are likely to suffer from some inaccuracies when applied to smaller plots (Dillon et al., 2019;Keita & Carfagna, 2009). Second, a number of recent studies have argued that the relationship between land area and agricultural productivity depends on the choice of agricultural productivity indicator used (Arag on et al., 2022;Ayaz & Mughal, 2023;Garz on Delvaux et al., 2020;Helfand & Taylor, 2021). 4 Third, a recent literature highlights the role of the edge effect, whereby productivity is higher along the plot perimeter, in explaining why small farms (and plots) appear more productive (Bevis & Barrett, 2020). It is the first of these issues on which we focus.
We revisit the sensitivity of empirical assessments of the size-productivity relationship to different measurement protocols for crop production and plot size. We implement a comprehensive list of production and area measurement protocols, including those employed by the above studies, as well as more accurate and improved measurement protocols, on 300 maize plots during the 2018 harvest period in three districts of the Amhara region of Ethiopia. Specifically, on each plot, we evaluate six alternative protocols for measuring maize production 5 : (i) farmers' self-reported production estimates; (ii) M-W walk cob sampling, which estimates production based on three randomly selected cobs from sampling points along imaginary diagonal lines connecting the longest two sides of the plot (approximating the shape of the letter M or W); (iii) transect cob sampling, which estimates production based on three randomly selected cobs from four sampling points defined at equal intervals on a transect that bisects the field across its longest horizontal axis 6 ; (iv) 4 Â 4 meter random quadrat crop cuts 7 ; (v) three 4 Â 4 meter diagonal quadrats regularly spaced along the longest diagonal of the plot; and (vi) a full plot harvest. We also consider six alternative land area measures: (i) farmers' self-reported estimates; (ii) estimates from low-cost old generation consumer-grade GPS receivers that have frequently been used in field data collection by research organizations over the past decade; (iii) estimates from single-frequency mobile phone GPS receivers; (iv) estimates from dual-frequency mobile phone GPS receivers; (v) compass-and-rope estimates; and (vi) total station theodolite measurement (henceforth, "total station"). 8 Our study is the first to jointly evaluate measurement error associated with all the main available measurement techniques for both production and area simultaneously. Our key point of departure is that we use the most accurate available production and area measures as our benchmark. Full plot harvest protocols provide the most accurate production data, and so this approach is used as our benchmark measure of production. Carefully recording measures from a complete harvest is, however, very costly, rendering this approach impractical in most survey settings (Lobell et al., 2020). Similarly, land area measurements derived from total station theodolites have become the new "gold standard." This approach, which estimates area on the basis of angles and distances measured with an accuracy of 1-2 millimeters, is commonly used for land surveying, engineering, and construction applications (e.g., Kavanagh & Bird, 1992;Kavanagh & Mastin, 2014), and hence serves as the benchmark for our land area measurement protocols. It is, however, also costly to implement in large household surveys. By comparing each of the other measurement protocols to these benchmarks, we evaluate the implications of measurement error associated with less precise measurement protocols for the estimation of the size-productivity relationship. 9 As shown by Abay et al. (2019), the empirical estimation of the size-productivity relationship is prone to nonclassical measurement error on both sides of the equation. However, the direction and extent of the bias in the parameter estimates depend on the nature of the measurement error in alternative measures of production and land area, as well as potential correlations between them. Thus, we first conduct an empirical characterization of measurement error associated with the different protocols, including some of those perceived as gold standards, and find that they suffer from systematic (nonclassical) measurement error. Interestingly, we find that all types of crop-cut sampling involving nonrandomly selected quadrats suffer from systematic measurement error, suggesting the prevalence of systematic differences in productivity within plots, as has also been shown in some recent studies (e.g., Bevis & Barrett, 2020;Kosmowski et al., 2021;Sida et al., 2021). We also find both negative and positive correlations between measurement error associated with the different protocols. These findings have important implications for inferences related to the size-productivity relationship. Empirically, we find that measurement error in both production and area entirely explains the observed inverse size-productivity relationship in our data; when using the combination of our most accurate measures for production (i.e., full plot harvest) and area (i.e., total station), this empirical relationship becomes statistically insignificant.
Our results are consistent with prior studies that suggest that the plot-level inverse sizeproductivity relationship is an artifact of mismeasurement in production and/or land area (Abay et al., 2019;Desiere & Jolliffe, 2018;Gourlay et al., 2019). Our use of a full plot harvest to evaluate the accuracy of alternative variants of crop-cut estimates indicates that, although other reference measures may perform well (especially relative to self-reported data), they still suffer from nonclassical measurement error. Similarly, our use of the current generation of total station estimates of land area support the consensus from the field literature in other disciplines that rope-and-compass measures may also suffer from systematic errors, albeit relatively small in magnitude in our sample (Almeida-Warren et al., 2021). Using these improved measures, we demonstrate that relatively imprecise "objective" measures (e.g., consumer grade GPS devices and nonrandomly selected subplot crop cuts) can generate a spurious inverse size-productivity relationship. Indeed, our findings clearly show that slight improvements in the precision of objective measures significantly reduce the bias associated with estimating the size-productivity relationship.
Although our results support some of the findings of prior studies, we also provide more nuance in some important respects. Most importantly, the implication of measurement error for statistical inference related to the size-productivity relationship depends on several parameters characterizing measurement error in production and land area, as well as potential correlations between these errors. These empirical patterns are likely to vary across different contexts, depending on the sources and drivers of measurement error. For instance, in contrast to Abay et al. (2019), the paper most similar to ours, which shows that if measurement error in production and land area are strongly and positively correlated, the "second-best" hypothesis of ignoring mismeasurement in both area and production may be superior to correcting only mismeasurement in one of them, we find negative correlations between pairwise alternative measures of area and production, which suggests that the "second-best" hypothesis does not hold in our sample. Our results thus suggest that the conclusions of Abay et al. (2019) do not apply in all contexts. 10 The rest of the paper is organized as follows. Section 2 provides more detail on the different measures of production and plot size considered and the nature of the measurement error associated with each. In Section 3, the data and summary statistics are presented. The empirical approach is outlined in Section 4. Section 5 presents the results, and Section 6 concludes.

| ALTERNATIVE MEASURES OF PRODUCTION AND PLOT SIZE AND ASSOCIATED MEASUREMENT ERROR
We use 12 measurement protocols, six measures of production and six measures of plot size. Each measurement protocol is described in detail in Table 1. Production measurement protocols are further illustrated in Figure 1.
Most agricultural data, including data on production and land area, come from farmers' selfreported estimates. In most cases, these data are collected as part of extensive multi-topic survey modules, which are usually administered in a single visit. Eliciting these measures does not require visiting individual plots and so the cost of collecting self-reported data is quite low. However, a number of studies (e.g., Abay et al., 2019Abay et al., , 2021Carletto et al., 2013;Desiere & Jolliffe, 2018;Gourlay et al., 2019) have shown that self-reported agricultural data of this kind are prone to systematic errors that can distort statistical inferences. This is particularly the case for self-reported estimates of production and plot size.
There are several sources of errors in self-reported production and land area (Abay et al., 2023;Carletto et al., 2021). First, farmers may intentionally over-or underreport production or farm size for different reasons and motives. For example, farmers may be motivated to misreport if they believe that the data might be used for purposes that could affect their access to various public project benefits. This includes information required for tax calculation or to determine eligibility for various public programs such as access to credit or social protection (Diskin, 1997). Second, aggregating total harvest and land area requires some level of literacy and numeracy, skills that farmers in rural parts of developing countries sometimes lack (De Groote & Traoré, 2005). Third, in many parts of rural Africa standardized measurement and associated measurement units are not commonly used, with rural farmers using local units that may vary considerably across communities and locations. 11 As a result, self-reported production and area estimates are often reported in local units and conversion of these measures into standard units introduces additional errors. Fourth, most household surveys involve extended recall periods, where farmers may forget details of past events (Arthi et al., 2018;Beegle et al., 2012) or season-specific harvests (Ali et al., 2009;Howard et al., 1995). Finally, farmers are likely to round off values (e.g., reporting plot sizes at the nearest half hectare approximation).
In relation to the measurement of crop output, the most reliable method, and hence the "gold standard" for estimating crop production is full plot harvesting (Fermont & Benson, 2011;Kosmowski et al., 2021). 12 This involves conducting a full plot harvest and recording the fresh weight of the crop. This fresh weight is then adjusted by the grain moisture content to compute dry weight. 13 This approach, however, is very costly and not feasible for large household surveys. 14 In practice, alternative variants of cob-based sampling and crop-cutting protocols are used to measure production (Sapkota et al., 2016). 15 This usually involves taking cob samples or crop-cuts from one or multiple sampling points/quadrats of varying size (typically 3-5 meters squared [m 2 ]), estimating the dry grain weight from the samples/crop cuts, and extrapolating to estimate production for the full plot. Surprisingly, there are very few systematic studies of how the alternative protocols perform relative to one another Sapkota et al., 2016). We experiment with different cob-based sampling and crop-cut protocols, and examine their accuracy and implications for the size-productivity relationship. The different protocols are likely to suffer from various sources of error, including error arising from sampling and the identification of subplots. We have ordered these protocols in accordance with the degree to which we expect such error. For example, T A B L E 1 Alternative production and area measurement protocols.

Measurement protocols Definitions
Panel A: Production measurement protocols Self-reported Farmers are directly asked to estimate the amount of production in quintals (100 kg bag) (dry weight).
M-W path cob sampling This approach, depicted in Figure 1a, estimates production based on three randomly sampled maize cobs from 10 one-meter square sampling points. To identify the sampling points, the longest side of the plot is first identified. An imaginary "M" line starting at 1 meter from the northwestern vertices of the plot is identified and then a "W" line on the opposite long side of the plot is identified. The "feet" and the "vertices" of the "M" and "W" lines are used as sampling points  a At each sampling point, three maize cobs are randomly selected, the number of cob-bearing maize plants are counted, the cobs are weighed, and the field moisture content of the grain is recorded. The fresh grain yield is standardized at 12.5 percent moisture content . The fresh weight is then adjusted by the grain moisture content to compute dry weight using the formula: Dry Weight ¼ Fresh Weight Ã 100 -Moisture content of the shelled grain 100 . Production for the whole plot is calculated by extrapolating the estimated production for the 10 m 2 sample to the full plot area measured using the total station approach Transect cob sampling This approach is illustrated in Figure 1b. Using a hand-held GPS, the midpoint of the shortest side of the plot is first identified. Four one-meter square sampling points are then selected along the horizontal line that connects the two midpoints of the shortest side of the plot. At each sampling point, three maize cobs are randomly selected, the number of cob-bearing plants is counted, and production is estimated and extrapolated using the same procedure as the M-W path cob-sampling procedure.
Random quadrat crop cut Figure 1c illustrates the random quadrat crop-cut method. The random quadrats are selected by first identifying the northwest corner of the plot and measuring the distance from the long to the short side. Two random-walk distances are generated for each field using Open Data Kit (ODK), based on the length of the shortest and longest sides of the plot. b From the northwest corner, the enumerators walk toward the long side of the plot taking the first random number of meters given by ODK and then walk parallel to the short side with the second random number of meters. A 4 Â 4 meter quadrat is selected at this point, a crop cut is conducted, and the grain is weighed. A more detailed description of the method used is provided in Appendix C. Dry grain production was estimated using the same procedure as for the cob sampling described above.
Diagonal quadrat crop cut This approach is illustrated in Figure 1d. Starting from the north-west corner of the plot, three 4 Â 4 meter diagonal quadrats spaced along the longest diagonal of the plot are identified regularly. A crop cut is conducted for each quadrat (one at the center and two midways between the center and the corners) and the grain is weighed. The same procedure as above is used to estimate the dry grain weight.
Full plot harvest Once all production measurement protocols are finalized, a full plot harvest, as illustrated in Figure 1e, is conducted. All grain is weighed, and the fresh weight is recorded. Maize harvested using the previous methods is included in the full plot harvest production estimation. As above, the fresh weight is adjusted by the grain moisture level (measured in the field with a moisture meter) to compute the "dry weight." Panel B: Area measurement protocols Self-reported Farmers are asked to estimate the given maize plot size in hectares.
Consumer grade GPS c A low-cost consumer grade handled GPS device, a Gramin eTrex 60, is used. In the past decade, this receiver device has been frequently used in land area data collection by different research organizations including the Ethiopian Central Statistics Authority (CSA). Before the data are collected, farmers indicate to (Continues)

Measurement protocols Definitions
enumerators/supervisors where the plot boundaries are. Enumerators/supervisors are then asked to position themselves in the northwest corner of the plot and turn the GPS devices to tracking mode. To conduct area measurement using a consumer grade GPS, enumerators/supervisors turn the GPS devices to tracking. This allows the devices to take measurements every few steps automatically as they walked with a framer around the plot. Devices are kept as separate as possible to avoid interference.
Single frequency mobile phone GPS receiver A standard low-cost mobile phone with normal single-frequency GPS reception is used to measure plot size using the same procedure as consumer-grade GPS.
Single-frequency GPS receivers rely on positional information broadcast on the original L1 band operating at 1575.42 MHz. Land area measurement using a single frequency receiver mobile GPS was estimated using the same procedure as for the consumer grade GPS described above.
Dual frequency mobile phone GPS receiver A medium-cost mobile phone that receives dual-frequency global navigation satellite system (GNSS) signals, is used to estimate land area using the same procedure as consumer-grade GPS. Dual-frequency GPS receivers receive satellite signals from both L1 and L5 (1176 MHz) frequency bands. Land area measurement using a dual frequency receiver mobile GPS was estimated using the same procedure as for the consumer grade GPS described above.

Compass-and-rope
In the absence of total station, the gold standard measurement unit is the compassand-rope (Carletto et al., 2015FAO, 1982;Fermont & Benson 2011;Keita & Carfagna, 2009). First, farmers indicate to enumerators/supervisors where the plot boundaries are and then enumerators measured the length and degree of each side of a plot using a compass and a rope. Then, they calculated the area using standard trigonometry rules. One of the sources of measurement error in land area estimation is the irregularity in the shape of plots and their boundaries. To accommodate this plot irregularity, each plot is divided into eight subplots and the length of each side of the triangular subplots is measured using a measuring tape. The subplot area is first calculated using Heron's formula then the total plot size is constructed by aggregating subplot areas.
Total station theodolites Area measurements derived from total station theodolites have become the new gold standard. This approach, which estimates the land area with an accuracy of $1.5 millimeters, is commonly used for land surveying, engineering, and construction applications (e.g., Kavanagh & Bird, 1992;Kavanagh & Mastin, 2014). It comprises a theodolite, which precisely measures angles between visible points in horizontal and vertical planes, an electronic distance measurement device to precisely calculate the distance between points, and geo-positioning via global navigation satellite system receivers (Kavanagh & Mastin, 2014). An independent professional surveying team of experts based in Addis Ababa with extensive experience in surveying and related land area measurement implemented the total station theodolites for this project. Like the compass and rope and other GPS measures, the total station area measure was conducted under the farmers' guidance. The GIS team walked with a farmer along the plot boundary. A prism mounted on a handheld pole is positioned over another point on the field boundary and the theodolite uses that prism to determine the distance between the first and second points (derived from a laser pulse emitted by the theodolite and reflected back from the prism); distances between subsequent points are similarly calculated, with angles between convergent points measured by the theodolite. a To avoid an edge effect, samples were not collected within 1 meter of the plot edges. As plot edges are more accessible, there is higher labor intensity, which can lead to higher reported productivity on plot peripheries (Bevis & Barret, 2020). Moreover, plant spacing plays a role in driving the edge effect (Bevis & Barret, 2020). This approach comes with the caveat that we may in fact be underestimating production using these alternative methods that could account for some of the difference between these measures and the full plot harvest. b If the random numbers obtained from ODK for long and short sides of the field do not fall in the crop field area, we dropped both random numbers and started the process again. c All GPS data were collected by supervisors. GPS-enabling atmospheric conditions were first established prior to the implementation of the measurement protocol.
in the presence of strong within-plot heterogeneity in yields, cob-based sampling and crop-cut protocols based on a limited number of quadrats may have larger average deviations from true plot-level productivity outcomes . In general, we would expect that protocols involving crop cuts will outperform protocols involving transect-based cob sampling, because they collect more data per plot. Similarly, we expect crop-cut protocols involving larger areas and/or more quadrats to outperform protocols with smaller and/or fewer quadrats. For the measurement of plot size, the compass-and-rope approach has been traditionally considered as the "gold standard" method (Carletto et al., 2015Dillon et al., 2019;FAO, 1982;Fermont & Benson, 2011;Keita & Carfagna, 2009). This involves measuring the length and degree of each side of a plot using a compass and a rope and calculating the area using standard trigonometry rules. However, this approach is not free from measurement error. Plots with irregular shapes and slopes are difficult to measure using this method. This method is also cumbersome, time-consuming, and infeasible to apply in large household surveys.
Global positioning system (GPS) devices are increasingly being used to address potential errors in self-reported land area and have the advantage of being easier to implement than the compassand-rope method (e.g., Carletto et al., 2021). Measuring plot size using GPS devices does require visiting plots and locating plot boundaries. This increases the accuracy of the estimates of plot size but also increases the cost as compared with self-reported measures. Several household surveys in Africa, including the LSMS-ISA initiative, use low-cost consumer grade GPS devices to measure cultivated land area. Land area measurement using consumer grade GPS devices may still suffer from potential inaccuracies arising from the quality of the devices used and associated implementation problems. Identified sources of inaccuracy in consumer grade GPS receivers include the nature and shape of plot boundaries, the satellite position, the quality of signal propagation, and the quality of the receivers themselves Fermont & Benson, 2011;Keita & Carfagna, 2009). The accuracy of GPS devices is also influenced by temporal weather conditions and plot-specific factors related to the slope and shape of plots. The imprecision of GPS devices is likely to increase errors in the measurement of smaller plots as errors become large in relative terms Dillon et al., 2019;Fermont & Benson, 2011).
Recent improvements in remote sensing and GPS technologies address some of the challenges in consumer grade GPS receivers. For example, single-frequency GPS receiver can measure position to within a few meters, whereas dual-frequency receivers, which receive more robust signals, can measure position to within a few centimeters (Elmezayen & El-Rabbany, 2019). 16 Area measurements derived from total station theodolites, however, have become the new gold standard. This approach, which estimates distances, angles, and derived measures of land area with an accuracy of a few millimeters, is commonly used for land surveying, engineering, construction, and other high-accuracy applications (e.g., Kavanagh & Bird, 1992;Kavanagh &Mastin, 2014). It comprises a theodolite, which precisely measures angles between visible points in horizontal and vertical planes; an electronic distance measurement device to precisely calculate the distance between points; and geopositioning via global navigation satellite System receivers (Kavanagh & Mastin, 2014). 17 Unlike the other GPS measures, the total station measurement does not need GPS satellites once the baseline that is used as a reference for the parcel boundary is established. 18 As with using full plot harvesting to measure production, the fixed cost of using the total station theodolite device to measure plot size may be prohibitively high, particularly for small household surveys.

| DATA COLLECTION AND MEASUREMENT
Our analysis uses data collected on 300 maize plots in three districts (woredas) of Ethiopia's South Gondar and West Gojjam zones-Dera, Merawi and Funeteselam-which are part of the Ethiopian maize belt. 19 In collaboration with agricultural extension workers, the survey team identified farmers who would be willing to harvest their maize field for data collection purposes and sampled one plot per household. 20 Although this is clearly a selected sample, because our interest lies in comparing different production and land area measures collected under alternative protocols within each plot rather than across plots or farms, we do not believe that this affects the internal validity of the analysis.
As indicated above, maize production data were collected using the six methods discussed above and described in Table 1 and illustrated in Figure 1. They are: (i) self-reported; (ii) M-W path cob sampling; (iii) transect cob sampling; (iv) random quadrat crop cut; (v) diagonal quadrat crop cut; and (vi) full plot harvest. Land area is also estimated using six different methods also described in Table 1. They are (i) self-reported; (ii) a consumer-grade GPS receiver; (iii) a single-frequency mobile phone GPS receiver; (iv) a dual-frequency mobile phone GPS receiver; (v) compass-and-rope; and (vi) total station. Table 2 presents summary statistics for all plots used in this analysis, including each of the alternative measures for production and plot size based on the different protocols described above. The average maize yield ranges from 5152 kilograms per hectare (kg/ha) when measured using the M-W cob-based approach to 6199 kg/ha when we use the diagonal quadrat crop-cut approach. The average plot size ranges from 1413m 2 when measured using a single-frequency mobile phone GPS receiver to 1627m 2 when using the farmers' self-reported estimates. 21 Table 2 also presents summary statistics on other characteristics of the plots that we use as controls in our analysis. Eighty percent of plots have a land title certificate, inorganic fertilizer was used on all plots, and 96% of plots received improved seed. Moreover, pesticides were applied on 15% of plots. The average household size, measured by the number of individuals living in the household during the survey period, was 5.6.
Before estimating the size-productivity relationship, we examine the extent to which there is measurement error associated with alternative measures of production and plot size. We do this by comparing each measurement protocol relative to their benchmark measures. This is important for understanding the implications of mismeasurement in each variable of interest in estimating the size-productivity relationship. We start this analysis by evaluating the correlation between the alternative measures of production and land area and the relevant benchmarks using the following specifications: where P pm is the production from each plot p measured using measurement method m; A pi is plot size measured using measurement method i; FH is the full plot harvest, our benchmark measure for production; TS is the total station, our benchmark for land area measurement. β m and α i are constant terms, whereas ε pm and ε pi are error terms capturing other unobservable factors associated with our dependent variables. In the absence of measurement error, β 1m ¼ 1 and α 1i ¼ 1. A test of each of these hypotheses will allow us to determine whether there is a statistically significant difference between alternative production and area measures and the benchmark measures. We also consider specifications where we control for household and plot specific factors that could influence the various measurement protocols to ascertain the extent to which the measurement error can be explained through observable characteristics. 22 Although the empirical specifications in Equations (1) and (2) can serve to effectively evaluate the presence and extent of measurement error associated with each protocol, additional empirical tests are required to evaluate the nature (classical versus non-classical) of measurement error associated with each protocol. For this purpose, we estimate the following empirical specifications that characterize the relative bias associated with each measurement method. Equation (3) characterizes measurement error associated with alternative production measurement protocols, whereas Equation (4) characterizes measurement error in alternative area measurement methods.
All terms are as defined above, and X p captures additional household and plot-level characteristics that may explain measurement error in alternative methods of production and area. 23 μ pm and μ pi capture other unobservable factors that may induce inaccuracies in production and plot size measures, respectively. If measurement error in production behaves nonclassically, then δ 1m and/or δ 2m are expected to be statistically significant. Similarly, statistically significant values of θ 1i and θ 2i imply that measurement error associated with these alternative land area measurement methods are systematically distributed across the distribution of plot size or other household and plot characteristics.
To evaluate the size-productivity relationship, we follow the standard approach in the literature and estimate the following empirical specification: where each of the variables are as defined in Equations (3) and (4), and ϑ pmi is an error term capturing other unobservable factors affecting yield. We estimate Equation (5) for each possible combination of production and plot size measurement protocols, as denoted by m and i, respectively. A negative and statistically significant value of γ mi suggests that there is evidence of the inverse sizeproductivity relationship in our sample for that particular combination of measurement protocols.
As discussed in detail in Abay et al. (2019) and described in Appendix A, measurement error in either production or land area, or in both, will bias the estimate of γ mi as well as other parameters in Equation (5). 24 Moreover, the direction of the bias will depend on several parameters characterizing the measurement error. For example, if land area suffers from classical measurement error but the production measure is accurate, there will be attenuation bias and division bias because land area appears on both sides of Equation (5). Nonclassical measurement error in production and land area will bias the size-productivity relationship, and the direction of bias will depend on the correlation between measurement error in production and plot. If the measurement error in production is positively correlated with the measurement error in plot size, they could partially cancel each other, thus reducing the overall bias as demonstrated by Abay et al.'s (2019) "second-best" hypothesis of ignoring both mismeasurements instead of addressing one of them. As we analytically demonstrate in Appendix A, this will not be the case, however, if production and area measurement error are negatively correlated.

| RESULTS AND DISCUSSION
In this section, we first describe the extent to which there is evidence of measurement error in the different protocols that we use. We then examine the implications for the empirical relationship between size and productivity.

| Empirical characterization of measurement error in alternative measurement protocols
We begin by providing a graphical illustration of the relationship between the alternative production and land area measurement protocols relative to their benchmark estimates. We use a kernelweighted local polynomial regression to plot the relationship between each protocol and the relevant benchmark. The results for the different production and land area measurement protocols are presented in Figure 2a and b, respectively. Figure 2a shows that all methods overestimate production on smaller plots and underestimate production on larger plots. The extent of the deviation from the 45-degree line, and hence the extent of measurement error, varies across alternative measures. The bias is relatively small for the random quadrat crop-cut method but large for the self-reported method. Figure 2b shows that all land area estimation methods overestimate the size of smaller plots and underestimate the size of larger plots. This implies that measurement error in alternative land area measurement protocols systematically vary by plot size. Figure 3a and b show that measurement error in production and plot size are negatively associated with true production and plot size, respectively, implying that larger (smaller) harvests (plots) are under(over)estimated. This pattern is consistent with the mean-reverting nonclassical measurement error documented in recent studies (Abay et al., 2019Desiere & Jolliffe, 2018;Gourlay et al., 2019). The strength of the relationship between measurement error and the true values (of production and plot size) varies across different protocols, the strongest being between measurement associated with self-reported values and true values. For example, the relationship between measurement error associated with random quadrat crop cut and the true value of production appears to be weak, suggesting that the former may be distributed randomly across the true value of production. We formally and parametrically evaluate these patterns in the next section.
To further examine the extent to which each of the measurement protocols deviates from the benchmark we also estimate Equations (1) and (2)   although all the alternative measurement protocols are positively and significantly correlated with the true production and plot size, in almost all cases, we reject the null hypothesis that the coefficient on the benchmark measure is equal to one. The correlation between alternative production measurement protocols and full plot harvest increases as we move from self-reported production to crop-cut measures (Panel A). The random crop-cut method comes closest to the benchmark; when we include the control variables, we fail to reject the null hypothesis that there is a one-for-one relationship between this measure and the benchmark. Similarly, the magnitude of the correlation between alternative land area measurement protocols and the benchmark increases as we move from self-reported to GPS based and rope-and-compass measures (Panel B).
Although it is beyond the scope of this paper to definitively explain why alternative protocols differ from one another, it is worth exploring what the potential reasons might be. In our analysis, we ordered the production measurement protocols by level of accuracy, and so most differences are unsurprising. It is notable, however, that the single random crop cut outperforms the three diagonally oriented crop cuts. One potential reason for this may be that, on very small plots, the first and (b) Measurement error in area F I G U R E 3 (Continued) last (i.e., noncentral) quadrats along the diagonal are more likely to be taken near plot edges, where productivity has been shown to be systematically higher (Bevis & Barrett, 2020). Similarly, in relation to the area measurement protocols, the differences in the precision of different types of GPS devices T A B L E 3 Partial correlations between alternative measurement approaches and the benchmark protocols. (1) (2) Note: The dependent variables in Panel A are log-transformed values of alternative production measures (self-reported, M-W walk, transect, random quadrat subplot quadrat, and diagonal quadrats crop cut), and the dependent variables in Panel B are the log-transformed values of alternative plot size measures (self-reported, consumer grade GPS, single frequency receiver phone GPS, dual frequency receiver phone GPS, and compass and rope). Control variables include household size, amount of fertilizer in kilograms per hectare, labor hours, distance measures from the nearest road in kilometers, distance from the nearest market in kilometers, distance from and home in kilometers, and dummy variables for improved seed use, pesticides, land title certification, steep slope, partial canopy (a control for the extent to which trees cover the plot boundaries, which may affect the GPS area measurement), and partial and mostly cloudy weather conditions during GPS area estimations. The F-statistic reported tests whether the coefficients associated with Log (full plot harvest) in Panel A and Log (plot size, total station) in panel B is equal to one. Robust standard errors are reported in parentheses. *p < 0.10; **p < 0.05; ***p < 0.01. correspond to our priors, despite the slight underperformance of the dual frequency receiver phone GPS. Because our sample comes from a farming system characterized by small plot sizes (the average plot in our sample is less than 0.15 ha), the known inaccuracies associated with GPS devices on small plots are likely to be particularly pronounced (e.g., Carlotto et al., 2017;Dillon et al., 2019).
In Table 4, we explore whether measurement error associated with the different protocols behaves classically or nonclassically. Panel A of Table 4 presents the regression results using Equation (3), which characterizes measurement error in alternative measures of production, whereas Panel B presents the regression results characterizing measurement error in plot size. Both sets of results show that most of the measurement error behaves nonclassically: they are negatively and significantly correlated with the true plot size and production values, respectively. This is in line with our conclusions from Figure 2. The magnitude of the correlation between measurement error in production and full plot harvest declines when we move from self-reported production to crop-cut measures (Panel A). Apart from the random quadrat crop-cut method, all production measurement protocols suffer from nonclassical measurement error. Similarly, the correlation between T A B L E 4 Characterizing measurement error in production and plot size. (1) ( Note: The dependent variables in Panel A are the logarithm of measurement error in plot size measures (self-reported, consumer grade GPS, single frequency receiver phone GPS, dual frequency receiver phone GPS, and compass and rope) relative to the total station theodolite measures and in Panel B the dependent variables are measurement error in alternative production measures (self-reported, M-W walk, transect, random quadrat subplot quadrat, diagonal quadrats crop cut, and full plot harvest) relative to the full plot harvest. Note that the drop in observations in Columns (2) and Columns (5) is due to difficulties in implementing the consumer grade GPS and the compass-and-rope measurements on some plots. Control variables include household size, amount of fertilizer in kilograms per hectare, labor hours, distance measures from the nearest road in kilometers, distance from the nearest market in kilometers, distance from and home in kilometers, and dummy variables for improved seed use, pesticides, land title certification, steep slope, partial canopy (a control for the extent to which trees cover the plot boundaries, which may affect the GPS area measurement), and partial and mostly cloudy weather conditions during GPS area estimations. Robust standard errors are reported in parenthesis. *p < 0.10; **p < 0.05; ***p < 0.01. measurement error in area and true plot size decreases as we move from self-reported to GPS based and rope-and-compass measures (Panel B). The nonclassical nature of measurement error in production and area can have important implications for the empirical evaluation of the sizeproductivity relationship.
In the presence of measurement error in multiple variables, the size and direction of the correlation between these mismeasurements affect statistical inferences. Abay et al. (2019) show that if measurement error in production and plot size are positively correlated, the biases triggered by these mismeasurements can cancel each other and hence the "second-best" option of ignoring both mismeasurements can be inferentially less consequential than correcting one or another mismeasurement. As we demonstrate analytically in Appendix A, this is not the case in the presence of negative correlations between measurement error in production and plot size. To assess this empirically, Table 5 presents the pairwise correlation between measurement error in alternative area and production measures.
Unlike Abay et al. (2019), who document strong and positive correlations across measurement error in production and land area, we find negative correlations between measurement error across most production and area measures in our sample. The only exception is the correlation between measurement error in self-reported production and self-reported area, which is also positive in our sample. There are a number of possible explanations for this. One explanation relates to the differences in the benchmark measurement methods. Abay et al. (2019) employed a random quadrat crop-cut protocol as their benchmark. If there are any errors in the measurement of harvest in the sampled quadrat, then this will be multiplied by the total plot size, which in turn, can create a spurious positive correlation with the measurement error in plot size. Negative correlations could also arise from common underlying implementation or sampling challenges associated with each plot. For example, the shape or slope of some plots may complicate land area measurement, whereas taking crop cuts from these plots may be straightforward. Heterogeneity in production within plots coupled with sampling of subplots for crop cuts may also generate negative correlations between measurement error in production and plot size. As we show in Appendix A, the implications of negatively correlated measurement error are potentially more severe given that associated inferential biases will not cancel each other as in the Abay et al. (2019) case.

Measurement error in production
included in Table 2. 26 Each cell represents a separate regression coefficient for a combination of production and plot size measurement protocols. For example, the first column in Row 1 shows the relationship between productivity and plot size using self-reported production and self-reported plot T A B L E 6 Estimated relationship between yield and plot size, by alternative measurement method.

Production measurement protocols
Area measurement protocols Note: The dependent variable in all regressions is maize productivity, which is defined as the log of production in kilograms measured by different production measurement protocols (self-reported, M-W walk, transect, random quadrat subplot quadrat, diagonal quadrats crop cut, and full plot harvest) divided by plot size measured by alternative area measurements (self-reported, consumer grade GPS, single-frequency mobile receiver, dual-frequency mobile receiver, compass-and-rope, and total station). Panel A provides results without controls, and Panel Ba reports estimates controlling for additional household and plot-level characteristics. Control variables include household size, amount of fertilizer in kilograms per hectare, labor hours, distance from the nearest road in kilometers, distance from the nearest market in kilometers, distance from home in kilometers, and dummy variables for improved seed use, pesticides, land title certification, steep slope, partial canopy (a control for the extent to which trees cover the plot boundaries, which may affect the GPS area measurement), and partial and mostly cloudy weather conditions during GPS area estimations. *p < 0.10; **p < 0.05; ***p < 0.01. size data. The estimates presented in Columns 2 to 5 of Row 1 are the coefficients from separate regressions of yield measured using self-reported production with land area measured using consumer grade GPS, single-frequency mobile GPS receiver, dual-frequency mobile GPS receiver, and total station, respectively. 27 In the first row of Table 6, where the dependent variable is yield based on self-reported production, we find strong evidence for an inverse size-productivity relationship, regardless of the area measurement protocol used. The magnitude of the relationship diminishes with the accuracy of the area measurement used. The second and third rows present the same estimates using the cob-based approaches, the M-W path and the transect methods, to estimate productivity. For these measures, we also find evidence of the inverse size-productivity relationship across all area measurement protocols except for the total station measure. The magnitude of the coefficient is, in most cases, lower than for the self-reported production measurement and diminishes as the area measurement protocol becomes more accurate. In Column 6, the inverse relationship disappears for both cob-based production measures when the most accurate plot size protocol, the total station approach, is used.
In the fourth and fifth rows, we test whether the inverse relationship holds using production data measured using the random quadrat and diagonal quadrat crop cuts, respectively. For the random quadrat crop cut, we only find evidence of an inverse relationship between size and productivity when we use self-reported plot size. In Columns 2 to 6, we do not find any statistically significant relationship between productivity and plot size when plot size is measured using any of the other approaches. For the diagonal quadrat crop cuts in Columns 1 and 2 of the fifth row, we find evidence of an inverse relationship between size and productivity for the self-reported and consumer grade GPS measures of area. The single-frequency and dual-frequency GPS devices perform somewhat better, although the negative relationship is marginally statistically significant for the latter. In Columns 5 and 6, the statistical significance of the relationship disappears when we use plot size based on the compass-and-rope and total station measurement protocols, respectively.
Finally, in the sixth row, we examine whether we find evidence for the inverse relationship when we use the most accurate production data, a full plot harvest. Even with accurate production data, we find strong evidence in support of an inverse relationship between productivity and plot size when plot size data are self-reported or are based on a consumer grade GPS receiver. The magnitude of the effect is much lower for the latter. The inverse relationship disappears when the single-or dual-frequency mobile GPS, compass-and-rope, and total station methods for measuring plot size are used. This suggests that measurement error in both production and plot size can generate a spurious inverse size-productivity relationship. 28 Despite minimizing measurement error in production data by using a full plot harvest, we still find strong evidence of an inverse size-productivity relationship when plot area measurement is inaccurate (self-reported or handheld GPS devices). This is surprising given that these consumer grade handheld GPS devices are widely accepted as gold standard measures. In contrast, addressing measurement error in land area estimation using the most accurate area measurement method (total station) removes the inverse size-productivity relationship. Comparing the size-productivity relationship estimates across the six columns reveals that the largest bias in this relationship arises when both production and land area come from self-reports.
In relation to the existing literature, our findings support the view that the inverse sizeproductivity relationships found in many empirical studies based on self-reported production and self-reported land area measurements are likely to be driven by measurement error in production and/or land area. Our results corroborate recent studies arguing that estimates of the inverse sizeproductivity relationship may be sensitive to the production and/or area measurement method used. For example, Desiere and Jolliffe (2018) use self-reported and subplot crop-cut production estimates combined with measures of plot size measure based on GPS receivers and show that the inverse relationship is strong when using self-reported production data but disappears when using crop-cut estimates. We find a similar result in Row 4, Columns 1 and 2 of Table 6. Abay et al. (2019), find that using a subplot crop-cut production measure and land area measurement based on the compass-and-rope method effectively eliminates the inverse relationship between size and productivity. The results we report in Row 4 and Column 5 of Table 6 are comparable and support their conclusions. Gourlay et al. (2019) and Lobell et al. (2020) are the only studies that used production data based on full plot harvest, which they compare to the subplot crop cut production measure. They use a consumer grade and a single-frequency receiver mobile GPS to measure size. Their results are comparable to our estimates presented in Row 6 and Columns 2 and 3, and Row 4 and Columns 2 and 3. Our conclusions are similar: The inverse size-productivity relationship is highly sensitive to how plot-level production is measured.
To sum up, our results are consistent with some of the evolving studies which argue that the inverse size-productivity relationship is an artifact of mismeasurement in production and/or land area (Abay et al., 2019;Desiere & Jolliffe, 2018;Gourlay et al., 2019). We add to this literature by showing that even those objective measures that are widely considered gold standard measures (e.g., consumer grade GPS devices and subplot crop cuts) are prone to nonclassical measurement error and can also generate an inverse size-productivity relationship. The variability in our findings across the different measurement protocols justifies investments in data production and associated measurement protocols, particularly for those key agricultural metrics such as yield and land area, which are crucial for assessing agricultural growth.

| CONCLUDING REMARKS
The relationship between productivity and land size is arguably one of the most contested hypotheses in the agricultural economics literature (e.g., Assunção & Braido, 2007;Barrett, 1996;Barrett et al., 2010). Although the existence of an inverse relationship has been extensively documented, the robustness of this finding to measurement error has recently come under scrutiny with an emerging literature suggesting that the relationship disappears once more accurate measures of production and plot size are used. Most of these studies examine the implication of measurement error in either area or production on the size-productivity relationship. We simultaneously investigate the role of measurement error in plot production and plot area, finding strong evidence that measurement error in both production and plot size estimation can explain the inverse size-productivity relationship. When only addressing measurement error in one dimension, either plot size or production, estimation results suggest that an inverse relationship exists. However, when accounting for measurement error in both variables, we find no statistically significant evidence of an inverse plot sizeproductivity relationship, a similar finding to that of Abay et al. (2019). Importantly, and in contrast to Abay et al. (2019), we find evidence of negative correlations between the measurement error in estimates of area and production, implying that the bias induced by two-way measurement error of this kind might be even more severe in certain contexts given that they will not cancel each other. This divergence of empirical findings suggests that neither study commands clear external validity. Additional research may usefully focus on identifying the conditions under which area and production errors are positively or negatively correlated.
A novel contribution is our evaluation of a broader range of measurement protocols than prior studies and our use of the most accurate reference measures currently available (i.e., total station for area measurement and full plot harvest for production measurement). Two key results stand out from this more nuanced comparison. First, we find significant differences among alternative GPS measures (which vary by receiver type and GPS protocol), as well as across alternative crop cut protocols. Although all "objective" measures are improvements over farmer estimates, they differ in the degree to which that is the case. This is an important corrective to the literature, which has typically failed to specify details of the GPS receivers or crop-cut protocols, implying that such details do not matter analytically. Our results suggest that crop cut protocol choice does matter, as does the type of GPS receiver used for area estimates.
Second, we find evidence of nonclassical error, even in high-quality GPS-based area measures and crop-cut estimates of production. In our data, these errors contribute to the spurious detection of an inverse size-productivity relationship. This raises the question of their validity as objective reference measures in prior studies (e.g., Abay et al., 2019;Desiere & Jolliffe, 2018). As farm sizes in sub-Saharan Africa continue to shrink and become more fragmented, smaller average plot sizes may exacerbate the analytical importance of measurement error associated with GPS-based area estimates and production estimates derived from crop cuts.
Our results support the hypothesis that the IR can be completely explained by comprehensively addressing measurement error in plot-level data collection. However, there are some caveats to our study that are worth emphasizing. First, given that land endowment and plot size are likely to be endogenous to productivity, our empirical evidence can only provide associational evidence. Second, our study is conducted on a relatively small sample of maize growing farmers and so may not be generalizable to other contexts and crops. Further research exploring the role of measurement error in other settings is merited. Third, although most of the production and area measurement protocols we use are applicable to many crop types, the cob-based measures are most relevant for maize (but could also be adapted for other coarse grains). These limitations suggest that future studies may usefully investigate measurement error for other types of crops, and in other geographical and production contexts, in order to better understand the generalizability of our findings.

ACKNOWLEDGMENTS
We are very grateful to the editor, Marc Bellemare, and four anonymous reviewers who provided valuable comments and suggestions that have significantly improved our paper. We would also like to thank participants at the Oxford Center for African Economies (CSAE) lunchtime seminar for their comments. This project received funding from the project " Taking  ENDNOTES 1 In Africa 69% of household income is derived directly from agriculture (Davis et al., 2017).
2 A large literature focusing on explaining the existence of the inverse relationship between size and productivity also exists.
Explanations include missing or imperfect land and labor markets (Assunção & Ghatak, 2003;Barrett, 1996;Carter & Wiebe, 1990;Deininger et al., 2018;Eswaran & Kotwal, 1986;Feder, 1985;Wineman & Jayne, 2020); undervaluation of labor inputs (Ali & Deininger, 2015;Benjamin & Brandt, 2002;Carter, 1984;Heltberg, 1998;Lamb, 2003;Wineman & Jayne, 2020); and, unobserved heterogeneity and unobserved relative efficiency of alternative inputs (Assunção & Ghatak, 2003;Barrett, 1996;Benjamin, 1995;Benjamin & Brandt, 2002;Bevis & Barrett, 2020;Bhalla & Roy, 1988;Carter & Wiebe, 1990;Chen et al., 2011;Deininger et al., 2018;Eswaran & Kotwal, 1986;Feder, 1985). 3 Nonclassical measurement error occurs when error in the measurement of a variable of interest is correlated with that variable's true value, and/or with the true values of other model covariates, and/or with errors in the measurement of other model covariates (Bound et al., 2001). 4 A number of studies have uncovered more complex, including nonlinear and U-shaped, relationships between size and productivity (Foster & Rosenzweig, 2022;Muyanga & Jayne, 2019;Sheng et al., 2019). Although this literature is more relevant for a discussion regarding farms of different scale (including medium-and large-scale farming) rather than small farms, which are the focus of our study, it is worth noting that further investigation of the implications of measurement error for this empirical relationship is also warranted. 5 All production and area measurement protocols are described in more detail in Section 2. 6 It is worth noting that some of the cob-based measures are most applicable to maize and similar crops, however, the other crop-cut protocols considered here are widely applicable to other types of crops as well. 7 Quadrat is a term commonly used in ecology and agronomy to mean a designated area of land over which data collection is conducted. 8 A theodolite is a device that measures horizontal and vertical angles. A total station theodolite is a theodolite with an integrated distance meter that can measure angles and distances simultaneously. 9 We measure productivity as kilograms of maize produced per meter squared. As such, when referring to the sizeproductivity relationship throughout the paper we are referring to land productivity measured as plot yield. 10 The analytical implications of correlated measurement error demonstrated in Abay et al. (2019) are based on positive correlations between measurement error in production and area, which they find empirical support for using a very limited set of alternative measures. Although positive correlations between measurement error in these variables will cancel each other out, this is not the case with negative correlations. 11 For example, smallholder farmers in Ethiopia commonly use oxen days to measure land area (Abay et al., 2019). Timad is a measure of how much land can be plowed in a day by a pair of oxen. Although this is usually equated with 1 /4 hectare, the true value of a timad can vary by topography, soil type, climate, etc. 12 We use the terms production and output interchangeably. 13 The dry weight of crop production is calculated using the formula: Dry Weight ¼ Fresh WeightÃ 100 -Moisture content of the shelled grain 100 14 It is worth noting that although the full plot harvest is considered the most accurate approach to measure production, it is possible that human error in the collection of the data could lead to some measurement error. In our study, measurement protocols were implemented by experienced and trained enumerators, so we believe that such errors are likely to be minimal. 15 Cob sampling involves randomly selecting maize cobs from different sampling points on the plot, which are selected using different protocols as described in Table 1. 16 Single-frequency GPS receivers rely exclusively on positional information broadcast on the L1 band. Signals sent via the L1 band may be obstructed by buildings or other tall structures, dense foliage, and/or atmospheric distortion. Dual-frequency GPS receivers are more robust as they also receive signals from the L5 band. 17 In our study, total station measurements were taken by an independent team of experts. Although human errors in the implementation and recording of the data are possible, we expect such errors to be minimal. 18 For additional information on how the total station measurement is implemented, please visit this link: https://www. southalabama.edu/geography/allison/GY301/Total%20Station%20Setup%20and%20Operation.pdf 19 From the total of 300 plots included in our sample, 249 plots have data available on all variables. Only five plots are dropped due to issues with implementing the measurement protocols. The other missing data relate to the control variables. Our analysis is robust to focusing only on the measurement protocols and including all 295 plots. 20 We only selected maize plots that are larger than 500 m 2 . The average plot size on our sample is around 1400 m 2 . To put this in context, the recent nationally representative plot-level data from the LSMS-ISA dataset shows that around 93.3% of Ethiopian plots are smaller than half a hectare. 21 The plot size measured by the most accurate area measure, total station, ranged between 612 and 2868 m 2 . 22 Control variables include household size, amount of fertilizer in kilograms per hectare, labor hours, distance from the nearest road in kilometers, distance from the nearest market in kilometers, distance from home in kilometers, and dummy variables for improved seed use, pesticides, land title certification, steep slope, partial canopy (a control for the extent to which trees cover the plot boundaries, which may affect the GPS area measurement), and partial and mostly cloudy weather conditions during GPS area estimations. 23 See footnote 22. 24 It is possible that there are other biases associated with unobserved heterogeneity that could influence both land area and productivity measurement. We assume that all measurement methods are prone to similar unobserved heterogeneity. The addition of control variables may exacerbate the endogeneity issues if unobserved factors are also correlated with these control variables, and so we present the results both with and without controls. 25 For these specifications, we restrict the sample to the set of 249 observations for which data on all variables are available.
Our results are very similar when we use the full set of data available for each individual specification. See Table B1 of Appendix B. 26 As pointed out in Section 4, the inclusion of additional control variables could introduce additional bias in the relationship between area and productivity given that they too could be measured with error or could be correlated with unobserved heterogeneity across plots. However, given the similarity in the coefficients in Panel A and B, this does not seem to be the case in our setting. We nevertheless rely on the specification, which excludes the controls to confine our focus to the implications of measurement error in production and area. To calculate yield in Columns 1-6, we used land area measured by self-reported, consumer-grade GPS, Single frequency receiver mobile GPS, dual-frequency receiver mobile GPS, compass-and-rope, and total station GPS devices, respectively. 27 The results presented here are for the sample of 249 plots for which data are available on all observations. The results are robust to the inclusion of the number of plots for which data are available for that particular specification. These results are presented in Table B1 of Appendix B. 28 Our findings are limited to plots within the size range covered by our sample. We cannot comment on what the relationship might look like if a larger range of plot sizes are examined.