## Introduction

An essential element in the environmental risk assessment (ERA) of genetically modified (GM) plants is a comparative field trial in which the effect on non–target organisms (NTO), such as aphids, beetles and bumble bees is compared. Such an experiment ensures that the GM plant and its comparator(s) are grown under the same management and environmental conditions, thus enabling a fair and objective comparison. A basic statistical approach for designing and analyzing such field experiments has been outlined in an EFSA guidance document (EFSA, 2010), in Perry et al. (2009) and Semenov et al. (2013). However, in practice the power of these experiments to detect environmental changes of a given magnitude is often unknown, partly because insufficient prior thought is given to what exact endpoints the experiments are supposed to test, and partly because the complex nature of ecological data complicates the power calculations. One of the aims in the EU-funded project “Assessing and Monitoring the Impacts of Genetically modified plants on Agro-ecosystems” (AMIGA) is to devise statistically well-based protocols for the design and analysis of field trials. To prepare this action an inventory was made of existing field studies in the literature, and a statistical simulation model was developed to mimic ecological data such as found in practice. The aim of the current paper is to describe this statistical simulation model and to show how this can be used in the design of field experiments.

Data collection in experimental fields with genetically modified crops has been conducted for many years and a large variability of experimental designs, sampling techniques, guilds of non-target arthropods and statistical methods have been used (e.g., Marvier et al., 2007). To summarize the different approaches presented in the scientific literature, a non-exhaustive inventory of 33 field studies with insect resistance transgenic plants was compiled among those firstly published, where the detection of possible effects of GM plants on natural enemies was the primary goal of the study (Table 1). The papers were published from 1992 until 2005. This time-span was chosen to include the very first published experiments of this kind, and also to incorporate the first available data from surveys in GM commercial fields. Different crops were included in the selection. The table presents some of the indicators relevant to the experimental design, collection methods and statistical analyses performed on the data. None of the papers provided a prospective power analysis for the experiments described.

Authors and Journal | Functional group | Crop | Measurement endpoint | Dimensions | Experimental design | Statistical method |
---|---|---|---|---|---|---|

Johnson & Gould (1992)Environ. Entomol. | Parasitoids | Tobacco | Parasitism rate | 9 replications, 2 years | Randomized blocks | Chi-square |

Johnson (1997)Environ. Entomol. | Parasitoids | Tobacco | Parasitism rate | 3 years, 15 sites | Randomized blocks | ANOVA |

Mascarenhas and Luttrell (1997) Environ. Entomol. | Parasitoids | Cotton | Host survival | 4 replications | Completely randomized | ANOVA |

Orr & Landis (1997) J. Econom. Entomol. | Parasitoids,Predators | Maize | Egg fate, Parasitism rate, Visual counts | 3 replications (50 plants), 3 sampling dates | Completely randomized | ANOVA |

Pilcher et al. (1997)Environ. Entomol. | Predators | Maize | Abundance, Visual counts | 2 years, 3 replications (6 plants in each), 3 sampling dates | Randomized blocks | ANOVA |

Riddick et al. (1998) Ann. Entomol. Soc. Am. | Predators | Potato | Abundance, Visual counts, Sweep nets, Pitfall traps | 2 years, 3 sites | Completely randomized | ANOVA |

Buckelew et al. (2000) J. Econom. Entomol. | Predators | Soybean | Abundance, Sweep nets | 2 sites, 2 years, weekly samplings | Randomized blocks | ANOVA |

Al–Deeb et al. (2001) J. Econom. Entomol. | Predators | Maize | Abundance, Visual counts | 40 plants, 2 locations | Completely randomized | ANOVA mixed model |

Reed et al. (2001)Entomol. Exp. Appl. | Predators | Potato | Abundance, Visual counts | 2 years, 6 replications | Latin square | ANOVA |

Wold et al. (2001) J. Entomol. Science | Predators | Maize | Abundance, Visual counts | 2 years, 4 replications, 6 sampling dates | Completely randomized | ANOVA |

Bourguet et al. (2002)Environ. Biosaf. Res. | Predators, Parasitoids | Maize | Abundance,Parasitization | 2 sites, 4 replications, weekly samplings | Split-plot | ANOVA |

Manachini & Lozzia (2002) Boll. Zool. Agr. Bachic. | Soil organisms | Maize | Abundance, Diversity | 2 separate fields, 8 locations50 soil samples | n.a. | ANOVA |

Al–Deeb and Wilde (2003) Environ. Entomol. | Predators | Maize | Abundance, Visual counts, Pitfall traps | 2 years, 8 locations | Completely randomized | ANOVA mixed model |

Jasinski et al. (2003)Environ. Entomol. | Predators | Soybean,Maize | Abundance, Sweep nets, Sticky traps, Soil samples | 24 commercial fields | n.a. | ANOVA |

Men et al. (2003)Environ. Entomol. | Herbivores,Predators,Parasitoids | Cotton | Abundance, Sweep nets, Visual counts | 3 years, 3 replications, 5 sampling dates | Completely randomized | ANOVA, Diversity indices |

Musser & Shelton (2003) J. Econom. Entomol. | Predators | Maize | Abundance, Egg predation | 2 years 2–10 plants/replication | Randomized block | ANOVA |

Volkmar et al. (2003)Agric. Ecosys. Environ. | Predators | Sugar beet | Abundance, Pitfall Traps | 4 replications | Randomised block | ANOVA |

Wu & Guo (2003)Environ. Entomol. | Predators | Cotton | Abundance, Visual counts | 3 replications | Completely randomized | ANOVA |

Candolfi et al. (2004)Biocontrol. Sci. Techn. | Predators,HerbivoresSoil org., | Maize | Abundance, Pitfall traps, Yellow traps | 3 replications, field size | Completely randomized | Principal response curves, Diversity indices |

Duan et al. (2004)Environ. Entomol. | Predators | Potato | Abundance, Pitfall traps | 2 years, 6 replications | Latin square | ANOVA |

Manachini et al. (2004) IOBC/WPRS Bullettin | Soil organisms | Canola | Extraction from soil | 3 replications | Completely randomized | Multivariate |

Wade French et al. (2004) Environ. Entomol. | Predators | Maize | Abundance, Pitfall traps | 2 years, commercial fields | n.a. | Canonical correspondence |

Wei-Di et al. (2004)Chinese J. Agric. Biotec. | Herbivores, Predators,Parasitoids | Cotton | Abundance, Diversity | 2 years, 3 replications | Completely randomized | ANOVA, Diversity indices |

Bhatti et al. (2005a)Environ. Entomol. | Predators,Detritivores,Soil herbivore | Maize | Abundance | 3 years | Split-plot | ANOVA mixed model |

Bhatti et al. (2005b)Environ. Entomol. | Predators,Herbivores,Parasitoids | Maize | Abundance | 3 years two-weekly samplings | Split-plot | ANOVA mixed model |

Daly & Buntin (2005)Environ. Entomol. | Predators,Herbivores | Maize | Abundance | 2 locations 2 years, 4 replications, weekly samplings | Completely randomized | ANOVA mixed model |

De La Poza et al. (2005)Crop Protection | Predators | Maize | Abundance, Visual counts, Pitfall traps | 2 locations, 3 years, 3–4 replicates | Completely randomized(split for year and location) | ANOVA |

Hagerty et al. (2005)Environ. Entomol. | Predators,Herbivores | Cotton | Abundance, Damage | 2 years, 4 replications | Completely randomized | ANOVA |

Head et al. (2005)Environ. Entomol. | Predators,Herbivores | Cotton | Abundance,Predation rates | 3 years, 3–4 replications,6–16 sampling dates | Completely randomized | ANOVA mixed model with repeated measures |

Naranjo (2005)Environ. Entomol. | Predators | Cotton | Diversity | 6 years, 3–4 replications | Completely randomized | ANOVA, PCA |

Pons et al. (2005)European J. Entomol. | Herbivores | Maize | Pest incidence | 3 years, 4 replications, various sampling dates | Completely randomized(year as factorial element) | ANOVA |

Torres and Ruberson (2005) Environ. Entomol. | Predators | Cotton | Abundance | 3 years, 3 replicates, weekly samplings | Completely randomized | ANOVA mixed model with repeated measures |

Whitehouse et al. (2005) Environ. Entomol. | Different guilds | Cotton | Diversity Index | 3 years, 2–3 replicates, weekly samplings | Completely randomized | ANOVA mixed model with repeated measures |

Field trials are thus diverse, but an example shows some typical elements. Al–Deeb and Wilde (2003) describe experiments to test the effects of the Cry3Bb1 toxin in Bt corn on aboveground non–target arthropods. The experiments were performed on eight locations in one year and three locations in a second year, and they involved three GM varieties and two isolines in combination with up to nine different seed and spraying treatments. Randomized complete block designs were used with 2–4 blocks and 8–40 plots. Visual inspection provided count data on 15–20 plants per plot. Average counts per plant for five NTOs varied between 0 and 70. Pitfall trap count data observed at 3–7 time points were reported as average numbers per pitfall trap between 0 and 616 for eight NTOs. Based on a statistical analysis using analysis of variance the authors concluded that no significant differences in numbers were detected between Bt corn and its non–Bt isoline. However, there is no mention of the effect sizes that these experiments would have been able to detect with a reasonable statistical power. In fact, the data provided are insufficient to draw any conclusion on the statistical power of the performed experiments, and this is also the case for many other reported studies. Indeed, in a few cases the importance of such an analysis had been singled out (Andow, 2003) and attempts to design field experiments on such bases were done in rare cases (e.g., Squire et al., 2003; Duan et al., 2006). To improve this situation the EFSA guidance asks for prospective power analyses to be performed. This issue is further developed in the present paper.

Typical data in environmental risk assessment of GM plants are counts or presence/absence data of NTOs. The basic distribution for counts is the Poisson distribution, while presence/absence data can usually be modeled by a binomial distribution. Clumping of individuals might give rise to an overdispersed distribution such as the negative binomial for counts and the beta-binomial distribution for presence/absence data. Also the number of zero observations can be larger than predicted by the distribution and this gives rise to so-called excess-zero distributions. In many experiments, NTOs are sampled at different points in time, for example weekly, for all experimental units. The data are thus repeated measurements probably with some form of autocorrelation across time within experimental units. Depending on the species various patterns across time are possible. Moreover experiments are frequently repeated on different locations and in different years.

The statistical analysis of ERA field trials comes in two flavors: difference testing and equivalence testing (van der Voet et al., 2011). The aim of the difference test is to reject the null hypothesis of no difference between the GM plant and its comparator. A significant difference test is then a “proof of difference”, but this does not state that the difference is biologically relevant and constitutes a true hazard to the environment. Poorly designed experiments with low levels of replication may have low statistical power of finding a true difference. So the absence of a significant difference is not a proof that there is no difference, or “absence of evidence is not evidence of absence” (Altman and Bland, 1995). An equivalence test on the other hand employs a null hypothesis of non-equivalence, that is, that the difference between the GM plant and its comparator is larger than some pre-described equivalence limit, also called limit of concern (LOC). Rejection of the non-equivalence hypothesis implies that the difference is smaller than the LOC and this can be regarded as a “proof of safety”. The advantage of equivalence testing is therefore that the onus is placed back on to those who wish to demonstrate the safety of GMOs to do high quality, well-replicated experiments with sufficient statistical power (Perry et al., 2009). Note that both the difference and equivalence test can be implemented by constructing a single confidence interval for the difference between the GM plant and its comparator. This employs the two one-sided tests (TOST) approach of Schuirmann (1987) for equivalence testing.

It is important to know the statistical properties of difference and equivalence tests, for example the power and robustness of a test and whether the test has the assumed significance level. Such properties are well-known for single experiments using tests based on the normal distribution, such as *t*-tests. For non-normal distributions, small sample properties of difference and equivalence tests are not straightforward. A simulation approach for sample size calculations for a difference test is employed by many authors, for example, Shieh (2001) and Hrdličková (2006) for the Poisson distribution, Shieh (2001) and Demidenko (2008) for the binomial distribution, Aban et al. (2009) and Friede and Schmidli (2010) for the negative binomial distribution. A general practical approach to computing power for non-normal distributions is given by Lyles et al. (2007). However field testing of environmental effects of GM plants on NTOs is much more complicated as it may not only involve non-normal distributions, potentially with excess-zeros, but also a set of reference varieties in addition to the GM plant and its comparator, randomized blocks within an experiment, multiple experiments across different sites and/or years with possibly genotype by environment interaction, and finally repeated measures in time exhibiting some pattern in time possibly with autocorrelation. The object of this paper is to formalize all these elements in a single statistical simulation model which provides a framework for studying various statistical approaches for data analysis of such experiments. The simulation model was implemented in a user-friendly C# program, using the R package (R Core Team, 2012) for simulating from various distributions. The software is available as Supplementary Material to this paper.

This paper first summarizes potentially useful statistical distributions for ecological data. Then the other elements of the statistical simulation model are described, namely block effects, additional varieties, repeated measurements and multiple trials. Some applications of the simulation model for power analysis are described, and possibilities for use and future research needs are discussed.