Proper experimental design requires randomization/balancing of molecular ecology experiments

Abstract Properly designed (randomized and/or balanced) experiments are standard in ecological research. Molecular methods are increasingly used in ecology, but studies generally do not report the detailed design of sample processing in the laboratory. This may strongly influence the interpretability of results if the laboratory procedures do not account for the confounding effects of unexpected laboratory events. We demonstrate this with a simple experiment where unexpected differences in laboratory processing of samples would have biased results if randomization in DNA extraction and PCR steps do not provide safeguards. We emphasize the need for proper experimental design and reporting of the laboratory phase of molecular ecology research to ensure the reliability and interpretability of results.

appropriately as important conclusions and recommendations are drawn from them, often addressing issues of global importance for nature, society, and economy. Randomization or balancing in laboratory experiments is essential to avoid batch effects and other nondemonic intrusions (see Hurlbert, 1984). This issue has been already raised by Meirmans (2015) in a recent opinion paper on population genetics. Meirmans (2015) notes that "It is perfectly possible that such randomization is already practised in genotyping laboratories everywhere and I am simply unaware of it. […], if this is the case, this is nowhere evident in the literature". We have similar impressions and the screening of one randomly selected 2016 issue from each of five relevant journals supports this assumption (Molecular Ecology, The ISME Journal, Ecology and Evolution, Journal of Biogeography, Soil Biology and Biochemistry, Appendix S1). Only two of the 59 relevant studies report some form of randomization during the laboratory processing of samples. This small literature survey is surely not representative of overall molecular ecology research, but the pattern is worrying as a simple Web of Science search for the keyword combination "molecul* AND ecol*" resulted in over 1,740 hits only from 2016.
The omission of randomization in the laboratory may allow chance events to systematically influence results. Such chance events are common everytime and everywhere: electric fallouts happen, sudden flaws incapacitate laboratory personnel, DNA extraction kits are not delivered in time or have been stored inappropriately, just to mention some. If samples are processed in batches, the coincidence of these events confounds the results and renders interpretations unreliable.
The potential diversity of such events is so high that nothing can protect against them except randomization of laboratory procedures, potentially in combination with balanced designs. Hurlbert (1984) notes that most of the time chance events have immeasurably small effects on the results. However, by nature, they are also completely unpredictable, both in frequency and effect size.
As molecular ecology studies mostly work with high observation numbers (thousands of SNPs over genomes, thousands of operational taxonomic units-OTUs-in hundreds of samples, etc.), even small chance events may result in statistically significant results (Carver, 1993). Here, we demonstrate this with taxonomically informative marker gene fragments amplified from environmental DNA (eDNA metabarcoding). The eDNA was preserved in lake sediments and provides a perspective on lake ecosystem history over several decades.
We looked at three aspects of methodological or biological interest: extracted DNA concentration, PCR efficiency, and community properties ( Figure 1). We evaluated several sources of variation: (1) expected laboratory biases (DNA extraction kit, Deiner, Walser, Mächler, & Altermatt, 2015;Barlow et al., 2016), (2) unexpected laboratory biases (e.g., a sudden change in laboratory personnel), and (3) an ecologically interesting predictor (either the age of the sediment or the effects of a power plant).

| Sampling
Two sediment cores of the same location from Lake Stechlin were taken on 14 May 2015 with a gravity corer (UWITEC ® , Mondsee, Austria) and Perspex tubes (inner diameter 9 cm, lengths 60 cm). Lake Stechlin (latitude 53°10′N, longitude 13°02′E) is a dimictic mesooligotrophic lake (maximum depth 69.5 m; area 4.5 km 2 ) in the lowlands of northern Germany. GDR's first nuclear power plant was built here between 1960 and 1966 and operated until 1990, connecting the lake with the nearby mesotrophic Lake Nehmitz and discharging its cooling water into Lake Stechlin. After coring, the cores were sliced immediately in the field in approximately 0.5-cm intervals. The first core was designated to eDNA. All sampling tools were H 2 O 2 -sterilized F I G U R E 1 Analysis scheme with predictors of variation in highthroughput-sequenced eDNA amplicon data. The first column lists typical analysis steps and the endpoints of these, the second column contains options to evaluate these endpoints through measurements, the third column lists several laboratory biases that may exert batch effects, and the fourth column contains biological factors of interest

| Date approximation
Approximate dates were obtained by comparing DDT decomposition compound concentrations with sedimentation rates inferred with 137 Cs (Casper, 1994)

| DNA extraction
We selected the youngest 21 horizons (the upper 13.5 cm of the core) for DNA extractions. Sample order was randomized before DNA extraction to minimize sampling biases. Four DNA extractions were carried out from each horizon with two commercial kits

| Multiplexing strategy and sequencing
We indexed all samples for multiplexed sequencing in a subsequent short PCR with primers that contained a fraction of the Illumina se-   (Illumina 2016). This protocol eliminates index jumps during library preparation (although a few index jumps are still known to happen on the sequencing plate (Schnell et al., 2015).
The cycling conditions were 95°C (10 min

| Sequence processing and data analysis
Raw sequence data were processed with OBITools . Potential contamination and false detection biases were con- were run with GNU "parallel" when possible (Tange, 2011 (Hill, 1973) to estimate the effects of potential laboratory biases and biological factors of interests. The first three Hill numbers correspond to species richness (H1), the exponent of Shannon diversity (H2), and the inverse of the Simpson diversity (H3). The identity of the sediment horizon was used as the random effect in these models. We used multispecies generalized linear models (GLMs) with the "mvabund" R package (Wang, Naumann, Wright, & Warton, 2012) to investigate the effects of the predictors on community composition.
The multispecies GLM cannot handle random effects. The community composition effects were visualized with a latent variable modelbased ordination performed with the boral R package (Hui, 2016).
Both compositional analyses assume a negative binomial distribution of the data, accounting for the sparse and overdispersed nature of read counts (Bálint et al., 2016

Community composition ~ reads + kit + person + nuclear,
where conc is the extracted DNA concentration, weight is the sediment weight used for DNA extraction, kit is the DNA extraction kit, person is the laboratory personnel, depth.nominal is the identity of the sediment horizon, PCR efficiency is estimated from HTS read numbers, and nuclear is the operational period of the nuclear power plant ( Figure 1).

| RESULTS
The results are summarized in Table 1

| DISCUSSION
Our results demonstrate that nondemonic intrusions (Hurlbert, 1984) in the laboratory may produce in strong, statistically significant effects that may severely confound results. Such effects render equivocal interpretations impossible if they coincide with effects targeted by the study. For example, interpretation of power plant effects on community composition would be difficult if samples are processed in batches and the sudden change in laboratory personnel coincides with a shift between operation periods. Similarly, strong personnel effects are well known when scoring microsatellite genotypes (or allozyme electrophoresis patterns): they are well known to be influenced by personal judgments and authors should also report how they dealt with it. Different pipelines and/or processing parameters (e.g., thresholds to discard rare OTUs) may produce different results in metabarcoding, for example, in diversity measures (Golob, Margolis, Hoffman, & Fredricks, 2017). We expect that all samples within a study are processed with the same pipeline or parameters; thus, the pipeline effect is uniform among all samples, allowing comparisons. However, differences in results caused by pipelines and parameters may influence metastudies. We hypothesize that these effects may reflect the study date due to developments in pipeline use. The uniform reprocessing of the data before such interstudy comparisons is thus important (Meiser, Bálint, & Schmitt, 2014).
The results allow to evaluate whether observed community We do not state that biases with comparable extent always appear in unrandomized, not balanced laboratory experiments, but they certainly have the potential to do so. This is clear in our example: The effects of unexpected laboratory biases exceed the effects of known laboratory biases (DNA extraction kit effects) and biological signal in several models (Figure 2). Such effects potentially influence all molecular ecology studies and threaten the interpretability of results. Their importance and extent are known in biomedicine (Fungtammasan et al., 2015;Lambert & Black, 2012;Leek et al., 2010;Yang et al., 2008), and it needs to be urgently considered in molecular ecology.
Generally, randomization of samples before major laboratory steps (particularly DNA extraction) is simple and low cost. The only case where this might be disputable is the processing of highly contamination-prone materials where it is almost a laboratory rule that DNA extraction is performed consecutively from the most contamination-prone toward the least contamination-prone samples (although to our knowledge, the validity of this still needs to be tested). Obviously, nondemonic intrusions (including contamination) in the laboratory easily become collinear with the processing order and this makes biological signals difficult to interpret (Salter et al., 2014 We recommend the followings: First, researchers involved in molecular ecology laboratory work need to properly design and report laboratory procedures. Guidelines in biomedicine exist and may be readily adapted, for example, how and why samples were assigned to certain processing batches and processing timeframes, evaluation of technical differences among equipment, and blinding the experiment by concealing information about sample identity from the laboratory personnel (Masca et al., 2015). Second, ecologists who rely on molecular data generated by laboratory personnel or companies must ensure (and should not take for granted) that principles of experimental design are followed in the laboratory. This is the easiest when giving samples to a laboratory as the ecologist can already rearrange and relabel his/

CONFLICT OF INTEREST
The authors state no conflict of interest.

AUTHOR CONTRIBUTIONS
MB conceived the ideas. MB and HPG designed the methodology and obtained the cores. MB and OM performed the molecular laboratory work. MS and RAD performed the organohalogen measurements. MB processed the sequences, analyzed the data, and lead the writing of the manuscript. All authors contributed critically to the drafts and gave final approval for publication.