Consistency of aquatic enclosed experiments: The importance of scale and ecological complexity

Marine and freshwater ecosystems are increasingly threatened by human activities. For over a century, scientists have been testing many biological, chemical and physical questions to understand various ecosystems and their resilience to different stressors. While the majority of experiments were conducted at small‐scale laboratory settings, lately large mesocosm experiments have become more and more common. Yet, it still remains unclear how the scale (i.e. space) and ecological complexity (i.e. community versus limited number of species) of experiments affect the results and to what extent different experimental types are comparable.


| INTRODUC TI ON
The speed and extent of current changes in the Earth's climate and environments have reached unprecedented rates, being faster than any previously observed (IPBES, 2019;Stillmann, 2019). Over the last several decades, marine and freshwater ecosystems have been significantly affected by rising temperatures and other anthropogenic activities, such as increased nutrient flows, overexploitation and introduction of non-native species (Capinha et al., 2015;Chapman, 2017). Consequently, scientists have been using field observations and experiments, often in combination with mathematical modelling, to understand coastal ecosystems and their resilience to stress (Petersen et al., 2009). Through time, due to development in science and technology, new and more advanced approaches and methods have been designed and developed, including valuable improvements in experimental set-ups, data analyses and computation (Evans, 2012;Woodward et al., 2010).
While some ecological research can be successfully conducted by observational studies with specific support of mathematical modelling, others need manipulative experiments to prove or refute the tested hypotheses (Stewart et al., 2013;Widdicombe et al., 2010). There is no standard experimental design that fits all research questions, and the approaches used may differ not only in spatial scale, but also in ecological complexity. While spatial scale is defined by variables such as length, area and volume, ecological complexity is characterized by species diversity and levels of ecological organization (Petersen et al., 2009). In this context, research approaches can vary tremendously. On the one side, there are laboratory experiments which are usually conducted at small-scale and individual or single-species levels. These experiments often test many of the basic biological, chemical, and physical questions in a controlled environment, allowing for high replication (Widdicombe et al., 2010). However, they are often simpler in terms of complexity and exclude important ecological and biological components present under natural settings. As they create an artificial environment, their validity often raises concern among the scientific community (Carpenter, 1996;Widdicombe et al., 2010;Stewart et al., 2013). On the other side, large-scale mesocosm experimental infrastructure and approaches have been significantly improved in recent years, allowing replication of "near-realistic" scenarios including important variables occurring in the natural environments by using subsets of natural ecosystems (e.g. Dzialowski et al., 2014;Kraufvelin et al., 2006Kraufvelin et al., , 2010Kraufvelin et al., , 2020Pansch & Hiebenthal, 2019;Wahl et al., 2015). By creating almost natural ecological and biological dynamics of ecosystems while the tested variables are manipulated and controlled, the results and conclusions deduced from mesocosm experiments are usually assumed to be more reliable and predictive than laboratory tests (Kraufvelin et al., 2006(Kraufvelin et al., , 2010(Kraufvelin et al., , 2020Petersen et al., 1999;Widdicombe et al., 2010). However, it should be emphasized that though mesocosm experiments use a subsample of a natural ecosystem, they are still a simplification of nature, and this should be considered when drawing wider conclusions from the observed results and when forecasting future scenarios (Petersen et al., 2009;Widdicombe et al., 2010).
Extreme isolated events associated with global warming, namely heatwaves, have lately raised the awareness of the scientific community due to their increasing frequency worldwide in both marine and freshwater ecosystems (Holbrook et al., 2019;Huber et al., 2012;Oliver et al., 2018). Heatwaves are warm isolated events that last for five or more days at temperatures warmer than the 90th percentile based on a 30-year historical baseline period (Hobday et al., 2016).
There are summer and winter heatwaves, with many of the former having devastating impacts on ecosystems, while some of the latter being even beneficial (Cavole et al., 2016;Hobday et al., 2016). In 2003, in the Northwestern Mediterranean region, one of the first documented impacts of a heatwave occurred, causing extensive mortality among numerous benthic communities (Garrabou et al., 2009). Since then, several studies have reported similar events worldwide, such as the Western Australia heatwave in 2011 (Pearce & Feng, 2013) and the Northwest Atlantic heatwave only one year later (Mills et al., 2013).
Field observations have determined strong responses of marine and freshwater environments to summer heatwaves, such as toxic cyanobacteria blooms (Joehnk et al., 2008), mass coral bleaching (Hughes et al., 2017) and extensive mortalities of important commercial fish species (Caputi et al., 2016). As these isolated climatic events can negatively affect aquatic communities, and it is expected that they will increase in their severity and frequency (Cavole et al., 2016;Smale et al., 2019), it is of great importance to understand the responses and resilience of ecosystems to this climatic abnormality to be able to better protect coastal habitats (Frölicher et al., 2018;Sorte et al., 2010).
Although a great number of studies has been conducted to determine the responses of single-species or communities to global warming and other anthropogenic impacts using mesocosm and laboratory experiments (e.g. Casties et al., 2019;Madeira et al., 2018;Pansch et al., 2018;Wahl et al., 2020), it still remains unclear how the type of the experiment (i.e. scale and ecological complexity) affects the outcome and to what extent the two types of experiments are comparable. In this study, we conducted two experiments using different scale-and ecological-complexity levels: (a) an outdoor large-scale community-level mesocosm and (b) a small-scale two-species laboratory experiment, to assess the effects of heatwaves on two gammarid species from the Baltic Sea. To be able to compare the results of the two types of experiments, after three months of rearing animals in different set-ups, relative population growth was calculated for each species and for each experimental type. We tested the null hypotheses assuming no difference in population growth: (a) for any of the species between the two experimental types; (b) between the two species in each experimental type; and (c) among different treatments.

| Specimen collection
Two species from the superfamily Gammaroidea (i.e. Gammarus locusta and G. salinus) were collected in April and May 2015 for the mesocosm and in April and May 2016 for the laboratory experiment. Gammarus locusta was collected in Falkenstein, Germany (54°40' N 10°20' E), while G. salinus was collected in Kiel, Germany (54°33' N 10°15' E), two sampling locations only a few km apart (<10). Specimens were transported in their ambient water to the laboratories at GEOMAR in Kiel, where each individual was morphologically identified according to Köhn and Gosselck (1989

| Experimental set-up
To determine whether the type of the experiment affects the results of the experiment, we have conducted two experiments using dif- The experimental set-up of the laboratory experiment consisted of six water baths (52 L each), with two experimental tanks (13.5 L each) set inside each bath (Figure 1b, d). Temperature of water in the experimental tanks was manipulated by regulating water temperature of the water baths, following the same pattern as in the mesocosm experiment (see below; Figure 2). Here, we acknowledge that this two-by-two block design may have some effect on our results, but it was necessary for temperature regulation. Water in the experimental tanks has been completely exchanged approximately every hour by a constant flow-through of filtered seawater (20 µm).
Twenty individuals of each G. locusta and G. salinus were added to each experimental tank. We emphasize here that there was a large difference between the density of gammarids per volume of water

| Heatwave treatments
The experimental design consisted of three treatments: (a) control; (b) one heatwave; and (c) three heatwaves. Each treatment consisted F I G U R E 1 Overview of the two types of experimental set-up (a) mesocosm tank with tested community from above (see Pansch et al., 2018 andWahl et al., 2015), (b) laboratory setting including the main water storage tank with water distribution hoses leading to the experimental tanks, (c) a scheme of the mesocosm tanks (Wahl et al., 2015), and (d) laboratory tanks with animals and artificial refugia structures from above Each heatwave lasted nine days, where during the first three days, temperature was increasing, reaching a peak phase on the fourth day and staying for four days at the peak phase, followed by two days of cooling. The peak phase of the first two heatwaves was 3.6°C, while that of the third one was 5.2°C higher than the control treatment (for details see Pansch et al., 2018). For more details on the heating system of the mesocosm experiment, see Wahl et al. (2015). In the case of the laboratory experiment, temperature inside the experimental tanks was daily manually adjusted using aquaria heaters submerged into the water baths (Aqua Medic titanium heaters 100W), with submersed pumps insuring homogenous mixing of water inside the baths (Figures 1 and 2). The temperature inside each water bath

| Statistical Analysis
First, we calculated the percentage of population growth for each replicate of each treatment in each experimental type following the equation: where R is the per cent change per unit time (i.e. growth rate) and k is the fractional change per unit time (Bartlett, 1993 Data visualization was conducted by "ggpubr" and "ggplot2" packages in R (Kassambara, 2018;Wickham, 2016).

| RE SULTS
In the mesocosm experiment, there was no significant difference in the percentage of population growth between the two species

| D ISCUSS I ON
While laboratory experiments are mostly focused on specific physiological and ecological responses to environmental changes of individuals or a single species, mesocosms embrace a higher complexity by including assemblages of a population subset which increases the possibility of biological interactions and "ecosystem realism" (Stewart et al., 2013;Widdicombe et al., 2010). However, there is still considerable uncertainty with regard to both types of experiments, and to which extent their results can be extrapolated and generalized with confidence. Our comparative assessment of mesocosm with laboratory experiment revealed that while for one species the results were similar independently of the experimental type, for the other species, the larger area of the mesocosm accompanied with inclusion of the community benefited the species' growth rate, demonstrating stronger performance in the mesocosm than in the laboratory experiment. Though, we acknowledge here that our study design did not allow to distinguish if the scale or ecological complexity of the experiments, or both, caused the observed discrepancy between the two types of experiments. We also acknowledge that we did not statistically compare the experiments due to potential time confounding factor as the experiments were conducted in different years, but we qualitatively compared the results. However, at the same time, our results revealed no difference in the heatwave impacts on any of the tested species independently of which experimental type was used.
The potential impacts of climate change on coastal marine environments and freshwater ecosystems have been extensively studied since the early 1990s where most studies have mainly focused on the species level (Harley et al., 2006;Wrona et al., 2006 and references therein). Lately, much more information using mesocosm experiments became available, which has been used, in some way, to calibrate and confirm that laboratory experiments properly represent natural ecosystems and their interactions (Schindler, 1998).
Interestingly, our results have led us to two different conclusions for the two studied species that were tested using different experimental types. Independently of the heatwave treatment, G.
salinus results were similar regardless of the experimental type, while for G. locusta, our study revealed differences between the two experimental types, with much poorer performance of individuals under laboratory conditions. Therefore, if one would study competition between these two species under current and/or future global warming scenarios under laboratory conditions, they could conclude that G. salinus would outcompete, or reduce population size, of G. locusta. Considering that settings in our study were exactly the same for both species in both experimental types, we believe that G. locusta is more sensitive to laboratory setting than G. salinus. Yet, our study design does not differentiate if the scale or ecological complexity of the experiments, or both, were responsible for the observed results. Under low salinity environments, several laboratory studies, for instance Bulnheim (1979) and Paiva et al. (2018), have determined that G. locusta is the most sensitive species among the Baltic gammarids with the lowest capacity to survive. In addition, this species reveals a much higher oxygen intake when exposed to such stress in comparison with other species, which seems to explain its absence from polluted areas (Bulnheim, 1979;Costa & Costa, 2000). Another explanation could be that the small space of the tanks used in the laboratory experiment in addition to the absence of natural predators and food availability have triggered possible fighting and cannibalistic behaviour within and among species. Our laboratory experiment has been started by two orders of magnitude higher density of each species than those in the mesocosm experiment. Therefore, density-dependent effects experienced in the laboratory conditions may have been different from those in the mesocosms, causing bias in the observed results. As reported by Dick (1995), TA B L E 1 Number of individuals in each replicate for both species (Gammarus locusta and G. salinus) at the beginning and at the end of both experiments (i.e. mesocosm and laboratory) in all three treatments (i.e. control, one heatwave and three heatwaves), and the respective population growth (R)

Control
One heatwave Three heatwaves and might prey on congeneric in order to get nutritional profits.
Consequently, though artificial refuges were provided in our laboratory experiment, these might have been ineffective due to the high density of individuals and consequent density-dependent effects since the two studied species are potential competitors for both space and food. As at the end of the experiment, we observed the majority of individuals of G. locusta being adults, as well as unchanged abundance of G. locusta between the beginning and the end of the experiment, we believe that the predation on its juveniles occurred, affecting the population growth rate of the species. Finally, due to unpredictable behaviour of species in highly artificial environments, such as laboratory experiments, we emphasize a necessity of great caution when testing and interpreting results on species interactions and/or impact on each other.
While the overall aim of experiments is to provide essential knowledge of current and future threats to diverse communities or to study species and/or communities in general, the conducted experiments may not always be representative of natural systems (Cooke et al., 2017;Kraufvelin et al., 2006Kraufvelin et al., , 2010Kraufvelin et al., , 2020Widdicombe et al., 2010). In fact, our study raised an important question considering the reliability of our own laboratory experiment and laboratory types of experiments in general. In the mesocosm experiment, half of the tested species demonstrated tolerance to heatwaves, which included both G. locusta and G.
salinus, with only few species responding strongly negatively (see Pansch et al., 2018). However, in apparent contrast to the observed lack of sensitivity to heatwaves of G. locusta in both experimental types in our study where during the last heatwave temperature reached 25.2°C, previous laboratory studies have found high mortality rate of this species above 20°C and 22°C and suggested future global warming scenario exceeding the thermal limit of the species (Cardoso et al., 2018;Neuparth et al., 2002).
Similarly, Marenzelleria viridis, a successful invader in Baltic waters, revealed a positive effect of heatwave treatments on both its biomass and abundance in our mesocosm experiment (see Pansch et al., 2018). However, Bochert et al. (1996), using laboratory experiments, found a temperature of 20°C to be too high for a proper development of the species suggesting an abnormal growth during larval phase. Therefore, while the laboratory experiment of our study produced similar results to the mesocosm experiment when testing tolerance of species to heatwaves, our experiments were contrasting findings of previously conducted laboratory experiments testing resilience of those species using constantly elevated temperatures (Bochert et al., 1996;Cardoso et al., 2018;Neuparth et al., 2002;Pansch et al., 2018 Though some organisms may, or may not, show resistance in controlled laboratory experiments when exposed to a single, or multiple stressors, their sensitivity may change when exposed to other factors, such as the complex physical components and biotic interactions of the natural environments (Sommer et al., 2012).
Although such laboratory experiments may improve our knowledge on the physiological response of the individuals, they are not a true replication of what occurs in nature. Actually, our study strongly indicated that the same species may respond differently when tested at high density and isolated from a community than when at lower density and in the presence of a subset of a community, confirming recent finding by Wahl et al. (2020). Furthermore, inconsistency in results among laboratory experiments complicates the extrapolations and generalization of the laboratory results even more. Our findings indicate the importance of scale, density, biotic interactions and complexity of natural environments in buffering, or boosting, the direct effects of environmental stress on organisms. Therefore, we urge the use of mesocosm experiments whenever possible, and emphasize a necessity of great carefulness when interpreting and generalizing the results of laboratory experiments.

ACK N OWLED G EM ENTS
We are grateful for financial support from the Alexander von Humboldt Sofja Kovalevskaja Award to EB. Special thanks to F.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13213.

DATA AVA I L A B I L I T Y S TAT E M E N T
The primary data set containing experimental results is available