Quantifying the responses of biological indices to rare macroinvertebrate taxa exclusion: Does excluding more rare taxa cause more error?

Abstract Including or excluding rare taxa in bioassessment is a controversial topic, which essentially affects the reliability and accuracy of the result. In the present paper, we hypothesize that biological indices such as Shannon–Wiener index, Simpson's index, Margalef index, evenness, BMWP (biological monitoring working party), and ASPT (Average Score Per Taxon) respond differently to rare taxa exclusion. To test this hypothesis, a benthic macroinvertebrate data set based on recent fifteen‐year studies in China was built for suppositional plot analyses. A field research was conducted in the Nansi Lake to perform related analyses. The results of suppositional plot simulations showed that Simpson's index placed more weight on common taxa than any other studied indices, followed by Shannon–Wiener index which remained a high value with the exclusion of rare taxa. The results indicated that there was not much of effect on Simpson's index and Shannon–Wiener index when rare taxa were excluded. Rare taxa played an important role in Margalef index and BMWP than in other indices. Evenness showed an increase trend, while ASPT varied inconsistently with the exclusion of rare taxa. Results of the field study also indicated that rare taxa had few impacts on the Shannon–Wiener index. By examining the relationships between the rare taxa and biological indices in our study, it is suggested that including the rare taxa when using BMWP and excluding them in the proposed way (e.g., fixed‐count subsampling) to calculate Shannon–Wiener index and Simpson's index could raise the efficiency and reduce the biases in the bioassessment of freshwater ecosystems.

Many researchers have paid attention to rare taxa (Jiang, Song, Xiong, & Xie, 2014) while also wondering if rare taxa could be removed during subsampling (Chen, Hughes, & Wang, 2015). Gauch (1982) suggested that rare species added noise to the statistical solution. Using common taxa to interpret patterns of disturbance or ecosystem degradation is current method of bioassessment (Marchant, 2002). However, Cao, Larsen, and Thorne (2001) argued that sample size and the rules for excluding rare species before conducting multivariate analyses need to be evaluated carefully for their unintended influences on the outcome of the analysis. Poos and Jackson (2012) summarized different viewpoints between inclusion and exclusion of rare taxa, supporting that better justifications for the removal of rare species are needed to move bioassessment forward. However, previous studies mostly focused on the responses of multivariate analyses to inclusion and exclusion of rare taxa (Cao et al., 2001;Poos & Jackson, 2012). Although biological indices are used in many methods of bioassessment, for example multi-metric indices (Chen et al., 2014), studies on responses of biological indices to rare taxa exclusion are seldom reported. Emphases of indices are different, and some indices put more weight on common species, while some do not. We therefore hypothesize that biological indices respond differently to rare taxa exclusion.
Before testing our hypothesis, a data set with more species than even a natural "high diversity" site is needed to allow generalization and simulation. Obviously, few current data sets meet these requirements.
To overcome the difficulty of simulations, a newly formed data set of benthic macroinvertebrates in China is utilized in this study. The data set based on fifteen-year (2001-2015) data was used to examine the responses of classical diversity indices and macroinvertebrate-based indices to rare taxa exclusion at several levels of rarity and several sizes of fixed-count. According to classic species-area relationship (SAR, Arrhenius, 1921), the number of rare taxa would increase (or decrease) with the increased (or decreased) sample size. Sampling in field research and subsampling in the laboratory are the determinations of the number of rare taxa. We excluded rare taxa on the levels of rarity stepwise to simulate the shrinkage of sample size. We also randomly selected fixed number of individuals from total sample to simulate the fixed-count subsampling. In order to test our hypothesis in a real natural world, a field research in Nansi Lake was also conducted. Our work offers a new perspective to other researchers by using a large-scale sample size accompanied by a field research which is never used before to test the responses. This work may simplify procedures that are used during field sampling and subsampling and direct future efforts to develop bias-reduction sampling methods for biological indices.

| Published data collection
The data set originated from published scientific literature on macroinvertebrate in the last fifteen years (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) [China]. Combining these approaches yielded a total of 1,115 papers. After the manual elimination of research on marine and estuary ecosystems, remained papers were chosen as our sample population. Second, the chosen studies were classified based on the period of publication and type of ecosystems in order to collect the detailed information for further analyses (Figure 1). A total of 374 studied sites in 219 papers were displayed chronologically on a map using ArcGIS 10.2. Among the 374 studied sites, 220 sites with detailed taxa information were selected as the origin of the data set (Appendix S1) which was considered as a "suppositional plot" in the following analyses. The suppositional plot was consisted of all taxa occurred in the data set, including three groups of taxa: Annelida, Insecta, and Mollusca. The "frequency" of taxa presented in the data set was regarded as the "abundance" of the suppositional plot. For example, Branchiura sowerbyi was presented in 133 times in different sites, and then the abundance of Branchiura sowerbyi was considered as 133. Crustacea was not listed on the inventory because it was not recorded in most studies. The data of our field research were not contained in the data set because all data set was based on the published data.

| Rare taxa exclusion and subsampling simulation
According to the review of bioassessment methods in collected papers, diversity indices were much more popular than other methods in the past 15 years in China. Diversity indices such as the Shannon-Wiener index, the Simpson's index, the Margalef index, and evenness (Table 1) were calculated from the total sample (true value) in this study. Biological monitoring working party (BMWP) score which has been published as a standard method by an international panel (ISO-BMWP, 1979) is a simple, rapid but not common index in China.
We wondered if it would respond differently to the exclusion of rare taxa from diversity indices. Hence, biological monitoring working party (BMWP) and Average Score Per Taxon (ASPT), which were macroinvertebrate-based qualitative indices, were also involved in the calculation ( Table 1). The BMWP score is calculated by summing the scores for each family represented in the sample. The ASPT indicates the average sensitivity of the families and can be determined by dividing the BMWP score by the number of taxa present.
Two types of simulations were conducted for the exclusion of rare taxa. In the first simulation, we excluded taxa stepwise on the level of rarity, calculated the indices of remained taxa, and compared the simulated value with the true value. Defining the rare taxa is the most important part in this simulation. At present, various criterions exist to define common and rare species (Table 2). We ranked all taxa with their frequency and tested all possible criterions for defining rare taxa ( Figure 3). Taxa of which frequency was lower than 10 were defined as rare taxa in this study ( Figure 3). The reasons why a medium value was considered as the demarcation of rare taxa in this study were as follows: (1) an excess of taxa would be excluded following A and B; (2) lacking output could be gained following D and E; and (3) C might be the most reasonable selection among the five criterions; however, based on our pre-analysis, the output was still a little more than our expectation. Considering that moderate shifts in relative abundance do not affect the general conclusion (Magurran 2004), ten is the best demarcation of rare taxa in this study. The taxa of which frequency was more than 10 were defined as common taxa, and their details are given in Appendix S2.
Subsampling is used as an effective way to limit sampling error and reduce workload in a wide range of subjects, including benthic macroinvertebrate sampling (Barbour & Gerritsen, 1996;Doberstein, Karr, & Conquest, 2000;Keen et al., 2014;Petkovska & Urbanič, 2010;Petreman, Jones, & Milne, 2014;Sovell & Vondracek, 1999;Wood & Wilmshurst, 2016). But little attention is paid to subsampling in the laboratory in China. Therefore, fixed-count subsampling was conducted in the second simulation. In this simulation, taxa were randomly selected from the total sample under different fixed-count sizes following the selection method in Appendix S4. Because the high richness in the suppositional plot is not available for most studies at site scale and the size of fixed-count was usually settled at 100-300 individuals (Barbour, Gerritsen, Snyder, & Stribling, 1999;Plafkin, Barbour, Porter, & Hughes, 1989), we conducted a pre-analysis to determine the demarcation by using the inflection point criterion. Sizes from 100 to 5,200 in increments of 100 were randomly selected from total sample. The ratio of selected richness and total richness showed significant inflection at 300 and 1,000 individuals ( Figure 4). Therefore, the sizes of fixed-count simulation were determined from 300 to 1,000 in increments of 100. The random selection was conducted 30 repetitions at each increment to compensate for the bias suppositional plot.

| Field research
A field study was conducted in Nansi Lake (116°34′E-117°21′E, The Shandong Province is characterized by a temperate monsoon climate, with an average annual precipitation of 600 mm and a mean temperature between 13.5°C and 15°C. Nansi Lake has a total surface water area of 1,226 km 2 and the catchment area of 31,700 km 2 .
The collection of macroinvertebrates was performed using the T A B L E 1 Six biological indices applied in the simulations

Indices
Calculation method References

Ricotta and
Avena (2003) BMWP Summing the scores for each family (Appendix S3) ISO-BMWP T A B L E 2 Criterion for defining common and rare taxa
A total of 58 samples with taxa richness ranging from 6 to 32 at each site were collected during the field research. According to work of Cao, Williams, and Williams (1998), the indices of disturbed sites will not significantly vary when rare taxa are excluded because few rare taxa exist in these sites. However, we wondered how indices respond to the exclusion of rare taxa in undisturbed sites. Different from the suppositional plot and Cao's work (1998), we did not mix all richness and abundance of 58 samples together to define the rare taxa but excluded the rarest taxa in each site, and then the second rarest and so on. One of the drawbacks of this method is common taxa will be excluded in disturbed sites where no rare taxa appear.
Hence, the results of potential disturbed sites should not be involved in further analyses. Referring to the number of common taxa (n = 15) in work of Cao et al. (1998) and the average number of rare taxa per site (n = 15) in Heino's work (2008), the richness in site ranging from 16 to 30 is considered as a reasonable scope for our analyses. Samples in these sites were calculated in order to explore the percentage of simulated values (simulated value/true value). In tests of effects of exclusion of rare taxa using the data of his part, the most distinctive index (Shannon-Wiener index) would be used, aiming to avoid the same results of simulation using that data of the suppositional plot.
We defined true value 95% was an acceptable simulated value for applicable usage in normal bioassessment.

| Field research
According to the result of suppositional plot simulation, we decided to calculate Shannon-Wiener index of field research data (Appendix S6).

The percentages of simulated values (simulated value of Shannon-
Wiener index/true value of Shannon-Wiener index) gradually decreased with the exclusion of rare taxa. In a high-level taxa exclusion condition (10 taxa excluded), the percentages were all equal to or greater than 95% in the sites with the richness ranging from 21 to 30 (Figure 7).

| DISCUSSION
Over 200 studies on benthic macroinvertebrate in freshwater ecosystems were carried out and showed an increase in investigations of macroinvertebrates over the fifteen-year period (Figure 2). During the progress of ecosystem restoration, researchers attempted to seek more integrated measures to assess the level of recovery. Apparently, research in eastern China, where many freshwater ecosystems have suffered from various damages, played a leading role due to the abundant water sources and anthropogenic activities (Wang, Shen, Niu, & Liu, 2009;Ye, Li, Zhang, & Zhang, 2011). Along with increasing research, the streams and lakes on plateaus in the southwestern China (Cao et al., 2016;Wang, Cai, Tang, Yang, & Li, 2012) even the headwater on the Tibet plateau (Jiang, Xie, et al. 2014;Meng, Jiang, Xiong, Wu, & Xie, 2016;Wu, Zhang, & Wang, 2015) were involved in these sorts of studies in recent five years. As found in our analysis, it is an optimistic situation that the distribution of the studies became more even.
According to our simple review, the methodology in early studies on macroinvertebrates was poorly described or even neglected by researchers and sampling methods were discrepant in China. A standard sampling method, for example Hughes and Peck (2008), listed a range of details for sampling along with other researchers (Buss et al., 2015;Li, Liu, Hughes, Cao, & Wang, 2014), is an important step in the study, and should be generalized in China. By using standard methods, the data could become more interchangeable, and the sampling error could be reduced (Chen et al., 2015). We inferred that subsampling had been used in much Chinese research because of the numerous macroinvertebrate individuals and extensive counting work, whereas the absence of detailed descriptions of the methodologies made the results hardly standardized. Moreover, fixed-count subsampling is an efficient method, although shortcomings still exist; for example, the subsamples should be performed in a random condition (Chen et al., 2015), but are not completely randomly selected during subsampling.
The increase of the calculated indices attributed to the deliberate selection of rare species usually leads to decreases in sensitiveness during subsampling. The advantage of rare taxa exclusion during subsampling in our research shields the contrived treatment for rare taxa.
Both the rare taxa exclusion simulation and fixed-count simulation had a relative high H ′ (Tables 3 and 4)  However, there are also other problems in the process of bioassessment. Firstly, because the climatic conditions and disturbance levels were significantly different from site to site, the surrounding environment would undoubtedly cause various effects on the macroinvertebrate community structure. In our study, only the feature of community structure but no environmental variables was considered.
Rare taxa accounted for a large portion of the total richness and abundance in the suppositional plot, which means that excluding rare taxa might blindly cause unmeasurable and confounding errors.
In conclusion, our study provides a critical test for the responses of biological indices to rare macroinvertebrate taxa exclusion. Responses of indices vary from each other when rare taxa are excluded. Our study indicates that including the rare taxa when using rapid qualitative index (BMWP) and excluding them in the proposed way (e.g., fixed-count subsampling) to calculate diversity indices (Shannon-Wiener index and Simpson's index) could raise the efficiency, reduce the workload, and avoid biases in the bioassessment of freshwater ecosystems.

ACKNOWLEDGMENTS
Thanks to the editor and three reviewers for their constructive comments on the manuscripts. We also thank Wiley Editing Service for language polishing. This study is financially supported by the "Major