Robust online active learning

In many industrial applications, obtaining labeled observations is not straightforward as it often requires the intervention of human experts or the use of expensive testing equipment. In these circumstances, active learning can be highly beneficial in suggesting the most informative data points to be used when fitting a model. Reducing the number of observations needed for model development alleviates both the computational burden required for training and the operational expenses related to labeling. Online active learning, in particular, is useful in high-volume production processes where the decision about the acquisition of the label for a data point needs to be taken within an extremely short time frame. However, despite the recent efforts to develop online active learning strategies, the behavior of these methods in the presence of outliers has not been thoroughly examined. In this work, we investigate the performance of online active linear regression in contaminated data streams. Our study shows that the currently available query strategies are prone to sample outliers, whose inclusion in the training set eventually degrades the predictive performance of the models. To address this issue, we propose a solution that bounds the search area of a conditional D-optimal algorithm and uses a robust estimator. Our approach strikes a balance between exploring unseen regions of the input space and protecting against outliers. Through numerical simulations, we show that the proposed method is effective in improving the performance of online active learning in the presence of outliers, thus expanding the potential applications of this powerful tool.


Introduction
Predictive models often need to be trained on a large amount of labeled data before being deployed.However, in industrial applications data is often abundant only in an unlabeled form.Active learning strategies provide a solution to this problem by prioritizing the labeling of the most useful instances for building the model, thus accelerating the convergence of its learning curve [1].Active learning problems can be classified into three macro-scenarios [2].The first and most studied scenario is the pool-based scenario, where the learner can select the most useful instances to be labeled by maximizing an evaluation criterion over a closed set of observations.The second scenario is referred to as membership query synthesis, and it allows the learner to query the labels of synthetically generated instances rather than those sampled from the process distribution.Finally, the third scenario is online, or stream-based, active learning [3].In this case, the unlabeled observations are drawn sequentially by the learner, which must immediately decide whether to keep the instance and query its label or discard it.While many researchers have been working on active learning in the recent years, the pool-based scenario has received the most attention [4,5].Although online active learning has become more popular in the last few years [6][7][8][9][10], the majority of the methods have been developed for classification tasks [11].An interesting approach to online active learning for fuzzy regression models has been proposed by Lughofer [12].Other researchers tried to adapt the optimality criteria of the experimental design theory to the online active linear regression framework [13][14][15][16].Linear regression models are still very useful in industrial applications as they can be efficiently trained on a small number of observations.They are able to offer a straightforward interpretation, along with the possibility of constructing confidence intervals on the parameter estimates [17,18].They can also be easily coupled with variable selection and robust estimation methods.Furthermore, whereas many poolbased active learning approaches employ ensemble methods or complex models, linear models can support online active learning due to the decreased computational cost associated with model training and updating.The main difference among the query strategies lies in how they assess the usefulness of an unlabeled instance when the learner samples it from the data stream.Another important aspect is the assumptions on the input distribution.Indeed, despite the increased interest in the online active linear regression framework, the performance of the sampling strategies in the presence of outliers has not been thoroughly explored.The few works we are aware of that analyze this issue, are related to the pool-based scenario.Deldossi et al. [19] highlighted how sampling methods based on D-optimality are affected by outliers and high leverage points.Zhao et al. [20] focused on robust active representations based on the 2,p -norm constraints for selecting highly representative data.Finally, He et al. [21] emphasized the problem of being prone to sample outliers while proposing a semi-supervised active learning strategy for multivariate time series classification, using uncertainty and local density.
In this paper, we study the problem of learning from contaminated data streams with limited sampling resources.We first investigate the effects of outliers on the sampling decisions made by state-of-the-art online active learning approaches for linear regression, and successively propose a solution for this issue.It should be noted that the presence of outliers considered in this work cannot be tackled using traditional anomaly detection methods.Indeed, most unsupervised anomaly detection strategies rely on the assumption that a large training set free from outliers, usually referred to as phase I data in the statistical process control literature, is available beforehand [22][23][24][25].However, this assumption is violated in many practical applications [26], especially in label-scarce scenarios where few to no labels are available before the beginning of the active learning routine.The proposed strategy for online active learning utilizes a double-threshold approach to limit the search area of a conditional D-optimality algorithm (CDO).By using two thresholds, the strategy aims to identify informative data points while excluding outliers.In cases of highly contaminated environments, robust estimators based on the Huber and Tukey bisquare loss are employed.
The remainder of this paper is organized as follows.In Section 2, we introduce the terminology and describe the sampling strategies that are used as the baseline in our analysis.Section 3 offers a review on the use of robust estimators and introduces ways of modifying the CDO algorithm.In Section 4, we test our approach using numerical simulations in four scenarios, using different contamination ratios.Section 5 offers a discussion on the results obtained.Finally, Section 6 provides some conclusions.

Background and related work
The labeled observations that are collected from the contaminated data stream are used to fit a linear model of the form where y is an n×1 vector of response variables, X is an n×p model matrix, β is a p×1 vector of regression coefficients, and ε is an n × 1 vector representing the zero-mean Gaussian noise.Here, n represents the total number of observations, and p the number of variables.Before starting the active learning routine and the collection of additional labels, we assume to have at our disposal an initial set of labeled observations as in [5,27,28].This set is used to obtain an initial estimate β for the coefficients β.Using an ordinary least squares (OLS) estimator, we have that β = X X −1 X y.Then, the fitted linear regression model is y = X β, and the residuals are obtained as e = y − y.
When the variables are highly correlated, a pre-whitening might be performed to avoid an ill-conditioned problem when computing X X −1 .It should be noted that the matrix X X is important to obtain information about the design geometry.In particular, for a design composed of n runs, the moment matrix, M = (X X)/n, plays a central role in the definition of optimal experimental designs.The two most commonly employed optimality criteria, which have been adapted for the online active learning scenario, are A-optimality and D-optimality.An A-optimal design is achieved by minimizing the trace of the inverse of the moment matrix M. It can be shown how this corresponds to minimizing the individual variances of the estimated coefficients.This approach has been adapted for the online active linear regression framework by Riquelme et al. [14].They proposed a norm-thresholding algorithm that only selects observations x with large, scaled norm by estimating a threshold Γ as where α is the ratio of observations we are willing to label out of the incoming data stream.The probability distribution of the norms can be approximated using kernel density estimation (KDE) on a set of unlabeled observations C, which can be regarded as a warm-up or calibration set and can either be retrieved from historical data or by observing the data stream for a while.Using this thresholding approach, we would be sampling, with high probability, observations that help achieve A-optimality.Given n statistics, (s 1 , . . ., s n ), KDE can be used to estimate the shape of an unknown distribution f using where the bandwidth h is a positive number that is used to control the amount of smoothing, and the kernel K is a smooth function such that K(s) ≥ 0, K(s)ds = 1, sK(s)ds = 0 and σ 2 K ≡ s 2 K(s)ds > 0. In this paper, the Gaussian (Normal) kernel, K(s) = (2π) −1/2 e −s 2 /2 is used.D-optimality is another fundamental criterion [29], which takes both the variances and covariances of the model coefficients into account by maximizing the determinant of the moment matrix M. As in the case of A-Optimality, D-Optimality has been adapted to the online active learning scenario with the proposal of a conditional Doptimality (CDO) algorithm [16].CDO suggests setting a threshold Γ by using where X l is the model matrix with the l labeled observations currently available and x l+1 is the unlabeled data point that is under evaluation.It can be shown that by selecting observations that maximize x l+1 X l X l −1 x l+1 , we are at the same time seeking D-optimality and labeling observations with a large unscaled prediction variance (UPV) [30], which is generally defined as where x (m) represents the data point where the UPV is being estimated, expanded to the model form (e.g., if polynomial features are added to the model).To estimate the threshold Γ, we use KDE after computing the UPV of all the observations in C.
The CDO intuition is coherent with the idea that a point for which we have a large UPV value represents a less explored region of the input space and will help, with high probability, attaining D-optimality, conditional on the already collected observations.The equivalence between sampling data points with high UPV and D-optimality is demonstrated in [16].Given these preliminaries, we now propose methods that are robust to the presence of outliers in the data stream.

Methods
When training a linear regression model on a dataset corrupted by the presence of outliers, a simple yet effective solution is to resort to the use of robust estimators.An extensive overview of robust regression has been provided by Fox and Weisberg [31].In general, robust estimation methods attempt to estimate the coefficients β by minimizing a particular loss function given by where ρ is a function that regulates the contribution of each residual to the loss, and e i is the residual for the ith observation (x i , y i ).The function ρ is nonnegative, equal to zero when the argument is zero, symmetrical and monotone in |e|.In the case of an OLS estimator, the loss is given by It can be seen how the objective function minimized by an OLS estimator is equally affected by all the observations for which we measure the residuals.Instead, robust estimators try to reduce the impact of observation with very large residuals on the estimation of β.One of the most popular robust loss functions is the Huber loss [32], which is defined as where k is a tuning parameter, which is usually set to 1.345σ to achieve 95% efficiency when the errors are normally distributed while keeping good protection against outliers [31].It can be seen how the contribution of each observation is reduced based on the magnitude of the corresponding residual.However, despite being much more robust than the OLS estimator, the Huber loss is still proportional to the magnitude of the residuals even when the absolute errors are larger than k.Conversely, the Tukey bisquare loss function [33] sets a threshold for the residuals, above which the value of the residuals does not influence the loss.The Tukey loss function is given by for |e| ≤ k where the value of the tuning constant k is usually set up to 4.685σ [31].Besides using a Huber or Tukey loss to obtain a robust estimator, we consider the possibility of filtering out outliers while selecting the most informative observations from the data stream.To this extent, we propose an adaptation of the CDO algorithm, where instead of estimating a threshold, we define a bounded area of interest for the unscaled prediction variance of an observation as This approach is hereinafter referred to as bounded CDO.The idea is coherent with the method proposed by Hoaglin and Welsch [34,35] of considering as potential outliers observations for which x i X X −1 x i ≥ 2p/n is verified.The filtering approach suggested by Hoaglin and Welsch is also used by Deldossi et al. [19], in the offline scenario.Here, instead of opting for a fixed value for Γ 2 , we use KDE with a Gaussian kernel to estimate Γ 1 and Γ 2 .The upper limit Γ 2 is selected by determining a cut-off value c, which is related to the amount of protection against outliers that we would like to achieve.This value is a tuning constant similar to the k used by robust estimators and, when possible, should be selected by exploiting previous knowledge of the process.Given the cut-off value c and the sampling rate α, Γ 2 is given by the 100(1 − c)% percentile, and Γ 1 by the 100(1 − c − α)% percentile.As anticipated in Section 2, the threshold estimation is based on a set of unlabeled data, which is also used to estimate the covariance matrix Σ and whitening the observations to remove dependencies and facilitate the estimation of β.At this stage, semi-supervised methods might also be considered to perform tasks like feature extraction and exploit all the information available in the unlabeled data [21,[36][37][38][39].
Algorithm 1 provides a detailed explanation of how to implement the bounded CDO strategy for online active learning in a fixed-budget setting.The strategy involves collecting new labels and incorporating them into the design until a specified budget constraint B is reached.In some cases, it might be beneficial to anticipate the stop of the active learning routine if the marginal improvement of the model is no longer significant [40].Previous studies have proposed various stopping criteria to enhance the efficiency of data collection schemes based on active learning [41][42][43][44][45]. Appendix A explores how some of these approaches could be adapted to the regression framework.From a computational standpoint, the update of β is done by means of a complete retraining each time a new labeled example is added to the design.However, if the data matrix becomes considerably large and the time required for model updates increases, one may opt to update the model and estimate new thresholds when a batch of new observations is collected, aligning with the principles of batch-mode active learning [46].Additionally, incremental and recursive updating techniques can also be considered for improving computational efficiency.Observe the ith data point x i ∈ S Whiten x i by computing Ask for the label y i and augment the labeled dataset: Pay for the label Update thresholds Γ 1 , Γ 2 using the augmented design else Discard The estimation of the UPV can be modified by taking into account the weight matrix obtained from the robust estimators.The weighted UPV (UPV w ) is estimated as follows where W represents the weight matrix used to downweigh the influence of outliers in the estimation of the regression parameters [31].Each element of the weight matrix W is a positive number that determines the weight given to each observation in the regression analysis.Larger weights correspond to observations with less outlierlike behavior, while smaller weights correspond to observations with more outlier-like behavior.The weight matrix W is a diagonal matrix, where each diagonal element corresponds to the weight assigned to a particular observation.In the case of an OLS estimator, we have W = I k , as the weight given to each observation is not sensitive to the residual.In other words, w OLS (e) = 1, regardless of the specific residual observed.Then, to select the most informative observations while seeking protection against outliers, instead of estimating a single threshold, we define a bounded area of interest for the UPV of an observation as follows 4 Experiments In the experiments, we evaluate the performance of the active learning strategies in four scenarios, according to the percentage of outliers affecting the data stream.We compare the bounded CDO strategy, coupled with OLS and robust estimators, to the norm-thresholding approach, standard CDO, and random sampling.When using random sampling, each time a new sample arrives, a number r ∼ U(0, 1) is generated and the data point is only selected if r ≥ 1 − α,where α represents the labeling or sampling rate.The sampling strategies based on the use of robust estimators select the most informative data points using the standard UPV, as in Equation 10.The results obtained with the weighted prediction variance, UPV w , were very similar and are included in the Appendix B for completeness.All the approaches receive as input the same random design and then they iteratively collect labeled observations until the budget constraint B is met.The number of observations contained in the initial design is equal to p + 2, where p is the number of process variables.We analyzed both the case of the initial design being outliers-free and contaminated.The results assuming the presence of outliers also in the initial design are included in the Appendix C. For each simulated scenario, the ith observation for the process variables, here considered a row vector, is generated according to a joint multivariate normal distribution where Σ 0 is given by σ 2 x I.The corresponding response is obtained using For normal data points, we used σ x = σ ε = 1 for both input and output variables, and, for simulating outliers, we set σ x = σ ε = 3.Moreover, for each of the true coefficients of the underlying model, we assumed β ∼ U (−5, 5) for normal data points and β ∼ U (10, 15) for outliers.Similarly to [19], the outliers are introduced in the data stream in the form of isolated covariate and concept shifts.That is, an anomalous data point is a point for which we have both a larger variation in the input space and a different relationship with the corresponding response variable.In the simulated scenarios, outliers are randomly distributed in the data stream according to a pre-defined percentage describing the contamination level of the environment.The performance of the models is expressed, in predictive terms, by the root mean squared error (RMSE) of the predictions on a separate test set, only composed of normal observations.This is coherent with the objective of trying to understand the true underlying relationship between predictors and response, and not the erroneous one that could be derived from the outliers.
The effectiveness of the proposed approach is evaluated by comparing the learning curves reporting the average RMSE values for each learning step, which are obtained using 1000 simulations for each scenario.A learning step indicates the acquisition of a new labeled observation and its inclusion in the training set.Hence, at each step, we are comparing models that are trained using the same number of labeled examples.We set the number of process variables equal to 20, the budget constraint B equal to 50, and the warm-up length m to 500.The warm-up length indicates the number of unlabeled observations that are used to estimate the covariance matrix Σ that is used for pre-whitening the observations.With regards to the sampling rate, we used α = 5% for all the sampling strategies and c = 5% for the protection cut-off value used by the bounded CDO algorithm.We selected 5% as it is a commonly employed value, especially when no previous specific knowledge is available.

No outliers
We first evaluated the query strategies to assess their performance in the absence of outliers.Consistently with the findings reported in [16], our results in Figure 2 indicate that the standard CDO algorithm performs best when there are no outliers in the data stream.The use of robust estimators does not provide any added value in this scenario.Both the Huber and Tukey estimators are unable to outperform the bounded CDO strategy with the OLS model, which in turn is only marginally worse than the standard CDO.In Figure 2 , plots (a) and (b) represent the strategies that rely on the OLS models, while plots (c) and (d) show the strategies that use robust models, with the bounded CDO based on OLS included for comparison.

0.275% outliers
The second scenario depicts a circumstance where only a modest fraction of the data stream is represented by outliers.We can see from plot (a) of Figure 3 how the performance of the norm-thresholding and the CDO algorithm is dramatically worsened, as they are both prone to sample outliers.The random strategy seems to be a better option and the bounded CDO strategy offers the best results.In plots (c) and (d) of the same figure, we can see the comparison with the results obtained from the robust estimators.In this scenario, using a robust estimator does not seem to offer a significant improvement over the bounded CDO strategy based on OLS.Indeed, the learning curves obtained with the bounded strategy employing the OLS estimator and the Huber estimator are very similar.

1% outliers
The third scenario reports a worse situation, where the process is affected by a large number of outliers, i.e. 1% of the total number of observations.The results in Figure 4 are similar to the ones from the previous scenario, with the exception that now the gap between bounded CDO and random sampling is much wider.This should be due to the fact that uniformly sampling observations with α = 5% would most certainly lead to the inclusion of a greater number of outliers in the training set.As per the robust estimators shown in the plots (c) and (d) of Figure 4, it is possible to see how the use of robust estimators now offers an evident value-added, also when compared to the OLS-based bounded CDO.While the learning curves are more or less overlapping in the first five learning steps, the models fitted using the Huber and Tukey losses are yielding a lower prediction error in the remaining steps.

5% outliers
The final scenario simulates a pathological case, where 5% of the observations from the data stream are outliers.The results from the third scenario are exacerbated here.In the case of the OLS estimators, the bounded CDO is still the best strategy, being the only one with a descending learning curve (plots (a) and (b) of Figure 5).Instead, from the plots (c) and (d) of Figure 5 we can see how the robust estimators are able to improve the results obtained with the bounded CDO strategy.In this circumstance, there is not a clear distinction between the Huber and the Tukey models.

Discussion
The experiments presented in this study aimed to evaluate the performance of different active learning strategies in the presence of outliers in a data stream.The results showed that the standard CDO algorithm performed best in the absence of outliers, while the bounded CDO strategy coupled with OLS and robust estimators provided better results when outliers were present.In scenarios where an initial training set free from outliers is available and only a modest fraction of the data stream is represented by outliers, the bounded CDO strategy employing an OLS estimator seems to be the better option.Conversely, in the case of a larger contamination level, sampling strategies based on robust estimators yield the best results.When using robust estimators, for our datasets we did not find solid evidence that using a weighted prediction variance is an advantage.Another interesting observation is that, in the presence of outliers, the standard OLS methods (random, norm-thresholding, and CDO) never converge to the results obtained with the robust query strategies.This is because they tend to accumulate outliers in the training set, which degrade the predictive performance as the model is not allowed to forget old or redundant data.The findings from this study have important consequences for practical applications of active learning strategies, especially in contexts where the data stream is contaminated by outliers.The results suggest that the choice of the active learning strategy should depend on the level of contamination of the data stream.When the data stream is free from outliers, the standard CDO is a good strategy.However, even when a modest fraction of the observations is corrupted, bounding the search area of the active learning algorithm or using robust estimators might be necessary.Overall, this study provides valuable insights into the performance of active learning strategies in the presence of outliers and can inform the development of more effective approaches for real-world applications.However, it is worth noting that the simulations were based on specific assumptions about the data generation process and may not fully capture the complexity of real-world data streams.Further research is needed to validate these findings on real-world datasets and to investigate the generalizability of the proposed approach.

Conclusion
In many real-world problems, data is only available in an unlabeled form, and acquiring the labels is often an expensive and time-consuming task.In these circumstances, active learning is able to reduce the computational burden required to achieve compelling predictive performance by selecting the most informative data points to query.
In this paper, we analyze the online active learning framework when the data stream is corrupted by the presence of outliers.In general, we show how the presence of outliers dramatically worsens the performance of the currently proposed methods for active linear regression.To tackle this issue, we propose a modification of the CDO algorithm that filters the outliers, while still focusing on the most promising observations based on the concepts of D-optimality and prediction variance.The analysis shows how this solution is sufficient to make the CDO strategy robust to a modest presence of outliers.When the percentage of outliers in the data stream is higher, the best results are obtained by coupling the bounded CDO strategy with a robust estimator.In general, the proposed approaches can effectively solve the problem of outliers contaminating the data stream, without adding computational complexity compared to the original CDO strategy.

Appendix A Stopping criterion
In real-world applications of active learning, if we do not have an explicit operational budget on the number of experiments that can be run, it can be challenging to determine when to stop collecting new labels due to the unavailability of the true learning curves.To address this problem, it is beneficial to approximate the learning curve using proxy measures.In this study, we investigate the use of two proxy measures.Firstly, we propose monitoring the slope of the stabilization score, drawing inspiration from the stabilizing predictions [47] and validation set agreement [48] methods employed in classification.In the regression framework, we calculate the stabilization of predictions by averaging the sum of squares of the differences between the predictions of the w most recent pairs of models.Similarly to [47], we utilize a window size of 3 (w = 3).The values being compared are the predicted values of the calibration set C, obtained through successive models.As the examples in C are not used in the annotation process, this curve is solely influenced by the impact of selected and labeled examples on training new models.Essentially, this curve monitors when the predictions from models trained with newly included observations start producing highly similar results.The stopping rule can then be determined through visual inspection of the curve, by setting a tolerance for the sum of squares not improving or approaching zero, or by applying a hypothesis testing procedure.Another performance-based metric we consider is the leave-one-out cross-validation (LOO-CV) score obtained by the model on the currently available labeled observations.While this technique relies on ground-truth labels and may appear advantageous, it may not be the optimal choice if the collected training set is biased or does not accurately represent the real data distribution [49].On the other hand, the stabilization score, despite not relying on real labels, could be more reliable if the calibration set C follows the population distribution.Figure A1 demonstrates the effectiveness of the two proposed methods in approximating the true test error curve, offering valuable insights for determining when to halt the active learning routine.

Figure 1
Figure1depicts a general online active learning flowchart.The main difference among the query strategies lies in how they assess the usefulness of an unlabeled instance when the learner samples it from the data stream.Another important aspect is the assumptions on the input distribution.Indeed, despite the increased interest in the online active linear regression framework, the performance of the sampling strategies in the presence of outliers has not been thoroughly explored.The few works we are aware of that analyze this issue, are related to the pool-based scenario.Deldossi et al.[19] highlighted how sampling methods based on D-optimality are affected by outliers and high leverage points.Zhao et al.[20] focused on robust active representations based on the 2,p -norm constraints for selecting highly representative data.Finally, He et al.[21] emphasized the problem of being prone to sample outliers while proposing a semi-supervised active learning strategy for multivariate time series classification, using uncertainty and local density.In this paper, we study the problem of learning from contaminated data streams with limited sampling resources.We first investigate the effects of outliers on the sampling decisions made by state-of-the-art online active learning approaches for linear regression, and successively propose a solution for this issue.It should be noted that the presence of outliers considered in this work cannot be tackled using traditional anomaly detection methods.Indeed, most unsupervised anomaly detection strategies rely on the assumption that a large training set free from outliers, usually referred to as phase I data in the statistical process control literature, is available beforehand[22- 25].However, this assumption is violated in many practical applications[26], especially in label-scarce scenarios where few to no labels are available before the beginning of the active learning routine.The proposed strategy for online active learning utilizes a double-threshold approach to limit the search area of a conditional D-optimality algorithm (CDO).By using two thresholds, the strategy aims to identify informative data points while excluding outliers.In cases of highly contaminated environments, robust estimators based on the Huber and Tukey bisquare loss are employed.The remainder of this paper is organized as follows.In Section 2, we introduce the terminology and describe the sampling strategies that are used as the baseline in our analysis.Section 3 offers a review on the use of robust estimators and introduces ways of modifying the CDO algorithm.In Section 4, we test our approach using numerical simulations in four scenarios, using different contamination ratios.Section 5 offers a discussion on the results obtained.Finally, Section 6 provides some conclusions.

1
end while Estimate covariance matrix Σ from C and perform eigendecomposition Σ = UΛU Whiten the initial design by computing Z = Λ −1/2 U X Whiten the calibration set by computing V = Λ −1/2 U C Estimate Γ 1 , Γ 2 by estimating the UPV of the model trained on Z on the points in V while b ≤ B and i ≤ |S| do With a Huber estimator, w H (e) = 1 if |e| ≤ k and w H (e) = k/|e| if |e| > k.Finally, with a Tukey model, w T (e) = 0 if |e| > k and to w T (e) = 1 − (e/k) 2 2 if |e| ≤ k.

Fig. 2
Fig. 2 Comparing query strategies in the absence of outliers: results from 1000 simulations.Plots (b) and (d) offer a closer view on the two best strategies from plots (a) and (c), respectively, with shaded regions indicating the standard deviation across the simulations.

Fig. 3
Fig. 3 Comparing query strategies with 0.275% outliers (1000 simulations).Plots (b) and (d) offer a closer view on the two best strategies from plots (a) and (c), respectively, with shaded regions indicating the standard deviation across the simulations.

Fig. 5
Fig. 5 Comparing query strategies with 5% outliers (1000 simulations): results from 1000 simulations.Plots (b) and (d) offer a closer view on the two best strategies from plots (a) and (c), respectively, with shaded regions indicating the standard deviation across the simulations.

Fig. 4
Fig. 4 Comparing query strategies with 1% outliers (1000 simulations): results from 1000 simulations.Plots (b) and (d) offer a closer view on the two best strategies from plots (a) and (c), respectively, with shaded regions indicating the standard deviation across the simulations.

Fig. C5
Fig. C5 Comparing query strategies with 0.275% outliers (1000 simulations).Plots (b) and (d) offer a closer view on the two best strategies from plots (a) and (c), respectively, with shaded regions indicating the standard deviation across the simulations.

Fig. C6
Fig. C6 Comparing query strategies with 1% outliers (1000 simulations): results from 1000 simulations.Plots (b) and (d) offer a closer view on the two best strategies from plots (a) and (c), respectively, with shaded regions indicating the standard deviation across the simulations.

Fig. C7
Fig. C7 Comparing query strategies with 5% outliers (1000 simulations): results from 1000 simulations.Plots (b) and (d) offer a closer view on the two best strategies from plots (a) and (c), respectively, with shaded regions indicating the standard deviation across the simulations.