Tutorial overview of simple, stratified, and parametric bootstrapping

Students pursuing baccalaureate degrees in electrical engineering and computer engineering are required to take a course in probability and statistics. While the course continues to be mostly conceptual, author started initiatives to introduce data analytics in this course with special emphasis on machine vision applications. Topics such as receiver operating characteristics curves and hypothesis testing are covered through examples and exercises with students having individual datasets. Continuing with this theme, bootstrapping and associated methodologies have now been introduced to facilitate interpretation of machine vision experiments. A demo created that illustrates simple, stratified, and parametric bootstrapping as a means to understand the statistics of a machine vision sensor is presented. It encompasses a number of conceptual topics such as random variables, densities, parameter estimation, chi square testing, etc. alongside data analytics offering a holistic picture of machine learning and machine vision to the undergraduate students.

Bootstrapping involving two cohorts of data as in the case of the ROC analysis of a machine vision receiver needs a much more detailed approach. 7,10,11 Pedagogy and didactics of bootstrapping of AUC are not generally covered in courses because of the reliance on software packages to carry out bootstrapping. Coverage of bootstrapping is also missing from the standard textbooks used in engineering probability courses. [12][13][14][15] In addition, literature is abound with terms such as stratified bootstrapping and parametric bootstrapping making it necessary to offer a detailed pedagogical perspective of bootstrapping so that it is well understood. [16][17][18] The applications of bootstrapping can be found in many areas of science, engineering, medicine, social sciences, etc. 7,9 It is essential that students be exposed to the details of bootstrapping so that it becomes easy to transition to undertake statistical analysis when they enter their chosen professions.
This manuscript reports on a demo created for the students in an undergraduate course in electrical engineering and computer engineering probability at Drexel University.

BACKGROUND
Inferences and predictions are often made on the basis of a single experiment. Of specific interest in machine learning problems is the ability to predict the presence or the absence of a target in front of a machine vision sensor based on the data collected. 6 This is accomplished by conducting an experiment which consists of collecting data from a sensor when a target is in its field of view as well as collecting data when the target is absent. The sensor may be wireless, radio frequency, infrared, or ultrasonic. 2,3 In all these cases, because of the uncertainties either from multipath scattering or other forms of unwanted interference, the data from the two sets, target present and target absent, will overlap. 2,5,19 A statistical approach is needed so that one can have an optimum threshold (value of the sensor output) for decision making later to predict whether the target is present or absent on the basis of a single observation. In another instance, the efficacy of a new medical screening device may need to be tested. A number of subjects (having no illness and having illness) are recruited and screening is administered with the goal determining an optimal threshold to predict (on the basis of a single screening) whether the subject suffers from the illness or not. [20][21][22][23][24] In both cases, the ROC quantified through the metric, AUC is used to establish the performance of the screening device or the sensor. 5 The range of the AUC lies between 0.5 and 1, the ideal sensor with no errors in prediction corresponding to the value of AUC of unity. In practical cases, AUC values never reach unity and there is a need to know the statistics of AUC to access the efficacy. Very few formulaic tools are available to examine the statistics of the AUC to draw inference on its variability (variance) and its 95% confidence level. 4 If there is a way to undertake the experiment multiple times, we may be able to study the inferential statistics. Because of difficulties in recruiting large number of subjects available for screening or the high costs of setting up machine vision testing multiple times, we need to explore other ways to study the efficacy of the devices and sensors.
A direct way to reproduce the experimental results is through the use of resampling. Bootstrapping is a simple technique allowing empirical regeneration of samples of the outcomes of the experiment through continual resampling with replacement regardless of the underlying statistics of the data. 7 While resampling is relatively simple, its application to the ROC is not straightforward because of the existence of two cohorts of data. The two cohorts of data (one from the target present and the other from the target absent) also offer two ways of resampling. [25][26][27][28] One is to resample the composite data from the two cohorts constituting simple bootstrapping while retaining their labels or tags (target absent and target present) or resample the individual cohorts separately constituting stratified bootstrapping. 7,16 These two forms of bootstrapping are nonparametric because we are not estimating any specific parameters associated with the data. If we know the underlying statistics of the two cohorts, we may generate multiple sets of samples satisfying the underlying statistics. In other words, we can reproduce the values of data through random number simulations. The creation of datasets based on underlying statistics constitutes the technique of parametric bootstrapping even though no resampling is used. 7,21,22 Parametric bootstrapping may also address one of the issues associated with traditional bootstrapping, namely the existence of nonindependent samples arising from sampling with replacement.
A number of publications offer insight into bootstrapping. 7,9,11 But, very few offer a detailed step-by-step approach that can be followed by an undergraduate student taking the first course in probability. As the students are starting to understand the concepts of data analytics through examples and exercises dealing with terms such as confusion matrices, probabilities of false alarm and miss, ROC etc. and exposure to another new methodology requires a careful and detailed approach. The implementation of bootstrapping and the nuanced differences among the three forms of bootstrapping must be undertaken at a very basic level making it possible for them to appreciate its practical implications. The ideal time for demonstrating this to the students may be when they have been exposed a number of data analytic applications. Students were familiar with data analytics because they were required to solve one homework problem in Matlab (www.mathworks.com) every week with a unique dataset for every student (besides a set of common problems for the class). Students had already completed exercises on positive predictive values, confusion matrix, AUC, chi square testing involving multiple densities to determine the best fit, maximum likelihood estimation of parameters of densities, and statistical analysis of improvement in performance achieved through signal processing algorithms, namely arithmetic mean, geometric mean, maximum. 2,3,29 The topic of bootstrapping was introduced after students were exposed to mathematical statistics (marginal, conditional, joint probabilities, and Bayes' rule), one and two random variables, parametric hypothesis testing (chi square tests), ROC curves, etc.
At Drexel University, every quarter consists of 10 weeks of classes followed by examinations during the 11th week. The engineering probability course is offered every quarter (required course for students pursuing baccalaureate degrees in electrical engineering as well as computer engineering) as a 4-credit course with 3 hours of lecture followed by 1 hour of recitation every week. For the recitation sessions, the class is split into smaller sections (less than 30 students). Lectures and recitations (three different nonoverlapping sessions) were covered by the author while teaching assistants were responsible for grading the homework submissions. For every topic covered in data analytics, demos were created and students were provided with datasets ahead of time so that they could follow the procedure during the lecture. 2,3 The demo created is described next. It was implemented in Matlab (version 2019a).

CREATION OF THE DEMOS AND RESULTS
A simple model of a machine vision sensor is used in the demo with samples of data collected with a target present in the field of view of a sensor and target absent in the field of view. The dataset used in the demo is shown in Table 1. It consisted of N 0 = 70 samples (target absent: hypothesis H 0 ) and N 1 = 60 samples (target present: hypothesis H 1 ) of the data collected.
The generation of this dataset is described elsewhere. 2 The datasets are merged into a single column, with the top 70 values belonging to hypothesis H 0 and the rest belonging to hypothesis H 1 . This step is followed by concatenating a second column of containing the labels with '0' for H 0 and '1' for H 1 . This means that we have a new matrix, [data, labels] of size [N × 2], the first column of the data (values) and the second column with the corresponding labels. The total number of samples is N = N 0 + N 1 . We can examine the ROC of the machine vision sensor to characterize its performance in terms of the AUC. With the results from a single experiment, we need to use techniques such as bootstrapping that implies sampling of the data with replacement to properly understand and interpret the statistics of AUC. 7 Bootstrapping thus allows replication of experiments. In the absence of the two cohorts (target absent and target present) of data, bootstrapping is simple and straightforward with sampling the data with replacement and a single boot sample (or boot set) is created by sampling N times if N is the number of original samples of a single cohort. By repeating the resampling process M times, we can get M boot sets (each being a matrix of size [N × 1] that may be used to study the characteristics of the original cohort, for example, the statistics of the mean. Students had already seen similar analysis with a single cohort of data in lecture and completed an assignment to estimate the statistics of the mean of a measurement. But, the machine vision system has two cohorts of data, each distinctly identified by two labels. This means that the simple resampling of the composite or pooled data from two cohorts set is not sufficient. Bootstrapping of the machine vision data is undertaken on the basis of rows of the [N × 2] matrix. This means that resampling results in the selection of a row of data (value) and the corresponding label. The sampling with replacement is undertaken M times creating M boot sets, each resulting in a matrix of size [N × 2]. It can easily be understood that this sampling may result in more than the original sample sizes of cohort H 1 or H 0 because resampling is nothing but choice of a number between 1 and N (repetition allowed). Resampling may be done in Matlab directly using the command randsample(.). It can also be done using a uniform random number generator as In Equation (1), rand (1,N) generates N uniform random number in [0,1] and ceil(.) produces the integer value (maximum) and − → k is an [N × 1] vector of integers between 1 and N (130 in the present case) with repetitions. Each of these M boot sets is used to obtain the AUC providing M boot samples of the AUC. Indeed, this means that we expect to get the statistics of the AUC from the 5000 boot samples of the AUC.
One of the issues raised with regards to bootstrapping described above (also identified as the simple or direct bootstrapping) is the fact that the relative counts of the two cohorts vary with every bootstrap set as explained in the previous paragraph because resampling is undertaken on the [N × 2] matrix. One way to mitigate the variation in relative counts is to perform stratified bootstrapping. This means that bootstrapping is done for each cohort. The resampling is done first on the [N 0 × 2] matrix and then on the [N 1 × 2] matrix. This means that we have two vectors, In Equation (2), − → k 0 is an [N 0 , 1] vector, with integers between 1 and N 0 and − → k 1 is an [N 1 , 1] vector, with integers between 1 and N 1 . With each cohort, M = 5000 boot samples (sets) are created. These sets result in M samples of the AUC are used for the analysis just as in the case of simple bootstrapping. It is clear that with stratified bootstrapping, the number of samples of target absent cohort will always be N 0 and the other cohort will always be N 1 , identical in each case to the original dataset.
While the simple and stratified bootstrapping presented above do not require any knowledge of the underlying statistics of the data, we may explore another option if the statistics of the two cohorts are available. This is accomplished in two steps. First a goodness fit test is conducted on each cohort. 3 Students had already undertaken parametric hypothesis tests as a part of the homework assignments. Since both sets of datasets represented either power or amplitude values collected through the sensor, the hypothesis tests were conducted to see the closest density fit to the data (significance level of 5%) among Rayleigh, Rician, gamma, Nakagami, and Weibull densities. 3 Once the density fits are obtained, M sets of each cohort (target absent and target present) were created and in each case, ROC analysis was carried out providing M samples of AUC. This constitutes parametric bootstrapping even though the M sets were not obtained through traditional bootstrapping modality. F I G U R E 1 Histograms of the data and the best density fits associated with the data in Table 1 F I G U R E 2 Receiver operating characteristic plots (input data) and bigamma fit. The SD of area under the ROC curve obtained using Equation (6) is given As mentioned above, chi square tests were carried out first on each of the datasets. Figure 1 displays the histograms of the data obtained using the ksdensity(.) command in Matlab and the closest fit based on the chi square test along with the parameters (a and b) of the gamma density, In Equation (3), Γ(.) is the gamma function. 3,15 The parameters were estimated using fitdist(.) and chi square testing was undertaken using chi2gof (.) in Matlab. Figure 2 also displays the test statistic ( 2 T ) and the degrees of freedom associated with the test. 3 Figure 2 shows the ROC curves for the data and the ROC curve obtained from the theoretical density fits (in this case, both fits are gamma). For any arbitrary threshold x T , the probability of false alarm (P F ), and probability of detection (P D ) are F I G U R E 3 Summary results on bootstrapping with the mean ( ), SD ( ), and 95% confidence interval of the mean of area under the ROC curve (5000 boot samples) The theoretical ROC plot is obtained from calculating the probabilities in Equation (4) by varying the threshold x T from 0 to ∞. The AUC can be obtained either using trapz(.) or polyarea(.) commands in Matlab. While AUC may be obtained using perfcurve(.) in Matlab, a simple script was written to accomplish this in the demo. One can see that the two curves fit well with AUC values very close. Even though we only have a single set of data, it is possible to estimate the SD of the AUC obtained from the data using the formula in literature 4 as In Equation (6), The value of the SD is also shown in Figure 2. Figure 3 shows the summary results of bootstrapping. Top portion displays the histogram of the boot samples of AUC (with a Gaussian fit) for the case of simple bootstrapping. The mean ( ), SD ( ), and the 95% confidence interval are also given. The middle portion displays the results for the case of stratified bootstrapping. The characteristics of both plots (simple and stratified) appear to be matching.
The variation in AUC values with both simple and stratified bootstrapping appears to be in the same range as reflected in the matching statistics in both cases (top two displays). The results of parametric bootstrapping are displayed on the bottom. The statistics are not much different from those of the simple bootstrapping (top left) or stratified bootstrapping (top right).
To determine whether the data sizes influence the results of bootstrapping, the procedure was repeated for another set (this was also provided to the students). The dataset is shown in Table 2.
The dataset consists of 40 samples of target absent and 30 samples of target present. The best fits were Nakagami and Rayleigh as seen in Figure 4. The two densities are

F I G U R E 4
Histograms of the data and the best density fits associated with the data in Table 2 Equation (9) is the Nakagami density with the Nakagami parameter m and second moment of Ω. Equation (10) is the Rayleigh density with the parameter b. 3 Figure 5 shows three ROC plots. In addition to the ROC from the data and the best fit, Figure 6 shows the ROC curve with a bigamma fit by treating both sets as gamma distributed. 24 The lower sample size clearly leads to slightly different AUC values for each of the three cases, a departure from the results in Figure 2. Figure 6 shows the results of bootstrapping with the dataset in Table 2. The problems with parametric bootstrapping (more appropriately random number simulations) are seen from the departure of the AUC compared to the simple and F I G U R E 5 Receiver operating characteristic plots (input data), bigamma fit, and the exact fit. The SD of area under the ROC curve obtained using Equation (6) is given F I G U R E 6 Summary results on bootstrapping with the mean ( ), SD ( ), and 95% confidence interval of the mean of area under the ROC curve (5000 boot samples) stratified bootstrapping. The parametric bootstrapping will be influenced the density fits while both simple and stratified techniques seem to be immune from it.

DISCUSSION AND CONCLUSIONS
A demo was created to familiarize students with applications of probability and statistics to data analytics. The goal was to expand the scope of the traditional course in engineering probability to provide insight into machine vision applications to gain a better understanding of the current trends. While exploring data analytics is important, the association between it and the conceptual topics in statistics should not be ignored. The pedagogic and didactic strengths of demo described in this manuscript are that it offers a means to combine multiple topics of interest generally covered in a statistics course on probability and random variables to the analysis and interpretation of data. The concepts of hypothesis testing, probability density, and distribution functions of random variables, parameter estimation, probabilities of false alarm and miss based on models are reinforced in the demo. Bootstrapping, more specifically, parametric bootstrapping bridges the gap between topics in probability and applications to machine vision and machine learning. The use of bootstrapping as a computational tool allows us to replicate the experiments. Bootstrapping also makes it possible to evaluate the efficacy of the machine vision system quantified through the AUC. Thus, the panoply of the topics encompassed in this demo allows the students to understand the importance of the topics covered in the lecture to practical aspects in engineering. In addition, extensive use of computational approaches in the demo provides a window into importance of software packages and computational tools in engineering education. Together with other efforts reported by the author and the students' responses, 2,3,25 all the demos including the one described here, made the course relevant to the current topics in engineering applications to industry and medicine. Starting with fall 2019 to 2020, the course is being renamed as Probability and Data Analytics for Engineers, taking into account the importance of data analytics in probability as evident from the demos and exercise problems introduced in the past 2 years. During the upcoming academic year, author also plans to expand the demos to include tests such as z tests, t tests, and Wilcoxon rank sum tests 15 to expose the students to other practical aspects of engineering probability.

CONFLICT OF INTEREST
Author has no conflicts of interest relevant to this article.