Filtering heart rates using data densities: The boxfilter R package

Over the past decades, there has been a growing interest in long‐term heart rate records, especially from free‐living animals. Largely, this increase is because most of the metabolic activity of tissues is based on oxygen delivery by the heart. Therefore, heart rate has served as a proxy for energy expenditure in animals. However, heart rates or other physiological variables recorded in humans and animals using loggers often contain noise. False measurements are sometimes eliminated by hand or by filters that reject variables based on the shape or frequency of the signal. Occasionally, outliers are rejected because they occur a long distance from genuine data. We introduce an R package, boxfilter, which enables users to eliminate noise based on counting the number of close neighbours inside a gliding window. Depending on the cut‐off value chosen, a focal point with a low proportion of neighbours will be rejected as noise. All three parameters, namely window width and height, as well as the cut‐off value, can be computed automatically. The choice of the clip‐off value beyond which data points are discarded is crucial. The package boxfilter cannot, of course, solve problems caused by completely erroneous measurements. Like the human eye, this filter prefers points that are part of a pattern, such as a dense band, and rejects isolated values. The boxfilter may also be applied to other measures than heart rate that do not change instantaneously, such as body temperature, blood pressure or sleep parameters.

fact that most metabolic activity of tissues is based on oxygen delivery by the heart.Therefore, some correlation between heart rate and energy expenditure is to be expected (Green, 2011).Records of heart rates thus provide information on the temporal change of energy expenditure in animals, even in free-living individuals, that would be impossible to obtain otherwise.However, especially for large mammals, recording of heart rates is not without problems.In large animals capturing the electrocardiogram (ECG) signals typically requires a certain distance between electrodes, which in turn makes it necessary to use long subcutaneous wires, with the accompanying surgical complications and risks (Arnold et al., 2006).Therefore, particularly for large ruminants a minimally invasive logger system based on the acoustic recording of heart beats has been developed (Signer et al., 2010).This system rests in the rumen and does not require surgical implantation.In smaller animals, loggers with integrated electrodes in a single plate located close to the heart may be used (Bjarnason et al., 2019).Heart rates are often recorded using ECG loggers, acoustic loggers or by plethysmography (PPG sensors, e.g.Delaunois et al., 2009;Signer et al., 2010;Ruf et al., 2021).
Heart rates and other signals recorded in humans and animals often contain noise.Noise is often generated by electrical devices or, especially in free-living animals, by movement within the body.
Noise can be removed to various degrees by hand from visually inspecting ECGs, which still seems to be the gold standard for bpm determination (e.g.Hochstadt et al., 2019) or by different filters.
Sometimes, noise is eliminated by comparing actual and expected values (Trondrud et al., 2021).However, this method, while often justified by the data and correct, seems cumbersome and timeconsuming.Automatic filters are typically based on specific signal characteristics or on limiting the signal to certain frequencies.
For example, ECG filters may require a certain shape of the electrical wave (Bjarnason et al., 2019;Syeda-Mahmood et al., 2007).
Alternatively, valid measurements may be restricted to certain frequencies by hi-, low-or bandpass filters (e.g.Kaufmann et al., 2011).
Instead of these filters, or prior to their application, heart rate data are often subjected to outlier detection.A typical tool for this purpose is the Hampel filter, which rejects all values with a distance that deviates from the focal point more than an arbitrary threshold (Hampel, 1974;van Gent et al., 2019).
A major step in the use of filters was the introduction of a quality index by one of the heart rate logger manufacturers, Star-Oddi, Iceland.This index (Bjarnason et al., 2019) was based on ECG signal quality and each recorded value is coded as 0 (good quality) to 3 (hardly useable).It does not, however, consider temporal changes of physiological measurements.For example, fast changes in heart rate are ignored and sometimes even "high quality" signals should be removed.Thus, this index does not always lead to the desired result and may in fact generate false conclusions.
An alternative presented here is the boxfilter.This filter was designed to assign each data point a weight based on the number of neighbouring points.This was implemented by looping through all data points and counting neighbours within a rectangular frame, adjustable in width and height, and centred on a focal data point.It weighs all data points of a given data set according to the proportion of neighbours and retains only those points with a high proportion of surrounding similar points.A previous version of this procedure, without software reference, however, was briefly mentioned before (Signer et al., 2010).
Generally, the identification of outliers is based on the rejection of values that lie a certain factor outside the accepted variability of measurements, as defined, for instance, by their standard deviation or their interquartile range.Automatic filters of heart rate records most frequently rely on variants of bandpass filters either with or without dynamic adaptation of their parameters (Kaufmann et al., 2011;Li et al., 2008;Motin et al., 2019).We are not aware of another algorithm, like the present, that relies on the similarity of measurements in a focal area.It should be especially attractive to researchers because the principle is intuitive, it is easy to implement, computationally fast, and, as an R package, it is readily available to all biomedical scientists.Not unlike the human eye, this filter prefers points within a dense band and rejects isolated values.The default values for the window size and the cut-off value are computed.In particular, the choice of the cut-off value beyond which data points are discarded is critical.
Because the filter is based on the number of neighbouring points, it rejects sudden abrupt changes (and back) that would be physiologically unlikely.It accepts only values reached gradually.The filter is also explained in Figure 1.
Of course, boxfilter has its limitations.If the data is dominated by sudden abrupt changes in physiology or behaviour, boxfilter is uncalled-for, as there are almost no neighbours.To avoid its use in this case, we use an obligatory data inspection (see below).
In the following sections, we describe the implementation of this filter.We do so with simple case studies from wild boar (Sus scrofa) and Alpine ibex (Capra ibex).An example of humans (the last example in boxclip(), the boxfilter main function) is used to illustrate applications of this filter for variables other than the heart rate.

| THE box f ilter R PACK AG E
The package requires a standard installation of R (R Core Team, 2019, version >3.5) and is available on the Comprehensive R Archive Network.To produce graphs, it imports the packages gg-plot2 and gridExtra (Auguie, 2017;Wickham, 2016) and Figure 4, shown here, was produced by the boxfilter package.After the one-time installation (install.packages("boxfilter")), and its activation by library (boxfilter) it can be called using the main boxclip() function.For a detailed description, see the help file or the vignette of boxfilter.At a minimum, the package requires a vector of measurements, that is heart rate (or another physiological measure).It may also be called with an additional time variable, which may be a date or a time.Data may be stored in a comma separated file (.csv) and consist of a time (x), a heart rate (y), and possibly a quality index (QI).They can be read as this (assuming HRS.csv is in the present directory): dat = read.csv("HRS.csv")upon which the data will be stored in "dat".

| Sampling
Boxfilter is designed to conserve gradual changes of the variable measured, and to reject abrupt changes.Plots of their autocorrelation are not useful in this context, because even data without any significant autocorrelation may change gradually enough to result in the conservation of points.
The speed of measurement changes and may be gradual when the measurements are fast or abrupt due to a low sampling rate.It also depends on the body size and shape of the animal studied, and other factors related to the variable.If the time variable was made an input, an optimal window size for the box filter could possibly be calculated only if the allometry of change in each variable, such as the temporal change of heart rate, was known, which is not the case.Therefore, before filtering, we provide a graph of the original data versus time, as well as a histogram of its temporal change.Since the direction of the changes is irrelevant here, it depicts the absolute change.In addition, the average sampling interval is given numerically.The user may view these graphs and numbers to decide whether filtering should proceed.Should extreme values be deleted, like for very short times in bed suggested by an automatic sleep recorder (last example in in boxclip(), the boxfilter main function), or maintained in the case of short high heart rates?We do not see how this can be judged automatically.

| White noise
The heart rates of an adult wild boar female recorded over 1 year may include measurements of different quality (coloured points) as provided by the logger's manufacturer (Figure 2a).Many heart rates even with the best manufacturer-quality-index seem still questionable (Figure 2b).Points after the application of boxfilter are shown in Figure 2c.Quality is displayed in the graph of original points (Figure 2a) to illustrate that, high values of heart rate, as they seem to occur in summer, are very unreliable.Not surprisingly, then, in filtered points, with filtering based on their density (Figure 2c), there is even a drop in heart rates in summer and a peak in winter.This seems typical for wild boar (Ruf et al., 2021).The filter parameters (rectangle width and height, as well as the cut-off value) were automatically computed.The "clipit" value of 0.2 corresponds to the lowest point in the histogram of neighbouring data, which usually best separates noise from the true measurement.In this case, the boxfilter even discards points that were considered "good quality" by the manufacturer.

F I G U R E 1
Example explaining the boxfilter.A rectangular window (the box) is moved along all datapoints.Each focal point (green) in the middle of the box is characterized by neighbours inside the box (red) and outside (dark blue).In the example, the proportion of neighbours inside the box is 11 out of a total of 17 (proportion ~ 0.65).Depending on the cut-off value, a focal point with that proportion of neighbours will be accepted or rejected as noise.Given this principle, boundary points are almost always eliminated.
The main function of the library boxfilter, that is, boxclip(), also accepts "width" as a parameter.This is half the width of a rectangle surrounding the focal point.It is given in number of points before or after the value in question.By default, the width is calculated from twice the height to result in a quadratic window with even side length.
The parameter "height" is the height of the box in units of signal, for The boxclip() function also accepts the parameter "QI" which is a quality index.This index is only used to display the original data in different shades of blue; it is ignored during actual signal filtering.
The parameters "miny" and "maxy" represent the absolute minimum and maximum expected signal.They may be set independently.If "plotit" is set to true, the results will be shown in a graph.When "histo" is true, a histogram of the distribution of the proportion of neighbours will be displayed in addition.According to the histogram (the leftmost trough) the best separation between real heart rates and noise will be achieved with a "clipit" value around 0.10.

| Clustered noise
It may seem that the boxfilter is only suited to remove the white noise surrounding the data.This is not the case.Sometimes the genuine signal may contain "bands" of noise.For example, a heart rate logger based on the acoustics of the heart beat in ruminants (Signer et al., 2010) occasionally-in some animals-picked up noise corresponding to approximately 110 bpm.An example of year-round record in an Alpine ibex is shown in Figure 3.
The obviously false bands are removed because they are not surrounded by a genuine signal.If one attempted to remove false data with a bandpass filter, it would inevitably fail.This is because the false points are in the same frequency range as the genuine points.
Boxfilter is, of course, mainly suited to identify the major tendency in a data set.The automatic computation of "clipit" is merely a suggestion that may be far from a sensible value.It may be found by trial and error and by employing physiological constraints.For instance, for the heart rates in Figure 3, the boxclip() suggests a smaller "clipit" value.Using this cut-off would leave a cluster of heart rates around 110 bpm in January (see Figure 3a).Raising the cut-off value to 0.65 removes these points (Figure 3b).

| Choosing parameters
The size of the rectangle (Figure 1) is determined by "width" and "height".Height, by default, is computed as a quarter of the mean to ensure that the window size increases with the size of the measurement.The default width is twice the height.All three parameters influence the number of points rejected.Larger windows (large parameters width and height) lead to a higher proportion of neighbours, hence more values are accepted.Based on previous trials, we recommend keeping the window size constant (for example, at default values) and to vary only the "clipit" value.For example, it seems impossible to completely remove the false band in the ibex data by changing only window size but changing "clipit" is successful.
Generally, the crucial issue with using boxfilter is finding the best value of "clipit" the cut-off constant.To this end, the clipview() function may be helpful.It gives the result of using a sequence of four different

| SOURCE S OF FAL S E VALUE S
Our filtering method is, of course, unable to solve problems with the signal generation.Each of the methods of creating an ECG or PPG signal by HR sensors may have its own problems and the artefacts generated and the errors calculated can be quite different.
For example, in human wearable devices, the so-called "cadence lock" can occur where a motion artefact is generated due to running movement that creates a false HR calculation (Jarchi & Casson, 2017).This problem can only be detected if body movement is recorded simultaneously, and these data are clearly beyond the present scope.
Another artefact that can be present in the ECG data is powerline interference, which algorithms can output as a plausible stable HR value even if there is no signal recorded at all.A possible example are our acoustic measurements in ibex, which correspond to approximately 110 bpm (Figure 3) and potentially overestimate HR during summer months.In this case, we consider it very unlikely, however, that an artefact is involved, because we found the same high HR, acoustically determined, during summer in animals without possible interference (Signer et al., 2011).However, there are reasons, such as powerline interference, for wrong HR measurements that cannot be fixed by any filter afterwards.

| CON CLUS IONS
We provide a new R package to eliminate noise from real data in measurements, for example, of animal heart rates.The data do not need to have peaks and valleys.The principle behind the procedure is close to the Hampel filter (Hampel, 1974) that is based on identifying the distance of each point from some of its neighbours.Strictly speaking, the Hampel filter computes the mean distance of a point to typically 6 surrounding data and rejects it as an outlier if it is more than k (usually 3; Löning et al., 2019) values away.There is no logical reason, however, to restrict this or similar methods to the elimination of a few outliers.Implementing this filter that actually determines many distances is computationally expensive, however, which is why it is not a default routine in heart rate analysis (van Gent et al., 2019).
We found no advantage of computing distances (only vertically or in both x and y directions) over the much faster count of points inside a gliding window, as suggested here.Currently, the R version of boxclip() takes less than 0.6 s for 10,000 points of time and on a 3.2 GHz 6-Core Intel Core i7.
Of course, the method may eliminate even points that, although unlikely, could represent true sudden changes in the physiological variable measured.This is a compromise that comes with the reliance on slowly changing patterns in the data.The fact that this method can still be superior to the use of quality indices is illustrated in Figure 2. Evidently, the boxfilter may also be applied to other measures than heart rate that do not change instantaneously, such as temperature or blood pressure.

AUTH O R CO NTR I B UTI O N S
Thomas Ruf wrote the R package.Claudio Signer and Walter Arnold collected the data on ibex.Sebastian G. Vetter and Claudia Bieber collected the heart rates of wild boar.All authors tested the

F
Yearly heart rate of a wild boar.(a) Original points.The scale on the right depicts an individual quality index provided by Star-Oddi (0-3).Black points are best quality.Inferior quality measurements are shown in colour (red = 1, green = 2; blue = 3).(b) Quality index filtered data.(c) Filtered data using package boxfilter with "clipit" = 0.2.F I G U R E 3 Yearly heart rate of an Alpine ibex.(a) Original points before filtering; (b) filtered points using boxfilter, clipit = 0.65.example, in bpm if the signal is a heart rate.Importantly, the parameter "clipit" defines a cut-off value of neighbour proportions below which all signals will be rejected as noise.The default value is calculated as the leftmost (>0) drop in a neighbour proportion histogram.

F
Result of a call to clipview(x, y, clipit = c (0.05, 0.10, 0.15, 0.20)) within package boxfilter, where y are heart rates of a female wild boar recorded for a month.The upper left panel shows the original data points in dark blue.The upper right graph is a logarithmic histogram of the neighbour proportions.The lower panels show the results of calling boxclip() with four different "clipit" values (0.05-0.2)

"
Figure 4 also shows a histogram of the proportion of neighbours surrounding a focal point (upper right).The histogram may be helpful in determining "clipit".In an automatic calculation, the value of "clipit" corresponds to the first (leftmost) drop >0 in the histogram.