Using quantile regression and relative entropy to assess the period of anomalous behavior of marine mammals following tagging

Abstract Tagging of animals induces a variable stress response which following release will obscure natural behavior. It is of scientific relevance to establish methods that assess recovery from such behavioral perturbation and generalize well to a broad range of animals, while maintaining model transparency. We propose two methods that allow for subdivision of animals based on covariates, and illustrate their use on N=20 narwhals (Monodon monoceros) and N=4 bowhead whales (Balaena mysticetus), captured and instrumented with Acousonde™ behavioral tags, but with a framework that easily generalizes to other marine animals and sampling units. The narwhals were divided into two groups based on handling time, short (t<58 min) and long (t≥58 min), to measure the effect on recovery. Proxies for energy expenditure (VeDBA) and rapid movement (jerk) were derived from accelerometer data. Diving profiles were characterized using two metrics (target depth and dive duration) derived from depth data. For accelerometer data, recovery was estimated using quantile regression (QR) on the log‐transformed response, whereas depth data were addressed using relative entropy (RE) between hourly distributions of dive duration (partitioned into three target depth ranges) and the long‐term average distribution. Quantile regression was used to address location‐based behavior to accommodate distributional shifts anticipated in aquatic locomotion. For all narwhals, we found fast recovery in the tail of the distribution (<3 h) compared with a variable recovery at the median (∼1–10 h) and with a significant difference between groups separated by handling time. Estimates of bowhead whale recovery times showed fast median recovery (<3 h) and slow recovery at the tail (>6 h), but were affected by substantial uncertainty. For the diving profiles, as characterized by the component pair (target depth, dive duration), the recovery was slower (narwhals‐long: t<16 h; narwhals‐short: t<10 h; bowhead whales: <9 h) and with a difference between narwhals with short vs long handling times. Using simple statistical concepts, we have presented two transparent and general methods for analyzing high‐resolution time series data from marine animals, addressing energy expenditure, activity, and diving behavior, and which allows for comparison between groups of animals based on well‐defined covariates.

In this note, we will show how implementation of the methods "Quantile Regression" and "Jensen-Shannon Divergence" are achieved in the free software R for statistical computing (R Core Team, 2022). Using simulated data, we will then recreate similar figures as those in the article and also show how supplementary figures (not present in the article) are obtained. Finally, we will present all supplementary figures on the actual datasets of the study animals. The "Mean-based Regression" code is available as a script "ReproduceCanada-NW.R" and should be easy to follow. For details on the implementation, see section "Applying mean based regression" in the main article. All datasets as well as original scripts are available here https://github.com/LaReiter/TaggingEffects We use the following R packages: library(ggplot2) ## Grammar of graphics library(reshape2) ## Reshaping data frames library(dplyr) ## dplyr functionality library(data.table) ## data tables and data tables functionality library(readr) ## read csv library(gridExtra) ## graphics library(xtable) ## LaTeX formatting of tables library(splines) ## splines library(grid) ## graphics library(quantreg) ## quantile regression library(segmented) ## segmented regression theme_set(theme_bw()) ## set black and white plots

Gather data into dataframe
We now use the procedures above to generate artificial data for N = 10 individuals, of which half has a categorical covariate with a value A and the other group has value of B. Type A corresponds to quick recovery relative to B. The length of the observations of different individuals are of uneven size (in order to introduce weighting protocol for depth data).

Method 1: Quantile Regression
Here we present the method and R-implemention for Quantile Regression (QR).
There are no distributional assumptions for the response in the QR model, and we do not require homogenity of variance. We do however assume a log-linear relationship between the response and the predictor variables: This linearity assumption can be hard to verify, since we have a cluster of points when z ≡ 1 t+1 is small (when t is large) and only few when z is large (t is small). We can however introduce a slight time lag, and disallow z to be too large (and t too small). Below we focus on the interval between 5 and 50 hours.

Get hourly distributions and hourly Jensen-shannon divergence
We now perform a time loop, and generate the hourly distributions, from which we derive hourly divergence.

Block bootstrapping
We will now walk through the procedure of how block bootstrapping is exploited to produce estimates and confidence bands of recovery times.
Once temporal measures of relative cross entropy has been calculated, denoted J t , we will calculate the 97.5% and 2.5% quantile of {J t | t > t N } and denotes thee α u and α l respectively. We then introduce a map m : R → {−1, 1} fulfilling m(J t ) = 1 if J t > α u or J t < α l and m(J t ) = −1 otherwise. Define now y t = t<t m(J t ), then we perform block bootstrapping on the set {y 1 , ..., y tmax } with block width equal to roughly tmax 1/3 (common rule in litterature), and for each sample we estimate a breakpoint using segmented regression (the package Segmented in R). We create 200 samples, and determine the average breakpoint and calculate the 97.5 and 2.5 quantile to obtain confidence bounds.
The code below, implements the procedure. The plots above show 3 bootstrap samples. They all drop initially, suggesting that the first temporal blocks only produce negative values in the bootstrapping which means that there are no RCE measurements inside the RoR. They rise sporadically, because few measurements falls inside the RoR, and eventually rise steadily as measurements begin to stabilize inside the RoR. The (small) variation in the samples means that segmented regression will either push back or push forward the breakpoint estimate to obtain the best fitting model.