Statistical modelling:
Statistical analyses were undertaken using R version 4.1.2 (R Core Team,
2022). Prior to statistical modelling, data exploration was conducted
following Zuur et al., (2010). Autocorrelation was observed in the data
using ACF plots with itasdug package (van Rij et al., 2015).
Generalised Additive Models (GAMs) were conducted using the functionbam within the mgcv package which is optimised to deal
with large data sets. GAMs were fitted with autoregressive (AR(1))
correlation structure to account for observed autocorrelation, and a
negative binomial error distribution (theta values obtained using
function gam and nb distribution), with logarithmic link
function, to deal with zero-inflation in the data (Wood, 2011; Woodet al., 2015). The rho values for the AR structure (which
control the degree of permitted autocorrelation (Wood, 2017)) were
determined using the itsadug package and ACF plots. The parameter
gamma was set to 1.2 to reduce potential overfitting of splines.
The data were analysed for every hour and the response variables used
were the number of minutes with porpoise detections for each hour (0-60
Detection positive minutes, or DPM) and the number of foraging buzzes
(ICI <10ms) recorded per hour. Explanatory variables included
diel period as a factor and month, temperature, noise, difference to
high tide and tidal range as smooth terms. Circular smoothers were used
for month and difference to high tide. Thin-plate regression splines
with shrinkage were used for the remaining smooth terms which return the
simplest effective spline. Generalized-cross validation and manual knot
selection were used, with chosen values visually selected based on the
trade-off between the overall simplicity of the model and the
explanatory power of smooth graphs. To decide between the appropriate
tidal variable for analysis each were included in the full model and
models compared based on AIC score. Time difference to high tide
resulted in the model with the lowest AIC and was used for further
analysis.
The relatedness between the smooth terms in the model were measured
using the function concurvity, in a similar manner to variance
inflation factors used for Generalised Liner Models (GLM). Relatedness
was measured on a scale of 0-1, with 0 indicating no difference and 1
indicating that terms are identifiable from each other. Concurvity was
not found, so all terms were retained for analysis. Stepwise model
selection was performed where non-significant interactions were dropped
from the model (starting with the least significant) and model
validation repeated. Models were compared using AIC to choose the best
and final model. Model performance was checked using gam.check based on
traditional QQ plot and residual plots (Wood, 2006). Model goodness of
fit was described by deviance explained, and area under the receiver
operator curve (AUC), package caret (Kuhn, 2008). AUC was
calculated by predicting a binomial response variable from the fitted
model and compared to the observed presence/ absence of the variable.
This results in a value ranging from 0-1, with values closer to 1
indicating better model fit (Boyce et al., 2002). Graphical
outputs were produced using the mgcViz package (Fasiolo et
al ., 2018) and ggplot2 (Wickham, 2009).