Methods
Overview of the approach
Previous approaches to predict the nesting behaviour of birds based on
tracking data used either simple decision trees and thresholds (Picardi
et al. 2020, Schreven et al. 2021, Ozsanlav-Harris et al. 2022), or
advanced artificial intelligence algorithms for daily status prediction
(Overton et al. 2022). We opted for a user-friendly approach that
employs easily accessible machine learning algorithms (Pichler and
Hartig 2023) to predict the occurrence of hierarchically nested breeding
behaviour at a seasonal timescale, rather than the prediction of
breeding status at a shorter (daily or weekly) timescale (Picardi et al.
2020, Overton et al. 2022, Eisaguirre et al. 2023). In particular, we
aimed to predict whether territorial ranging behaviour occurs (e.g.
whether the bird occupies a territory or ‘home range’), whether a
nesting attempt is made (breeding propensity), and whether the nesting
attempt was successful based on GPS tracking data.
We developed ‘NestTool’ - a tool that first identifies a plausible nest
location as the GPS location with the greatest amount of time spent and
the greatest number of revisits by the focal individual within a
user-specified radius (e.g. 50 m) and time period (breeding season
adjusted for local phenology). The radius and time period can be
specified by the user to ensure that NestTool can be adjusted to
different populations or species and their respective movement behaviour
and phenology. The plausible nest location is then used to calculate
distances to all GPS locations and to summarize median daily distances
from the nest, the number of revisits to the nest radius, the amount of
time spent within the nest radius, and the amount of time spent outside
another user-defined radius indicative of a typical home range (e.g. 1
km; Hötker et al. 2017, Scherler et al. 2023a). We repeat this procedure
for all nocturnal and all diurnal locations and calculate the distance
between the locations where most of the daytime and most of the
nighttime is spent, based on the premise that nesting birds will likely
roost on or very near their nest site, whereas non-nesting birds can
often roost farther away from any diurnal congregation sites (Heiniger
et al. 2020, Aebischer and Scherler 2021). We further calculate the home
range area as the 95% and 99% minimum convex polygon and extract the
last day of the season when the nest was visited. All movement metrics
are calculated for distinct periods of the breeding season, and combined
with some demographic information on individuals (age and sex) in three
successive models. NestTool is available as an R package on GitHub:
https://github.com/Vogelwarte/NestTool.
Model description and
specification
Each of NestTool’s three models aims to predict a binary outcome
(occurrence of home range, nest, and success – yes or no) using a
random forest algorithm, which is a powerful machine-learning algorithm
that can extract useful information from many predictor variables
(Breiman 2001, Prasad et al. 2006, Cutler et al. 2007, Pichler and
Hartig 2023). Because of the large number of potential movement metrics
that may distinguish between breeding behaviours and outcomes, and the
fact that many of these variables may be correlated, a random forest is
a suitable algorithm that has been used to distinguish other movement
behaviours (Thiebault et al. 2018, Carneiro et al. 2022). A random
forest is based on the aggregation of hundreds of individual
classification trees, the base learner used to infer behavioural
patterns in previously published approaches (Shamoun-Baranes et al.
2012, Picardi et al. 2020), and conceptually similar to the stochastic
gradient boosting employed by approaches predicting daily behavioural
states (Overton et al. 2022).
Like other machine learning models, random forests can be tuned to
improve predictive accuracy by adjusting parameters that control how the
model distinguishes between outcomes. Because the ideal value of these
parameters to achieve the most accurate prediction may vary depending on
the composition of the training data set, our tool iterates over two
important parameters, namely the number of individual classification
trees (from 500 to 5000) and the number of different variables tried at
each node (from 1 to 15). For each combination of parameters, a random
forest model is fitted using the R package ‘ranger’ (Wright and Ziegler
2017), and the prediction error of internally cross-validated
(out-of-bag) test data is extracted from each fitted model. The
parameters resulting in the lowest prediction error are then used to fit
the main model and extract the most important predictor variables using
a permutation procedure (Strobl et al. 2007, Grömping 2009, Janitza et
al. 2013). The available data are randomly divided and only 70% of the
data are used to train the main model, while 30% are retained to test
the accuracy of predictions of this model. We report the accuracy of
predictions referring to the training and test data separately.
Description of the workflow of
analysis
The analysis requires two input files which contain the raw tracking
data and life history information per individual and season,
respectively. We first extract movement metrics for each individual and
season, which are calculated based on various user-specified input
values to curtail the data to the spatial and temporal domain of
interest, and to specify certain phases of the breeding season. The
different phases allow for movement behaviour of breeding birds to
change between incubation and chick rearing (Pfeiffer and Meyburg 2015,
Aebischer and Scherler 2021, López-López et al. 2021, Spatz et al.
2022), and the input values (dates) are therefore deliberately required
to accommodate differences among populations and species.
Similarly, the spatial scale over which recursions to a nest location
and presence and absence times will be evaluated must be specified by
the user. These parameters therefore permit different scales for
different species or populations (Aebischer and Scherler 2021, Spatz et
al. 2022), and for differences in locational precision of the tracking
data. However, the absolute value of the movement metrics is not
important for the predictive accuracy of the model, because only the
difference between birds that are nesting (successfully) and those that
are not nesting (or not successfully) is important for the
tool to work accurately.
For each individual season a total of 42 movement metrics that reflect
home range areas, distances, residence times and revisitation patterns
of the respective individual over the user-specified phases and radii of
the breeding season are calculated (Table S1). Because the absolute
distances moved by breeding individuals can vary substantially among red
kite populations (Aebischer and Scherler 2021, García-Macía et al.
2022b, Spatz et al. 2022), all movement metrics are scaled to the
maximum ranges and distances of each individual in each season, and thus
are comparable between individuals and populations on a relative scale.
This scaling is achieved by dividing the distances and areas calculated
for each individual, year and brood phase by the overall distances and
areas for that individual and year, calculated with all data over the
entire breeding period.
Once the movement metrics have been calculated, there are three
sequential steps to identify and predict breeding behaviours, namely (1)
whether individuals have a home range (or a ‘territory’), which is
typically shown by frequent returns to and long residency times at a
single location; (2) whether birds initiate a nesting attempt, which
results in similar movements to having a territory, but with even more
stringent attachment to a single location and very short distances
between the locations most used during day and night; and (3) whether
the nesting attempt was actually successful, which results in frequent
returns to the nest throughout the entire breeding season and regular
presence for long periods of time, especially during the early
chick-rearing period (Picardi et al. 2020). NestTool provides two
functions for each of those steps: one to train a model using data where
the actual behaviour of the tracked birds was observed in the field and
is therefore known, and a second function to use that model to predict
the behaviour for tracked birds (which can include birds for which the
actual behaviour was not known). If no training data with known
behaviours exist for a population, the models contained in the tool can
be used to predict breeding behaviours for any data set that meets the
minimum data requirements – but we caution that these predictions rely
explicitly on the assumption that the patterns observed in red kites
from Switzerland are transferrable to the population of interest.
The steps above can classify recruitment, breeding propensity and
success based on movement metrics, but there will always be individual
movement behaviours that have very high uncertainty in classification.
NestTool, therefore, provides not only the functions above to predict
probabilities of each reproductive behaviour, but also a graphical user
interface (based on R shiny) to visually inspect an individual’s
movements over a season to manually determine behaviours when the
predicted probabilities fall below a user-specified threshold. To aid in
this manual annotation, the user is presented with a map of the tracking
data locations, and graphs that show the development of five movement
metrics over time (Fig. S1). The manual annotation of breeding
propensity and success is then saved and can be combined with the
automated classification to extract important demographic parameters for
the tracked population solely from the tracking data.
Estimating reproductive parameters with red kite tracking
data from
Switzerland
We tracked 324 red kites since 2013 using harness-attached solar powered
GPS tags of different manufacturers (Aebischer and Scherler 2021,
Orgeret et al. 2023, Scherler et al. 2023b) that have been shown to
cause no adverse effects on survival and breeding performance on birds
(Peniche et al. 2011, Sergio et al. 2015, Longarini et al. 2023).
We first extracted data for the core breeding season of red kites in
Switzerland and discarded all tracking locations south of 45°N and west
of 4°E, and before 11 March and after 24 June, which are the earliest
egg-laying and earliest fledging dates in our study population
(Aebischer and Scherler 2021, Scherler et al. 2023a). We then resampled
all tracking data to a 1-hourly interval, matching the temporal data
resolution that is generally available for most datasets in Europe
(Maciorowski et al. 2019, García-Macía et al. 2022b, Literák et al.
2022). Because the identification of reproductive behaviour requires
regular data throughout the breeding season, we discarded all individual
seasons (n = 336; 33%) that did not provide at least 800 locations (8
locations per day) in the time period specified above (to overcome the
limitations experienced by van der Wal et al. 2015).
For each individual and season, we used the functions described above to
extract all the explanatory variables from the filtered tracking data.
We used a 50 m radius around the most frequently visited day and night
locations to identify potential nests, and a 2000 m radius to quantify
the time spent outside the typical home range (Scherler et al. 2023a).
We separated the breeding season of red kites in Switzerland into 5
distinct phases during which different movement behaviours could be
expected: the settlement phase when birds arrive in their territory and
prepare for nesting (11 March – 15 April), the first incubation phase
(15 – 30 April), the second incubation phase (30 April – 15 May), the
first (15 May to 30 May) and second (1 June to 24 June) chick rearing
phases (Aebischer and Scherler 2021, Spatz et al. 2022). The phase when
we expected very frequent returns to the nest during the presence of
small chicks was specified from 1 May to 1 June (Scherler et al. 2023a).
Testing the tool with red kite tracking data from
Germany
To test whether the models developed for red kites in Switzerland could
accurately predict the reproductive behaviour of individuals of
different populations in Germany, we used data from 99 individuals
tracked between 2010 and 2023 in eastern Germany (Pfeiffer and Meyburg
2015, Nicolai et al. 2017, Pfeiffer and Meyburg 2022). We discarded all
tracking locations south of 50°N and west of 9°E, and adjusted the
temporal definitions of the breeding season to the local phenology as
follows: start of season 15 March, start of incubation 5 April, start of
hatching 15 May, end of brooding period 15 June, first fledging 10 July.
Because home ranges of red kites in Germany are larger than in
Switzerland (Aebischer and Scherler 2021, Spatz et al. 2022), we also
adjusted the definitions of the ‘nest radius’ to 150 m and of the ‘home
range radius’ to 5000 m. We extracted all movement metrics based on
those spatial and temporal definitions, and predicted reproductive
behaviour for red kites in Germany using the models trained with data
from red kites in Switzerland. For breeding success, we also explored
whether training models with data from Germany resulted in improved
accuracy.