Methods

Overview of the approach

Previous approaches to predict the nesting behaviour of birds based on tracking data used either simple decision trees and thresholds (Picardi et al. 2020, Schreven et al. 2021, Ozsanlav-Harris et al. 2022), or advanced artificial intelligence algorithms for daily status prediction (Overton et al. 2022). We opted for a user-friendly approach that employs easily accessible machine learning algorithms (Pichler and Hartig 2023) to predict the occurrence of hierarchically nested breeding behaviour at a seasonal timescale, rather than the prediction of breeding status at a shorter (daily or weekly) timescale (Picardi et al. 2020, Overton et al. 2022, Eisaguirre et al. 2023). In particular, we aimed to predict whether territorial ranging behaviour occurs (e.g. whether the bird occupies a territory or ‘home range’), whether a nesting attempt is made (breeding propensity), and whether the nesting attempt was successful based on GPS tracking data.
We developed ‘NestTool’ - a tool that first identifies a plausible nest location as the GPS location with the greatest amount of time spent and the greatest number of revisits by the focal individual within a user-specified radius (e.g. 50 m) and time period (breeding season adjusted for local phenology). The radius and time period can be specified by the user to ensure that NestTool can be adjusted to different populations or species and their respective movement behaviour and phenology. The plausible nest location is then used to calculate distances to all GPS locations and to summarize median daily distances from the nest, the number of revisits to the nest radius, the amount of time spent within the nest radius, and the amount of time spent outside another user-defined radius indicative of a typical home range (e.g. 1 km; Hötker et al. 2017, Scherler et al. 2023a). We repeat this procedure for all nocturnal and all diurnal locations and calculate the distance between the locations where most of the daytime and most of the nighttime is spent, based on the premise that nesting birds will likely roost on or very near their nest site, whereas non-nesting birds can often roost farther away from any diurnal congregation sites (Heiniger et al. 2020, Aebischer and Scherler 2021). We further calculate the home range area as the 95% and 99% minimum convex polygon and extract the last day of the season when the nest was visited. All movement metrics are calculated for distinct periods of the breeding season, and combined with some demographic information on individuals (age and sex) in three successive models. NestTool is available as an R package on GitHub: https://github.com/Vogelwarte/NestTool.

Model description and specification

Each of NestTool’s three models aims to predict a binary outcome (occurrence of home range, nest, and success – yes or no) using a random forest algorithm, which is a powerful machine-learning algorithm that can extract useful information from many predictor variables (Breiman 2001, Prasad et al. 2006, Cutler et al. 2007, Pichler and Hartig 2023). Because of the large number of potential movement metrics that may distinguish between breeding behaviours and outcomes, and the fact that many of these variables may be correlated, a random forest is a suitable algorithm that has been used to distinguish other movement behaviours (Thiebault et al. 2018, Carneiro et al. 2022). A random forest is based on the aggregation of hundreds of individual classification trees, the base learner used to infer behavioural patterns in previously published approaches (Shamoun-Baranes et al. 2012, Picardi et al. 2020), and conceptually similar to the stochastic gradient boosting employed by approaches predicting daily behavioural states (Overton et al. 2022).
Like other machine learning models, random forests can be tuned to improve predictive accuracy by adjusting parameters that control how the model distinguishes between outcomes. Because the ideal value of these parameters to achieve the most accurate prediction may vary depending on the composition of the training data set, our tool iterates over two important parameters, namely the number of individual classification trees (from 500 to 5000) and the number of different variables tried at each node (from 1 to 15). For each combination of parameters, a random forest model is fitted using the R package ‘ranger’ (Wright and Ziegler 2017), and the prediction error of internally cross-validated (out-of-bag) test data is extracted from each fitted model. The parameters resulting in the lowest prediction error are then used to fit the main model and extract the most important predictor variables using a permutation procedure (Strobl et al. 2007, Grömping 2009, Janitza et al. 2013). The available data are randomly divided and only 70% of the data are used to train the main model, while 30% are retained to test the accuracy of predictions of this model. We report the accuracy of predictions referring to the training and test data separately.

Description of the workflow of analysis

The analysis requires two input files which contain the raw tracking data and life history information per individual and season, respectively. We first extract movement metrics for each individual and season, which are calculated based on various user-specified input values to curtail the data to the spatial and temporal domain of interest, and to specify certain phases of the breeding season. The different phases allow for movement behaviour of breeding birds to change between incubation and chick rearing (Pfeiffer and Meyburg 2015, Aebischer and Scherler 2021, López-López et al. 2021, Spatz et al. 2022), and the input values (dates) are therefore deliberately required to accommodate differences among populations and species.
Similarly, the spatial scale over which recursions to a nest location and presence and absence times will be evaluated must be specified by the user. These parameters therefore permit different scales for different species or populations (Aebischer and Scherler 2021, Spatz et al. 2022), and for differences in locational precision of the tracking data. However, the absolute value of the movement metrics is not important for the predictive accuracy of the model, because only the difference between birds that are nesting (successfully) and those that are not  nesting (or not  successfully) is important for the tool to work accurately.
For each individual season a total of 42 movement metrics that reflect home range areas, distances, residence times and revisitation patterns of the respective individual over the user-specified phases and radii of the breeding season are calculated (Table S1). Because the absolute distances moved by breeding individuals can vary substantially among red kite populations (Aebischer and Scherler 2021, García-Macía et al. 2022b, Spatz et al. 2022), all movement metrics are scaled to the maximum ranges and distances of each individual in each season, and thus are comparable between individuals and populations on a relative scale. This scaling is achieved by dividing the distances and areas calculated for each individual, year and brood phase by the overall distances and areas for that individual and year, calculated with all data over the entire breeding period.
Once the movement metrics have been calculated, there are three sequential steps to identify and predict breeding behaviours, namely (1) whether individuals have a home range (or a ‘territory’), which is typically shown by frequent returns to and long residency times at a single location; (2) whether birds initiate a nesting attempt, which results in similar movements to having a territory, but with even more stringent attachment to a single location and very short distances between the locations most used during day and night; and (3) whether the nesting attempt was actually successful, which results in frequent returns to the nest throughout the entire breeding season and regular presence for long periods of time, especially during the early chick-rearing period (Picardi et al. 2020). NestTool provides two functions for each of those steps: one to train a model using data where the actual behaviour of the tracked birds was observed in the field and is therefore known, and a second function to use that model to predict the behaviour for tracked birds (which can include birds for which the actual behaviour was not known). If no training data with known behaviours exist for a population, the models contained in the tool can be used to predict breeding behaviours for any data set that meets the minimum data requirements – but we caution that these predictions rely explicitly on the assumption that the patterns observed in red kites from Switzerland are transferrable to the population of interest.
The steps above can classify recruitment, breeding propensity and success based on movement metrics, but there will always be individual movement behaviours that have very high uncertainty in classification. NestTool, therefore, provides not only the functions above to predict probabilities of each reproductive behaviour, but also a graphical user interface (based on R shiny) to visually inspect an individual’s movements over a season to manually determine behaviours when the predicted probabilities fall below a user-specified threshold. To aid in this manual annotation, the user is presented with a map of the tracking data locations, and graphs that show the development of five movement metrics over time (Fig. S1). The manual annotation of breeding propensity and success is then saved and can be combined with the automated classification to extract important demographic parameters for the tracked population solely from the tracking data.

Estimating reproductive parameters with red kite tracking data from Switzerland

We tracked 324 red kites since 2013 using harness-attached solar powered GPS tags of different manufacturers (Aebischer and Scherler 2021, Orgeret et al. 2023, Scherler et al. 2023b) that have been shown to cause no adverse effects on survival and breeding performance on birds (Peniche et al. 2011, Sergio et al. 2015, Longarini et al. 2023).
We first extracted data for the core breeding season of red kites in Switzerland and discarded all tracking locations south of 45°N and west of 4°E, and before 11 March and after 24 June, which are the earliest egg-laying and earliest fledging dates in our study population (Aebischer and Scherler 2021, Scherler et al. 2023a). We then resampled all tracking data to a 1-hourly interval, matching the temporal data resolution that is generally available for most datasets in Europe (Maciorowski et al. 2019, García-Macía et al. 2022b, Literák et al. 2022). Because the identification of reproductive behaviour requires regular data throughout the breeding season, we discarded all individual seasons (n = 336; 33%) that did not provide at least 800 locations (8 locations per day) in the time period specified above (to overcome the limitations experienced by van der Wal et al. 2015).
For each individual and season, we used the functions described above to extract all the explanatory variables from the filtered tracking data. We used a 50 m radius around the most frequently visited day and night locations to identify potential nests, and a 2000 m radius to quantify the time spent outside the typical home range (Scherler et al. 2023a). We separated the breeding season of red kites in Switzerland into 5 distinct phases during which different movement behaviours could be expected: the settlement phase when birds arrive in their territory and prepare for nesting (11 March – 15 April), the first incubation phase (15 – 30 April), the second incubation phase (30 April – 15 May), the first (15 May to 30 May) and second (1 June to 24 June) chick rearing phases (Aebischer and Scherler 2021, Spatz et al. 2022). The phase when we expected very frequent returns to the nest during the presence of small chicks was specified from 1 May to 1 June (Scherler et al. 2023a).

Testing the tool with red kite tracking data from Germany

To test whether the models developed for red kites in Switzerland could accurately predict the reproductive behaviour of individuals of different populations in Germany, we used data from 99 individuals tracked between 2010 and 2023 in eastern Germany (Pfeiffer and Meyburg 2015, Nicolai et al. 2017, Pfeiffer and Meyburg 2022). We discarded all tracking locations south of 50°N and west of 9°E, and adjusted the temporal definitions of the breeding season to the local phenology as follows: start of season 15 March, start of incubation 5 April, start of hatching 15 May, end of brooding period 15 June, first fledging 10 July. Because home ranges of red kites in Germany are larger than in Switzerland (Aebischer and Scherler 2021, Spatz et al. 2022), we also adjusted the definitions of the ‘nest radius’ to 150 m and of the ‘home range radius’ to 5000 m. We extracted all movement metrics based on those spatial and temporal definitions, and predicted reproductive behaviour for red kites in Germany using the models trained with data from red kites in Switzerland. For breeding success, we also explored whether training models with data from Germany resulted in improved accuracy.