Parameter estimation – Bayesian sampling
MSOM was implemented using Bayesian statistics in the JAGS language (Plummer & others, 2003). Bayesian models take prior probabilities (beliefs; priors) of each parameter (e.g. occupancy of a species in a patch) and use the observed data to update these probabilities – posterior probabilities, thereby generating a distribution of probabilities for each parameter of interest. This differs from traditional maximum likelihood estimators, which provide a single best-fit estimate. Null models/hypotheses and p-values do not exist in Bayesian statistics.
To define a threshold for ”significance”, we calculated the Highest Density Interval (HDI) for each model parameter (e.g. the parameter defining the effect of habitat amount on species occupancy). HDIs were calculated as the interval where 95% of parameter posterior probability lies. We also defined a Region of Practical Equivalence (ROPE), ranging from -0.1 to 0.1, to determine the parameter space where an effect would be weak/equivalent to zero from a biological perspective. All comparisons of HDI and ROPE were performed using the coefficients obtained with the standardized predictor variables. Based on HDI and ROPE, three possible outcomes were defined for the effect of each predictor variable (see Fig. S2):
1) No effect, when HDI is completely within the Region of Practical Equivalence;
2) Lack of support, when HDI partially overlaps the Region of Practical Equivalence;
3) Evidence of effect, when HDI lies completely outside the Region of Practical Equivalence.
In our data, the [-0.1; 0.1] limits of ROPE indicate that the area effect was considered “significant” only if species occupancy increases or decreases at least 0.00268 with 1-ha of additional patch area at the steepest part of the Species-Area Relationship (assuming S = cAz). Similarly, an absence of practical effect occurs when the effect is weaker than this 95%-probability threshold. Regarding diversity, these intervals would represent an increase or decrease in 16% in species richness comparing the smallest (2.4 ha) to the largest patch ( 144,805 ha). This interval is conservative considering most of the conservation implications of the results, and therefore ”significant” effects represent strong associations with a high degree of certainty. We also include in results and discussion the exact number of species estimated to be gained or lost with the change in patch area or cattle intrusion, and provide the exact estimate whenever possible.
To estimate parameters, we calculated the posterior probability of parameters as:
In this equation, y represents the observed data (counts of individual species at each individual Winkler extractor), L(y | Ψ, p) represents the likelihood function, and P(Ψ) and P(p) denote the prior distributions for occupancy and detection rates. The likelihood function is defined as:
where N represents the number of patches surveyed, m is the number of Winkler extractors per patch (m = 10 for all patches), and P(z|Ψ) is the probability of the latent variable z that defines the true occurrence of a species per patch:
The joint posterior probability captures the simultaneous estimation of both occupancy and detection rates based on the observed data. The species-specific occupancy probability (Ψ) and detection probability (p) for species s at a given fragment i depend on the effect of environmental covariates:
The priors for the species-specific coefficients associated with the occupancy probability (Ψ) and detection probability (p) are assumed to follow Gaussian distributions with overall means and standard deviations:
These priors reflect the expected values and variability of occupancy and detection rates across species (random effects). On the other hand, the hyperpriors (fixed effects) capture the overall patterns across species by specifying the prior distributions for the overall mean of the occupancy probability. The hyperpriors provide a way to incorporate prior knowledge or assumptions about the underlying patterns in occupancy and detection across the entire species assemblage. All model priors were defined as non-informative/flat priors (no prior expectation on model parameters, totally informed by the data):
Posterior probabilities were calculated using three Monte Carlo Markov Chains (MCMC) with 50,000 steps each. The chains were initiated at different parameter values and chain convergence was checked by visual inspection and using the Gelman-Rubin statistic (Gelman & Rubin, 1992; convergence in parameters with R-hat lower than1.05). To remove the effect of initial parameters on the final estimates, a burn-in phase of 5,000 steps was run in each chain. The three chains were run in parallel (simultaneously in three computer cores) with different random number seeds.
All predictor variables were standardized (mean = 0, sd = 1) before analyses. Patch area was log-transformed before standardization and analyses. Untransformed measures of habitat amount (forest cover within a buffer) were more strongly correlated with the observed species richness, so these variables were not log-transformed (but log transformation provided almost identical results; results not shown). Because patch size, forest cover, and cattle presence were highly correlated with each other (r > 0.7), we present results using area and cattle in the same model and also each of these variables separately. Using non-standardized predictor variables produced qualitatively the same results (Fig. S3 in Supporting Information). We present the results using standardized predictors in the main text to allow for direct comparisons of the magnitude of effect directly based on the coefficients.
Effects on individual species
On the basis of MSOMs, we obtained detection, occupancy, and the effect of predictor variables on detection and occupancy for each species. We modeled individual species responses to each predictor as a random effect, centered around a community-wise mean. To illustrate the consistency or variability of these responses across species, we present the standard deviation of the random effects around the parameter estimates in the results. This allows us to showcase how species’ responses may be consistent or variable in relation to the predictor variables. In all models, species responses to habitat loss and other predictors was highly variable (see Results). To estimate how species traits are associated with their detection, occupancy, and responses to covariates (i.e. why some species are more common and/or more affected by habitat loss than others), we ran separate regressions and analyses of variance (ANOVAs) associating species body mass and foraging strategy (predictor variables) with species detection, occupancy, and their responses to environmental predictors. Here, individual species, rather than forest patches, represent units of analysis.
Regressions and ANOVAs were run using Bayesian statistics using a similar setup to the occupancy models described above. Species traits were not directly incorporated into MSOMs to facilitate model convergence.
Alpha, beta, and gamma diversities
In each patch, species richness was estimated as the sum of individual species occupancies obtained from MSOMs (alpha-diversity; sum of rows in the estimated species x patch matrix). Regional species richness (gamma-diversity) was calculated as the sum of all species occupancies regionally (number of species with at least one occurrence in the estimated species x patch matrix). We measured beta-diversity for each pair of patches using the turnover component of the pairwise Jaccard similarity index (Baselga 2010), which was calculated using the bias-corrected occupancy matrix obtained from MSOM (i.e. presences corrected for imperfect detection). The Jaccard similarity index is measured as the percentage of shared species between a pair of patches, which is likely to be grossly underestimated using raw occurrence data because many shared species are rare and go undetected in at least one of the sites in a pairwise comparison (Chao et al. , 2005; Chiuet al. , 2014). Moreover, the classic Jaccard index results from differences in species richness (nestedness) and true species turnover between any two patches (Baselga, 2010). The turnover component represents species replacements (beta-diversity) and was the only pairwise metric used (but see Supporting Figures for further details and comparisons with classic metrics).
To estimate the effect of predictor variables on alpha-diversity, we correlated species richness with the predictor variables. These correlation coefficients were used to measure the strength of the association, and the posterior probabilities of the correlation coefficient were used to assess the “significance” of this association.
To estimate the effect of predictor variables on pairwise beta-diversity, we created pairwise environmental distance matrices to represent the predictors of species turnover. These matrices were created by calculating pairwise differences for each predictor variable across all patches (e.g. two cattle-trampled patches have a difference of zero for cattle intrusion). These pairwise differences were then correlated with the pairwise turnover of the Jaccard similarity index using a multiple regression of distance matrices (similarly to Mantel tests; Tuomisto & Ruokolainen, 2006). Similarity in species composition is usually strongly associated with geographic distance (Nekola & White, 1999), thus distances separating patches were included as a predictor of the Jaccard index in all beta-diversity models along with differences in patch area, cattle intrusion, connectivity, and habitat amount. Because our study was conducted on a single landscape, gamma diversity across the entire landscape was represented as a single value, which cannot be directly used as a response variable in statistical models.
Although the association between alpha- or beta-diversity and predictor variables informs the degree to which these variables affect regional/landscape diversity, they fail to reveal the relative contribution of each component to regional diversity or how these contributions change with the amount of habitat across the landscape or the size of individual patches. Therefore, to determine if the effect of fragmentation on beta-diversity can compensate for losses in alpha-diversity within patches one should investigate the combined contribution of alpha and beta-components (Socolar et al. , 2016).
To understand the combined contribution of alpha- and beta-diversity to regional/gamma diversity with increasing spatial scales (scaling relationship), we ordered each patch from smallest to largest. We then sequentially grouped subsequent patches and calculated the maximum and average patch diversity (alpha) and overall/landscape diversity (gamma). Gamma diversity increases when either alpha-diversity increases or species composition diverges between patches (beta-diversity). Beta-diversity can be calculated using the additive (gamma = alpha + beta) or multiplicative (gamma = alpha × beta) partitioning of gamma diversity (Jost 2007). We used additive diversity partitioning to estimate the absolute number of species added by beta-diversity to the regional diversity with increasing habitat amount across the landscape, and multiplicative partitioning to estimate therelative contribution of beta-diversity to regional diversity. Species diversities (alpha, beta, and gamma) were calculated for each of the MCMC runs of MSOM, generating a distribution of values and density intervals for these diversity components and their scaling relationships (posterior probabilities).
The sequential grouping of patches allowed us to create cumulative areas and estimate diversities at each grouping of patches. By doing so, we were able to examine how alpha and beta-diversities changed from a single small forest fragment to the entire landscape, capturing the effects of habitat expansion and increasing spatial scale.
Species-Area Relationship (SAR) and landscape configuration
We did not only estimate the contribution of species turnover to gamma diversity but also compared the relative contribution of smaller patches to regional diversity based on the Species-Area Relationship (SAR). To do this, we used the model proposed by Arnillas et al. (2017) to compare the observed SAR with the expectation based on a unitary community, assuming free species dispersal and no spatial segregation. This allowed us to assess how the presence of smaller patches influenced regional diversity compared to the theoretical expectation under idealized dispersal and spatial conditions. This comparison is summarized by R, the relative difference between the observed and expected SAR:
In this equation, forest parches are sorted from largest to smallest; pi represents the probability of a patch having a species that is absent from larger patches; c is a constant; z is the species-area exponent; and Ai represents patch area. The numerator represents the number of species in a region with the observed SAR, and the denominator represents the expected number of species based on a uniform distribution of species across the landscape. c and z were estimated using a linear regression between the species richness (log S) and patch area (log A).
The R values range from R<1 when communities are nested (smaller patches have a subset of species from continuous forests), R = 1 when small and large communities are equivalent (small areas can still add species due to area effects), to R>1 when smaller patches have a distinct set of species and add more species than would be expected by chance based on landscape area alone (Arnillas et al., 2017).
All analyses were conducted in R (R Core Team, 2019) and JAGS (Plummer, 2019) programs using the rjags (Plummer, 2019), runjags(Denwood, 2016), reshape2 (Wickham, 2007), raster (Hijmanset al. , 2015), and vegan (Oksanen et al. , 2018) packages. All statistical models created in these analyses are provided with a tutorial on how to run MSOM in R (see Supporting Information).