V
\(\mathbf{Y}=\mathbf{\text{\ μ}}+\mathbf{\text{U\ }}\mathbf{V}^{\mathbf{t}}\mathbf{\sigma}\)(Eq. 1)
where the loading matrix U has dimensions equal to ij x k, or the number of measurements across all sites and indices by the number of principal components, k . The score matrix V has dimensions of t x k where t is the number of time steps. All avian diversity indices were first centered and normalized independently for each site and metric by calculating the long-term mean, µ, and standard deviation, σ, for each column of the original Ymatrix.
PC scores describe the temporal expression of each PC, centered around the long-term mean, µ. In the context of our analysis, PC scores capture the dominant seasonal pattern of avian diversity. For example, a transition of PC scores from strongly negative at the beginning of the year to strongly positive in the summer and strongly negative again towards the end of the year captures breeding-winter season variation in avian diversity. PC loadings indicate how strongly, positively or negatively, the temporal pattern given by PC scores is expressed at a given location. Strong positive loadings mean that the average temporal pattern given by PC scores is expressed strongly in that region, strong negative loadings indicate the temporal pattern given by PC scores is expressed strongly in the opposite direction, and loadings near zero indicate that the temporal pattern given by PC scores is barely expressed, producing values of biodiversity near the mean. Loadings maps can be produced for all k PCs, with later PCs often capturing increasingly random spatial variation. Note that the true temporal pattern of avian diversity will always be a combination of the principal modes, but the stronger loadings on a given PC the stronger the effect of that particular Principal Component on avian diversity (see below); PC loadings can thus be thought of as weights indicating the contribution of each PC to the true temporal pattern.
Once we calculated the PC scores and loadings and selected the number of PCs to consider, we used subset versions of the loading and score matrices, U and V, to show the effect of each PC as well as the cumulative effect of all considered PCs together on avian diversity over time, at selected sites. To accomplish this, we multiplied U by the transpose of the score matrix,Vt , to produce a matrix in ‘normalized space’. Because we originally chose to normalize the data,UVt must be multiplied by the standard deviation, σ, and added to the mean, µ, to obtain reconstructions of the original avian diversity metrics (Eq. 2). For a single site andk th PC, this becomes:
\(\text{Div}_{i,j,k}[t]=\ \mu_{i,j}+\ \sigma_{i,j}\times\left[U_{i,j,k}\times V_{k}[t]\right]\)(Eq. 2)
where Ui,j,k is a single loading value for PCk and a given site/avian diversity metric; Vk[t] is a score vector for PCk which changes over time. The long-term mean and standard deviation of an avian diversity metric j at location i are given by µi,jad σi,j, respectively. PC loadings and scores therefore work together to reconstruct the original avian diversity at each site and time step by calculating the number of standard deviations from the long-term mean. Lastly, we compared the reconstructed values of avian diversity metrics with true avian diversity time series for select locations.
We conducted Principal Component Analysis using function ‘prcomp’ from a package ‘stats’ in R.
Spatial congruence in seasonality of avian taxonomic and functional diversity. To assess agreement in spatiotemporal patterns of all avian diversity metrics, we first evaluated correlations among loadings for the first three PCs (which together accounted for ca. 65% of variance in the data; Fig. S1) for each pair-wise association of avian diversity metrics (SR, FRic, cFRic, FEve, FDis). To better synthesize findings from the three PCs and identify regions characterized by similar temporal patterns of avian diversity, we conducted a clustering procedure using the k-means clustering algorithm. The k-means algorithm partitions observations into k clusters in which each observation belongs to the cluster with the nearest mean in q-dimensional space, where the q axes represent the number of measurements. For this example, q = 15 because clustering was based on loading values from the five diversity metrics and three principal components. We first performed a k-means clustering procedure on a training dataset (a subset of 20,000 locations) and used a goodness-of-fit metric (maximum silhouette width) to select the most appropriate number of clusters (Fig. S2). We then used function ‘kcca’ from package ‘flexclust’ to partition observations into clusters with the closest k-centroid. Repeated tests with random samples produced stable cluster estimates, increasing confidence in the use of a training subset rather than the full dataset. All code will be available on GitHub and archived with zenodo (doi: to be provided upon acceptance of the manuscript).