V
\(\mathbf{Y}=\mathbf{\text{\ μ}}+\mathbf{\text{U\ }}\mathbf{V}^{\mathbf{t}}\mathbf{\sigma}\)(Eq. 1)
where the loading matrix U has dimensions equal to ij x k, or
the number of measurements across all sites and indices by the number of
principal components, k . The score matrix V has
dimensions of t x k where t is the number of time steps. All
avian diversity indices were first centered and normalized independently
for each site and metric by calculating the long-term mean, µ, and
standard deviation, σ, for each column of the original Ymatrix.
PC scores describe the temporal expression of each PC, centered around
the long-term mean, µ. In the context of our analysis, PC scores capture
the dominant seasonal pattern of avian diversity. For example, a
transition of PC scores from strongly negative at the beginning of the
year to strongly positive in the summer and strongly negative again
towards the end of the year captures breeding-winter season variation in
avian diversity. PC loadings indicate how strongly, positively or
negatively, the temporal pattern given by PC scores is expressed at a
given location. Strong positive loadings mean that the average temporal
pattern given by PC scores is expressed strongly in that region, strong
negative loadings indicate the temporal pattern given by PC scores is
expressed strongly in the opposite direction, and loadings near zero
indicate that the temporal pattern given by PC scores is barely
expressed, producing values of biodiversity near the mean. Loadings maps
can be produced for all k PCs, with later PCs often capturing
increasingly random spatial variation. Note that the true temporal
pattern of avian diversity will always be a combination of the principal
modes, but the stronger loadings on a given PC the stronger the effect
of that particular Principal Component on avian diversity (see below);
PC loadings can thus be thought of as weights indicating the
contribution of each PC to the true temporal pattern.
Once we calculated the PC scores and loadings and selected the number of
PCs to consider, we used subset versions of the loading and score
matrices, U and V, to show the effect of each PC as
well as the cumulative effect of all considered PCs together on avian
diversity over time, at selected sites. To accomplish this, we
multiplied U by the transpose of the score matrix,Vt , to produce a matrix in ‘normalized
space’. Because we originally chose to normalize the data,UVt must be multiplied by the standard
deviation, σ, and added to the mean, µ, to obtain reconstructions of the
original avian diversity metrics (Eq. 2). For a single site andk th PC, this becomes:
\(\text{Div}_{i,j,k}[t]=\ \mu_{i,j}+\ \sigma_{i,j}\times\left[U_{i,j,k}\times V_{k}[t]\right]\)(Eq. 2)
where Ui,j,k is a single loading value for
PCk and a given site/avian diversity metric;
Vk[t] is a score vector for PCk which changes over
time. The long-term mean and standard deviation of an avian diversity
metric j at location i are given by µi,jad σi,j, respectively. PC loadings and scores therefore
work together to reconstruct the original avian diversity at each site
and time step by calculating the number of standard deviations from the
long-term mean. Lastly, we compared the reconstructed values of avian
diversity metrics with true avian diversity time series for select
locations.
We conducted Principal Component Analysis using function ‘prcomp’ from a
package ‘stats’ in R.
Spatial congruence in seasonality of avian taxonomic and
functional diversity. To assess agreement in spatiotemporal patterns of
all avian diversity metrics, we first evaluated correlations among
loadings for the first three PCs (which together accounted for ca. 65%
of variance in the data; Fig. S1) for each pair-wise association of
avian diversity metrics (SR, FRic, cFRic, FEve, FDis). To better
synthesize findings from the three PCs and identify regions
characterized by similar temporal patterns of avian diversity, we
conducted a clustering procedure using the k-means clustering algorithm.
The k-means algorithm partitions observations into k clusters in
which each observation belongs to the cluster with the nearest mean in
q-dimensional space, where the q axes represent the number of
measurements. For this example, q = 15 because clustering was based on
loading values from the five diversity metrics and three principal
components. We first performed a k-means clustering procedure on a
training dataset (a subset of 20,000 locations) and used a
goodness-of-fit metric (maximum silhouette width) to select the most
appropriate number of clusters (Fig. S2). We then used function ‘kcca’
from package ‘flexclust’ to partition observations into clusters with
the closest k-centroid. Repeated tests with random samples produced
stable cluster estimates, increasing confidence in the use of a training
subset rather than the full dataset. All code will be available on
GitHub and archived with zenodo (doi: to be provided upon acceptance of
the manuscript).