Model transferability
To determine the appropriateness of using the model to make predictions outside of the scope of the NEON dataset, we tested the performance of the combined model on novel locations and novel species using independent data from the Botanical Information and Ecology Network (BIEN) and a subset of TRY leaf trait datasets. These datasets include trait data on LMA, N% and C% for 62 species, including 27 species unavailable in training data (Figure S.1, supplement S1). The combined model showed good transferability to other data sources (mean R2 = 0.54, 95% coverage = 91%, Figure S.6). The inclusion of full phylogenetic relationships in addition to environmental predictors yielded successful model transfer to species not sampled at NEON sites (mean R2 = 0.4, 95% coverage = 94%, Figure 2). This was possible because phylogenetic relationships allow parameter estimates for unsampled species to borrow strength from closely related species included in the data (Evans et al., 2016). The model is therefore suitable for large scale application.
Predicting large scale trait variation
To understand large scale variation in traits using our combined model and to compare it to the phylogeny-only and environment-only approaches, we integrated each of the three models with tree species abundance and topographic data from ~30,000 Forest Inventory and Analysis (FIA) plots (~1.2 million trees) and climate data from DayMet (Figure S.7, S.8, S.9) to predict leaf traits across the eastern US. Predictions from the phylogeny-only model produced patterns similar to Swenson & Weiser (2010) (Figure S.10), suggesting that our approach to representing species-only methods (and any resulting deviations from them) is consistent with patterns previously reported in literature.
Predictions from the combined model show broad-scale patterns associated with shifts in forest communities and large-scale climatic and topographic patterns across latitudinal and altitudinal gradients (Figure 3, S.11). In some cases, trait distributions shifted abruptly between neighboring ecoregions due to a combination of shifts in local environmental conditions and in community assembly (independent of the environment). Changes in species composition may explain trait patterns between the Mississippi Plains ecoregion and Southern Plains ecoregions (Figure S.12). The Mississippi Plains ecoregion is characterized by heavy disturbance from agricultural activities and forests are often limited to riparian ecosystems favoring bottomwood broadleaf species (e.g., Celtis laevigata , Fraxinus pennsylvanica ,Salix nigra ) in contrast to needleleaf species (e.g.,Juniperus virginiana , Pinus taeda , and Pinus echinata ) more common in the neighboring ecoregions (Coastal plains mixed forests). Being that broadleaf species are generally characterized by higher N% and lower LMA, this change in community assembly translates into predicted regional trait patterns. Other ecoregions show little change in species composition compared to neighboring ecoregions instead, suggesting that shifts in predicted patterns may be attributed to how environmental gradients affect traits directly. This seems to be the case of the mixed forests in the Appalachian region, where opposing patterns are exhibited at higher altitudes in the Blue Ridge ecoregion (lower N% and pigment concentrations; higher LMA and C%) compared to piedmont and valleys in the neighboring ecoregions (higher N% and pigments, lower LMA and C%) (Figure S.13).
Predictions from the combined model differed from the phylogeny- and environment-only models, suggesting that phylogeny and environmental drivers contain different information at large scales. Divergence from the combined model varied across ecoregions (Figure 4). These differences were complex, with regions exhibiting shifts of different magnitudes and directions from either phylogeny- or environment-only approaches (Figure 4, Figure S.14, Supplement 5). 80% of ecoregion-trait-model combinations exhibited significant differences in predicted traits between the combined and phylogeny- or environment-only models (p < 0.0001 in paired t-tests). Accordingly, predictions from the phylogeny-only and environment-only model differed from each other (p < 0.0001 in paired t-tests) for 93% of ecoregion-trait combinations, which highlights the distinct effects of the environment and phylogeny on trait distributions and demonstrates the importance of a combined modeling approach for prediction and inference.
Patterns of divergence from the combined model indicate how phylogenetic and environmental effects vary biogeographically across the eastern US. Regions with no significant divergence may signal conditions where the environment affects traits by filtering for species better adapted to local conditions, and traits are well estimated by both species’ averages and directly from the environment (as in the case of the Mississippi Alluvial Plains region). For most ecoregions, divergence of the phylogeny- and environment-only models move in opposite directions (i.e., positive divergence for one opposed to negative divergence for the other). Significant divergence for the phylogeny-only model indicates that continental-scale species averages fail to correctly represent trait values at finer scales, because local environmental conditions and/or competition shift the trait values away from the species mean.
Areas where the combined model predicts higher N% than the phylogeny-only model (blue shading in Fig. 4a) suggest the presence of environmental effects that increase N% above species means. Conversely, orange-shaded areas in Fig. 4a indicate environmental effects that lower N% below species means. These patterns may reflect regulatory mechanisms controlled by the environment, where leaves adjust allocation to proteins, pigments, or structural compounds to balance photosynthetic capacity, toughness, and chemical defense within ranges constrained by species life history (Weih & Karlsson, 2001, Tjoelker et al., 2001, Crous et al., 2019, Albert et al., 2010). In contrast, divergence between the combined and environment-only models (Fig. 4b) could be explained by (i) environmental effects that have a phylogenetic signal not captured by the environmental variables included in our analysis; and/or (ii) stochastic factors (e.g., disturbance history and dispersal limitation) that have resulted in species distribution patterns that are decoupled from current environmental conditions (Burns & Strauss, 2012, McIntyre et al., 1999). Testing mechanistic hypotheses for the above divergence patterns is beyond the scope of our study. Nevertheless, merely identifying these biogeographic patterns is a novel step towards better understanding trait distributions and is only possible using modeling approaches that combine phylogenetic and environmental information.