Definition of the Climatic Niche limits for common ancestors in the hominin tree
The hominin phylogenetic tree was obtained by combining the Primate (and human) phylogenetic information published in recent papers49-51. We started by using the six climatic variables, representing the limits (minima and maxima) in temperature, precipitation and NPP. Since these variables are highly correlated to each other, we reduced covariation among variables by performing a Principal Component Analysis (PCA) on climatic variables associated with each hominin species occurrence in the fossil record. Then, we extracted the PC scores and used them as a multivariate dataset for the phylogenetic ridge regression.
We used the PC scores estimated by RRphylo at each node and back transformed the scores in climatic variables (MinTemp, MinPrec, min NPP, MaxTemp, MaxPrec, max NPP). For each node in the hominin tree, we mapped geographically the areas associated with its corresponding climatic estimates (i.e. the limits of the climatic envelope). The resulting map thus represents the geographic areas estimated to be climatically suitable for occupation by each ancestor in the tree. To account for uncertainty around the ages of individual nodes in the hominin phylogeny, we repeated the entire procedure at each node over 100 replicates by using the 100 alternative phylogenies generated fromswapONE function embedded in the RRphylo package. This function randomly changes the tree topology and branch lengths although it is possible to keep specific clades monophyletic. However, we accounted for a few, well-supported, monophyletic clades which are present in the hominin tree. In particular, in swapping the tree tips, we kept H. ergaster and H. erectus as sister species. We similarly kept monophyletic the clade subtending to the four australopithecines in the tree, the clade representing the genus Homo , and the clade including H. heidelbergensis , H. neanderthalensis andH. sapiens . Eventually, we recorded the number of times a given geographical cell counts as climatically suitable out of the 100 replicates. For each cell, estimated habitat quality thus ranges between 0 (never suitable irrespective of the phylogenetic tree used for climatic reconstructions at the tree nodes) to 1 (100 times suitable). Since the inclusion of particular taxa in the data may alter significantly the result of PCA ordination52 we repeated the swap procedure leaving one species at random out of the tree for each replicate.
To measure the association between climatic suitability and the presence of human species, we calculated the Area Under receiver-operator Curve (AUC) averaging over the 100 replicates. AUC theoretically ranges from 0 to 1. However, since random sampling points, (pseudoabsences) are not real absences, AUC cannot reach 153, as the maximum AUC value depends on the actual (unknown) area of distribution of the species. To obtain a null distribution of AUC values and assess significance for the real AUC, for each node in the tree we sampled 100 times as many point occurrences as with the real data (i.e. the fossil occurrences of the species descending from that node) within the biogeographic domain of the species, and calculated the random AUC. Climatic tolerance values at the tree node represent the estimated tolerance limits for hominin ancestors. Since these values are estimated, rather than observed, to assess the association between fossil localities and habitat quality for each ancestral species estimates we selected the fossil occurrences of its descendants, provided they are not included in a descending node which was itself tested. For instance, the EHS ancestor was tested by selecting the fossil occurrence of H. habilis , H. erectus and H. ergaster , but not H. sapiens , H. neanderthalensis andH. sapiens which were considered only descendant to the MHS ancestor. To account for sampling differences between the hominin species we analysed, we repeated the AUC computation after sampling randomly no more than 100 occurrences per species at each replicate.
We used the evolutionary rates provided by RRphylo to apply the function search.shift 23 which tests whether individual clades evolved at significantly different rates as compared to the rest of the tree. The function compares the rates attached to each branch descending from a particular node to the rates for the branches of the rest of the tree. The significance for the rate difference is assessed by means of randomization. In the case of multivariate data, as with this particular study, the multivariate rate is computed as the 2-Norm (Euclidean) vector of the rates of individual variables.
To look for possible evolutionary trends in climatic tolerances over time we used the function search.trend 54 in theRRphylo R package23. In search.trend , evolutionary rates and phenotypes (including the phenotypic estimates at the nodes) are regressed against their age and the resulting slopes compared to slopes randomly generated under the Brownian motion model of evolution.