Carbonate clumped isotope thermometry (Δ_47) is a temperature proxy that is becoming more widely used in the geosciences. Most calibration studies have used ordinary least squares linear regressions or York models to describe the relationship between Δ_47 and temperature. However, Bayesian models have not yet been explored for clumped isotopes. There also has not yet been a comprehensive study assessing the performance of commonly used regression models in the field. Here, we use simulated datasets to compare the performance of seven regression models, three of which are new and fit using a Bayesian framework. While Bayesian and non-Bayesian ordinary least squares linear regression models show the best overall accuracy for calibrations, Bayesian models outperform other models in terms of precision, especially if datasets are sufficiently large (>50 data points). For temperature reconstructions where a given regression model is applied to predict temperature from Δ_47), Bayesian and non-Bayesian models show variable performance advantages depending on the the structure of errors in the calibration dataset. Overall, our analyses suggest that the advantages of using Bayesian models for calibrating and reconstructing temperatures using clumped isotope paleothermometry are realized through the use of large calibration datasets (>50 data points). When used with large datasets, Bayesian regressions are expected to substantially improve the accuracy and precision of (i) calibration parameter estimates and (ii) temperature reconstructions (e.g., typically improving precision by at least a factor of two). We implement our comparative framework into a new web-based interface, BayClump. This data tool should increase reproducibility by enabling access to the different Bayesian and non-Bayesian regression models. Finally, we utilize BayClump with three published datasets to examine precision and accuracy in regression parameters and reconstructed temperatures. We show that BayClump yields similarly accurate results to published studies. However, the use of BayClump generally produces temperature reconstructions with meaningful reductions in temperature uncertainty, as demonstrated through reanalysis of data from a Late Miocene hominoids site in Yunnan, China.

Dave Vieglais

and 16 more

Material samples are vital across multiple scientific disciplines with samples collected for one project often proving valuable for additional studies. The Internet of Samples (iSamples) project aims to integrate large, diverse, cross-discipline sample repositories and enable access and discovery of material samples as FAIR data (Findable, Accessible, Interoperable, and Reusable). Here we report our recent progress in controlled vocabulary development and mapping. In addition to a core metadata schema to integrate SESAR, GEOME, Open Context, and Smithsonian natural history collections, three small but important controlled vocabularies (CVs) describing specimen type, material type, and sampled feature were created. The new CVs provide consistent semantics for high-level integration of existing vocabularies used in the source collections. Two methods were used to map source record properties to terms in the new CVs: Keyword-based heuristic rules were manually created where existing terminologies were similar to the new CVs, such as in records from SESAR, GEOME, and Open Context and some aspects of Smithsonian Darwin Core records. For example specimen type =liquid>aqueous in SESAR records mapped to specimen type = liquid or gas sample and material type = liquid water. A machine learning approach was applied to Smithsonian Darwin Core records to infer sampled feature terms from record text describing habitat, locality, higher geography, and higher classification fields. Applying fastText with a 600-billion-token corpus in the general domain, we provided the machine a level of “understanding” of English words. With 200 and 995-record training sets, 87%, 94% precision and 85%, 92% recall were obtained respectively, yielding performance sufficient for production use. Applying these approaches, more than 3x106 records of the four large collections have been mapped successfully to a common core data model facilitating cross-domain discovery and retrieval of the sample records.