Allan C. Just

and 4 more

Daniel Carrión

and 7 more

Background: Accurate and precise estimates of ambient air temperatures that can capture fine-scale within-day variability are necessary for studies of air temperature and health. Method: We developed statistical models for predicting temperature at each hour in each cell of a 927-m square grid across the Northeast and Mid-Atlantic United States from 2003 to 2019, across ~4,000 meteorological stations from the Integrated Mesonet, using inputs such as elevation, an inverse distance-weighted interpolation of temperature, and satellite-based vegetation and land surface temperature. We used a rigorous spatial cross-validation scheme and spatially weighted the errors to estimate how well model predictions would generalize to new cell-days. We assess the within county association of temperature and social vulnerability in a heat wave as an example application. Results: We found that a model based on the XGBoost machine-learning algorithm was fast and accurate, obtaining weighted root mean square errors (RMSEs) around 1.6 K, compared to standard deviations around 11.0 K. We found similar accuracy when validating our model on an external dataset from Weather Underground. Assessing predictions from the North American Land Data Assimilation System-2 (NLDAS-2), another hourly model, in the same way, we found it was much less accurate, with RMSEs around 2.5 K. Finally, we demonstrated the health relevance of our model by showing that our temperature estimates were associated with social vulnerability across the region during a heat wave, whereas the NLDAS-2 showed a much weaker association. Conclusion: Our high spatiotemporal resolution air temperature model provides a strong contribution for future health studies in this region.