Statistical analyses
The first analysis was to calculate average difference between summer
temperatures experienced by ethnoracial group and the county average. To
calculate mean difference per ethnoracial group, we used fixed effects
regressions. To account for spatial variation in temperature and
ethnoracial composition, we used three strategies. First, regressions
were stratified by census regions, which closely align with NOAA climate
regions. Second, we included fixed effects for county. Finally, we used
Conley variance–covariance calculations to construct standard errors to
account for spatial autocorrelation in the data, calculated based on
population-weighted centroids of tracts. We also included a fixed effect
for the year, and we weighted our regression models by the total
population of the tract. These statistical procedures were conducted
using the fixest package in R (30). Often regression models with
categorical variables like ethnoracial groups use traditional dummy
coding with one referent group, typically white people. This is
problematic because it makes the referent result invisible and makes one
group’s experience the standard, norm, or aspirational depending on
context (31). To avoid this, we implemented weighted effect coding,
which functionally weights each category to represent deviation from the
sample mean, in this case the county-averaged temperature (32).
Our second analysis was to associate our measure of residential
segregation with local air temperatures. Associating tract-level
temperature with the segregation index required a flexible regression
framework to accommodate nonlinearities, so we used generalized additive
models with smoothing splines. The segregation measure was modeled with
a natural cubic spline with three knots. We included fixed effects for
county and year and accounted for spatiotemporal dependence by modeling
a tensor product smooth of the geocoordinates of population-weighted
centroids by year. These regressions were implemented with themgcv package in R, specifically using the bam function for
computational efficiency in large datasets (33).