Spatial bootstrapped microeconometrics: forecasting for out-of-sample
geo-locations in big data
Abstract
Spatial econometric models estimated on the big geo-located point data
have at least two problems: limited computational capabilities and
inefficient forecasting for the new out-of-sample geo-points. This is
because of spatial weights matrix W defined for in-sample observations
only and the computational complexity. Machine learning models suffer
the same when using kriging for predictions; thus this problem still
remains unsolved. The paper presents a novel methodology for estimating
spatial models on big data and predicting in new locations. The approach
uses bootstrap and tessellation to calibrate both model and space. The
best bootstrapped model is selected with the PAM (Partitioning Around
Medoids) algorithm by classifying the regression coefficients jointly in
a non-independent manner. Voronoi polygons for the geo-points used in
the best model allow for a representative space division. New
out-of-sample points are assigned to tessellation tiles and linked to
the spatial weights matrix as a replacement for an original point what
makes feasible usage of calibrated spatial models as a forecasting tool
for new locations. There is no trade-off between forecast quality and
computational efficiency in this approach. An empirical example
illustrates a model for business locations and firms’ profitability.