High-resolution mapping of SOC at different subsidence stages in high
groundwater level mining areas using machine learning and model fusion
techniques
Abstract
Accurately estimating the spatial distribution of soil organic carbon
(SOC) in coal mining regions is crucial for soil quality restoration and
understanding global carbon cycling. Given the complex mechanisms
influencing SOC in coal mining areas, research on the dynamic and
high-precision digital analysis of SOC content before and after
subsidence and reclamation in high groundwater mining sites remains
limited. In this study, we employed four machine learning
algorithms—Cubist, Random Forest (RF), Support Vector Machine (SVM),
and Extreme Gradient Boosting (XGBoost)—in conjunction with a model
fusion technique to analyze SOC content across various subsidence stages
in high groundwater mining areas: control land (CL), subsided land (SL),
and reclaimed land (RL). By integrating high-resolution imagery from
China’s GF-1 satellite, we generated a predictive map of surface SOC
content. Additionally, we utilized an optimal parameter-based
geographical detector (OPGD) model to quantitatively identify the key
factors driving SOC spatial variation within the study area. Our results
indicate that the fusion model combining RF and Cubist outperformed the
others, achieving a coefficient of determination (R 2)
of 0.73, a root mean square error (RMSE) of 0.73 g/kg, and a ratio of
performance to interquartile distance (RPIQ) of 2.50. The predictive map
highlights that high SOC concentrations in the mining area are
predominantly found in reclaimed lands. Organism-related factors emerged
as the strongest explanatory variables for SOC content in these areas
and constituted the most critical dataset in our model development. This
cost-effective, high-efficiency approach offers valuable insights into
SOC research and informs strategies for soil remediation in
mining-affected lands.