Unfrozen Water Content Estimation: A Comparison between Ensemble and Non-ensemble Machine Learning Models
Abstract
Unfrozen water content (UWC) is a key parameter affecting a variety of soil physical-mechanical properties and processes in frozen soil systems. However, traditional estimation models suffer limitations due to oversimplified assumptions or limited applicable conditions. Given that, there is a compelling need to explore alternative modeling approaches that leverage machine learning (ML) algorithms, which have shown increasing potential in engineering fields. To this end, this study evaluated and compared six widely known ML algorithms (i.e., three ensemble models: RF, LightGBM and XGBoost; and three non-ensemble models: KNN, SVR and BPNN) for modeling UWC based on collected experimental datasets. These algorithms were optimized and evaluated using a framework combining Bayesian optimization and cross-validation to ensure model stability and generalization. The results demonstrated that the ensemble tree-based methods, particularly LightGBM and XGBoost, achieved the highest predictive accuracy and superior overall performance. On the other hand, the nonensemble methods exhibited poorer generalization abilities. Interestingly, during 10-fold cross-validation, consistent underperformance was observed for a particular fold, possibly stemming from the challenges of the data distribution in that fold after random shuffling. The present study highlights the effectiveness of ensemble learning approaches, importance of proper hyperparameter tuning and validation strategies, and intrinsic modeling challenges arising from the difference between the freezing and thawing phase change behaviors. This comprehensive ML model comparison and robust training framework provide valuable guidance on selecting suitable data-driven techniques for modeling frozen soil properties for cold regions hydrogeology and engineering practices.