MaxEnt brings comparable results when the input data is being completed;
model parameterization and background manipulation of four species
distribution models
Abstract
Species distribution models (SDMs) are practical tools to assess the
habitat suitability of species with numerous applications in
environmental management and conservation planning. The manipulation of
the input data to deal with their spatial bias is one of the
advantageous methods to enhance the performance of SDMs. However, the
development of a model parameterization approach covering different SDMs
to achieve well-performing models has never been implemented. We
integrated input data manipulation and model tuning for four
commonly-used SDMs; generalized linear model (GLM), gradient boosted
model (GBM), random forest (RF), and maximum entropy (MaxEnt), and
compared their predictive performance to model geographically imbalanced
biased data of a rare species complex of mountain vipers. Models were
tuned up based on a range of model-specific parameters considering two
background selection methods; random and background weighting schemes.
The performance of the fine-tuned models was assessed based on a
recently identified localities of the species. The results indicated
that although the fine-tuned version of all models shows great
performance in predicting training data (AUC > 0.9 and TSS
> 0.5), they produce different results in classifying
out-of-bag data. The GBM and RF with higher sensitivity of training data
showed more different performances. The GLM, despite having high
predictive performance for test data, showed lower specificity. It was
only the MaxEnt model that showed high predictive performance and
comparable results for identifying test data in both random and
background weighting procedures. Our results highlight that while GBM
and RF are prone to overfitting training data and GLM over-predict
non-sampled areas MaxEnt is capable of producing results that are both
predictable (extrapolative) and complex (interpolative). We discuss the
assumptions of each model and conclude that MaxEnt could be considered
as a practical method to cope with imbalanced-biased data in species
distribution modeling approaches.