Holger Robert Maier1, Firouzeh Rosa Taghikhah2, Ehsan Nabavi3, Saman Razavi4, Hoshin Gupta5, Wenyan Wu6, Douglas A. G. Radford1, Jiajia Huang61 School of Architecture and Civil Engineering, The University of Adelaide, Adelaide, 5005, Australia.2 Business School, University of Sydney, Sydney, 2000, Australia.3 Responsible Innovation Lab, Australian National Centre for Public Awareness of Science, The Australian National University, Canberra, 0200, Australia.4 School of Environment and Sustainability, University of Saskatchewan, Saskatoon, S7N 5A1, Canada.5 Department of Hydrology and Atmospheric Sciences, The University of Arizona, 85721, The United States.6 Department of Infrastructure Engineering, The University of Melbourne, Melbourne, 3010, Australia.Corresponding author: Holger Robert Maier ([email protected])IntroductionThe field of hydrological modelling has, in recent years, seen a resurgence in the use of Artificial Intelligence (AI), with Explainable AI (XAI) methods leading the way (Fan et al., 2023; Fleming et al., 2021; Papacharalampous et al., 2023). As these methods are the ”new kid on the block ”, they can easily capture the imagination of water experts, due to their perceived novelty and presumed promise to be able to explain complex phenomena. Unfortunately, the hype surrounding such methods can also hinder our understanding of their actual capabilities and limitations, as they are often viewed from overly optimistic perspectives. We caution that there is a need for objective and transparent assessment of their utility to understand the conditions under which they add value and those under which they cannot. To this end, this Perspective paper provides a brief explanation of how XAI methods work in comparison with classical methods (Section 2), attempts to articulate the shifts in mindset that must occur for the power of XAI to be leveraged in a responsible fashion (Section 3), and makes suggestions about the path forward (Section 4).How does XAI work?In XAI literature, methods for explanation generally follow one of three approaches (Ghaffarian et al., 2023), depending on whether their purpose is to identify decisive features, quantify feature contributions, or assess the robustness of a model to perturbations in features (see Table 1). Below, we briefly describe each of these methods and offer insights from a geoscientific (e.g., hydrologic) modelling perspective.Identification of decisive featuresXAI approaches that focus on identifying decisive features (i.e. the model inputs that have the biggest influence on model outputs) are referred to as “anchor explanations ” (see Table 1). Such explanations provide a form of “interpretability ” by identifying which subset of features (referred to as ”anchors ”) is sufficient to guarantee a specific prediction outcome. The core idea is that, while other features may vary, the prediction will not change as long as these anchor features remain the same. In other words, anchor explanations seek to identify which features ”anchor ” the prediction, such that changes to other features will not affect the modelling outcome, and are typically used for their ability to explain individual predictions in a transparent way, rather than for their role in model development per se.From a geoscientific modeller’s perspective, the fact that it is acceptable for AI models to include features that have very little influence on model performance is somewhat surprising, as this would most likely be considered questionable practice in hydrological modelling (see Maier et al. (2023a); Maier et al. (2010)). This highlights some of the cultural differences between computer scientists and geoscientists, where the former may often be concerned solely with maximising predictive performance, whereas the latter typically try to ensure that models tend to “give the right answers for the right reasons ”. Consequently, when developing AI models in the geosciences, it is general practice to identify “decisive features ” as part of the process of “parsimonious ” model development, using well-established Input Variable Selection (IVS) algorithms (e.g., PMIS, PCIS, IIS) (Bowden et al., 2005; Galelli et al., 2014; Sharma, 2000) to help ensure that only non-redundant features that have significant influence on model performance are incorporated into the model (Maier et al., 2023a; Wu et al., 2014). Adopting this practice as standard prioritizes overall model stability and generalizability, whereas anchors can only be useful when investigating the precise reason behind a specific decision/observation.Table 1. Details of common XAI methods.