Extracting latent variables from forecast ensembles and advancements in
similarity metric utilizing optimal transport
Abstract
This study presents a novel methodology for extracting latent variables from high-dimensional sparse data, particularly emphasizing spatial distributions such as precipitation distribution. This approach utilizes multidimensional scaling with a distance matrix derived from a new similarity metric, the Unbalanced Optimal Transport Score (UOTS). UOTS effectively captures discrepancies in spatial distributions while preserving physical units. This is similar to mean absolute error, however it considers location errors, providing a more robust measure crucial for understanding differences between observations, forecasts, and ensembles. Probability distribution estimation of these latent variables enhances the analytical utility, quantifying ensemble characteristics. The adaptability of the method to spatiotemporal data and its ability to handle errors suggest its potential as a promising tool for diverse research applications.