Large ensembles of model simulations require considerable resources, and thus defining an appropriate ensemble size for a particular application is an important experimental design criterion. Utilizing the recently developed CLIVAR ENSO Metrics Package (Planton et al., 2021), we estimate the ensemble size (N) needed to assess a model’s ability to capture observed ENSO behavior. Using the larger ensembles available from CMIP6 and the CLIVAR Large Ensemble Project, we find that larger ensembles are needed to robustly capture baseline ENSO characteristics (N > 65) and physical processes (N > 50) than the background climatology (N ≥ 12) and remote ENSO teleconnections (N ≥ 6). While these results vary somewhat across metrics and models, our study highlights that ensembles are required to robustly evaluate simulated historical ENSO behavior, and provide initial guidance for designing model ensembles to reliably evaluate and compare ENSO simulations.