OpenET is a software system that makes satellite-based multi-model estimates of evapotranspiration (ET) accessible at multiple spatial and temporal scales over the U.S. Large-scale ET estimates fill a critical data-gap for irrigation management, water resources management, and hydrological modeling and research. We present the methods and results of the second phase of an intercomparison and accuracy assessment between OpenET satellite-based models (ALEXI/DisALEXI, eeMETRIC, PT-JPL, geeSEBAL, SIMS and SSEBop) and a benchmark ground-based ET dataset with data from nearly 200 eddy covariance towers across the contiguous U.S. Processing steps for the benchmark dataset included gap-filling, energy balance closure correction, calculation of closed and unclosed daily ET, and multiple levels of data QA/QC. The dataset was split into three groups, phase I and II of the intercomparison and a reserve dataset for future studies. To sample satellite-based ET pixels, static flux footprints were generated at each station based on dominant wind speed and direction. Where data allowed, two dimensional flux footprints that are weighted by hourly ETo were developed and used for ET pixel sampling. A wide range of visual and statistical comparisons between satellite and ground-based ET were conducted at each station and against stations grouped by land cover type. Based on key performance metrics including bias, coefficient of determination, and root mean square error, model results show promising agreement at many flux sites considering the inherent uncertainty in station data. Remote sensing models show the highest agreement with closed station ET in irrigated annual cropland settings whereas locations of native vegetation with high aridity and some forested stations show relatively less agreement. The benchmark ET dataset was used to explore different approaches to computing a single ensemble estimate from the six model ensemble, with the goal of reducing the influence of model outliers and selection of weighting and data sampling schemes to reduce the influence of flux stations with sparse or extensive data records. We present the results from the model intercomparison and accuracy assessment and discuss model performance relative to accuracy requirements from the OpenET user community.