For field workers around the world, wheat trials are often synonymous with wheat heads counting: a tedious but important task to measure this important yield component. Deep Learning has been a promising solution to automate the acquisition of wheat head density from a high-throughput phenotyping system, but it has been shown to be sensitive to changing acquisition conditions, also known as “domain change.” In response, an international collaboration built the “Global Wheat Head Dataset” in 2020 and 2021, a collection of 6515 images acquired during 47 different acquisition sessions in 12 countries. In addition to these datasets, two data competitions were held in 2020 (Kaggle, over 2,200 competitors) and 2021 (AIcrowd, over 400 competitors). The winning solutions are expected to be usable in plant phenotyping pipelines to robustly assess wheat spike density. We tested this hypothesis by evaluating the 2021 winning solution on an independent dataset consisting of images measured both in the field and in the image by a human, taken with the same acquisition protocol. We use triple collocation analysis to demonstrate that the predicted density appears to be more reliable than the human density measured in the field and in the image. Furthermore, we demonstrate that Global Wheat Head Dataset can be used to estimate wheat ear density from a drone.