Extreme ultraviolet images taken by the Atmospheric Imaging Assembly on board the Solar Dynamics Observatory make it possible to use deep vision techniques to forecast solar wind speed - a difficult, high-impact, and unsolved problem. At a four day time horizon, this study uses attention-based models and a set of methodological improvements to deliver an 11.1% lower RMSE error and a 17.4% higher prediction correlation compared to the previous work testing on the period from 2010 to 2018. Our analysis shows that attention-based models combined with our pipeline consistently outperform convolutional alternatives. Our model has learned relationships between coronal holes’ characteristics and the speed of their associated high speed streams, agreeing with empirical results. Our study finds a strong dependence of our best model on the position in the solar cycle, with the best performance occurring in the declining phase.