We discuss an efficient implementation of the analog ensemble algorithm, the distilled analog ensemble, which is achieved by distilling the post-processing transformation generated by the analog ensemble into a deep neural network. While the analog ensemble has been shown to be able to improve deterministic forecasts and create calibrated probabilistic predictions in many contexts, a common issue with operationalizing a large, global analog ensemble-based system is the amount of data (a corpus of historical forecasts) and post-processing latency required to process that data in the time-critical path of producing a forecast. Deep neural networks are high capacity function approximators, and we demonstrate that we are able to train a network that memorizes the post-processing behavior of the analog ensemble on a particular corpus of forecasts. This technique breaks the scale factor between the size of the historical forecast corpus (larger is better for forecast skill improvements) and the calculation required to post-process the current forecast in real-time operations. We show that the distilled analog ensemble is able to improve European Centre for Medium-Range Weather Forecasts (ECMWF) high-resolution deterministic forecasts of winds in the lower stratosphere using as ground-truth either the ECMWF analysis or observations from Loon high altitude balloons. In this case, rather than requiring terabytes of historical forecast data to apply the conventional analog ensemble, we can perform the post-processing that improves forecast quality on the fly doing computationally efficient forward passes through a pre-trained network that has a data size of only 100’s of kilobytes.