One of the largest sources of uncertainty in climate models comes from convective cloudssystems parametrizations, which are necessary at current horizontal resolutions, of about 25km.In order to fully resolve the effects of such systems, horizontal resolutions need to reach the 1-3km range, which entail a huge increase in computational costs. In recent years, the performance of the world largest supercomputers has steadilyimproved, to the point that we can now think of running a climate modelat cloud resolving resolution, with a Simulated Years Per Day (SYPD) throughputthat is suitable for decade or century long simulations.However, the architectures of the newest supercomputers are becoming more andmore heterogeneous, which makes the task of keeping a performant code base moreand more challenging. Here, we present our effort to upgrade the new non-hydrostatic atmosphere dycore ofE3SM to a version that is highly performant on a variety of architectures, includingGPUs, conventional CPUs, and many-core CPUs. When using GPUs, our implementation isone of the fastest, achieving 0.97 SYPD on the NGPPS benchmark, when running on the full Summit supercomputer.