We explore how neural differential equations (NDEs) may be trained on highly resolved fluid-dynamical models of unresolved scales providing an ideal framework for data-driven parameterizations in climate models. NDEs overcome some of the limitations of traditional neural networks (NNs) in fluid dynamical applications in that they can readily incorporate conservation laws and boundary conditions and are stable when integrated over time. We advocate a method that employs a ‘residual’ approach, in which the NN is used to improve upon an existing parameterization through the representation of residual fluxes which are not captured by the base parameterization. This reduces the amount of training required and providing a method for capturing up-gradient and nonlocal fluxes. As an illustrative example, we consider the parameterization of free convection of the oceanic boundary layer triggered by buoyancy loss at the surface. We demonstrate that a simple parameterization of the process — convective adjustment — can be improved upon by training a NDE against highly resolved explicit models, to capture entrainment fluxes at the base of the well-mixed layer, fluxes that convective adjustment itself cannot represent. The augmented parameterization outperforms existing commonly used parameterizations such as the K-Profile Parameterization (KPP). We showcase that the NDE performs well independent of the time-stepper and that an online training approach using differentiable simulation via the Julia scientific machine learning software stack improves accuracy by an order-of-magnitude. We conclude that NDEs provide an exciting route forward to the development of representations of sub-grid-scale processes for climate science, opening up myriad new opportunities.