Recent research showed that machine learning, in particular deep learning, can be applied with great success to a multitude of seismological tasks, e.g. phase picking and earthquake localization. One reason is that neural networks can be used as feature extractors, generating generally applicable representations of complex data. We employ a convolutional network to condense earthquake waveforms from a varying set of stations into a high dimensional vector, which we call event embedding. For each event the embedding is calculated from instrument-corrected waveforms beginning at the first P pick and updated continuously with incoming data. We employ event embeddings for real time magnitude estimation, earthquake localization and ground motion prediction, which are central tasks for early warning and for guiding rapid disaster response. We evaluate our model on the IPOC catalog for Northern Chile, containing ∼100,000 events with low uncertainty hypocenters and magnitude estimates. We split the catalog sequentially into a training and a test set, with the 2014 Iquique event (Mw 8.1) and its fore- and aftershocks contained in the test set. Following preliminary results the system achieves a test RMSE of 0.28 magnitude units (m.u.) and 35 km hypocentral distance 1 s after the first P arrival at the nearest station, which improves to 0.17 m.u. and 22 km after 5 s and 0.11 m.u. and 15 km after 25 s. As applications in the hazard domain require proper uncertainty estimates, we propose a probabilistic model using Gaussian mixture density networks. By analyzing the predictions in terms of their calibration, we show that the model exhibits overconfidence i.e. overly optimistic confidence intervals. We show that deep ensembles substantially improve calibration. To assess the limitations of our model and elucidate the pitfalls of machine learning for early warning in general, we conduct an error analysis and discuss mitigation strategies. Despite the size of our catalog, we observe issues with two kinds of data sparsity. First, we observe increased residuals for the source parameters of the largest events, as training data for these events is scarce. Second, similar inaccuracies occur in areas without events of a certain size in the training catalog. We investigate the impact of these limitations on the Iquique fore- and aftershocks.