The annual area burned due to wildfires in the western United States (WUS) increased by more than 300% between 1984 and 2020. However, accounting for the nonlinear, spatially heterogeneous interactions between climate, vegetation, and human predictors driving the trends in fire frequency and sizes at different spatial scales remains a challenging problem for statistical fire models. Here we introduce a novel stochastic machine learning (ML) framework to model observed fire frequencies and sizes in 12 km x 12 km grid cells across the WUS. This framework is implemented using Mixture Density Networks trained on a wide suite of input predictors. The modeled WUS fire frequency corresponds well with observations at both monthly (r= 0.94) and annual (r= 0.85) timescales, as do the monthly (r= 0.90) and annual (r= 0.88) area burned. Moreover, the annual time series of both fire variables exhibit strong correlations (r >= 0.6) in 16 out of 18 ecoregions. Our ML model captures the interannual variability and the distinct multidecade increases in annual area burned for both forested and non-forested ecoregions. Evaluating predictor importance with Shapley additive explanations, we find that fire month vapor pressure deficit (VPD) is the dominant driver of fire frequencies and sizes across the WUS, followed by 1000-hour dead fuel moisture (FM1000), total monthly precipitation (Prec), mean daily maximum temperature (Tmax), and fraction of grassland cover in a grid cell. Our findings serve as a promising use case of ML techniques for wildfire prediction in particular and extreme event modeling more broadly.