Historical simulations of models participating in the 6th phase of the Coupled Model Intercomparison Project (CMIP6) are evaluated over ten Australian regions for their performance in simulating extreme temperatures. Based on two observational datasets, the Australian Water Availability Project (AWAP) and the Berkeley Earth Surface Temperatures (BEST), we first analyze the models’ abilities in simulating the probability distributions of daily maximum and minimum temperature (TX and TN), followed by the spatial patterns and temporal variations of temperature-related extreme indices, as defined by the Expert Team on Climate Change Detection and Indices (ETCCDI). Overall, the CMIP6 models are comparable to CMIP5, with modest improvements shown in CMIP6. Compared to CMIP5, the CMIP6 ensemble tends to have narrower interquartile model ranges for some cold extremes, as well as narrower ensemble ranges in temporal trends for most indices. Over southeast, tropical and southern south regions, both CMIP ensembles generally exhibit relatively large deficiencies in simulating temperature extremes. It is also noted that models with relatively coarse resolution sometimes show better performance, suggesting that some localized processes may need further improvement in finer-scale models. With the assessment on the probability distributions of TX and TN, the results of this study provide more robustness on the evaluation of extreme temperatures and more confidence on future projections. The findings of this study demonstrate only incremental improvement on the simulation of extremes over Australia from CMIP5 to CMIP6. However, they are useful in informing and interpreting future projections of temperature-related extremes over the region.