Six conceptually different models of steady groundwater flow and conservative transport are applied to the heterogeneous MADE aquifer. Their predictive capability is assessed by comparing the modelled and observed longitudinal mass distributions at different times of the plume in the MADE-1 experiment, as well as at a later time. The models differ in their conceptualization of the heterogeneous aquifer structure, computational complexity, and use of permeability data obtained from various observation methods (DPIL, Grain Size Analysis, Pumping Tests and Flowmeter). Models depend solely on aquifer structural and flow data, without calibration by transport observations. Comparison of model results by various measures, i.e. peak location, bulk mass and leading tail, reveals that the predictions of the solute plume agree reasonably well with observations if the models are underlined by a few parameters of close values: mean velocity, a parameter reflecting log-conductivity variability and a horizontal length scale related to conductivity spatial correlation. From practitioners perspective the robustness of the models is an important and useful property. The model comparison provides insight into relevant features of transport in heterogeneous aquifers. After further validation by additional field experiments or by numerical simulations, the results can be used to provide guidelines for users in selecting conceptual aquifer models, characterization strategies, quantitative models and implementation for particular goals.