Effective permeability is a key physical property of porous media that defines its ability to transport fluid. Digital rock physics combines modern tomographic imaging techniques with advanced numerical simulations to estimate effective rock properties. Digital rock physics is used to complement or replace expensive and time-consuming or impractical laboratory measurements. However, with increase in sample size to capture multimodal and multiscale microstructures, conventional approaches based on direct numerical simulation (DNS) are becoming very computationally intensive or even infeasible. To address this computational challenge, we propose a hierarchical homogenization method (HHM) with a data-driven surrogate model based on 3-D convolutional neural network (CNN) and transfer learning to estimate effective permeability of digital rocks with large sample sizes up to billions of voxels. This workflow (HHM-CNN) divides the large digital rock into small sub-volumes and predicts the sub-volume permeabilities through a CNN surrogate model of Stokes flow at the pore scale. The effective permeability of the full digital rock is then predicted by solving the Darcy equations efficiently on the upscaled model in which the permeability of each cell is assigned by the surrogate model. The proposed method is verified on micro-CT data of both sandstones and carbonates as well as the reconstructed high-resolution digital rock obtained by multiscale data fusion. The computed permeabilities of our proposed hierarchical approach are consistent with the results of the DNS on the full digital rock. Compared with conventional DNS algorithms, the proposed hierarchical approach can largely reduce the computational time and memory demand.