In this work, we develop a data-driven subgrid-scale (SGS) model using a fully convolutional neural network (CNN) for large eddy simulation of forced 2D turbulence. Forced 2D turbulence is a fitting prototype for many large-scale geophysical and environmental flows (where rotation and/or stratification dominate) and has been widely used as a testbed for novel techniques, including machine-learning-based SGS modeling. We first conduct direct numerical simulation (DNS) and obtain training, validation, and testing data sets by applying a Gaussian spatial filter to the DNS solution. With the filtered DNS (FDNS) data in hand, we train the CNN with the filtered state variables. A priori analysis shows that the CNN-predicted SGS term accurately captures the inter-scale energy transfer. A posteriori analysis indicates that the LES-CNN outperforms the physics-based models in both short-term prediction and long-term statistics. Although the CNN-based model is promising in predicting the SGS term, it requires big data to perform satisfactorily. In the small-data limit, the LES-CNN generates artificial instabilities and thus leads to unphysical results. We propose three remedies for the CNN to work in the small-data limit, i.e., data augmentation and group convolution neural network (GCNN), leveraging the rotational equivariance of the SGS term and incorporating a physical constraint on the SGS enstrophy transfer. The SGS term is both translational and rotational equivariant in a square periodic flow field. While primitive CNN can capture the translational equivariance, the rotational equivariance can be accounted for by either including rotated snapshots in the training data set or by a GCNN that enforces rotational equivariance as a hard constraint. Additionally, The SGS enstrophy transfer constraint can be implemented in the loss function of the CNN. A priori and a posteriori analyses show that the CNN/GCNN with knowledge/constraints of rotational equivariance and SGS enstrophy transfer enhances the SGS model and allows the data-driven model to work stably and accurately in a small-data limit. These findings can potentially help the ongoing efforts in using machine-learning for SGS modeling in weather/climate models, where high-quality training data are scarce and instabilities have been reported in many past studies.