Machine learning algorithm is applied to shear wave velocity (Vs) inversion in surface wave tomography, where a set of 1-D Vs profiles and the corresponding synthetic dispersion curves are used in network training. Previous studies showed that performances of a trained network depend on the input training dataset with limited diversity and therefore lack generalizability. Here, we present an improved semi-supervised algorithm-based network that takes both model-generated and observed surface wave dispersion data in the training process. The algorithm is termed Wasserstein cycle-consistent generative adversarial networks (Wcycle-GAN). Different from conventional supervised approaches, the GAN architecture extracts feature from the observed surface wave dispersion data that can compensate the limited diversity of the training dataset generated synthetically. The cycle-consistency enforces the reconstruction ability of input data from predicted model using a separate data generating network, while Wasserstein metric provides improved training stability and enhanced spatial smoothness of the output Vs model. We demonstrate improvements by applying the Wcycle-GAN method to 4076 pairs of fundamental mode Rayleigh wave phase and group velocity dispersion curves obtained in Southern California. The final 3-D Vs model from the best trained network shows large-scale features that are consistent with the surface geology. Our Vs model has smaller data misfits, yields better spatial smoothing, and provides sharper images of structures near faults in the top 15 km, suggesting the proposed Wcycle-GAN algorithm has stronger training stability and generalization abilities compared to conventional machine learning methods.