Adjoint tomography (i.e., full-waveform inversion) has been recently applied to ambient seismic noise and teleseismic P waves separately to unveil fine-scale lithospheric structures beyond the resolving ability of traditional ray-based traveltime tomography. In this study, we propose a joint inversion scheme that alternates between frequency-dependent traveltime inversions of ambient noise surface waves and waveform inversions of teleseismic P waves to take advantage of their complementary sensitivities to the Earth’s structure. We apply our method to ambient noise empirical Green’s functions from 60 virtual sources, direct P and scattered waves from 11 teleseismic events recorded by a dense linear array (~ 7 km station spacing) and other regional stations (~ 40 km average station spacing) in central California. To evaluate the performance of the method, we compare tomographic results from ambient noise adjoint tomography, full-waveform inversion of teleseismic P waves, and the joint inversion of the two data sets. Both applications to practical field data sets and synthetic checkerboard tests demonstrate the advantage of the joint inversion over individual inversions as it combines the complementary sensitivities of the two independent data sets towards a more unified model. The 3D model from our joint inversion not only shows major features of velocity anomalies and discontinuities in agreement with previous studies. but also reveals small-scale heterogeneities which provide new constraints on the geometry of the Isabella Anomaly and mantle dynamic processes in central California. The proposed joint inversion scheme can be applied to other regions with similar array deployments for high-resolution lithospheric imaging.