Conditional GANs for Biomarker Distribution Simulations for
Under-represented Groups
Dataset: Conditional GAN analyses were conducted with the same
14-biomarker diabetes-relevant set and test-training methods of the
previous High-Dimensional Biomarker Joint Distribution Simulations
section.
Data Pre-processing: The race variable was obtained from theRIDRETH1 variable in the NHANES datasets. The Non-Hispanic Black
group was categorized as Black, the Mexican American and Other Hispanic
groups were categorized as Hispanic, the Non-Hispanic White group was
categorized as White, and the Other Race-Including Multi-Racial was
categorized as Other.
GAN Architecture: The generator and discriminator architectures
were identical to that used for High-Dimensional Biomarker Panel Joint
Distribution Simulations. However, the derived race/ethnicity categories
were encoded as one-hot encoded vectors and appended with the biomarker
input.
The model was trained for 1000 epochs with batch size of 300 and five
discriminator steps.
Data Analysis: The high dimensional distributions were
visualized using t-SNE and UMAP methods and assessments of the
univariate distribution of the GAN-generated distribution vs. test data
distribution were conducted with box plots.