Overview of Generative Adversarial Networks
Generative Adversarial Networks (GANs) are an artificial intelligence (AI) method for simulating samples from complex data distributions. GANs cleverly harness neural networks, which are powerful and versatile at learning approximations for arbitrary high dimensional functions.
A GAN is a system with two neural networks, the generator and discriminator (Figure 1), which interact in a mutually competitive (adversarial) supervised learning strategy. Samples (training data) from the target data distribution serve as input for training the GAN. The output is a learned model representing the underlying data distribution.
The input to the generator is a random variable vector drawn from a latent space that provides a supply of random variables of suitable dimensionality. The generator neural network transforms the input from the latent space to synthesize generated data as output.
The inputs to the discriminator are instances of generated data and training data. The discriminator neural network is a binary classifier that is designed to determine whether a given input was drawn from the training or was generated data. The classification errors are used to compute the generator and discriminator loss functions.
Backpropagation of the loss functions is used to update the parameters of the generator and discriminator neural networks via gradient descent to enable supervised learning. The training process in a GAN is adversarial because the generator is trained to maximize the classification error, while the discriminator is trained to minimize the classification error. This adversarial strategy guides the generator neural network to become increasingly proficient at synthesizing data that approximates the training data distribution. Ideally after training is complete, the generated output is a random variable indistinguishable from the target data distribution. The performance of the GAN is usually assessed on independent test data.
The GAN approach was used to simulate high-dimensional joint distributions of disease-relevant biomarkers of virtual patient populations for pharmacometrics applications. Large heterogeneous public domain datasets with biomedical information on diverse populations were utilized for GAN training and performance evaluation.