Overview of Generative Adversarial Networks
Generative Adversarial Networks (GANs) are an artificial intelligence
(AI) method for simulating samples from complex data distributions. GANs
cleverly harness neural networks, which are powerful and versatile at
learning approximations for arbitrary high dimensional functions.
A GAN is a system with two neural networks, the generator and
discriminator (Figure 1), which interact in a mutually competitive
(adversarial) supervised learning strategy. Samples (training data) from
the target data distribution serve as input for training the GAN. The
output is a learned model representing the underlying data distribution.
The input to the generator is a random variable vector drawn from a
latent space that provides a supply of random variables of suitable
dimensionality. The generator neural network transforms the input from
the latent space to synthesize generated data as output.
The inputs to the discriminator are instances of generated data and
training data. The discriminator neural network is a binary classifier
that is designed to determine whether a given input was drawn from the
training or was generated data. The classification errors are used to
compute the generator and discriminator loss functions.
Backpropagation of the loss functions is used to update the parameters
of the generator and discriminator neural networks via gradient descent
to enable supervised learning. The training process in a GAN is
adversarial because the generator is trained to maximize the
classification error, while the discriminator is trained to minimize the
classification error. This adversarial strategy guides the generator
neural network to become increasingly proficient at synthesizing data
that approximates the training data distribution. Ideally after training
is complete, the generated output is a random variable indistinguishable
from the target data distribution. The performance of the GAN is usually
assessed on independent test data.
The GAN approach was used to simulate high-dimensional joint
distributions of disease-relevant biomarkers of virtual patient
populations for pharmacometrics applications. Large heterogeneous public
domain datasets with biomedical information on diverse populations were
utilized for GAN training and performance evaluation.